Senior Service Observability Engineer at Safaricom
Job role insights
-
Date posted
September 18, 2025
-
Closing date
September 25, 2025
-
Hiring location
Westlands
-
Qualification
Bachelor Degree
Description
Reporting to the Engineering Lead – Service Availability, the position holder will be tasked with monitoring & Observability and improving the operational aspects of all systems in scope within Digital IT, drive automation and Dev-ops across the different domains and foster service monitoring through proactive initiatives like AIOPs, machine learning among other available channels.
The role is fixed term contract (1 year).
Key Responsibilities:
Proactively building and implementing monitoring services, including end to end monitoring, scripting and automation, modern tooling and maintenance software.
Use of AI and Machine learning to perform log analysis and create predictive models that will assist in identifying potential failures.
Design, develop and support Inhouse Observability platform.
Design and maintain scalable, high-availability observability pipelines and dashboards for microservices and cloud infrastructure.
Define and enforce SLO/SLI/ SLA/ Error budgets standards, set actionable alerts, and drive continuous reliability improvements.
Partner with SRE, DevOps, Development Squads and security teams to instrument services using OpenTelemetry and related tooling.
Build custom Agents, exporters, collectors or integrations where off-the-shelf solutions fall short.
Job Requirements:
- Bachelor’s Degree in either Computer Science, Software Engineering, Business Information Technology, or any other relevant field.
- Domain knowledge in Sysadmin especially Linux, Linux Kernel.
- Strong skills in Go, Rust and a scripting language like Python or Bash for building custom exporters, scripts and integrations.
- Technical understanding of SRE Practices with respect to providing stable services to customers and adhering to availability KPIs, Service Level Objectives, Service Level Indicators & conforming to target monthly error budget.
- Proven experience with multiple observability platforms (Prometheus/Grafana, ELK/Elastic, Dynatrace, etc.).
- Deep knowledge of manual and auto-instrumentation using OpenTelemetry SDK and Collector.
Hands-on experience with Kubernetes especially Openshift distro.
Proficiency with Ansible/ Rundeck/ Helm and integration of observability into build and deployment pipelines.
- Conversant with both ITIL & Agile ways of working.
How to Apply
https://egjd.fa.us6.oraclecloud.com/hcmUI/CandidateExperience/en/sites/CX/jobs/preview/915
Interested in this job?
7 days left to apply