Senior Site Reliability Engineer (hybrid) You would be playing a key role in ensuring the reliability, stability, scalability and security of our Logging Monitoring cloud systems and infrastructure. You will be designing, implementing, and testing highly automated solutions to shape the technology platform that fulfills our business and product vision, ultimately bringing value to our customers with positive user experiences. Key Responsibilities: End-to-end responsibility, from development to production, in designing, deploying, operating, and continuously improving performance and fault-tolerance of large-scale multi-cloud solutions. Ensure system security, data integrity, and high availability of the platform. Establish and improve monitoring, logging, and alerting frameworks to detect and resolve issues promptly. Keep up with technology trends and identify promising new solutions that meet our requirements. Create technical support documentation and provide hands-on troubleshooting and consulting to our customers. About the team Our Logging Monitoring squad develops and operates state-of-the-art logging, monitoring and event management platforms to collect application behaviour information, detect/limit service disruptions and provide the associated reporting capabilities. Our ambition is to help empower the developers, application and platform owners identify any growing risks, have a clear understanding of their SLAs, reduce the mean time to resolution, and be ahead of the curve with regards to long-term trends. About you We are happy to meet you if you possess: Experience in software development, continuous integration/deployment, and system engineering experience in large-scale, distributed cloud solutions. Hands-on expertise in open-source application and infrastructure monitoring tools, e.g., ELK and/or TICK stack, Prometheus and Grafana. Experience in Distributed Tracing with OpenTelemetry and Observability platforms. Hands-on expertise in container orchestration systems such as Kubernetes running in a hybrid cloud environment such as Azure and VMWare. Experience programming in one or more of the following languages: Go, Java, Python, and in scripting languages (Shell or PowerShell). Passion for sharing knowledge, through interactive sessions as well as documentation. Strong analytical and problem-solving skills, as well as the ability to focus on details without losing track of the bigger picture. Excellent oral and written English skills; additional language skills are a plus. Nobody is perfect and meets 100% of our requirements. If you, however, meet some of the criteria below and are curious about the world of observability, we'll be more than happy to meet you! #J-18808-Ljbffr
demandante de empleo
reclutador