高级SRE工程师15k-30k · 14薪
福建本科及以上5-10年K8Sazure
Description
The platform team is seeking an experienced Site Reliability Engineer (SRE) to meet rapid expansion of our business. You need to be highly sensitive to system reliability, and keen on identifying/resolving system risks to keep the system working well. In the platform team, you will be involved in provisioning, maintaining infrastructure, proposing solutions for the system, and working online with people from different countries.
Responsibilities:
• Participate in on-call duty to respond/investigate/resolve system incidents or handle support tickets for application teams.
• Pay attention to alarms in the monitoring system, provide timely feedback, and solve problems.
• Design, implement, and govern infrastructure to achieve high availability & scalability.
• Evaluate and research technical initiatives with complete plans including documentation, provisioning, testing, and monitoring.
• Construct service quality system, lead the team to complete indicator quantification.
Required Skills and Qualifications:
• Good English communication and writing skills, learning ability, and hands on skills.
• Proficiency with Azure (Azure resources, network models, and best practices).
• More than 2 years of experience in managing AKS/Kubernetes.
• Familiar with Infrastructure as Code, Terraform preferred.
• Familiar with CI/CD automation.
• Familiar with observability technologies, like Prometheus, and Grafana.
• Familiar with several of following middleware: Kafka, MySQL, Mongo, Elasticsearch, and Redis.
Nice to Have:
• CKA, CKAD Certificate is a plus.
• Certificates related to Cloud Native/ Ops and Maintenance Qualifications is a plus.
• Familiar with Java or Go.