- Manage and maintain cloud-based infrastructure primarily on Microsoft Azure.
- Operate and scale Kubernetes (AKS) clusters including monitoring, upgrades, and performance tuning.
- Ensure secure deployment and operations of cloud services in compliance with best practices and regulatory standards.
- Implement infrastructure as code (IaC) using tools such as Terraform, ARM templates, or Bicep.
- Work closely with development and security teams to build secure CI/CD pipelines and support DevSecOps practices.
- Troubleshoot and resolve system and network issues across cloud environments.
- Manage identity and access controls (e.g., Azure AD, RBAC) and enforce security policies.
- Maintain observability using tools like Prometheus, Grafana, Azure Monitor, and Log Analytics.
- Participate in on-call rotation and incident response as part of the Site Reliability Engineering (SRE) duties.
- Continuously evaluate new technologies and cloud services to improve scalability, reliability, and security.