> job detail
S
👽Other
Senior Site Reliability Engineer (SRE) – AWS
Sails Software Inc · Andhra Pradesh, Visakhapatnam, India
// classified as
Other (Adjacent or hard to classify.)
posted
1d ago
location
Andhra Pradesh, Visakhapatnam, India
languages
bash, python
tools
aws, docker, kubernetes
> stack
bashpythonawsdockerkubernetess3terraform
> description
<div>SRE- AWS<br>7+ years <br>Vizag Only- Onsite <br><br>We are looking for a highly experienced Senior SRE with strong expertise in AWS to help design, operate, and scale the infrastructure powering our product platforms. This is a mission-critical role in a fast-moving product development environment, where system reliability, automation, and performance are core business drivers.<br><br>Key Responsibilities<br>Reliability & Operations<br>Own reliability, availability, and performance of large-scale production systems.<br>Establish SLOs, SLAs, and error budgets for mission-critical services.<br>Lead incident response, root cause analysis, and continuous improvement initiatives.<br>Design fault-tolerant architectures and disaster recovery strategies.<br><br>Cloud & Infrastructure Engineering<br>Architect, deploy, and manage infrastructure on AWS using IaC (Terraform / CloudFormation).<br>Optimize cloud costs while maintaining performance and reliability.<br>Implement multi-region, highly available architectures.<br>Manage container platforms (Docker, Kubernetes, EKS).<br><br>Automation & DevOps<br>Build automation pipelines for infrastructure provisioning, deployment, and scaling.<br>Improve CI/CD pipelines and release engineering processes.<br>Develop tools and scripts to reduce operational toil.<br><br>Observability & Performance<br>Implement comprehensive monitoring, logging, and alerting systems.<br>Drive performance tuning and capacity planning.<br>Lead chaos engineering and resilience testing practices.<br><br>Leadership & Mentorship<br>Mentor SREs and DevOps engineers.<br>Partner with Engineering and Product teams to embed reliability into product design.<br><br>Required Skills & Experience<br>7+ years in Site Reliability Engineering / DevOps / Infrastructure roles.<br>Deep hands-on experience with AWS services (EC2, EKS, RDS, S3, Lambda, VPC, IAM, etc.).<br>Expertise in infrastructure as code: Terraform, CloudFormation.<br>Strong experience with Linux systems, networking, and distributed systems.<br>Experience with Kubernetes, container orchestration, and microservices environments.<br>Strong scripting skills (Python, Bash, Go).<br>Knowledge of security best practices and compliance requirements.<br><br>Soft Skills<br>Strong problem-solving and decision-making ability under pressure.<br>Excellent communication and stakeholder collaboration.<br>High ownership and accountability mindset.<br>Ability to thrive in an aggressively-paced product development culture.<br><br>Education<br>Bachelor’s degree in Computer Science, Engineering, or related field (preferred).</div>