← back to jobs
> job detail
C
👽Other

Senior Staff Cloud Backend Engineer

Coupang Internal · Bengaluru
// classified as
Other (Adjacent or hard to classify.)
posted
14d ago
location
Bengaluru
languages
tools
aws, azure, docker
> stack
awsazuredockergrafanakubernetes
> description
<p><strong><span data-contrast="auto"> </span></strong><span data-ccp-props="{&quot;134233279&quot;:true,&quot;134245417&quot;:false}">&nbsp;</span></p> <p><strong><span data-contrast="auto">Please complete the attached<span class="Apple-converted-space">&nbsp;</span></span></strong><a href="https://coupang.service-now.com/sp?id=kb_article&amp;sysparm_article=KB0010204"><strong><span data-contrast="none"><span data-ccp-charstyle="Hyperlink">Internal Transfer Request Form</span></span></strong></a><strong><span data-contrast="auto"> and submit.  </span></strong><span data-ccp-props="{&quot;134233279&quot;:true,&quot;134245417&quot;:false}">&nbsp;</span></p> <p><strong><span data-contrast="auto">Please make sure to<span class="Apple-converted-space">&nbsp;</span></span></strong><span style="text-decoration: underline;"><strong><span data-contrast="auto">apply with your Coupang e-mail address</span></strong></span><strong><span data-contrast="auto">.  </span></strong><span data-ccp-props="{&quot;134233279&quot;:true,&quot;134245417&quot;:false}">&nbsp;</span></p> <p><strong><span data-contrast="auto">  </span></strong><span data-ccp-props="{&quot;134233279&quot;:true,&quot;134245417&quot;:false}">&nbsp;</span></p> <hr> <p><strong><span data-contrast="auto"> </span></strong><strong><span data-contrast="auto">Company Introduction</span></strong><span data-contrast="auto"> </span><span data-ccp-props="{&quot;134233279&quot;:true,&quot;134245417&quot;:false}">&nbsp;</span></p> <p><span data-contrast="auto">We exist to wow our customers. We know we’re doing the right thing when we hear our customers say, “How did we ever live without Coupang?” Born out of an obsession to make shopping, eating, and living easier than ever, we’re collectively disrupting the multi-billion-dollar e-commerce industry from the ground up. We are one of the fastest-growing e-commerce companies that established an unparalleled reputation for being a dominant and reliable force in South Korean commerce. </span><span class="Apple-converted-space">&nbsp;</span><span data-ccp-props="{&quot;134233279&quot;:true,&quot;134245417&quot;:false}">&nbsp;</span></p> <p><span data-contrast="none">We are proud to have the best of both worlds — a startup culture with the resources of a large global public company. This fuels us to continue our growth and launch new services at the speed we have been since our inception. We are all entrepreneurs surrounded by opportunities to drive new initiatives and innovations. At our core, we are bold and ambitious people that like to get our hands dirty and make a hands-on impact. At Coupang, you will see yourself, your colleagues, your team, and the company grow every day.</span><span class="Apple-converted-space">&nbsp;</span><span data-ccp-props="{&quot;134233279&quot;:true,&quot;134245417&quot;:false}">&nbsp;</span></p> <p><span data-contrast="auto">Our mission to build the future of commerce is real. We push the boundaries of what’s possible to solve problems and break traditional tradeoffs. Join Coupang now to create an epic experience in this always-on, high-tech, and hyper-connected world. </span><span data-ccp-props="{&quot;134233279&quot;:true,&quot;134245417&quot;:false}"> <br></span><span data-contrast="auto"> </span><span data-ccp-props="{&quot;134233279&quot;:true,&quot;134245417&quot;:false}">&nbsp;</span></p> <p><strong><span data-contrast="auto">Role Overview&nbsp;</span></strong></p> <p><span data-contrast="auto">As a Senior Staff Data Centre Observability and Site Reliability Engineer, you will</span>&nbsp;design, build, and operate scalable observability and reliability solutions for large-scale datacenter infrastructure. This role focuses on developing high-performance monitoring and telemetry platforms, ensuring system reliability, and driving operational excellence through automation, performance optimization, and SRE best practices. The ideal candidate will work across the full service lifecycle—design, deployment, and continuous improvement—while collaborating with cross-functional teams to enhance visibility, resilience, and efficiency of critical systems.</p> <p><strong><span data-contrast="auto">What You Will Do</span></strong></p> <div> <h4>Observability and Monitoring</h4> <ul> <li>Design, implement, and maintain observability solutions for datacenter infrastructure, including monitoring, logging, alerting, and telemetry systems.</li> <li>Develop, deploy, and operate large-scale observability and telemetry platforms with a focus on real-time monitoring, high performance, and scalability.</li> <li>Own and contribute to the full lifecycle of observability services—from design and development to deployment and ongoing optimization.</li> <li>Build and enhance monitoring systems to ensure high availability, reliability, and performance of infrastructure.</li> <li>Create and manage dashboards, alerts, and reports to provide clear visibility into system health, performance, and capacity trends.</li> </ul> <h4><strong>Site Reliability Engineering (SRE)</strong></h4> <ul> <li>Apply SRE principles and best practices to improve reliability, scalability, and operational efficiency of datacenter services.</li> <li>Develop and maintain automation for infrastructure provisioning, monitoring, and system management.</li> <li>Lead root cause analysis (RCA) and post-incident reviews, driving corrective actions to prevent recurrence and improve system resilience.</li> </ul> <h4><strong>Performance Optimization</strong></h4> <ul> <li>Analyze system and application performance across the datacenter infrastructure to identify bottlenecks and improvement areas.</li> <li>Implement optimization strategies to enhance performance, efficiency, and resource utilization.</li> </ul> <h4><strong>Collaboration</strong></h4> <ul> <li>Partner with cross-functional engineering teams to understand observability and reliability requirements and deliver effective solutions.</li> <li>Collaborate with hardware and software vendors to evaluate, integrate, and optimize new technologies within the ecosystem.</li> </ul> <h4><strong>Security and Compliance</strong></h4> <ul> <li>Ensure observability and reliability solutions adhere to organizational security policies and industry standards.</li> <li>Implement and maintain appropriate security controls to safeguard infrastructure, systems, and data.</li> </ul> <h4><strong>Troubleshooting and Support</strong></h4> <ul> <li>Provide hands-on support for observability and reliability issues, including debugging complex hardware and software problems.</li> <li>Develop and maintain documentation, including troubleshooting guides and operational best practices, to support efficient issue resolution.</li> </ul> <h4><strong>Continuous Improvement</strong></h4> <ul> <li>Stay current with emerging trends, tools, and technologies in observability and SRE, and incorporate them into the platform.</li> <li>Continuously enhance the scalability, reliability, and operational efficiency of datacenter services through proactive improvements.</li> </ul> </div> <p>&nbsp;</p> <p><strong><span data-contrast="auto">Basic </span></strong><strong><span data-contrast="auto">Qualifications</span></strong></p> <ul> <li>Bachelor’s or&nbsp;Master’s degree in Computer Science, Engineering, or&nbsp;a related&nbsp;technical field.&nbsp;</li> </ul> <ul> <li>12+ years of progressive software engineering experience, with a heavy emphasis on distributed systems, cloud-native architectures, or platform operations.&nbsp;</li> <li> <p data-pm-slice="1 1 []">Proven experience in managing and optimizing large-scale datacenter environments</p> </li> </ul> <ul> <li>Strong&nbsp;proficiency&nbsp;in&nbsp;<strong>Go</strong>&nbsp;or&nbsp;<strong>Python</strong>, with a deep understanding of networked systems and performance optimization.&nbsp;</li> </ul> <ul> <li>Expert-level knowledge of&nbsp;<strong>Kubernetes</strong>&nbsp;internals (scheduling, controllers) and containerization ecosystems.&nbsp;</li> </ul> <ul> <li>Proven experience with load balancing, service mesh, and request routing at scale.&nbsp;</li> <li> <p data-pm-slice="1 1 []">Proficiency in observability tools and technologies (e.g., Prometheus, Grafana, ELK Stack).</p> </li> <li> <p data-pm-slice="1 1 []">Experience with SRE practices and tools (e.g., Kubernetes, Docker, Terraform).</p> </li> <li> <p data-pm-slice="1 1 []">Familiarity with cloud platforms (AWS, Azure, GCP) and their observability and reliability services</p> </li> </ul> <p>&nbsp;</p> <p><strong><span data-contrast="auto">Preferred Qualifications&nbsp;</span></strong></p> <ul> <li>Prior experience building infrastructure specifically for LLM inference or large-scale training clusters.&nbsp;</li> </ul> <ul> <li>Familiarity&nbsp;with&nbsp;inference, including mixed precision,&nbsp;kernel tuning, or custom hardware accelerators.&nbsp;</li> </ul> <ul> <li>Experience managing hybrid-cloud or multi-AZ&nbsp;deployments across AWS, Azure, or GCP.&nbsp;</li> </ul> <ul> <li>Experience&nbsp;operating&nbsp;in regulated environments with strict security and compliance requirements</li> </ul> <p>&nbsp;</p> <p><strong><span data-contrast="auto">Type of work model</span></strong>&nbsp;</p> <ul> <li data-leveltext="" data-font="Symbol" data-listid="38" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:880,&quot;335559991&quot;:440,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><span data-contrast="auto">Hybrid</span><span data-ccp-props="{&quot;134245417&quot;:false}"><br></span><span data-contrast="auto">Our Hybrid work model: Coupang hybrid work model is designed to enable a culture of collaboration that acts a catalyst to enrich the experience of employees. Employees are required to work at least 3 days in the office per week, with the flexibility to work from home 2 days a week, depending on the role requirement. Some businesses may require more time in office due to nature of work.&nbsp;</span><span data-ccp-props="{&quot;134245417&quot;:false}">&nbsp;<br><br></span></li> </ul> <p><strong><span data-contrast="auto">Details to consider</span></strong><span data-ccp-props="{&quot;134245417&quot;:false}">&nbsp;</span></p> <ul> <li data-leveltext="" data-font="Symbol" data-listid="38" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:880,&quot;335559991&quot;:440,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="2" data-aria-level="1"><span data-contrast="auto">Those eligible for employment protection (recipients of veteran’s benefits, the disabled, etc.) may receive preferential treatment for employment in accordance with applicable laws.&nbsp;<br></span><span data-ccp-props="{&quot;134245417&quot;:false}">&nbsp;</span></li> </ul> <p><strong><span data-contrast="none">Privacy Notice</span></strong><strong><span data-contrast="none">&nbsp;</span></strong><span data-ccp-props="{&quot;134233279&quot;:true,&quot;134245417&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}">&nbsp;</span></p> <ul> <li data-leveltext="" data-font="Symbol" data-listid="35" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><span data-contrast="none">Your personal information will be collected and managed by Coupang as stated in the Application Privacy Notice located below. </span><a href="https://privacy.coupang.com/en/land/jobs/"><span data-contrast="none"><span data-ccp-charstyle="Hyperlink">https://privacy.coupang.com/en/land/jobs/</span></span></a><span data-ccp-props="{&quot;134233279&quot;:true,&quot;134245417&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}">&nbsp;</span></li> </ul> <p>&nbsp;</p> <hr> <p><span data-contrast="auto"> </span><span data-ccp-props="{&quot;134233279&quot;:true,&quot;134245417&quot;:true}">&nbsp;</span></p> <p><strong><span data-contrast="auto">Please complete the attached<span class="Apple-converted-space">&nbsp;</span></span></strong><a href="https://coupang.service-now.com/sp?id=kb_article&amp;sysparm_article=KB0010204"><strong><span data-contrast="none"><span data-ccp-charstyle="Hyperlink">Internal Transfer Request Form</span></span></strong></a><strong><span data-contrast="auto"> and submit. </span></strong><span data-contrast="auto"> </span><span data-ccp-props="{&quot;134233279&quot;:true,&quot;134245417&quot;:true}">&nbsp;</span></p> <p><strong><span data-contrast="auto">Please make sure to<span class="Apple-converted-space">&nbsp;</span></span></strong><span style="text-decoration: underline;"><strong><span data-contrast="auto">apply with your Coupang e-mail address</span></strong></span><strong><span data-contrast="auto">.</span></strong><strong><span data-contrast="auto"> </span></strong><span data-contrast="auto"> </span><span data-ccp-props="{&quot;134233279&quot;:true,&quot;134245417&quot;:true}">&nbsp;</span></p> <p><span data-contrast="auto"> </span><span data-ccp-props="{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;134233279&quot;:true,&quot;134245417&quot;:true}">&nbsp;</span></p>