← back to jobs
> job detail
D
👑Data Leadership

SVP, Team Lead, SRE Engineer (Observability Platforms)

dbs · Singapore - East
// classified as
Data Leadership (Heads of data, directors, managers.)
posted
<1d ago
location
Singapore - East
languages
java, python
tools
docker, grafana
> stack
javapythondockergrafana
> description

As the Team Lead Site Reliability Engineering (SRE), you will lead a group of technical engineering staff to develop, maintain and scale enterprise Observability and Command Center platforms and services. This role requires a leader with a deep understanding of SRE principles, application of Agentic AI, Observability and a hands-on approach to problem-solving.  The ideal candidate will have a strong background in observability, software engineering, data engineering, analytics, application of AI and a proven track record in driving reliability improvements in complex technical environments.

Key Responsibilities:

  • Leadership & Strategy:
    • Develop and execute the SRE strategy and roadmap for enhancing Observability capabilities in alignment with the bank's business goals and technological vision.
    • Lead, mentor, and manage a team of SRE engineers, fostering a culture of collaboration, innovation, and continuous improvement.
  • Reliability & Performance:
    • Design, implement, and maintain robust situational awareness, monitoring and alerting, to ensure high availability and performance of banking services.
    • Drive the adoption of best practices in system design, capacity planning, and performance optimization.
    • Identify and mitigate potential risks to system reliability, proactively addressing issues before they impact customers.
  • Engineering and Services:
    • Develop and implement enterprise observability and monitoring strategies, platforms and services for the organization.
    • Develop and implement enterprise command center platforms and services for managing incidents and situational awareness.
    • Collaborate with cross-functional teams to establish monitoring tools and metrics, ensuring alignment with business objectives and goals.
  • Automation & Tooling:
    • Champion automation efforts to streamline operational processes, reduce manual intervention, and increase system efficiency.
    • Develop and maintain tools and scripts for infrastructure management, deployment, and monitoring.
  • Collaboration & Communication:
    • Work closely with the application & infrastructure teams to ensure that reliability is built into the architecture and design of new features and services.
    • Communicate reliability goals, progress, and challenges to executive leadership and other stakeholders.
    • Promote a culture of transparency and accountability within the SRE team and across the organization.

Qualifications & Requirements:

  • Education & Experience: o Bachelor's or Master’s degree in computer science, Engineering, or a related field.
    • Minimum 10 years of experience in software engineering, data engineering, infrastructure management, or a related technical field.
    • Minimum 5 years of experience in a leadership role within an SRE or DevOps team, preferably in the banking or financial services industry.
  • Technical Skills:
    • Proficiency in programming languages such as Python, Java, or similar.
    • Deep knowledge of monitoring and observability tools (Grafana, ELK stack, etc.)
    • Experience and good knowledge in building Agentic AI applications; including prompt engineering, RAG (Retrieval-Augmented Generation)
    • Experience building web and workflow applications
    • Strong understanding of containerization technologies (Docker, Kubernetes).
    • Experience with CI/CD pipelines
    • Good understanding and experience with ITIL processes and best practices
  • Other Skills:
    • Excellent leadership, mentoring, and team-building skills.
    • Strong problem-solving and analytical abilities.
    • Effective communication and interpersonal skills, with the ability to convey complex technical concepts to non-technical stakeholders.
    • Strategic thinking and a proactive approach to identifying and addressing potential issues.

Location:

DBS Asia Hub

Job:

Technology

Schedule:

Regular

Employee Status:

Full time