> job detail
D
👑Data Leadership
SVP, Team Lead, SRE Engineer (Observability Platforms)
dbs · Singapore - East
// classified as
Data Leadership (Heads of data, directors, managers.)
posted
<1d ago
location
Singapore - East
languages
java, python
tools
docker, grafana
> stack
javapythondockergrafana
> description
As the Team Lead Site Reliability Engineering (SRE), you will lead a group of technical engineering staff to develop, maintain and scale enterprise Observability and Command Center platforms and services. This role requires a leader with a deep understanding of SRE principles, application of Agentic AI, Observability and a hands-on approach to problem-solving. The ideal candidate will have a strong background in observability, software engineering, data engineering, analytics, application of AI and a proven track record in driving reliability improvements in complex technical environments.
Key Responsibilities:
- Leadership & Strategy:
- Develop and execute the SRE strategy and roadmap for enhancing Observability capabilities in alignment with the bank's business goals and technological vision.
- Lead, mentor, and manage a team of SRE engineers, fostering a culture of collaboration, innovation, and continuous improvement.
- Reliability & Performance:
- Design, implement, and maintain robust situational awareness, monitoring and alerting, to ensure high availability and performance of banking services.
- Drive the adoption of best practices in system design, capacity planning, and performance optimization.
- Identify and mitigate potential risks to system reliability, proactively addressing issues before they impact customers.
- Engineering and Services:
- Develop and implement enterprise observability and monitoring strategies, platforms and services for the organization.
- Develop and implement enterprise command center platforms and services for managing incidents and situational awareness.
- Collaborate with cross-functional teams to establish monitoring tools and metrics, ensuring alignment with business objectives and goals.
- Automation & Tooling:
- Champion automation efforts to streamline operational processes, reduce manual intervention, and increase system efficiency.
- Develop and maintain tools and scripts for infrastructure management, deployment, and monitoring.
- Collaboration & Communication:
- Work closely with the application & infrastructure teams to ensure that reliability is built into the architecture and design of new features and services.
- Communicate reliability goals, progress, and challenges to executive leadership and other stakeholders.
- Promote a culture of transparency and accountability within the SRE team and across the organization.
Qualifications & Requirements:
- Education & Experience: o Bachelor's or Master’s degree in computer science, Engineering, or a related field.
- Minimum 10 years of experience in software engineering, data engineering, infrastructure management, or a related technical field.
- Minimum 5 years of experience in a leadership role within an SRE or DevOps team, preferably in the banking or financial services industry.
- Technical Skills:
- Proficiency in programming languages such as Python, Java, or similar.
- Deep knowledge of monitoring and observability tools (Grafana, ELK stack, etc.)
- Experience and good knowledge in building Agentic AI applications; including prompt engineering, RAG (Retrieval-Augmented Generation)
- Experience building web and workflow applications
- Strong understanding of containerization technologies (Docker, Kubernetes).
- Experience with CI/CD pipelines
- Good understanding and experience with ITIL processes and best practices
- Other Skills:
- Excellent leadership, mentoring, and team-building skills.
- Strong problem-solving and analytical abilities.
- Effective communication and interpersonal skills, with the ability to convey complex technical concepts to non-technical stakeholders.
- Strategic thinking and a proactive approach to identifying and addressing potential issues.
Location:
DBS Asia HubJob:
TechnologySchedule:
RegularEmployee Status:
Full time