Senior Site Reliability Engineer

apartmentGovernment Recruitment Service placeLondon calendar_month05/02/2025

If you would like to find out more about the role, the Site Reliability Engineering team and what it’s like to work at DBT, we are holding a Hiring Manager Q&A session for this role where you can virtually 'meet the team' on Monday 17th February at 12:30pm.

Please click here to book your spot.

About us

The Department for Business and Trade (DBT) has a clear mission - to grow the economy. Our role is to help businesses invest, grow and export to create jobs and opportunities right across the country. We do this in three ways.

Firstly, we help to build a strong, competitive business environment, where consumers are protected and companies rewarded for treating their employees properly.

Secondly, we open international markets and ensure resilient supply chains. This can be through Free Trade Agreements, trade facilitation and multilateral agreements.

Finally, we work in partnership with businesses every day, providing advance, finance and deal-making support to those looking to start up, invest, export and grow.

The Digital, Data and Technology (DDaT) directorate develops and operates tools and services to support us in this mission.

About the role

We are on a mission to build a new cutting-edge developer platform in AWS and migrate existing DBT services from GOV.UK PaaS in the process.

Can we rely on you to make us more reliable? We need Site Reliability Engineers (SREs) to make sure our internet services work as users expect.

Main responsibilities

As a Senior Site Reliability Engineer you will work to give development teams the tools for their job, including application performance monitoring, exception, log and metrics aggregation, dashboards, and declarative CI/CD (continuous integration/continuous delivery) pipelines.

You’ll evangelise product teams about service-level indicators, objectives, and error budgets, and negotiate them. You’ll help build and scale our global product platform and participate in an on-call rota for which you will receive an additional allowance.

Specific projects the team are working on include rolling out an observability tool to enhance system monitoring and incident response and streamlining deployment processes to reduce downtime and speed up feature delivery.

You will be using:

Amazon Web Services
Azure
AWS CodePipelines and AWS CodeBuild
Terraform & AWS Copilot (CloudFormation
Docker, Elastic Container Service (ECS) and Elastic Container Registry (ECR)
ElasticSearch/OpenSearch
Python and Django framework
PostgreSQL as a service (Amazon RDS)
Sentry
Redis/Elasticache

thumb_up_altRecommended

Staff SRE / Site Reliability Engineer - FinTech

placeLondon

Staff SRE / Site Reliability Engineer London / WFH to £140k Are you a Site Reliability Engineering technologist seeking a role where you can make the technology choices, influence strategy and remain hands-on? You could be progressing your career...

check_circleNew offer

Remote Site Reliability Engineer

apartmentPaymentologyplaceLondon

cloud platform, offering both shared and dedicated processing instances, vast global presence and richer, real-time data, set us apart as the leader in payments. We're on the hunt for an exceptional Site Reliability Engineer (SRE) to join our dedicated...

business_centerHigh salary

Site Reliability Engineer - London

apartmentAudioStackplaceLondon

We’re on a mission to democratize audio creation by building world-class audio infrastructure for our customers. As a Site Reliability Engineer, you’ll play a key role in improving our platform's developer operations including observability...

Best jobs you don't want to miss:

Acoustic Engineer Jobs in London

Aerospace Engineer Jobs in London 4 Urgent

Application Engineer Jobs in London 2 Urgent

Application Support Engineer Jobs in London

Architectural Engineer Jobs in London