Site Reliability Engineer with Python

apartmentNexus Jobs Limited placeLondon descriptionPermanent calendar_month30/10/2024

Job Description

Our Client looking to bring on a site reliability engineer to help deploy, manage, troubleshoot, and enhance our complex cloud-based set of internal tools and externally managed services for a variety of users across our wide-ranging organization.

You will have at least 7 to 10 years hands-on expertise working as a Site Reliability Engineer.

You will work closely with IT, product, and engineering to extend and maintain this set of tools and services and to help debug and resolve problems.

In addition, the ideal candidate will proactively look for system weaknesses and find ways to resolve them before they can cause production issues via monitoring and data we aggregate through various tools within our organization's IT & DevOps toolkit.

Responsibilities

Keep our suite of internal apps and services up and running or getting it back up and running quickly if a failure were to occur
Be the technical point person of operational responsibility for two core platforms (one mobile and one web application) i.e. engaging as appropriate upon escalations from the IT support group whether it be problem solving, addressing production issues, enhancing features etc. - collaborating with engineers and others as needed
Work closely with internal partners and teams as well as external vendors to ensure that we ship software that meets our code quality, security and performance requirements
Write, update, and use our documentation, including runbooks and/ or playbooks
Help automate existing or build new internal workflows including ongoing infrastructure needs, testing, failover mitigations, and more
Debug complex problems across our entire web and mobile application stack and advise key stakeholders on solutions, as well as implement said solutions if appropriate.
Further our internal CI/ CD processes to improve release cadence and developer experience
Participate in the daily / weekly software development process (standups, sprint planning, retros, issue tracking, etc.)
Actively lead any critical issue post-mortem processes, including coordination of any meetings and further steps to take

Qualifications
li>

7+ years experience with software engineering, software development, and/ or system operations
Experience debugging complex problems and implementing timely cost-effective solutions
Experience designing, building, and operating large-scale production systems
Deep knowledge of Python is preferred, though other languages like Java, Go, Rust, or similar will also be heavily considered
Experience using source control (Git, GitHub) and feature branching strategies
Experience with a variety of open-source databases (MySQL, Postgres, Redis, etc.)
Experience with DevOps engineering and working with container orchestration, such as with Docker or Kubernetes
Experience with log monitoring and observability via platforms like Sumologic or Cloudwatch
Experience automating infrastructure, testing, and deployments using tools like CircleCI Configuration management tooling and infrastructure as code knowledge is preferred but not required
Experience working with AWS services, with knowledge of Azure / Google ecosystems helpful but not required
Strong familiarity with general modern web and mobile application development, including hands-on experience working with JavaScript (Typescript preferred) and Python stacks
Cross functional team collaboration experience, especially working with engineers and user experience / product designers, as well as external stakeholders
Strong skills for weighing and managing scope, risk, quality and timelines
Strong focus on quality, security, performance, and end user experience ul>

This is an exciting position with an exciting organisation based in Central London and New York.

The position can be London or New York based.

The salary for this position will be circa £80K - £100K.

Do send your CV to us in Word format along with your salary and notice period.

electric_boltImmediate start

London - Automation Reliability Engineer, Support and Maintenance Solutions (Hiring Immediately)

apartmentAmazon TAplaceLondon

Robotics portfolio of automated fulfillment and sortation systems. You will work with cross-functional engineering and product support teams engaged in continuous improvement initiatives to enhance processes around maintenance and reliability...

thumb_up_altRecommended

Reliability Engineer, Amazon Logistics - East London (Hiring Immediately)

apartmentAmazon TAplaceLondon

Job summary The Reliability Maintenance Engineering (RME) team at Amazon is fundamental to our operations – they’re the ones keeping vital machinery running at all times. As an RME Technician, you’ll maintain a wide range of equipment and workspaces...

starFeatured

Robotics Systems Engineer, Reliability and Automation Engineering Team (RAE) (Hiring Immediately)

apartmentAmazon TAplaceLondon

Best jobs you don't want to miss:

Automation Engineer Jobs in London

Acoustic Engineer Jobs in London 6 Urgent

Application Engineer Jobs in London 7 Urgent

Architectural Engineer Jobs in London

Application Support Engineer Jobs in London