Principal Site Reliability Engineer

Department: DevOps
Employment Type: Full time
Location: North East England
Amplience is looking to grow and expand its business to become one of the best headless content & media delivery platforms in the market. Helping to achieve this will be a core part of the Principle Site Reliability Engineers role. Ensuring that Amplience has a platform that is reliable and can scale to meet future demands in a rapidly expanding business. The SRE will be responsible for implementing SLO’s and SLI’s to meet current and future business SLA’s. Defining a monitoring and capacity planning strategy that allows Amplience to plan 6-12 months ahead. Ensuring that agreed uptimes are met and identifying and raising current and future risk areas with stakeholders. The SRE will work with the stakeholders to identify which services would benefit from having SLOs in place. They will then document and share the agreed SLOs, error budgets and error budget policies. The SRE will define and agree the supporting SLI’s that can be used to measure and report against.  As part of the SRE role this will be a constantly evolving process as new services are introduced and challenges arise, while actively looking for potential problem areas. The SRE will be required to work closely with all areas of the business including Operations, Engineering and Customer Success to ensure an holistic view to expectations and performance.

Key Responsibilities

  • Taking ownership of defining, implementing and maintaining the SLIs and SLOs.
  • Managing team of SREs in order to deliver the SLIs and SLOs.
  • Ensuring releases are scoped within the SLIs/
  • SLOs and managing the change process where updates are necessary.
  • Work with the Product, Engineering and other stakeholders to ensure alignment of and correctness of Objectives.
  • Drive improvement processes across the teams.
  • Focus on reducing MTTR - and maintaining the error budget.
  • Maintaining and improving our observability tools.
  • Taking ownership of the Services.
  • Gate keeper of the production environments -ensuring changes are honest and don't impact reliability, scalability, performance etc.

Skills Knowledge and Expertise

  • Experience working with Infrastructure and Application Monitoring tools: Cloudwatch,Prometheus, Grafana, Kibanba, DataDog, etc
  • Experience with monitoring, instrumentation and metrics that clearly describe service behaviours.
  • Experience defining and implementing incident response management processes.
  • Thorough understanding of automation and orchestration principles
  • Use of profilers, APM, tracing.
  • Expert knowledge of AWS, Ideally AWS Professional level certified - at least be able to demonstrate professional level experience.
  • SRE/DevOps experience and comfortable operating software in a Linux based 
    environment.

Benefits

  • Competitive salary
  • Flexible working arrangements
  • Discretionary bonus scheme
  • Company pension scheme
  • Employee share options so that everyone can benefit from our success
  • Enhanced maternity & paternity policies
  • Extra holidays once you've been with us for a while
  • The option to purchase additional holidays
  • Charity / volunteer days
  • Life assurance policy
  • Ride to work scheme
  • Season ticket advance loans

About Amplience

Amplience is an API-first, headless CMS and DAM in one: a unified platform for commerce content that does everything you need it to. Organize, find and enrich all your assets from a central library. Optimize and automate your product media, images and videos.

Plan, schedule, produce and deliver customer experiences. Do it all from the same platform.

And do more of it. Better, and faster, than ever.

Our Hiring Process

  • Applied
  • Review Application
  • Recruitment Interview
  • Technical Interview
  • 2nd Technical Interview
  • 3rd Interview
  • Hired

Not quite right?

Register your interest to be notified of any roles that come along that meet your criteria.

Register Your Interest