Site Reliability Engineering (SRE) Practitioner
SM
Course Description
DURATION - 24 Hours
Introduces a range of practices for advancing service reliability engineering through a mixture
of automation, organizational ways of working and business alignment. Tailored for those
focused on large-scale service scalability and reliability.
OVERVIEW
The SRE (Site Reliability Engineering) Practitioner course introduces ways to economically and
reliably scale services in an organization. It explores strategies to improve agility,
cross-functional collaboration and transparency of health of services towards building resiliency
by design, automation and closed loop remediations.
The course aims to equip participants with the practices, methods, and tools to engage people
across the organization involved in reliability through the use of real-life scenarios and case
stories. Upon completion of the course, participants will have tangible takeaways to leverage
when back in the office such as implementing SRE models that fit their organizational context,
building advanced observability in distributed systems, building resiliency by design and
effective incident responses using SRE practices.
The course is developed by leveraging key SRE sources, engaging with thought-leaders in the
SRE space and working with organizations embracing SRE to extract real-life best practices and
has been designed to teach the key principles & practices necessary for starting SRE adoption.
This course positions learners to successfully complete the SRE Practitioner certification exam.
COURSE OBJECTIVES
At the end of the course, the following learning objectives are expected to be achieved:
1. Practical view of how to successfully implement a flourishing SRE culture in your
organization.
2. The underlying principles of SRE and an understanding of what it is not in terms of
anti-patterns, and how do you become aware of them to avoid them.
3. The organizational impact of introducing SRE.
4. Acing the art of SLIs and SLOs in a distributed ecosystem, and extending the usage of Error
Budgets beyond the normal to innovate and avoid risks.
5. Building security and resilience by design in a distributed, zero-trust environment.
6. How do you implement full stack observability, distributed tracing and bring about an
Observability-driven development culture?
©DevOps Institute SREP v1.0 Course Description July 2021