SRE — Site Reliability engineering
Aspects of Software Engineering
Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to IT infrastructure and operations.Site reliability engineering has been described as a specific implementation of DevOps. The main objectives are to create highly reliable and scalable software systems.The field of site reliability engineering originated at Google with Ben Treynor Sloss, who founded a site reliability team after joining the company in 2003.
SRE Lifecycle
Automation : “SRE is what you get when you treat operations as if it’s a software problem”. This means applying software engineering principles like automation of repeatable tasks, test automation, source control of changes, and more to operations challenges.
SLIs are a quantitative measure, typically provided through your APM platform. Traditionally, these refer to either latency or availability, which are defined as response times, including queue/wait time, in milliseconds.Service level objectives become the common language for cross-functional teams to set guardrails and incentives to drive high levels of service reliability.An error budget is the percentage of remaining wiggle room you have in terms of your SLO.
Performance Engineering: Performance Engineering involves checking the speed, reliability, scalability, stability, response time, and resource use of an application under the anticipated workload. Performance engineering delivers end-to-end system optimization through a continuous testing and monitoring process. This shifts-left performance and load testing into the development process.
Change Management:Change management is a systematic approach to dealing with the transition or transformation of an organization’s goals, processes or technologies. The purpose of change management is to implement strategies for effecting change, controlling change.
Incident Management: ITSM(IT service management Framework)defines how teams design, create, and deliver their services. It is much more than just IT support. ITSM is the policies, processes, and structure behind the lifecycle of IT services. ITSM is one of the practices of the Information Technology Infrastructure Library, or ITIL.
Monitoring and observabitlity:Observability is the practice of monitoring your system in a manner where you can detect and diagnose issues as they happen. The goal of observability is to provide visibility into all aspects of your system to identify and fix issues before they cause customer-facing problems.
Devops Focused:DevOps relies on continuous integration and continuous delivery (CI/CD) pipelines and automation tools to update applications and maintain consistency across different software versions and deployment environments.
Devops vs SRE
- Goal. Both SRE and DevOps aim to bridge the gap between development and operations, though SRE involves prescriptive ways of achieving reliability, while DevOps works as a template that guides collaboration.
- Approach: DevOps is a cross-functional approach, and Site Reliability Engineering (SRE) is an approach to IT operations that treats your production environment.
- Automation:SRE teams work together with the goal of automating tasks and improving the overall functioning of the organization.DevOps can do so by automating tasks such as creating buildpacks, monitoring application health, and provisioning/deploying software according to standards and best practices.
- Use Cases : DevOps is often applied in agile software development projects.SRE is used with the infrastructure practices. SREs use this model to design, build, run, monitor, and improve their systems.