Slogan in SRE
SRE: Site Reliability Engineering 很多章節開頭都有一段很有深意的 標語 (slogan)
Ch1 Introduction
Hope is not a strategy.
Ch5 Eliminating Toil
如果系統正常運轉中需要人工干預,應該將此視為一種 Bug。『正常』的定義會隨著系統的進步而不斷改變。
If a human operator needs to touch your system during normal operations, you have a bug. e de nition of normal changes as your systems grow.
Ch7 The Evolution of Automation at Google
Besides black art, there is only automation and mechanization.
– Federico García Lorca (1898–1936), Spanish poet and playwright
Ch9 Simplicity
The price of reliability is the pursuit of the utmost simplicity.
– C.A.R. Hoare, Turing Award lecture
Ch10 Practical Alerting from Time-Series Data
May the queries flow, and the pager stay silent.
– Traditional SRE blessing
Ch12 Effective Troubleshooting
Be warned that being an expert is more than understanding how a system is supposed to work. Expertise is gained by investigating why a system doesn’t work.
– Brian RedmanWays in which things go right are special cases of the ways in which things go wrong.
– John Allspaw
Ch13 Emergency Response
Things break; that’s life.
Ch14 Managing Incidents
Effective incident management is key to limiting the disruption caused by an incident and restoring normal business operations as quickly as possible.
Ch15 Postmortem Culture: Learning from Failure
The cost of failure is education.
– Devin Carraway
Ch17 Testing for Reliability
If you haven’t tried it, assume it’s broken.
類似想法: 輕鬆聊系統測試 (SVT) 的三兩事
- FVT (Functional Verification Test): 假設功能都還沒準備好
- SVT (System Verfication Test): 假設功能都好了 (Non-Functional)
Ch22 Addressing Cascading Failures
If at first you don’t succeed, back off exponentially.
– Dan Sandler, Google Software EngineerWhy do people always forget that you need to add a little jitter?
– Ade Oshineye, Google Developer Advocate
- Study Notes - SRE Opening and Chapter 1
- Resource Provisioning and DevOps
- 淺談系統監控與 CloudWatch 的應用 - AWS User Group Taiwan
- What is Ops?
- 輕鬆聊系統測試 (SVT) 的三兩事