Slogan in SRE
SRE: Site Reliability Engineering 很多章節開頭都有一段很有深意的 標語 (slogan)
,這些話都很簡單,卻也讓人省思。
Ch1 Introduction
Hope is not a strategy.
不能將運氣當作戰略
Ch5 Eliminating Toil
如果系統正常運轉中需要人工干預,應該將此視為一種 Bug。『正常』的定義會隨著系統的進步而不斷改變。
If a human operator needs to touch your system during normal operations, you have a bug. e de nition of normal changes as your systems grow.
Ch7 The Evolution of Automation at Google
Besides black art, there is only automation and mechanization.
黑科技之外,就只剩下自動化和機械化了。
– Federico García Lorca (1898–1936), Spanish poet and playwright
Ch9 Simplicity
The price of reliability is the pursuit of the utmost simplicity.
可靠性只有靠最大程度的簡化,不斷追求而得到
– C.A.R. Hoare, Turing Award lecture
Ch10 Practical Alerting from Time-Series Data
May the queries flow, and the pager stay silent.
讓查詢來得更猛烈些吧,讓尋呼機永遠保持沈默
– Traditional SRE blessing
Ch12 Effective Troubleshooting
Be warned that being an expert is more than understanding how a system is supposed to work. Expertise is gained by investigating why a system doesn’t work.
值得警惕的是,理解一個系統應該如何工作並不能使人成為專家。只有靠調查系統為何不能正常工作才行。
– Brian RedmanWays in which things go right are special cases of the ways in which things go wrong.
系統正常,只是該系統無數異常情況下的一種特例。
– John Allspaw
Ch13 Emergency Response
Things break; that’s life.
東西早晚要壞的,這就是生活。
Ch14 Managing Incidents
Effective incident management is key to limiting the disruption caused by an incident and restoring normal business operations as quickly as possible.
有效地景及事故管理,是控制事故影響和迅速恢復營運的關鍵因素。
Ch15 Postmortem Culture: Learning from Failure
The cost of failure is education.
學習是避免失敗的最好辦法。
– Devin Carraway
Ch17 Testing for Reliability
If you haven’t tried it, assume it’s broken.
如果你還沒開始親自測試過某件東西,那麼就假設他是壞的。
類似想法: 輕鬆聊系統測試 (SVT) 的三兩事
- FVT (Functional Verification Test): 假設功能都還沒準備好
- SVT (System Verfication Test): 假設功能都好了 (Non-Functional)
Ch22 Addressing Cascading Failures
If at first you don’t succeed, back off exponentially.
如果請求沒有成功,以指數型延遲重試
– Dan Sandler, Google Software EngineerWhy do people always forget that you need to add a little jitter?
為什麼人們總是忘記增加一點點抖動因素呢?
– Ade Oshineye, Google Developer Advocate
延伸閱讀
- Study Notes - SRE Opening and Chapter 1
- Resource Provisioning and DevOps
- 淺談系統監控與 CloudWatch 的應用 - AWS User Group Taiwan
- What is Ops?
- 輕鬆聊系統測試 (SVT) 的三兩事