Slogan in SRE

SRE: Site Reliability Engineering 很多章節開頭都有一段很有深意的 標語 (slogan),這些話都很簡單,卻也讓人省思。

Ch1 Introduction

Hope is not a strategy.

Ch5 Eliminating Toil

如果系統正常運轉中需要人工干預,應該將此視為一種 Bug。『正常』的定義會隨著系統的進步而不斷改變。
If a human operator needs to touch your system during normal operations, you have a bug. e de nition of normal changes as your systems grow.

Ch7 The Evolution of Automation at Google

Besides black art, there is only automation and mechanization.
– Federico García Lorca (1898–1936), Spanish poet and playwright

Ch9 Simplicity

The price of reliability is the pursuit of the utmost simplicity.
– C.A.R. Hoare, Turing Award lecture

Ch10 Practical Alerting from Time-Series Data

May the queries flow, and the pager stay silent.
– Traditional SRE blessing

Ch12 Effective Troubleshooting

Be warned that being an expert is more than understanding how a system is supposed to work. Expertise is gained by investigating why a system doesn’t work.
– Brian Redman

Ways in which things go right are special cases of the ways in which things go wrong.
– John Allspaw

Ch13 Emergency Response

Things break; that’s life.

Ch14 Managing Incidents

Effective incident management is key to limiting the disruption caused by an incident and restoring normal business operations as quickly as possible.

Ch15 Postmortem Culture: Learning from Failure

The cost of failure is education.
– Devin Carraway

Ch17 Testing for Reliability

If you haven’t tried it, assume it’s broken.

類似想法: 輕鬆聊系統測試 (SVT) 的三兩事

  • FVT (Functional Verification Test): 假設功能都還沒準備好
  • SVT (System Verfication Test): 假設功能都好了 (Non-Functional)

Ch22 Addressing Cascading Failures

If at first you don’t succeed, back off exponentially.
– Dan Sandler, Google Software Engineer

Why do people always forget that you need to add a little jitter?
– Ade Oshineye, Google Developer Advocate



