Abstract
Having automatic monitoring alerting us when there's problems with our systems is a core part of ensuring reliable systems, however this can lead to alert fatigue.
This talk will cover using silencing, grouping and ihibition to reduce alert load, and improve the signal to noise ratio of remaining alerts.
Presented by
Julien Goodwin
Julien is a Site Reliablility Engineer at Google, working day-to-day maintaining one of the worlds largest IP networks. In the past he has worked as a (primarily Linux) Sys Admin on educational networks, and in small businesses. He has a long history working with the Australian FOSS community and was a member of the LCA2008 team, and has spoken at several previous LCA's and Sysadmin Miniconfs, most recently in 2015.