Key Alerting Metrics
Good alerting is critical to operating a SaaS (or any other software) platform. Good alerts are timely, actionable, understandable, and correct. In this context, correct means minimizing false positives (alerting when there is not an issue) and false negatives (not firing when there is an issue).
Key Alerting Metrics are four metrics to monitor a whole system, subsystem, or microservice based on customer pain which all production engineers can understand.
System Impact and Mitigation
The goal of incident response is to minimize the total impact on customers over time through mitigation and root cause resolution. For example, a high-impact, short-duration incident (five-minute total outage) can be as impactful to a customer as a low-impact, long-duration incident (slowness for a full day).
A key component of SaaS incident response is to mitigate the incident, if possible, to lessen the immediate impact on the customer and buy the team time to resolve the issue permanently.
Product Delivery Team
For SaaS companies, the Product Delivery Team is all individuals involved in building and operating the product. All these sub-teams share a common goal: continuously deliver customer value.
Jira Ticket Hierarchy
Standardizing a Jira Ticket Hierarchy allows all producers of feature requirements and implementation details (product management, engineering leads, and engineers) and consumers of those requirements (support, SRE, docs, marketing) to collaborate on the right level of detail for their job roles.