Major Service Outage – DDoS Attack
Incident Report for Where's My Staff
Postmortem

Where's My Staff DDoS Incident December 29th 2023

Summary

  • On 12/29/2023, Where's My Staff experienced a major service outage due to a large-scale DDoS attack.
  • The outage lasted for 2 hours 24 minutes, during which core features of the service were inaccessible/disrupted.
  • The attack was volumetric in nature, targeting Outdated Server Software.
  • We worked closely with our cloud provider, to implement IP blacklisting and traffic scrubbing at the network level..

Timeline of Events

  • 23.34: Initial signs of increased traffic.
  • 23.35: Performance degradation noticeable by users.
  • 23.40: Attack intensifies, core systems become overwhelmed.
  • 23.41: Incident declared, mitigation efforts begin in collaboration with our cloud provider.
  • 00.42: Traffic filtering and blocking successfully implemented.
  • 00.55: Partial service restoration.
  • 01.03: Full service restored, attack subsides.

Root Cause Analysis

  • The DDoS attack exploited Outdated Server Software in our infrastructure.
  • The attack volume exceeded our baseline traffic capacity, highlighting a need for scaling provisions.
  • Initial incident detection could have been faster with more robust monitoring tools.

Corrective Actions

  • Immediate:

    • Implementing IP blacklisting and traffic scrubbing at the network level measures.
    • Upgrading DDoS protection with our Cloud Provider.
  • Short-Term:

    • Scaling up server capacity to handle larger traffic bursts.
    • Investing in more advanced monitoring and early detection tools.
    • Developing a formal incident response plan with clear roles and communication channels.
  • Long-Term:

    • Exploring multi-cloud or hybrid infrastructure for wider distribution and greater resilience.
    • Regular penetration testing and security audits to identify potential vulnerabilities.

Lessons Learned

  • Importance of Proactive Defense: Investing in DDoS protection and regular security assessments is crucial in this environment.
  • Swift Detection is Key: Improved monitoring is essential for rapid response, reducing the impact of an attack.
  • Communication and Transparency: Clear communication throughout the incident with users built trust and minimized frustration.

Conclusion

This DDoS attack underscores the critical need to address the evolving threat landscape. While complete outage prevention is difficult, the corrective actions outlined will significantly improve our service's resilience and minimize the impact of future attacks.

Posted Feb 18, 2024 - 12:35 UTC

Resolved
Where's My Staff is currently experiencing a major service disruption due to a Distributed Denial of Service (DDoS) attack. Our team is actively working with our infrastructure providers to mitigate the attack and restore service as quickly as possible.
Posted Dec 28, 2023 - 12:14 UTC