Last Week’s Downtime: What Happened and What We’re Doing About It
On August 12th, 2021, Pingboard experienced an extended period of downtime and performance issues. We apologize–we know our customers rely on Pingboard and were disappointed by the interruption. One of our key values at Pingboard is “be transparent”, and we’re committed to upholding transparency in every circumstance. In the spirit of that value, we’ve prepared a detailed report on what happened, what it means for you, and what we’re doing to improve.
On August 12th, 2021, Pingboard was targeted by a coordinated DDoS (distributed denial of service) attack by a malicious party. The attack sent an extreme amount of traffic to our sign-up and sign-in pages resulting in a prolonged period of degraded performance and sustained stretches of inaccessibility. We’ve included more details below, but have confirmed that this was not a security breach and no customer data was obtained by the attackers – your organization’s data remains secure.
Here’s a timeline of the attack and our response (all timestamps in CDT):
August 12, 2021
- 3:00 AM – DDoS attack targeting our signup page begins, Pingboard experiences performance issues.
- 3:19 AM – The attackers contact our support team notifying us of the attack and demanding ransom money. Pingboard does not engage, choosing to focus on mitigation and prevention.
- 3:22 AM – Application monitoring alerts are delivered to the team, but delivery methods of alerts failed to wake our engineering staff.
- 7:15 AM – Our engineers verify the cause of downtime and the target of the attack, confirming the work of a malicious actor.
- 7:45 AM – Established protection protocols (internal blocking/throttling) are used to block all traffic to our signup page, foiling the initial attack and rendering most services of Pingboard operable. Some performance issues remain.
- 8:00 AM – After a debrief with our engineers, our support team begins responding to customers who reached out about the service interruption.
- 8:30 AM – Changes are made to our established protection protocols in order to better mitigate the attack.
- 9:00 AM – Testing of a new Web Application Firewall (WAF) begins.
- 11:39 AM – Our support team issues another round of updates to any customer reaching out about downtime.
- 12:20 PM – The attack evolves and targets new pages of the application.
- 1:00 PM – Mitigation efforts against new attack vectors are implemented and are moderately successful.
- 1:33 PM – We publicly announce the attack via Twitter.
- 4:00 PM – The new WAF is deployed and performance returns to normal. Further tuning is performed to eliminate false positives. Analysis continues to ensure the attack is fully mitigated and the scope of the attack is understood.
- 8:30 PM – After a sustained period of normal performance, our support team updates customers who had reached out previously to confirm performance has returned to normal.
August 13, 2021
- 2:30 AM – The DDoS attack ends after additional attack attempts are unsuccessful due to the new measures put in place.
What does this mean for your Pingboard account?
No data breaches occurred. No customer data was obtained by the attackers. We’re confident about this for the following reasons:
- We successfully separated normal traffic from malicious traffic on the day of the attack and confirmed that all malicious traffic occurred on public endpoints (pages and resources available to the public that are designed not to require authentication).
- There is no evidence of brute-force or credential-stuffing attacks, which are commonly used to attempt access to private information (and for which Pingboard has mechanisms dedicated to protecting against). Last week’s incident was limited to DDoS attacks targeting public resources.
- Multiple reviews of our logs both during and after the incident reveal no evidence of a security breach.
Since the attack did not target private information and was only intended to render Pingboard inoperable until we paid a fee to the attackers (which Pingboard did not pay), the impact to us and our customers was limited to the app’s sustained period of downtime and instability.
What could we have done better?
- Pingboard has multiple levels of protection in place against malicious actors. However, our efforts to date had prioritized our highest responsibility—protecting customer data—and the defense against service uptime related attacks on the scale of this DDoS attack were insufficient.
- Our time to respond was slow as our monitoring alerts failed to appropriately notify our team of incidents during overnight hours.
- Our communication of the incident with you, our customers, was slow.
What steps is Pingboard taking to improve?
We’re using the lessons we learned during this incident to better prepare for the future. We successfully fended off the attack and got things back to normal, but we’re not stopping to catch our breath until a number of improvements are made. Here are four steps we have committed to take:
- We’re implementing a new internal notification system for high-priority alerts, such as downtime and suspicious activity. This system will ensure that incidents are investigated by and communicated to our team right away, regardless of when they occur.
- The Web Application Firewall implemented during the attack was successful, and we’re continuing to invest in that resource to make it more robust as an additional layer of protection for the future. We also deployed several updates and changes to Pingboard’s existing internal traffic throttling tools, and we have plans to continue improving that defense. Key public-facing application pages are also being optimized to be more robust when faced with a future attack.
- We are improving our ability to communicate with customers about future incidents more quickly. We’ve implemented a public status page at https://status.pingboard.com so that customers can be notified of service interruptions and follow our progress in addressing them. We will continue to use Twitter as a source for updates as well.
- Internally, we’re updating our existing incident management plan to help us move faster and more efficiently. This includes naming additional stakeholders to take actions across incident triage, communication, and mitigation, and resetting expectations for customer communication timelines.
We apologize, again, for the extended interruption in service and the inconvenience it caused. All of us at Pingboard thank you for your business and your continued trust. We’ll continue to work tirelessly to earn them.
If you have any questions, please reach out to firstname.lastname@example.org and our support team will be happy to answer them.Disqus