A retrospective on public cloud breaches of 2022, with Rami McCarthy and Houston Hopkins

Introduction

In this post, we look back on the cloud data breaches that were publicly disclosed in 2022, specifically focusing on breaches of companies using large cloud providers like AWS, Azure, and Google Cloud.

Before diving in, we want to emphasize that this is a blameless postmortem article. Securing cloud environments is a challenging task for everyone, and organizations who publicly report their incidents should not be further criticized for choosing to be transparent.

We’ll start by looking at large patterns of breaches in 2022 and what insecure configurations attackers were able to exploit. Then, Rami McCarthy and Houston Hopkins will join us to discuss the learnings we can take away from these incidents.

The data in this post comes from McCarthy’s aws-customer-security-incidents, Hackmageddon, and other publicly available sources. You will find all the references to the data breaches we discuss in the annex.

Cloud security incidents of 2022

In 2022, cloud security incidents were mostly caused by one of the following weaknesses:

Exposed static, long-lived cloud credentials
Elasticsearch instances exposed to the Internet without authentication
Data storage services publicly accessible without authentication
Exploitation of a server-side request forgery (SSRF) vulnerability to steal applications' cloud credentials

Exposed long-lived cloud credentials

In cloud environments, control plane APIs are by design readily available for anyone on the Internet to consume. Compromising long-lived credentials has therefore consistently been—for years—the most popular way attackers gain initial access to cloud environments.

This year, we have seen many long-lived cloud credentials exposed and compromised in GitHub repositories, including AWS access keys and Google Cloud service account tokens. In some cases, these were private repositories mistakenly made public. One of the breaches directly impacted Microsoft. Overall, many people still keep vital keys and credentials in GitHub, and the tools attackers use to detect and steal these keys are becoming more accessible and efficient.

We also see a continuing trend where credentials are included in public software packages such as Android applications and PyPI packages, making them straightforward for attackers to discover.

Finally, attackers continue targeting developers through the software supply chain. In particular, several malicious PyPI and npm packages have been actively attempting to steal cloud credentials.

Exposed Elasticsearch instances

Unsurprisingly, technologies that tend to be insecure by default without the cloud also end up being insecurely configured in the cloud. That's the case for Elasticsearch, Redis, or MongoDB which does not enforce any kind of authentication by default.

At least three data breaches involving insecurely configured Elasticsearch instances in AWS were reported in 2022. It's not clear whether they were customer-managed Elasticsearch instances running on top of EC2 or misconfigured clusters using the OpenSearch managed service. Interestingly enough, one of these misconfigurations was from a division of Amazon itself.

Storage services allowing public access

Public S3 buckets have been a common cause of data breaches for almost a decade, and they struck again in 2022 with close to 20 publicly reported breaches. We've also seen several instances of breaches due to public Azure Blob Storage, including by Microsoft themselves. Attackers can easily scan for and compromise dangerously configured storage buckets due to a wide range of available tooling and the increasing number of bug bounties.

Researchers also identified hundreds of exposed Amazon RDS snapshots, several of which contained a plethora of sensitive and personal data.

AWS added public sharing of snapshots in 2015 in response to requests from the data science community. AWS launched opendata the same year but one community's feature request is another community's new risk, a seemingly inevitable part of the march of progress.

Server-side request forgery (SSRF) vulnerabilities

Cloud virtual machines such as AWS EC2 instances or Google Cloud Compute Engine instances are frequently attached a role, allowing applications they run to access the cloud provider's API. However, unless the instance is hardened, it also allows an attacker exploiting an SSRF vulnerability to retrieve cloud credentials for the instance role.

Exploitation of SSRF vulnerabilities in cloud environments has been a major issue for years. Earlier this year, we found that 93 percent of EC2 instances still allow usage of the Instance Metadata Service v1, leaving them at risk when running an application vulnerable to an SSRF. This continues to be a substantial risk and the cause of numerous vulnerability and bug bounty reports in 2022, especially considering the prevalence of SSRF vulnerabilities in applications, with over 380 reported CVEs in 2021.

Discussion with Rami McCarthy and Houston Hopkins

Rami McCarthy and Houston Hopkins are both long-time security practitioners with extensive experience building and securing cloud environments.

McCarthy works on Infrastructure Security at Figma. He previously worked as Tech Lead Manager for the security team at a healthtech scale-up and as a consultant at NCC Group. Hopkins is currently a Platform Security Engineer at Robinhood, and prior to that he held Distinguished Engineer and leadership roles at Capital One and AllianceData (now Bread Financial). He is dedicated to helping organizations navigate the complex and constantly evolving world of cloud technology, ensuring the safety and security of their data and systems.

On cloud breaches of 2022

The problem of public storage buckets leaking sensitive data is not new. In fact, as McCarthy points out, tooling to identify vulnerable buckets has been available as far back as 2011.

“The issue has gotten better for a certain class of organizations,” McCarthy says. “Organizations leaking S3 buckets are different from those of 2015. We see many small companies and non-core areas of large companies—but fewer startups whose whole business is run out of an open S3 bucket.” But, he says, this kind of data breach is here to stay. “It's nowadays easier to discover open buckets and monetize them through bug bounties or ransomware,” he says, pointing out that increased discoverability balances out the overall decrease of these issues.

Long-lived, static cloud credentials are still one of the most common causes for data breaches in the cloud. While better alternatives exist both for applications and humans, that's only one part of the issue, according to Hopkins. “As we see with SSRF exploitation, the elephant in the room is credential portability,” Hopkins says. “If I retrieve temporary credentials for an EC2 profile from a specific instance, I should only be able to use them from this instance—not from anywhere on Earth.”

While cloud providers now provide guardrail features such as AWS's account-wide S3 Block Public Access or Azure Blob Storage’s similar account-wide feature, many companies don't have adequate staffing to apprehend these guardrail settings, especially when they are not turned on by default.

“Cloud providers need to make it harder and harder to make critical mistakes,” Hopkins says. “I do believe they want to make improvements on that side.” And as McCarthy points out, AWS released several mechanisms allowing for secure-by-default practices, including EBS default encryption, Control Tower safe defaults, support for custom guardrails with service control policies (SCPs), and enhancements to the console experience. “These should be available and enabled by default—it shouldn't take high-profile incidents,” Hopkins says. That said, new S3 buckets are now secure by default. Since April 2023, AWS has been automatically enabling S3 Public Access Block and disabling ACLs on new buckets, to make it much harder to unintentionally make them public.

A proactive approach from providers can be complementary. For instance, a few years back, AWS started sending emails to customers who had risky configurations such as public S3 buckets, public AMIs, public EBS snapshots, or Internet-exposed reverse proxies allowing access to the instance metadata service.

Overall, the complexity of a system increases the likelihood that it is misused and results in a breach. “No one is immune to misconfigurations, including cloud providers themselves,” Hopkins says. As cloud providers tend to be large corporations that have numerous business units with widely different focus areas, they too are at risk of making mistakes when engineers misconfigure their own public cloud services. “While not strictly new, cloud providers now have more visibility when suffering a breach caused by an incorrect usage of the very services they sell—partly because there are more organizations working to discover and monetize these,” McCarthy says. Cloud providers must also balance disclosure against customer privacy. Today there isn't a great solution to address reporting and anonymizing these incidents. Hopefully it can be addressed in the future.

Only the tip of the iceberg

Incidents we discussed in this post are only the ones that were publicly disclosed. While organizations who publicly share postmortems about their incidents greatly benefit our community, most incidents still go unreported. “I don’t believe we’ll see a broad pattern of organizations self-selecting to disclose their cloud security incidents,” McCarthy says. “I don’t blame them. There is a concrete cost and risk associated with it. We lack a current narrative on the benefits of such an activity beyond altruism.”

Although it's unlikely that the majority of organizations start reporting on their security incidents purely out of altruism, cloud providers themselves have a role to play. “The most viable and likely sources of data are cloud providers themselves, as they have incentives to highlight risks on the customer side of the shared responsibility model,” McCarthy says.

Indeed, providers are slowly starting to go public about all customer incidents they have knowledge of. At re:Inforce 2022, AWS presented common tactics of attackers targeting S3 buckets with ransomware (slides). The recent creation of the AWS Customer Incident Response Team is an initiative that will hopefully bring further transparency on the threats organizations face in cloud environments. Google's Cybersecurity Action Team also releases regular “Threat Horizon” reports detailing most common attacker tactics they witness in Google Cloud.

However, the cloud industry still has room for improvement. For Hopkins, “Providers need more incentives to report customer incidents they witness, possibly mandated by regulation.” McCarthy adds, “Cloud providers are currently afraid of bad press for being public about customer security incidents. I hope that sentiment flips, and they realize that disclosing and educating on breaches improves the trust our community has in security across the industry.”

Conclusion

In this post, we discussed the most common causes for cloud data breaches in 2022 and what we can learn from them—both individually and collectively as an industry. For a closer look at common cloud misconfigurations in the real world and how to solve them, have a look at our recent State of AWS Security research and our follow-up blog post.