Stories about exposed databases have continued to pop-up over the past few years and a tweet by @GossiTheDog reminded me that the elephant in the room remains somewhat unaddressed:
It's been a few years since I mentioned this, but HDFS (a file system) being presented to the internet with no security whatsoever so still a big problem.— Kevin Beaumont (@GossiTheDog) May 8, 2020
It's actually got bigger since this blog, there is petabytes of data exposed by US companies.https://t.co/1ZklUwkZMZ
I was surprised to see how much data remains exposed by public HDFS clusters so I decided to revisit the topic. In this blog post I will be looking at 3 databases that I've written about previously: MongoDB, Elastic and HDFS. And with that lets start off with an overview of the data exposure for those technologies:
The above chart compares the amount of data each database exposed in 2018 and in 2020. Note that the size is measured in TB.
MongoDB's exposure in 2018 was already drastically lower than the others due to a series of ransomware attacks. Those attacks are still on-going but the overall exposure of public MongoDB instances has drastically decreased. This is the only database among the 3 I looked at where the amount of exposed data has actually decreased (from 24 TB to 12.5 TB). MongoDB has come a long way since the early days and their security documentation reflects that. If you're not sure where to start I recommend checking it out.
Elastic's exposure on the other hand has more than doubled in the past few years (904 TB to 3.2 PB). Most of the instances are located in China and nearly all of the clusters are in the cloud which is generally in-line with how NoSQL databases tend to get exposed (insecure images deployed to the cloud). If we take a deeper dive into the size distribution of exposed Elastic clusters then we can actually see that there are a few clusters that account for a combined 22 PB of data:
For the purpose of this article I didn't include them in the first chart because it looked suspicious to me that there are ~11 unique Elastic clusters all on Amazon with exactly 2 PB of exposed data. Interestingly, they run on port 80; I would expect a honeypot to run on the default port (9200). This brings to me another recurring issue we see with Shodan: people putting services on non-standard ports and expecting that to offer security.
See here for a full distribution of ports that Shodan sees Elastic clusters on. Please don't make security by obscurity the only protective measure for your infrastructure.
Even without those 22 PB stored in dubious clusters we can see a significant growth in the number of Elastic instances that are insecure and exposing their customer's data to the public Internet.
Finally, HDFS remains the king of exposed data. From 5.1 PB in 2018 to a whopping 13.1 PB in 2020 the amount of publicly-exposed data on HDFS clusters continues to grow. The good news is that the number of exposed clusters has actually gone down significantly from ~4000 clusters in 2018 to ~800 clusters in 2020. And the vast majority of HDFS clusters are located in China (much like Elastic). This means that although the number of insecure servers has gone down the amount of data stored on those instances has gone up.