<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[robots.txt - Shodan Blog]]></title><description><![CDATA[The latest news and developments for Shodan.]]></description><link>https://blog.shodan.io/</link><generator>Ghost 0.7</generator><lastBuildDate>Sun, 12 Apr 2026 02:23:30 GMT</lastBuildDate><atom:link href="https://blog.shodan.io/tag/robots-txt/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Presidential Robots and 404s]]></title><description><![CDATA[<p>The field of presidential candidates has started to heat up and the websites are the first stop for a lot of prospective voters. For my purposes though, I was less interested in their political platform and more curious about the technology behind the websites. Others have <a href="https://paulschreiber.com/blog/2015/04/12/presidential-candidate-website-tech-compared/">already compared the SSL</a></p>]]></description><link>https://blog.shodan.io/presidential-robots-and-404s/</link><guid isPermaLink="false">c4aa727d-ac0c-46fa-86ee-17f1905e068e</guid><category><![CDATA[research]]></category><category><![CDATA[presidential candidates]]></category><category><![CDATA[robots.txt]]></category><dc:creator><![CDATA[John Matherly]]></dc:creator><pubDate>Sun, 21 Jun 2015 02:08:52 GMT</pubDate><media:content url="http://blog.shodan.io/content/images/2015/06/white-house-02.jpg" medium="image"/><content:encoded><![CDATA[<img src="http://blog.shodan.io/content/images/2015/06/white-house-02.jpg" alt="Presidential Robots and 404s"><p>The field of presidential candidates has started to heat up and the websites are the first stop for a lot of prospective voters. For my purposes though, I was less interested in their political platform and more curious about the technology behind the websites. Others have <a href="https://paulschreiber.com/blog/2015/04/12/presidential-candidate-website-tech-compared/">already compared the SSL security</a> of the candidates, so I wanted to check out what sort of information the presidential hopefuls' <strong>robots.txt</strong> files and <strong>404 responses</strong> return. To generate the 404 response I chose a random URL <strong>/test</strong> (turns out I'm really bad at being random).</p>

<p>Without further ado, let me show the results of the requests:</p>

<h1 id="democrats">Democrats</h1>

<h4 id="hillaryclinton">Hillary Clinton</h4>

<p><a href="https://www.hillaryclinton.com">https://www.hillaryclinton.com</a></p>

<h6 id="robotstxt">robots.txt</h6>

<pre><code>User-agent: *
Disallow: /api/
</code></pre>

<p>Looks like there's an API for their website that is undocumented publicly.</p>

<h6 id="404">404</h6>

<p><img src="https://blog.shodan.io/content/images/2015/06/hillary-404.png" alt="Presidential Robots and 404s"></p>

<h4 id="berniesanders">Bernie Sanders</h4>

<p><a href="https://berniesanders.com/">https://berniesanders.com/</a></p>

<h6 id="robotstxt">robots.txt</h6>

<pre><code>User-agent: *
Disallow: /wp-admin/
</code></pre>

<p>The website uses Wordpress as its framework.</p>

<h6 id="404">404</h6>

<iframe width="560" height="315" src="https://www.youtube.com/embed/Dhot2OJKKZc" frameborder="0" allowfullscreen></iframe>

<h4 id="martinomalley">Martin O'Malley</h4>

<h6 id="robotstxt">robots.txt</h6>

<pre><code>User-agent: *
Disallow: /wp-admin/
</code></pre>

<p>The website uses Wordpress as its framework.</p>

<h6 id="404">404</h6>

<p><img src="https://blog.shodan.io/content/images/2015/06/malley-404.png" alt="Presidential Robots and 404s"></p>

<h4 id="jimwebb">Jim Webb</h4>

<h6 id="robotstxt">robots.txt</h6>

<pre><code>User-agent: *
Disallow: /wp-admin/

Sitemap: http://www.webb2016.com/sitemap.xml
</code></pre>

<h6 id="404">404</h6>

<p><img src="https://blog.shodan.io/content/images/2015/06/webb-404.png" alt="Presidential Robots and 404s"></p>

<h4 id="lincolnchafee">Lincoln Chafee</h4>

<h6 id="robotstxt">robots.txt</h6>

<pre><code>User-agent: *
Disallow: /wp-admin/
</code></pre>

<h6 id="404">404</h6>

<p><img src="https://blog.shodan.io/content/images/2015/06/chafee-404.png" alt="Presidential Robots and 404s"></p>

<h1 id="republicans">Republicans</h1>

<h4 id="jebbush">Jeb Bush</h4>

<h6 id="robotstxt">robots.txt</h6>

<p>No robots.txt file available.</p>

<h6 id="404">404</h6>

<p><img src="https://blog.shodan.io/content/images/2015/06/jeb-404.png" alt="Presidential Robots and 404s"></p>

<h4 id="randpaul">Rand Paul</h4>

<h6 id="robotstxt">robots.txt</h6>

<pre><code>User-agent: *
Disallow:
</code></pre>

<h6 id="404">404</h6>

<p><img src="https://blog.shodan.io/content/images/2015/06/paul-404.png" alt="Presidential Robots and 404s"></p>

<h4 id="tedcruz">Ted Cruz</h4>

<p><a href="https://www.tedcruz.org">https://www.tedcruz.org</a></p>

<h6 id="robotstxt">robots.txt</h6>

<pre><code>User-agent: *
Disallow: /wp-admin/
</code></pre>

<p>The website uses Wordpress as its framework.</p>

<h4 id="ricksantorum">Rick Santorum</h4>

<p><a href="http://www.ricksantorum.com/">http://www.ricksantorum.com/</a></p>

<h6 id="robotstxt">robots.txt</h6>

<pre><code>User-Agent: *
Disallow: /admin/
Disallow: /utils/
Disallow: /forms/
Disallow: /users/
Sitemap: http://www.ricksantorum.com/sitemap_index.xml
</code></pre>

<p>Based on this information the website is a hosted CMS at nationbuilder.com</p>

<h6 id="404">404</h6>

<p><img src="https://blog.shodan.io/content/images/2015/06/santorum-404.png" alt="Presidential Robots and 404s"></p>

<h4 id="bencarson">Ben Carson</h4>

<p><a href="https://www.bencarson.com/">https://www.bencarson.com/</a></p>

<h6 id="robotstxt">robots.txt</h6>

<p>No robots.txt file available.</p>

<h6 id="404">404</h6>

<p><img src="https://blog.shodan.io/content/images/2015/06/carson-404.png" alt="Presidential Robots and 404s"></p>

<p>Most of them didn't turn out to be very interesting to look at, with the exception of the final candidate I'd like to show:</p>

<h2 id="carlyfiorina">Carly Fiorina</h2>

<p><a href="https://www.carlyfiorina.com">https://www.carlyfiorina.com</a></p>

<h4 id="robotstxt">robots.txt</h4>

<pre><code>User-agent: *
Disallow: /standing-desks2
Disallow: /standing-desks2.html
Disallow: /privacy-policy.html
Disallow: /privacy-policy
Disallow: /terms-of-use.html
Disallow: /terms-of-use
Disallow: /adjustable-height-desk.html
Disallow: /adjustable-height-desk
Disallow: /blank
Disallow: /test
</code></pre>

<h4 id="404">404</h4>

<p><img src="https://blog.shodan.io/content/images/2015/06/carly-auth.png" alt="Presidential Robots and 404s"></p>

<p>It turned out that my <em>random</em> URL of <strong>/test</strong> wasn't random enough and I accidentally stumbled upon a location on Carly Fiorina's website that requires authentication.</p>

<p>I took away 4 lessons from this exercise:</p>

<ol>
<li>Wordpress remains incredibly popular  </li>
<li>robots.txt can tell you where the administrative area is  </li>
<li>404s must be generated enough that it is worth investing time into making them nicer  </li>
<li>I'm bad at generating random URLs</li>
</ol>

<p>PS: Did you know that Shodan also grabs the <strong>robots.txt</strong> data for each IP? You can access all the information via the <a href="https://developer.shodan.io">Shodan API</a>.</p>]]></content:encoded></item></channel></rss>