The field of presidential candidates has started to heat up and the websites are the first stop for a lot of prospective voters. For my purposes though, I was less interested in their political platform and more curious about the technology behind the websites. Others have already compared the SSL security of the candidates, so I wanted to check out what sort of information the presidential hopefuls' robots.txt files and 404 responses return. To generate the 404 response I chose a random URL /test (turns out I'm really bad at being random).
Without further ado, let me show the results of the requests:
Democrats
Hillary Clinton
https://www.hillaryclinton.com
robots.txt
User-agent: *
Disallow: /api/
Looks like there's an API for their website that is undocumented publicly.
404

Bernie Sanders
robots.txt
User-agent: *
Disallow: /wp-admin/
The website uses Wordpress as its framework.
404
Martin O'Malley
robots.txt
User-agent: *
Disallow: /wp-admin/
The website uses Wordpress as its framework.
404

Jim Webb
robots.txt
User-agent: *
Disallow: /wp-admin/
Sitemap: http://www.webb2016.com/sitemap.xml
404

Lincoln Chafee
robots.txt
User-agent: *
Disallow: /wp-admin/
404

Republicans
Jeb Bush
robots.txt
No robots.txt file available.
404

Rand Paul
robots.txt
User-agent: *
Disallow:
404

Ted Cruz
robots.txt
User-agent: *
Disallow: /wp-admin/
The website uses Wordpress as its framework.
Rick Santorum
robots.txt
User-Agent: *
Disallow: /admin/
Disallow: /utils/
Disallow: /forms/
Disallow: /users/
Sitemap: http://www.ricksantorum.com/sitemap_index.xml
Based on this information the website is a hosted CMS at nationbuilder.com
404

Ben Carson
robots.txt
No robots.txt file available.
404

Most of them didn't turn out to be very interesting to look at, with the exception of the final candidate I'd like to show:
Carly Fiorina
robots.txt
User-agent: *
Disallow: /standing-desks2
Disallow: /standing-desks2.html
Disallow: /privacy-policy.html
Disallow: /privacy-policy
Disallow: /terms-of-use.html
Disallow: /terms-of-use
Disallow: /adjustable-height-desk.html
Disallow: /adjustable-height-desk
Disallow: /blank
Disallow: /test
404

It turned out that my random URL of /test wasn't random enough and I accidentally stumbled upon a location on Carly Fiorina's website that requires authentication.
I took away 4 lessons from this exercise:
- Wordpress remains incredibly popular
- robots.txt can tell you where the administrative area is
- 404s must be generated enough that it is worth investing time into making them nicer
- I'm bad at generating random URLs
PS: Did you know that Shodan also grabs the robots.txt data for each IP? You can access all the information via the Shodan API.