Search Engine Improvements
Shodan has seen tremendous growth the past year both in terms of additional data collection as well as number of users. Due to that increased demand we started seeing cracks in the search engine performance. All of our websites are built on-top of the same public API that our customers use so we felt their pain when searches timed out or data couldn't be updated in real-time due to traffic spikes. As a result, we decided to spend Q3 2020 on cleaning up some of the technical debt - starting with the search engine. I'd like to highlight a few of the improvements we've made to provide you with a better experience. Note that there are other aspects of the Shodan platform that we'll be making adjustments to in the coming months but in this article I'll focus on the search engine.
Quick Recap
If you're not familiar with the general rule of the Shodan search query syntax here's a quick recap:
- By default, Shodan only searches the data property on the banner. To search in other properties you have to specify a filter.
- Query terms are always AND-ed together whereas filter values are OR-ed together.
For example, the following search query looks for services on port 22 OR 80:
port:22,80
Whereas this searches for services on port 22 AND are identified as OpenSSH:
port:22 product:OpenSSH
And this searches for OpenSSH services in San Diego OR Austin:
product:OpenSSH city:"San Diego,Austin"
To learn more please visit our Help Center or checkout the search query examples page.
Improvements
The main focus of revamping the search backend was to improve the feature while being fully backwards compatible. This means that to take advantage of the new search features you won't need to make any changes to existing code.
General Performance
Based on our internal metrics it looks like the new search engine is performing significantly better. And it's keeping up with the rate of data collection in real-time without breaking a sweat.
Downloading Results
Each search API request returns up to 100 results per page which means in order to download all the available search results you have to page through them. In the past, you would often encounter timeouts when paging deeper into the results and it could take a long time to get the data you asked for - especially if the results weren't cached. We've made significant changes to the way paging works on the backend so download requests should be faster and not timeout anymore. You can still use the same website, command-line interface or API as before; it will just be a lot faster now!
New functionality
We haven't just improved the search engine though, we've also deployed a new features to make your life easier.
Numeric Ranges
You can now ask Shodan to search a range of numbers by putting one of the following characters before a number: <, =<, >=, >
The following search looks for services running on the first 1024 ports:
port:<=1024
You can also use it to exclude ranges. For example, this is how you would search for SSH running on any port that's not within the first 1024 ports:
ssh -port:<=1024
You can also specify multiple ranges - keep in mind that filter values specified using a comma are OR-ed together. This search looks for services on ports less than 1024 OR greater than 6000:
port:<1024,>6000
IPv6
Shodan has been crawling IPv6 for several years but until now it wasn't possible to search for specific IPv6 network ranges. The net search filter now fully supports IPv6!
Conclusion
I'm excited about the new backend and the opportunities we'll have with this improved architecture. Let us know if you're experiencing any problems or have suggestions on what you'd like us to add!