|
Nov 29
2008
|
Creating a Robots.txt HoneypotPosted by: dnrestcom on Nov 29, 2008 |
|
One standard form of information discovery and reconnaissance used by malicious attackers is to scan a target website and search for robots.txt files. The robots.txt file is designed to provide instructions to spiders or web crawlers about a site's structure and more importantly to specify which pages and directories the spider should not crawl. Often these files are used to keep a spider from crawling sensitive areas of a website, such as administrative interfaces, so that search engines don't cache the existence of such pages and functionality. It is precisely for this reason that a malicious attacker will look in a robots.txt file - they often provide roadmaps to sensitive data and administrative interfaces.
Knowing that malicious attackers might look into your robots.txt file and explore the listings there allows you to employ a few defensive techniques, or at least provide some early warning measures. One possibility is to simply waste an attackers time. For instance, if your site has an administrative interface at /admin you might want to list a couple hundred non-existent sub-directories and sift /admin into the list near the middle or end. This would provide frustrating for an attacker looking through the robots.txt entries by hand. If an attacker was using an automated tool, however, they likely won't be slowed down by false entries in the robots.txt file.
The system I'm describing can be implemented in a number of ways. The basic idea is the same though. You fill your robots.txt file with numerous false entries. Each of these false entries leads to a server response that triggers a blacklisting of the offending IP address. This means that real subdirectories and files can still safely be embedded in the robots.txt, but the time to search each entry becomes exhaustive for an attacker.
Creating a Robots.txt Honeypot
