|
Nov 29
2008
|
One standard form of information discovery and reconnaissance used by malicious attackers is to scan a target website and search for robots.txt files. The robots.txt file is designed to provide instructions to spiders or web crawlers about a site's structure and more importantly to specify which pages and directories the spider should not crawl. Often these files are used to keep a spider from crawling sensitive areas of a website, such as administrative interfaces, so that search engines don't cache the existence of such pages and functionality. It is precisely for this reason that a malicious attacker will look in a robots.txt file - they often provide roadmaps to sensitive data and administrative interfaces.