A friend of mine tipped be off about a cool little program called awstats. It reads your web server logs and translates them into some pretty cool graphs and stats. I have it sort of running right now, but not perfectly.

From it, I could see what spiders are accessing my website and how often. It seems that the one for Inktomi Slurp (Hotbot) is accessing my server a lot, and accounting for a high percentage of the bandwidth being used by the server. A quick search of Google and a modification to my robots.txt file and I think I got it to throttle back. Now instead of accessing my site once per second, it is allowed once every 6 hours. If it makes any kind of dent in my (non-spider) traffic I’d be shocked. If you want to cut down on the amount of bandwidth this leach uses on your server, add the following two lines to your robots.txt file, in the root folder of your site (if you don’t have one, just create one and add these lines):


User-agent: Slurp
Crawl-delay: 60

The Crawl-delay is the minimum number of seconds between each hit that you will allow. Mine is set to 21600 (6 hours).