Weirdbot
Overview
"Weirdbot" is the name I have given to certain bots which are known to generate requests for nonexistent pages. I have not been able to find any source of links to these pages, so I have to assume that the links are being generated by the bot – either in a deliberate attempt to find unlisted pages within the web site, or else due to a bug of some sort.
Normally, the script which generates the /cat/ pages sends me an email whenever there's a 404, but I have disabled that for known weirdbots. When a bad page is requested by one of these bots, the page is generated more or less normally except that the navigational sidebar is omitted and a message is displayed with a link to this wiki page (ordinarily it displays a different message and includes the sidebar). These bots are otherwise treated no differently from any other browser.
Seeking Information
I have looked through my log files and been unable to find any evidence that these bots have obtained these bad URLs from pages on vbz.net. The bots themselves leave no referer information, so I can't see where the bad URLs might be coming from.
I have to wonder how difficult it could possibly be to do this: for each URL in the bot's database, simply record one URL where that link was found. It doesn't have to be the best source in any way; I just need to see one page where a bad URL is being linked to, and I should be able to fix the problem – if indeed it is actually a problem on my site, and not some weird feature/bug in the bot code.
If, on the other hand, these bots are trying different permutations of known valid URLs (for whatever reason), that too would be useful information, i.e. I could stop worrying that my scripts are generating bad link URLs when I'm not looking (they certainly never do it when I am looking).
Details
The page requests generally follow the form of valid URLs, but leave off one or more folders. Some examples:
request | what's missing | example valid page | perpetrator | when |
/cat/mt/tb/big/ | two digits between /tb/ and /big/ | http://vbz.net/cat/mt/tb/17/big/ | MJ12bot/v1.0.7 (http://majestic12.co.uk/bot.php?+) | 2006-06-08 |
/cat/mt/q/big/ | three digits between /q/ and /big/ | http://vbz.net/cat/mt/q/050/big/ | MJ12bot/v1.0.7 (http://majestic12.co.uk/bot.php?+) | 2006-06-08 |
/cat/1235/ | supplier, possibly dept also | http://vbz.net/cat/mt/1235/ http://vbz.net/cat/zr/bm/1235/ |
ShopWiki/1.0 ( +http://www.shopwiki.com/wiki/Help:Bot) | 2006-06-03 |
/cat/11587/ | supplier (probably lb) | http://vbz.net/cat/lb/11587/ | ShopWiki/1.0 ( +http://www.shopwiki.com/wiki/Help:Bot) | 2006-06-03 |