SpamBait 1.0

=========================

Statement of the problem

If you don't know what spam (the e-mail variety of it, not the canned one) is, then you probably don't need to read any further, or to use this program.

Spammers need large lists of e-mail addresses. One way to get these addresses is by combing the Web or the Usenet groups with robots which extract e-mail addresses automatically. They do so by pattern-matching (anything that contains an @ and one or more dots is likely an e-mail address, and the word after the last dot can only be one of a few choices, like .com, .edu, .org, .mil, .de and so on).

The idea behind SpamBait

Once they have collected a list of e-mail addresses, few address collectors have the resources to verify each address individually, or even by domain. Therefore, polluting these lists with large numbers of fake addresses should make them much less valuable (for instance, large numbers of failed DNS lookups may alert a system manager that something is going on with a mail server).

Obviously, it would be futile to start writing a list of fake addresses by hand. It is much more efficient to have a little program that can generate, say, one million addresses in about one minute (as Spambait does on my machine). Then you can put these addresses on your web site or Usenet sig, and wait for robots to do their job. Changing the lists regularly will wreak even more havoc with robots. Many of the addresses generated by SpamBait are quite believable, while some may even look silly or hilarious, but what does a robot know?

How SpamBait works

SpamBait generates addresses randomly by combining entries read from a few dictionary files. The rules are slightly different for the different parts of an address. The algorithm steps are:

addressee (the part to the left of the @) is generated by concatenating from one to four entries chosen at random from syllabe.txt and name.txt. Examples actually generated by Spambait are eugenefu, jo-e, ejy, hmamijerry, jeff, cliffadminsu.
computer (the part between the @ and the last dot) is generated by concatenating from one to four entries chosen at random from syllabe.txt and computer.txt. The process is repeated from one to four times to generate computer names, like super.legl.fanfour, hillbogus, route.dr-t.pfr or doublesrvr.ntpow.gelib, which consist of a single name or multiple names separated by dots.
domain is chosen at random from domain.txt, and appended to the address generated so far, for example birodavju@6rz-a.svrpa.threemo.mil, zefoakira@iwexo.jnrouter.srvnycl.fi, grleeji@zatech.bouncestatn.org.

The addresses are then formatted into a simple HTML page containing <A HREF="mailto:addr">addr</A> tags (where addr is the actual address).

SpamBait can generate multiple files containing a specified number of addresses, and for your convenience it generates also a links.htm file containing links to all the generated address files. For instructions, run SpamBait without parameters in a DOS box.

Further notes

The dictionary files can be expanded. The format is simple: one entry per line, no spaces, no empty lines (the last line must end with a carriage return, or the last entry is not used). If you want one dictionary entry to be chosen more often, write the entry several times in the dictionary. For instance, the domain.txt dictionary contains several com, edu and org entries because these domain suffixes are so common in the real Internet.

The sillabe.txt dictionary works better than a random combination of characters: words made of syllabes are often readable, while most words made of random characters are usually unpronounceable or plainly impossible. This dictionary contains single characters in addition to syllabes, so that unreadable words or lists of initials are generated now and then, just like in real addresses.

The domain.txt dictionary can probably be improved. I am not sure that all these domain suffixes actually exist, and there may be many real ones which are not present in the dictionary.

After placing the generated files on your web site, you should add a link to them on one of your main pages, so that robots will find their way to the lists. You may add a short explanation of the links to your page, so that human visitors may have a clue of what these address lists are. If you are using multiple address files (this is recommended, because a single one-million-addresses file is not very believable, and loads very slowly if an unfortunate human visitor happens to hit the link), you may link your page to links.htm, or import this file into one of your pages.

Source code

The full source code for the whole project (Micro$oft Vi$ual C++ 6.0) is provided for you to play with. Remember that the code is just a quick hack to get the job done. I shall provide no support.

Warranty

None whatsoever. Enjoy,

KloroX

==============

[end of file]