# C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

© 2006-2013 Andrew Cooke (site) / post authors (content).

## How Many Spammers? A Statistical Approach

From: "andrew cooke" <andrew@...>

Date: Sun, 24 Jun 2007 21:19:24 -0400 (CLT)

I've thought about the following for some time, but I don't think I'll
ever get round to doing anything about it.  Maybe someone else would like
to give it a go...?

If you look at the receipt times of spam emails, I think you may be able
to put an upper limit on the number of spammers.  If there were very many
spammers then you would expect spam emails to be random - the number per
interval would have a Poisson distribution.  On the other hand, if they
come from a small number of sources you would expect a less random
distribution.

Obviously this isn't exact.  In particular, a spammer could send out
emails randomly (in which case one would infer more spammers than actually
exist).  I think that's the most significant bias, which is why this would
be an upper limit.

And I don't have a simple relationship to go from some statistic to the
number of spammers.  Maybe there is one, under a certain set of
assumptions, but I think you'd probably need to run simulations.  And I
have no idea what the best statistic would be - perhaps you would compare
power spectra.

Andrew