Anti-spam form
Tuesday, August 26th, 2008Pobo has just completed a new contact form for Alice Melvin, an Edinburgh-based artist and illustrator. She was receiving industrial quantities of spam through her website’s contact form and had to take it offline. She was looking for a replacement and I was happy to accept the challenge.
Spam is a tricky and evolving beast, so any attempt to combat it needs a many-pronged approach. Single defences are vulnerable. Captcha, a method of distorting an image and asking the user to decipher it to prove he or she is really a human, is a popular method, but several of the most high-profile ones have been cracked recently. There are also reports of sweatshops run by hackers where huge numbers of captchas are sent by robots, read by humans and sent back so the robots can get on with their evil work.
The real problem, though, is that captcha is inaccessible to visually-impaired users, so it’s illegal (at least in the UK) unless an alternative is provided. Some sites offer an audio equivalent, but how would a deaf-dumb braille user submit such a form?
There are good overviews of the inaccessibility of captcha and the proposal of some alternatives in the W3C’s discussion and the slightly less dry Sitepoint article. Slightly depressing, because nothing strikes the perfect balance between accessibility and security, but they reinforced the impression that only a sequence of barriers would be effective and also have the capability to be tweaked in future without having to throw the whole thing out and start again if a particular technique is cracked. As will no doubt happen with captcha sooner or later.
At IfLooksCouldKill we have used the open source CFFormProtect, which wraps up some of the alternatives in a ColdFusion custom tag, and I took this as a basis for my PHP code.
There are five tests:
Time taken: the current time is recorded on the form. This is compared to the time it is received back. Too short and it is blocked. Spam robots can fill in a form in a second or two, but humans take much longer. The time is also encoded in a separate field and when the form is received back, this is checked to make sure the time value hasn’t been altered.
Hidden field: robots usually just fill everything in, making a guess at what is expected (e.g. an email address in an email field). This field expected to be empty when the form is submitted. It is hidden from view and screenreader users are asked not to fill it in. Robots will, and the form will be blocked.
Text check: the words in the form fields are checked against two lists: the first is bad words that no legimate user would write: one of these and it’s a red card. The second is words that in combination are good indicators of spam. A kind of points system adds these up and if the threshold is reached – sorry, you’re not getting in.
IP check: the user’s IP is recorded after a successful form submission and then that’s it; no more. For a while, anyway. Each time the form is submitted, the user’s IP is checked against a list of IPs and times and a decision is made on the validity of the submission.
This leaves us with a range of knobs and dials that we can tweak to raise or lower the barriers if spam finds its way through or users are being inconvenienced.



