There’s been a bit of a storm brewing on 2+2. A long time contributor had his mom accused of using bots by PokerStars. After a few mind numbingly long threads and several emails to Lee Jones it was determined that the person reviewing her case had either had bad information or had made a mistake analyzing her data. Letters of apology were issued and apologies were accepted but the outcome is inconsequential. The real issue is the problem of false positives in trying to keep online poker safe and clean.
The lesson for every online poker room is that a false positive can cost you dearly. The few dollars that are saved by employing a too aggressive a fraud program are easily eclipsed by the six or seven figure losses in goodwill.
Most poker rooms attempt to prevent cheating, bots, and other terms of service violations using automated detection schemes. Either in real time or nightly the casino is alerted to suspicious activity. For instance, chat spamming is a problem most casinos have measures to prevent. They don’t appreciate people coming on their site and pimping other poker sites in their table chat. Now, if they see a single user who has shot off 50 single line messages to 50 different tables that would be very suspicious. Some casinos might have the ability to monitor this in real time and might block the chat privileges of that user immediately. Others, less automated, casinos might generate a report that evening that their abuse department would review and decide whether or not to take action against the player.
If the casino has misidentified the abusive patterns, the data collection is flawed, or the casino has established too aggressive of a threshold then they run the risk of wrongly accusing players of misdeeds; a false positive.
Of course, online poker companies don’t have a monopoly on generating false positives. Email spam filtering has a similar challenge. If the filters are set too aggressively then innocent mail is marked as spam. If the filters are too passive, spam ends up in the user’s in-box. Generally, you want to err on the side of too passive but continuously refine your detection algorithms to become more accurate.
Probably the most difficult aspect of generating a quality algorithm is that not all data is of equal importance. For instance, going back to the email spam example, let’s examine a few pieces of information that a spam filter might look at to determine if a piece of email is spam.
HTML – Most spam is sent as HTML. Marketers like putting pictures, bolded phrases, and other formatting into their message in order to enjoy maximum impact.
Forged Headers – Along with the to, from, and subject lines of an email there are other pieces of information in the header of an email message that most software programs hide from the user. Many spammers will attempt to forge some of this header information by inserting false data in the hopes of bypassing more sophisticated spam filtering or to disguise the true source of the email
I could keep on going. There are literally hundreds of characteristics that spam detection software might look at. The point is that some of that information should weigh more heavily than other pieces of information. Forged email headers is a much more reliable indicator than is whether or not the email message has HTML in it. But, forged email headers by itself might not be strong enough indicator. There are legitimate reasons why a header might appear to be forged.
The way to address this problem is to use a scoring system. A forged header might be +50 and HTML might be +5. At some threshold number (perhaps +100) the software might be confident enough that the message is likely spam that it will label it as such. The real challenge is that the human beings who assign scores to each variable and set threshold levels can and do make mistakes.
That sounds like the problem the 2+2’ers mom ran into. She somehow tripped enough triggers to get herself labeled as a bot. Once the system labeled her, the human was reluctant to allow her to represent her claims to the contrary. And from what was said about the incident in the 2+2 thread, the criteria being used to label a user as a bot seemed a little hyper-aggressive. Unfortunately, since catching bots and writing bots which can evade being caught is a cat and mouse game, the casino isn’t likely to reveal enough about what set off the trigger to allow the accused to offer a possible explanation. Worse, that same attitude doesn’t allow the casino to better refine it’s detection by learning what the user was doing to trigger the false positive.