This post is also available in: heעברית (Hebrew)


A new machine learning algorithm can break text-based CAPTCHA systems with less effort, faster, and with higher accuracy than all previous methods. It has been developed by academics from UK and China. CAPTCHA is a program or system used extensively to distinguish human from machine input. The research results mean that this first security defense of many websites is no longer reliable

The new algorithm is based on the concept of GAN, Generative Adversarial Network, a special class of artificial intelligence algorithms that are useful in scenarios where the algorithm doesn’t have access to large quantities of training data.

This means that attackers won’t need to buy and keep paying for expensive cloud computing servers in order to break text CAPTCHAs in real time on websites. Once an attacker has trained a text CAPTCHA algorithm, they can run it on a regular PC or web server, and launch coordinated DDoS or spam-posting attacks on websites where that CAPTCHA service is in use.

Classing machine learning algorithms usually require millions of data points to train the algorithm in performing a task with the desired degree of accuracy. A GAN algorithm has the advantage that it can work with a much smaller batch of initial data points.

This concept was applied by the researchers to breaking text CAPTCHAs, which, in the vast

majority of previous research studies, have only been tested with classic machine learning algorithms trained with large quantities of initial data points. Researchers argued that in a real-world scenario, an attacker wouldn’t be able to generate millions of CAPTCHAs on a live website or API without being detected and banned. However, the effort and cost for launching an attack such as the one tested by the current research on a particular captcha scheme is low.

Once they’ve collected and trained their GAN solvers by generating up to 200,000 “synthetic” CAPTCHAs, researchers tested their algorithms against other text CAPTCHAs systems used across the Internet, and which had been previously tested by other researchers in prior academic works.

The team said their method was able to solve text CAPTCHAs with a 100 percent accuracy rate on sites like Megaupload, Blizzard, and Authorize.NET. In addition, their method also achieved better accuracy on absolutely all other CAPTCHA systems used on the other 30 sites they tested -which included the likes of Amazon, Digg, Slashdot, PayPal, Yahoo, and QQ, just to name a few.