Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

We spend hours on Instagram and YouTube and waste money on coffee and fast food, but won’t spend 30 minutes a day learning skills to boost our careers.
Master in DevOps, SRE, DevSecOps & MLOps!

Learn from Guru Rajesh Kumar and double your salary in just one year.

Get Started Now!

Machine learning techniques applied to crack CAPTCHAs

Source: portswigger.net

F-Secure says it’s achieved 90% accuracy in cracking Microsoft Outlook’s text-based CAPTCHAs using its AI-based CAPTCHA-cracking server, CAPTCHA22.

For the last two years, the security firm has been using machine learning techniques to train unique models that solve a particular CAPTCHA, rather than trying to build a one-size-fits-all model.

And, recently, it decided to try the system out on a CAPTCHA used by an Outlook Web App (OWA) portal.

The initial attempt, according to F-Secure, was comparatively unsuccessful, with the team finding that after manually labelling around 200 CAPTCHAs, it could only identify the characters with an accuracy of 22%.

The first issue to emerge was noise, with the team determining that the greyscale value of noise and text was always within two distinct and constant ranges. Tweaks to the tool helped filter out the noise.

The team also realized that some of the test CAPTCHAs had been labelled incorrectly, with confusion between, for example, ‘l’ and ‘I’ (lower case ‘L’ and upper case ‘i’). Fixing this shortcoming brought the accuracy up to 47%.

Pyppeteer pulls the strings

More challenging, though, was handling the CAPTCHA submission to Outlook’s web portal.

There was no CAPTCHA POST request, with the CAPTCHA instead sent as a value appended to a cookie. JavaScript was used to keylog the user as the answer to the CAPTCHA was typed.

“Instead of trying to replicate what occurred in JS, we decided to use Pyppeteer, a browsing simulation Python package, to simulate a user entering the CAPTCHA,” said Tinus Green, a senior information security consultant at F-Secure

“Doing this, the JS would automatically take care of the submission for us.”

Green added: “We could use this simulation software to solve the CAPTCHA whenever it blocked entries and once solved, we could continue with our conventional attack, hence automating the process once again.

“We have now also refactored CAPTCHA22 for a public release.”

CAPTCHA the flag

CAPTCHAs are challenge-response tests used by many websites in an attempt to distinguish between genuine requests to sign-up to or access web services by a human user and automated requests by bots.

Spammers, for example, attempt to circumvent CAPTCHAs in order to create accounts they can later abuse to distribute junk mail.

CAPTCHAs are something of a magnet for cybercriminals and security researchers, with web admins struggling to stay one step ahead.

Late last year, for example, PortSwigger Web Security uncovered a security weakness in Google’s reCAPTCHA that allowed it to be partially bypassed by using Turbo Intruder, a research-focused Burp Suite extension, to trigger a race condition.

Soon after, a team of academics from the University of Maryland was able to circumvent Google’s reCAPTCHA v2’s anti-bot mechanism using a Python-based program called UnCaptcha, which could solve its audio challenges.

Green said: “There is a catch 22 between creating a CAPTCHA that is user friendly – grandma safe as we call it – and sufficiently complex to prevent solving through computers. At this point it seems as if the balance does not exist.”

Web admins shouldn’t, he says, “give away half the required information” through username enumeration, and users should be required to set strong pass phrases conforming to NIST standards.

And, he adds: “Accept that accounts can be breached, and therefore implement MFA [multi-factor authentication] as an additional barrier.”

Related Posts

What is Machine Learning and what are the Types of Machine Learning Tools Available?

What is Machine Learning? Machine Learning is a subfield of Artificial Intelligence that incorporates statistical models and algorithms to help computer systems learn from data and improve Read More

Read More

What is an Autonomous System and what are Applications of Autonomous Systems?

Introduction to Autonomous Systems Autonomous systems, once the stuff of science fiction, have become a reality in our world today. From self-driving cars to drones, robots, and Read More

Read More

What is Predictive Analytics and what is the Types of Predictive Analytics Tools

Introduction to Predictive Analytics Tools As businesses continue to collect vast amounts of data, it becomes increasingly challenging to make informed decisions that drive growth and improve Read More

Read More

What is Neural Network Libraries and What are the popular neural network libraries available today?

1. Introduction to Neural Network Libraries Neural networks are being used more and more in today’s technology landscape, powering everything from image recognition algorithms to natural language Read More

Read More

What is Reinforcement Learning and What are Reinforcement Learning Libraries?

Introduction to Reinforcement Learning Reinforcement learning is a machine learning technique that involves training an agent to make decisions based on trial and error. It is an Read More

Read More

What are Graphical Models? Why use Graphical Models Libraries and Types of Graphical Models Libraries?

Graphical Models Libraries are powerful tools that allow developers and data scientists to build complex models with more accuracy and less complexity. These libraries help in capturing Read More

Read More
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x