Breaking the Silk Road's Captcha

gitamar · 2014-09-26T23:19:30+00:00

Interesting write up! Should be cross-posted to /r/machinelearning or even /r/computervision.

mutantturkey · 2014-09-26T18:00:20+00:00

I wanna see the cocaine stock ticker!

dd_123 · 2014-09-26T13:26:45+00:00

A great example of why you should always try to avoid creating your own captcha scheme. I can't count the amount of times people (especially on forums) think they've come up with some great new scheme which is in fact relatively easy for computers to solve with minimal effort.

YourTechGuy · 2014-09-26T15:06:36+00:00

Interesting article, but I wouldn't have gone the letter frequency route. While it's easier to use these general frequencies, creating a Markov chain from the trained examples would probably yield better results.

Optimally, I think the best route would be to fuse multiple classifiers (one based on image data, another on Markov, another on something like Levenshtein distance from words in a chosen dictionary).

It's a great starter though (far better than random guessing at 1/((26^5)*1000)), I hope he publishes the CAPTCHA files he collected so other people can try their hand at it.

miekao · 2014-09-26T17:58:48+00:00

"I wrote a Mechanize tool that downloaded 2,000 captcha examples from the site: one every two seconds. Then I solved them all by hand, renaming the files to (solution).jpg. That was not fun."

Well, holy shit

land_stander · 2014-09-26T14:43:14+00:00

Interesting read. Glad I am not the only developer who has off the wall ideas like this that never quite reach fruition :)

I'm playing around with some machine learning stuff with OpenCV right now. I think I'll spend some time on that this weekend now.

nilknarf · 2014-09-26T14:56:20+00:00

I also wrote about solving weak captchas before (but for much simpler captchas): http://franklinta.com/2014/08/24/solving-captchas-on-project-euler/.

I think the weakness was the same for both of these cases: using a uniform font reduced the problem to image template matching!

DanAtkinson · 2014-09-26T22:04:26+00:00

Hey! This is a bit random but I submitted a pull request to correct a couple of words in the readme file. For some reason, Wired Magazine thinks that it's all my work and would like an interview.

Whilst I've corrected him, you may wish to reach out to him personally - @a_greenberg.

Booshanky · 2014-09-26T14:19:41+00:00

I've never bothered to check out the silk road or evolution. I know it's done through TOR, but anyone know a good FAQ somewhere? Google is mostly full of new stories, maybe my search terms suck, haha.

holambro · 2014-09-26T16:52:29+00:00

So does this totally debunk the assertion by the FBI that they found the SR server because of a leaky captcha? It certainly appears that way to me.

DPR's laywers should go have a chat with Mike and see if he's willing to testify on their behalf.

mserenio · 2014-09-26T14:23:12+00:00

I am more of a front-end, design guy but I kinda understood how he went about it. Awesome stuff.

dummer_august · 2014-09-26T18:26:42+00:00

The hardest part for me in this article would be to solve 2000 captchas by hand. (I usually get 4 out of 5 wrong)

tigertom · 2014-09-27T00:27:26+00:00

He is surprised that J is so rare, if you have ever played scrabble you would know that - it's how the scoring works

1 point: E ×12, A ×9, I ×9, O ×8, N ×6, R ×6, T ×6, L ×4, S ×4, U ×4

2 points: D ×4, G ×3

3 points: B ×2, C ×2, M ×2, P ×2

4 points: F ×2, H ×2, V ×2, W ×2, Y ×2

5 points: K ×1

8 points: J ×1, X ×1

10 points: Q ×1, Z ×1

snkscore · 2014-09-27T01:57:27+00:00

Great write up!

XTornado · 2014-09-26T19:04:48+00:00

I would have payed somebody else to solve the 2000 captchas :P or much better I would have done a Faucet for some new "altcoin" or somethign similar requiring people to solve a captcha and using this images and saving what people types.

octnoir · 2014-09-27T04:35:36+00:00

Very nice article and way to segment step by step how to break a Captcha and use existing tech/methods to do so.

This is a prime example of a security principle: Captcha is not an unbreakable wall - it will be gotten over and is simply a small obstacle.

The question is how much of an obstacle do you want to create, and how do you deal with the ones that go over. The latter is more important for you, the web site creator, to answer which very few consider once putting up a simple captcha thinking they are safe from automation/bots.

xmsxms · 2014-09-26T22:01:47+00:00

Given those captchas I would have expected a 100% success rate, not 50%. No offence to the author, but much more difficult captchas are solved with a much higher success rate.

muyuu · 2014-09-26T16:23:43+00:00

No surprise there. The captcha is as terrible as it gets.

Crashthatch · 2014-09-26T16:53:54+00:00

Great writeup. Very interesting.

I've often wondered how hard it would be to write something to crack some of these "easy" custom captchas. Had never thought of using a spell-checker / wordlist to spot "impossible" words and improve.

nakilon · 2014-09-26T21:19:06+00:00

Ruby FTW

thefallingoff · 2014-09-26T18:45:45+00:00

[deleted]

programming

MODERATORS

programming

MODERATORS

Welcome to Reddit.

Want to add to the discussion?