> mthadley_

Akinator versus AI

The other night I was having trouble sleeping and my mind wandered to, of all the things, that old Akinator game that was popular in the mid-aughts. Amazingly it’s still online, and looks like there’s even mobile apps, but my memory was of playing it on the web.

If you don’t know Akinator, it’s basically a bot that tries to a guess a character you are thinking of based on answers to yes or no questions. Ostensibly a computer playing twenty questions, or so I’ve read. When I was younger Akinator blew my mind, though my older programmer self is less impressed. Not that I could do a better job or anything.

So in a fit of nostalgia I fired up Akinator and gave it a softball of Luke Skywalker and it did… kinda badly? It got the answer in roughly 40 questions, and some of the questions were interesting:

Does your character play in ‘Harry ‘Potter’?”

I guess there’s a lot of Potter fans out there such that Akinator needs to lead with this one.

“Is your character a famous YouTuber?”

YouTube was not (or maybe barely) a thing when I first played Akinator, but understandable. I wonder how many times it guesses “Mr. Beast”.

“Does your character talk about basketball?”

It asked this question after I had already affirmed that I was thinking of a Star Wars character. Do I now know something about Yoda?

And really a bunch of other odd ones. It first guessed Anakin Skywalker around, and after continuing it landed on Lego Luke Skywalker, which I gave it.

The Competition

That was fun, but not as impressive as I remember. It got me thinking, with the AI “revolution” in full-swing, would any old LLM do better than Akinator nowadays? I decided to do an experiment, and since I’m a Kagi Assistant user, try an Akinator prompt across a range of different models:

You are Akinator, a mind-reading genie. I will think of a character and you will try to guess that character based on my answers to your yes or no questions.

As you ask each question, number them so that it is easy to see how many questions were asked. Instead of asking a question, you may guess the character, but you only have three guesses before you lose the game. You may ask as many questions as you like before guessing.

Here’s how well they each did. Fewer guesses and questions the better. All played with “Luke Skywalker” as the same answer in mind.

Model Outcome Guesses Questions
DeepSeek Chat V3 1 6
Nova Pro 1 7
GPT 4o 1 11
Claude 3 Opus 1 11
Llama 3.3 70B 1 12
Claude 3.5 Haiku 1 13
GPT 4o mini 3 18
Claude 3.5 Sonnet 3 19
Gemini Pro 3 35
Llama 3.1 405B 3 23
Nova Lite 3 24
Mistral Pixtral 3 26
Mistral Large 3 57
Qwen QwQ 32b 0 208

Clearly many of the LLM’s out-akinatored Akinator. I didn’t save any of the transcripts but if you’re interested I’d say it’s more fun to just try the above prompt yourself. Here are some of the quirks of each model I noticed:

At times I worried I had led the model down a bad road with an answer to a wishy-washy question. While I’m certainly a factor in this “experiment”, I decided its still fair to judge the model for asking the question in the first place.

Worth noting that I’m not an expert on AI models, and so this may have been a poor test for some of these. Like if Star Wars material never appears in any of its training data, its probably very hard to ever get the right answer.