What’s the hardest word to guess, when playing hangman? I’ll come back to that.
Last year, Nick Berry wrote a fantastic blog post about the optimal strategy for Hangman. He showed that the best guessesto make to get your first “hit” in a game of hangman arenot the most-commonly occurring letters in written English, because these aren’t the most commonly-occurringletters in individual words. He also showed that the first guesses should be adjusted based on the length of the word (the most common letter in 5-letter words is ‘S’, but the mostcommon letter in 6-letter words is ‘E’). In short: hangman’s a more-complex game than you probably thought it was! I’d like to take his work a step further, and work out which word isthehardest word: that is – assuming you’re playing an optimal strategy, what word takes the most-guesses?
First, though, we need to understand how hangman is perfectly played. Based on the assumption that the “executioner” player is choosing words randomly, and that no clue is given as tothe nature of the word, we can determine the best possible move for all possible states of the game by using a data structure known as a tree. Suppose our opponent has chosen athree-letter word, and has drawn three dashes to indicate this. We know from Nick’s article that the best letter to guess is A. And then, if our guess iswrong, the nextbest letter to guess is E. But what if our first guess is right? Well, then we’ve got an “A” in one or more positions on the board, and we need to work out thenext bestmove: it’s unlikely to be “E” – very few three-letter words have both an “A” and an “E” – and of course what letter we should guess next depends entirely on whatpositionsthe letters are in.
What we’re actually doing here is afiltering exercise: of all of the possible letters we could choose, we’re considering what possible results that could have. Then foreach of those results, we’re considering what guesses we could makenext, and so on. At each stage, we compare all of the possible moves to a dictionary of all possiblewords, and filter out all of the words itcan’t be: after our first guess in the diagram above, if we guess “A” and the board now shows “_ A _”, then we know that of the600+ three-letter words in the English language, we’re dealing with one of only about 134. We further refine our guess by playing the odds: of those words, more of them have a “C” inthan any other letter, so that’s our second guess. If it has a C in, that limits the options further, and we can plan the next guess accordingly. If itdoesn’t have a Cin, thatstill provides us with valuable information: we’re now looking for a three-letter word with an A in the second position and no letter C: that cuts itdown to 124 words (and our next guess should be ‘T’). This tree-based mechanism for working out the best moves is comparable to that used by other game-playing computers. Hangman issimple enough that it can be “solved” by contemporary computers (like draughts –solved in 2007 – but unlike chess: while modern chess-playingcomputers can beat humans, it’s still theoretically possible to build future computers that will beat today’s computers).
Now that we can simulate the way that a perfect player would play against a truly-random executioner, we can use this to simulate games of hangman forevery possible word(I’m using version 0.7 of this British-English dictionary).In other words, we set up two computer players: the first chooses a word from the dictionary, the second plays “perfectly” to try to guess the word, and we record how many guesses ittook. So that’s what I did. Here’s the Ruby code I used. It’s heavily-commented andprobably pretty understandable/good learning material, if you’re into that kind of thing. Or if you fancy optimising it, there’s plenty of scope for that too (I knocked it out on alunch break; don’t expect too much!). Or you could use it as the basis to make a playable hangman game. Go wild.
Running the program, we can see that the hardest three-letter word is “xxv”, which would take 22 guesses (20 of them wrong!) to get. But aside from the roman numeral for 25, I don’tthink that “xxv” is actually a word. Perhaps my dictionary’s not very good. “Oak”, though, is definitely a word, and at 20 guesses (17 wrong), it’s easily enough to hang your opponentno matter how many strokes it takes to complete the gallows.
There are more tougher words in the four-letter set, like the devious “quiz”, “jazz”, “zinc”, and “faux”. Pick one of those and your opponent – unless they’ve seen this blog post! – isincredibly unlikely to guess it before they’re swinging from a rope.
As we get into the 5, 6, and 7-letter words you’ll begin to notice a pattern: that thehardest words with any given number of letters geteasier the longerthey are. That’s kind of what you’d expect, I suppose: if there were a hypothetical word that contained every letter in the alphabet, then nobody would ever fail to (eventually) get it.
When we make a graph of each word length, showing which proportion of the words require a given number of “wrong” guesses (by an optimised player), we discover a “sweet spot” window inwhich we’ll find all of the words that an optimised player will always fail to guess (assuming that we permit up to 10 incorrect guesses before they’re disqualified). Thewindow seems small for the number of times I remember seeing people actually lose at hangman, which implies to me that human players consistently play sub-optimally, and do notadequately counteract that failing by applying an equal level of “smart”, intuitive play (knowing one’s opponent and their vocabulary, looking for hints in the way the game ispresented, etc.).
In case you’re interested, then, here are the theoretically-hardest words to throw at your hangman opponent. While many of the words there feel like they would quite-rightly bedifficult, others feel like they’d be easier than their ranking would imply: this is probably because they contain unusual numbers of vowels or vowels in unusual-but-telling positions,which humans (with their habit, inefficient under normal circ*mstances, of guessing anextended series of vowels to begin with) might be faster to guess than acomputer.
Word | Guesses taken | “Wrong” guesses needed |
---|---|---|
quiz | 24 | 20 |
jazz | 22 | 19 |
jazzy | 22 | 18 |
quaff | 22 | 18 |
zinc | 21 | 17 |
oak | 20 | 17 |
vex | 20 | 17 |
vox | 20 | 17 |
foxing | 22 | 16 |
foxed | 21 | 16 |
queued | 20 | 16 |
fuzzy | 20 | 16 |
quay | 20 | 16 |
pinup | 20 | 16 |
fox | 19 | 16 |
yuk | 19 | 16 |
vaquero | 22 | 15 |
jazzier | 21 | 15 |
quizzed | 21 | 15 |
hazing | 21 | 15 |
favour | 21 | 15 |
yoking | 21 | 15 |
quays | 20 | 15 |
quark | 20 | 15 |
joked | 20 | 15 |
guyed | 20 | 15 |
foyer | 20 | 15 |
bumph | 20 | 15 |
huge | 19 | 15 |
quip | 19 | 15 |
gibe | 19 | 15 |
rump | 19 | 15 |
guan | 19 | 15 |
quizzed | 19 | 15 |
oaks | 19 | 15 |
murk | 19 | 15 |
fezzes | 19 | 15 |
yuck | 19 | 15 |
keno | 19 | 15 |
kazoo | 19 | 15 |
Download a longer list (there’s plenty more which you’d expect to “win” with) |
If you use this to give you an edge in your next game, let me know how it works out for you!