Wordle - Frequency analysis approach
An attempt using frequency analysis

Wordle - Frequency analysis approach

Like most of you, my social media feed was recently filled with strange-looking green, black, and yellow squares with Wordle scores. I had no idea initially what it was but the continual floods of it made me curious enough to find out what the hype was all about.

After learning about the game, it's a simple twist to the game Mastermind which we used to play as kids. And following my habit of losing friends by codifying lazy solutions to games (you might recall I have already spoilt the game of Sudoku for some), I decided to analyse whether there was an efficient way to guess based on statistical measures.

In this game, the search space comprises of English words of 5 characters in length. You might already know that there are some sites that have dug deeper into the Wordle source codes and found the word list that was used. For my approach, I kept it at a generic English word list downloaded from here. The only drawback to this approach is that some high-likelihood words may be rejected and you have to pick from a list of recommended words instead.

An initial naïve approach seems simple. Break up the word list into its characters and perform a simple frequency distribution analysis of the occurrences.

No alt text provided for this image

Looking at the diagram above, the top 5 frequently occurring alphabets would be 'a', 'e', 's', 'o', 'r'. And immediately one might already think of 'arose' as a likely word to attempt.

But this naïve approach ignores the positions of the characters, and instead only aggregates the count across the entire corpus. What if we compute the frequencies based on each of the 5 possible positions? As seen from the table below, the distributions differ as one might expect.

No alt text provided for this image

For example, the most common 5-character word starts with 's' with around an 11.4% distribution, while 'a' is the most common 2nd character with an 18% distribution amongst all the 5-character words.

Thus, an alternative approach we might try would be:

  • compute the distributions across each of the position
  • choose the position and character with the highest frequency (e.g. in the distribution above, we see that 's' in the last position scores the highest at 19.8%)
  • Filter the word list as a posterior on the condition that one of the characters has been locked
  • Repeat the frequency distribution for the remaining 4 positions and selection

No alt text provided for this image

Using this approach, the above analysis suggests that the best word to start from the words_alpha list would be bares. If it was loaded using the Wordle word list gleaned from the source code, it would be the word 'spice' instead.

You might have spotted that I also attempted to use words with 5 unique characters in the first few guesses. This increases our chances of narrowing the search space faster while also avoiding what appeared to be some buggy implementations where repeated characters were inconsistently flagged (I have not verified this extensively yet).

After each word is presented as a guess, the approach above is simply repeated by running the word list through a filtering process based on:

  1. Characters known to be in the right positions
  2. Characters that are not in the word
  3. Characters in the wrong places (must appear, but not in the position tried)

The frequency distribution process is then repeated until the puzzle is solved.

This is probably a fun-killer, and also a trivial implementation. But it highlights how frequency distributions might be used instead of blindly brute-forcing through the search space.

No alt text provided for this image

Hope I don't lose friends by killing the fun from another game - but I hope it sparked some ideas on approaching some day-to-day problems (or fun in this case, sorry) using data and logic.

Code can be found here - though it is not production grade, just something that works for fun.


Dilyara Zaynutdinova

Head of Sales & Marketing | Business Strategy, Commercial Development Lead

3mo

Gerry, thanks for sharing!

Like
Reply

Discuss how Wordle can be used to visualise the frequency of words in each text. Provide an example of a scenarios where wordle might be particularly useful. 

Like
Reply
Clinton O'Grady

Product Leader | Bridging the worlds of product management & social impact | Global Corporate Responsibility @ EY

2y

You’re certainly one of the most interesting people I know, Gerry Chng. Thanks for always doing these deep dives, synthesizing, and rearticulating content in a way that makes it seem easy for the rest of us. With a 2 day losing streak with Worldle (still mortified), you’ve likely saved my pride and self confidence with this article.

Hey Gerry, interesting read :) off the top of my head, I'm wondering if we can use a recommender (system) algorithm to approach the game of Wordle as well...

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics