Title image of Writing an algorithm to solve Wordle

Writing an algorithm to solve Wordle

8 February 2022

·
C#

What is Wordle?

Wordle is a word guessing game that exploded in popularity at the start of 2022. I’d be surprised if you haven’t tried it. It’s literally everywhere atm. But just in case I’m still going to explain the rules.

It’s really simple: you get 6 guesses to guess a word. The word is always 5 letters. After each guess you get told which of your letters are a) contained in the word and b) in the correct position.

This game is perfect for an algorithm. It’s one of those games where you develop an algorithm in your head as you play. So let’s turn up the lo-fi focus music and write some code.

The game Wordle

Before we start:

The code is written in C# and can be run as a simple console application.

Also when building a word guessing algorithm it helps to have some words to guess. Luckily you can find anything on Github. I found a repo with a decent list so a couple 100k words. So yeah thank you random Github repo.

Dumb Algorithm

Start simple and build on it. We’re going to start with a dumb algorithm then improve it. Our dumb algorithm is not going to bother remembering the correct positions of letters. Or even which letters are confirmed to not be in the word.

We’re just going to keep a list of which letters are confirmed in the word. And use them to randomly guess a word until we find the Wordle.

Diagram of possible letter of ‘Sharp’

Narrowing down the available words

Code

This keeps the logic simple enough for a single method:

public class DumbAlgorithm : IWordleAlgorithm
{
    public int GuessWorld(string wordle)
    {
        var words = File.ReadAllLines("words_alpha.txt")
            .Where(x => x.Length == wordle.Length)
            .ToArray();

        var knownLetters = new List<char>();
        var guesses = new List<string>();
        var rnd = new Random();

        guesses.Add(words[rnd.Next(words.Length)]);

        while (guesses.Last() != wordle)
        {
            // Find which guess letters are contained in the Wordle
            var foundLetters = guesses.Last().Where(x => wordle.Contains(x));
            knownLetters.AddRange(foundLetters.Where(x => !knownLetters.Contains(x)));

            Console.WriteLine($"Guess #{guesses.Count()}: {guesses.Last()} " +
                $"found {foundLetters.Count()} letters: {string.Join(',', foundLetters)}");

            // Filter our word choices for the next guess based on the known letters
            words = words.Where(x =>
                knownLetters.All(y => x.Contains(y) &&
                !guesses.Last().Contains(x))).ToArray();

            guesses.Add(words[rnd.Next(words.Length)]);
        }

        Console.WriteLine($"Wordle guessed in {guesses.Count()} guesses!");

        return guesses.Count();
    }
}

Results

Let’s see how it performs guessing the word Sharp:

Guess #1: froth found 2 letters: r,h
Guess #2: shier found 3 letters: s,h,r
Guess #3: trush found 3 letters: r,s,h
Guess #4: grush found 3 letters: r,s,h
Guess #5: hyrse found 3 letters: h,r,s
Guess #6: ashur found 4 letters: a,s,h,r
Guess #7: tahrs found 4 letters: a,h,r,s
Guess #8: shear found 4 letters: s,h,a,r
Guess #9: haars found 5 letters: h,a,a,r,s
Guess #10: sharn found 4 letters: s,h,a,r
Guess #11: harts found 4 letters: h,a,r,s
Guess #12: hards found 4 letters: h,a,r,s
Guess #13: brash found 4 letters: r,a,s,h
Guess #14: haras found 5 letters: h,a,r,a,s
Guess #15: hares found 4 letters: h,a,r,s
Guess #16: rheas found 4 letters: r,h,a,s
Wordle guessed in 17 guesses!

Ouch 😂

I was expecting it to be bad but I didn’t expect it to be 17 guesses bad.

To make sure this result isn’t an outlier we should run try a few words and see what the average is. For a good test I’ve separated 500 words from our words list. Testing each of them 500 times each will weed out any outliers. This gives us 250,000 runs of our algorithm which is more than sufficient to a decent average.

The test gave us an average of 24.99 guesses per Wordle.

Yeah… time to move on to a smarter version I think.

Smart Algorithm

Our smart algorithm will build on our dumb algorithm. The dumb algorithm was a good start but we need to make more intelligent decisions. Instead of randomly picking a word that contains the same letters we also need to make sure they’re also in the same position.

Better comparison of our guess word and the Wordle will give us more information to use. From this we can have more detailed rules for picking our next guess:

Old rules for picking the next guess:

  • Must be a new guess
  • Guess must contain all found letters

New rules:

  • Must be a new guess
  • Guess must contain all found letters
  • Guess must have found letters in the found position
  • Guess must not have found letters in the confirmed incorrect position
  • Guess must not contain any letters confirmed not in the word.

Code

Because this version is slightly more complicated we can’t jam it into a single method. We need to split it into it’s separate components.

First let’s start with a better way to compare our guess and the Wordle:

public struct LetterResult
{
    public char Letter;
    public int Position;
    public bool FoundInWord;
    public bool RightPosition;
}

internal static class WordComparer
{
    public static IEnumerable<LetterResult> Compare(string wordle, string guess)
    {
        var results = new List<LetterResult>();

        for (int i = 0; i < guess.Length; i++)
        {
            var result = new LetterResult
            {
                Letter = guess[i],
                Position = i,
            };

            for (int wI = 0; wI < wordle.Length; wI++)
            {
                if (guess[i] == wordle[wI])
                {
                    result.FoundInWord = true;

                    if (i == wI)
                    {
                        result.RightPosition = true;
                    }
                }
            }

            results.Add(result);
        }

        return results;
    }
}

Got to love a loop in a loop.

This excellently named class just loops through each letter of the guess and compares it each of the World’s letters. It records if each letter is found and if it is in the correct position.

From here we need to store the state of each letter we guess. The GuessLetter and GuessState classes are where we do this:

internal class GuessLetter
{
    public bool LetterInWordle { get; set; }
    public List<int> YellowPositions { get; set; }
    public int? FoundPosition { get; set; }

    public readonly char Letter;

    public GuessLetter(char guessLetter)
    {
        Letter = guessLetter;
    }
}

internal class GuessState
{
    public List<GuessLetter> guessInfo { get; private set; }

    public GuessState()
    {
        this.guessInfo = new List<GuessLetter>();
    }

    public void AddGuessResults(IEnumerable<LetterResult> guessResults)
    {
        //.....
    }
}

Next we can validate potential guesses against this state:

internal static class PotentialGuessValidator
{
    public static bool Validate(string potentialGuess, List<string> previousGuesses, GuessState guessState)
    {
        if (previousGuesses.Contains(potentialGuess))
        {
            return false;
        }

        foreach (var guessLetter in guessState.guessInfo)
        {
            if (guessLetter.LetterInWordle && !potentialGuess.Contains(guessLetter.Letter))
            {
                return false;
            }

            if (!guessLetter.LetterInWordle && potentialGuess.Contains(guessLetter.Letter))
            {
                return false;
            }

            if (guessLetter.CorrectPosition != null && potentialGuess[guessLetter.CorrectPosition.Value] != guessLetter.Letter)
            {
                return false;
            }

            foreach (var incorrectPosition in guessLetter.IncorrectPositions)
            {
                if (potentialGuess[incorrectPosition] == guessLetter.Letter)
                {
                    return false;
                }
            }
        }

        return true;
    }
}

The Validate method is a simple set of if statements to validate any potential guess complies to the rules we set out earlier.

And finally we can pull it all together into a method we can use:

public int GuessWorld(string wordle)
{
    var words = _words.Where(x => x.Length == wordle.Length).ToArray();

    var rnd = new Random();

    var guessState = new GuessState();

    var guesses = new List<string>();
    guesses.Add(words[rnd.Next(words.Length)]);

    while (guesses.Last() != wordle)
    {
        var results = WordComparer.Compare(wordle, guesses.Last());

        Console.WriteLine($"Guess #{guesses.Count()}: {guesses.Last()} " +
            $"found {results.Where(x => x.FoundInWord).Count()} " +
            $"letters: {string.Join(',', results.Where(x => x.FoundInWord).Select(x => x.Letter))}");

        guessState.AddGuessResults(results);

        words = words.Where(x => PotentialGuessValidator.Validate(x, guesses, guessState)).ToArray();

        guesses.Add(words[rnd.Next(words.Length)]);
    }

    Console.WriteLine($"Wordle guessed in {guesses.Count()} guesses!");

    return guesses.Count();
}

There’s definitely better ways of structuring this code but who cares? let’s see some results!

Results

Let’s try running it on the word sharp:

Guess #1: snoga found 2 letters: s,a
Guess #2: samal found 3 letters: s,a,a
Guess #3: spacy found 3 letters: s,p,a
Guess #4: staph found 4 letters: s,a,p,h
Wordle guessed in 5 guesses!

That’s abit better!

And from our tests we got an average of 5.17 guesses. A really big improvement from the dumb version which brings the average down below the Wordle 6 maximum number of guesses.

Amazing what a few extra guessing rules can achieve. Can we improve even more?

Big Brain Algorithm

Okay here we go big brain time.

Till this point we’ve been filtering our guessing words then picking one at random. This works but I think we can do better.

When playing the game, a player will often choose a word with alot of vowels as their first word. This is because vowels are very common. Eliminating them can significantly narrow down the word possibilities We could apply this same logic to our algorithm to rank our guesses by how many possibilities they will eliminate.

Letter frequency in English is well defined we won’t need to write anything to work this out. According to this handout from the University of Notre Dame, the most common letter is ‘E’ which appears in 11.16% of English words. At the other end of the scale is the letter ‘Q’ which only appears in 0.19% of words.

In terms of Wordle this means the Wordle is 56 times more likely to contain a the letter ‘E’ compared to ‘Q’.

That’s not all we can do. There’s another parameter we can rank with think would be fun to use. It’s the most common positions for those letters. If we’re going to guess with the most common letter ‘E’ shouldn’t we guess with it in it’s most common position?

A quick bit of code to work this out:

public static int Score(string guess,
    Dictionary<char, int> letterCommonalityScores,
    Dictionary<char, Dictionary<int, int>> letterPositionalScores)
{
    var score = 0;

    var usedLetters = new List<char>();

    for (int i = 0; i < guess.Length; i++)
    {
        // Only score each letter once otherwise we end up
        // with a word full of 'e's
        if (!usedLetters.Contains(guess[i]))
        {
            score = score + letterCommonalityScores[guess[i]];
        }

        score = score + letterPositionalScores[guess[i]][i];

        usedLetters.Add(guess[i]);
    }

    return score;
}

And this give us results like these for the letters ‘E’ and ‘G’ in 5 letter words:

Most likely positions of the letters E and G

If you’re going to guess a word with the letter ‘G’ the best position the first position. And for the letter ‘E’ it is the 4th position.

Combining most common letters and most common positions into a scoring system is the easiest way to rank our guesses. We can assign each guess points based on the following:

LetterPoints Value
e26
….….
q1

Most common letters

PositionPoints Value
1st5
5th1

Most common position for each letter

Results

Guess #1: saner found 3 letters: s,a,r
Guess #2: soral found 3 letters: s,r,a
Guess #3: sutra found 3 letters: s,r,a
Guess #4: shari found 4 letters: s,h,a,r
Wordle guessed in 5 guesses!

Our test word ‘Sharp’ took 5 guesses which is the same as our smart algorithm. But interestingly our first guess found three correct letters compared to just 2 from our smart algorithm.

From our tests we get an average of 4.75 which is another improvement!

Conclusion

I’m happy we managed to get the average number of guesses down from 24.99 to 4.75. Although I was expecting to be able to get lower.

The big issue now is the word data set. Most of the words are very uncommon and would not appear in the game Wordle. On the last set of results it guessed the words “saner”, “soral”, “sutra” and “shari” none of which I’ve ever heard of.

A cool data set to include would be how often each word is actually used in the English language. That way we could rank our guesses based on how common the word is as well how common the letters are. If anyone wants to try this I love love to see the results 😀

All code available on Github to play with: https://github.com/Liam-Hunt/wordle-solving-algorithm