-
Notifications
You must be signed in to change notification settings - Fork 0
/
wordle.html
381 lines (293 loc) · 18.1 KB
/
wordle.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>typon's website</title>
<link href="css/generated.css" rel="stylesheet">
<link href="https://unpkg.com/[email protected]/themes/prism-solarizedlight.min.css" rel="stylesheet" />
</head>
<body class="bg-gray-600">
<div class="grid grid-cols-12">
<div class="p-4 hidden col-span-3 md:block bg-orange-100 grid grid-rows-2 justify-items-stretch min-h-screen">
<div id="sidebar-header" class="font-urdu justify-self-center text-center p-6 border-8 border-slate-800 ring-2 ring-slate-800 ring-offset-8 ring-offset-orange-100">
<a href="index.html">
<span class="text-6xl text-amber-600">حَسَن</span>
<span class="text-6xl text-orange-900">فاروق</span>
</a>
</div>
<div class="bg-black-100 h-1">
<!-- Divider -->
</div>
</div>
<div id="main" class="p-4 col-span-12 md:col-span-9 bg-orange-50">
<h2 class="font-niceserif text-2xl text-slate-800 font-semibold">Solving Wordle with Z3</h1>
<article class="mt-4 w-3/4 flex flex-col">
<p class="mb-2">
I've been enjoying this viral new word game eponymously named <a href="https://www.powerlanguage.co.uk/wordle/" class="text-slate-800 underline decoration-dotted" >Wordle</a>. The aim of the game is to guess the day's word in six or less tries.
I'll go through an example game to give an idea of the rules.
</p>
<img class="w-1/4 place-self-center" src="game.png" alt="Example game">
<p class="mt-2">
In this particular game:
<ul class="list-inside list-disc">
<li>the hidden word was
<span class="bg-green-900 text-white py-px px-0.5">G</span>
<span class="bg-green-900 text-white py-px px-0.5">O</span>
<span class="bg-green-900 text-white py-px px-0.5">R</span>
<span class="bg-green-900 text-white py-px px-0.5">G</span>
<span class="bg-green-900 text-white py-px px-0.5">E</span>, guessed correctly on the fifth try.
</li>
<li>the first guess was
<span class="bg-amber-400 text-white py-px px-0.5">R</span>
<span class="bg-amber-400 text-white py-px px-0.5">E</span>
<span class="bg-gray-600 text-white py-px px-0.5">N</span>
<span class="bg-gray-600 text-white py-px px-0.5">T</span>
<span class="bg-gray-600 text-white py-px px-0.5">S</span>, which was incorrect. However it gave us clues:
<span class="bg-amber-400 text-white py-px px-0.5">R</span> is found withing in the correct word, but not at this location (same with <span class="bg-amber-400 text-white py-px px-0.5">E</span>) and <span class="bg-gray-600 text-white py-px px-0.5">N</span>, <span class="bg-gray-600 text-white py-px px-0.5">T</span>, <span class="bg-gray-600 text-white py-px px-0.5">S</span> are not found in the correct word.
</li>
<li>the second guess confirms that <span class="bg-green-900 text-white py-px px-0.5">O</span> appears in the correct word and at the same location we guessed.
</li>
<li>Note that letters are allowed to appear more than once in the correct word - however if a letter appears only once, but our guess contains that letter multiple times, all except one instance will be marked gray.
</li>
</p>
<p class="mt-2">
Armed with these rules and an English dictionary, we can encode these rules as constraint satisfaction program and try to solve it using an SMT solver.
My go to SMT solver is <a href="https://github.com/Z3Prover/z3" class="text-slate-800 underline decoration-dotted">Z3</a> because it is simple to install, has well-made python bindings and an easy to use API.
</p>
<p class="mt-2">
With pip, cmake and a C++ compiler as prerequisites, you can simply install z3 with the following command:
<pre><code class="language-bash">$ pip install z3-solver</code></pre>
</p>
<p class="mt-4">
My goal is to model this problem with variables and constraints that are easy to understand - hopefully helpful for people who are new to SMT solvers.
Let's first start by instantiating the z3 solver class and constructing our variables:
<pre><code class="language-python">import z3
word_length = 5
def define_letter_variables():
return [z3.Int(f"letter_{index}") for index in range(word_length)]
if __name__ == "__main__":
solver = z3.Solver()
letter_vars = define_letter_variables()
</code></pre>
We are going to model each letter of our guess as a separate <code class="language-python">z3.Int</code>. z3 can represent arbitrarily large integers - but we only need to model each variable as a letter - ranging from 0 - 25 for the English alphabet. To model our variables in that legal range, let's add some constraints:
<pre><code class="language-python">def add_alphabet_modeling_constraints(solver, letter_vars):
for letter_var in letter_vars:
solver.add(letter_var >= 0, letter_var <= 25)
return solver
</code></pre>
Just to sanity check our work so far, let's run the solver and see what it gives us:
<pre><code class="language-python">letter_to_index_map = {letter: index for index, letter in enumerate("abcdefghijklmnopqrstuvwxyz")}
index_to_letter_map = {index: letter for letter, index in letter_to_index_map.items()}
def pretty_print_solution(model, letter_vars):
word = []
for index, letter_var in enumerate(letter_vars):
word.append(index_to_letter_map[model[letter_var].as_long()])
print(''.join(word))
if __name__ == "__main__":
solver = z3.Solver()
letter_vars = define_letter_variables()
solver = add_alphabet_modeling_constraints(solver, letter_vars)
print("Solving...")
result = solver.check()
print(result)
assert result == z3.sat
model = solver.model()
pretty_print_solution(model, letter_vars)
</code></pre>
<pre><code class="language-bash">$ python wordle.py
Solving...
sat
aaaaa
</code></pre>
A pretty boring result. The solver decided to set each of our letter variables to 0, resulting in the final word guess of <code class="language-python">aaaaa</code>.
</p>
<p class="mt-4">
We'll need to teach the solver about English if we want it to pick words that actually make sense. We can use the dictionary that UNIX-like OS's come with:
<pre><code class="language-python">
def remove_plurals(words):
five_letter_words = list(filter(lambda word: len(word) == 5, words))
four_letter_words = set(filter(lambda word: len(word) == 4, words))
all_five_letter_words_ending_in_s = set(filter(lambda word: word[4] == "s", five_letter_words))
singular_five_letter_words = list(filter(lambda word: not (word in all_five_letter_words_ending_in_s and word[:4] in four_letter_words), five_letter_words))
return singular_five_letter_words
def remove_words_with_invalid_chars(words):
valid_chars_set = set(letter_to_index_map.keys())
def contains_only_valid_chars(word):
return set(word).issubset(valid_chars_set)
return filter(contains_only_valid_chars, words)
def load_dictionary(dictionary_path=None):
with open(dictionary_path, "r") as f:
all_legal_words = set(word.strip() for word in f.readlines())
words = remove_words_with_invalid_chars(all_legal_words)
words = list(words)
words = remove_plurals(words)
return words
if __name__ == "__main__":
...
words = load_dictionary(dictionary_path="/usr/share/dict/words")
</code></pre>
Now that we have loaded the dictionary, cleaned up invalid characters and removed plurals (I've found that Wordle only uses singular words for answer words, if you found otherwise let me know!), we can add
constraints to force the solver to pick only legal words.
<pre><code class="language-python">def add_legal_words_constraints(solver, words, letter_vars):
all_words_disjunction = []
for word in words:
word_conjuction = z3.And([letter_vars[index] == letter_to_index_map[letter] for index, letter in enumerate(word)])
all_words_disjunction.append(word_conjuction)
solver.add(z3.Or(all_words_disjunction))
return solver
if __name__ == "__main__":
...
solver = add_legal_words_constraints(solver, words, letter_vars)
</code></pre>
The <code class="language-python">add_legal_words_constraints</code> function seems innocuous, but it adds a whole bunch of constraints and makes up the meat of the solver. Read in boolean logic, you can think of the solver now having the following constraints, given our dictionary only consisted of two words: "hello" and "world".
<pre><code class="language-python">
(letter_0 == "h" and letter_1 == "e" and letter_2 == "l" and letter_3 == "l" and letter_4 == "o") or
(letter_0 == "w" and letter_1 == "o" and letter_2 == "r" and letter_3 == "l" and letter_4 == "d")
</code></pre>
Now we are ready to run the solver and get valid guesses only! Let's give it a try:
<pre><code class="language-bash">$ python wordle.py
Solving...
sat
badge
</code></pre>
Awesome! We get <code class="language-python">badge</code> as our guess. We can input that into Wordle and see what we get.
For the Wordle on January 10th, 2022, I get the following result:
<img class="mt-2 w-1/4 place-self-center" src="first_guess.png" alt="Badge">
A relatively bad hit - we got 4 letters wrong and only one right, but in the wrong position.
Let's add constraints to the solver to prevent it from picking these letters again and tell it that it should definitely consider <span class="bg-amber-400 text-white py-px px-0.5">E</span> in it's future guesses.
<pre><code class="language-python">def add_doesnt_contain_letter_constraint(solver, letter_vars, letter):
for letter_var in letter_vars:
solver.add(letter_var != letter_to_index_map[letter])
return solver
def add_contains_letter_constraint(solver, letter_vars, letter):
solver.add(z3.Or([letter_var == letter_to_index_map[letter] for letter_var in letter_vars]))
return solver
def add_invalid_position_constraint(solver, letter_vars, letter, position):
solver.add(letter_vars[position] != letter_to_index_map[letter])
return solver
if __name__ == "__main__":
...
for banned_letter in "badg":
solver = add_doesnt_contain_letter_constraint(solver, letter_vars, banned_letter)
solver = add_contains_letter_constraint(solver, letter_vars, "e")
solver = add_invalid_position_constraint(solver, letter_vars, "e", 4)
</code></pre>
We prevent
<span class="bg-gray-600 text-white py-px px-0.5">B</span>
<span class="bg-gray-600 text-white py-px px-0.5">A</span>
<span class="bg-gray-600 text-white py-px px-0.5">D</span>
<span class="bg-gray-600 text-white py-px px-0.5">G</span> from appearing in any position and tell the solver that while <span class="bg-amber-400 text-white py-px px-0.5">E</span> must appear in one of the positions, it can't appear in the fifth position.
Let's run the solver again:
<pre><code class="language-bash">$ python wordle.py
Solving...
sat
every
</code></pre>
Okay, great. It kept the <span class="bg-amber-400 text-white py-px px-0.5">E</span>, but not at the fifth position and didn't guess either of
<span class="bg-gray-600 text-white py-px px-0.5">B</span>
<span class="bg-gray-600 text-white py-px px-0.5">A</span>
<span class="bg-gray-600 text-white py-px px-0.5">D</span>
<span class="bg-gray-600 text-white py-px px-0.5">G</span>.
Let's input that guess and see what we get.
<img class="mt-2 w-1/4 place-self-center" src="second_guess.png" alt="Tuner">
</p>
<p class="mt-4">
Wonderful, we got an exact match for
<span class="bg-green-900 text-white py-px px-0.5">E</span>
<span class="bg-green-900 text-white py-px px-0.5">R</span>
<span class="bg-green-900 text-white py-px px-0.5">Y</span>. Note that the <span class="bg-gray-600 text-white py-px px-0.5">E</span> in the first position is grayed out - this means that it should appear only once in this word. Let's add a constraint for that as well as adding more banning constraints.
<pre><code class="language-python">def add_exact_letter_position_constraint(solver, letter_vars, letter, position):
solver.add(letter_vars[position] == letter_to_index_map[letter])
return solver
def add_letter_appears_once_constraint(solver, letter_vars, letter):
unique_letter_disjunction = []
for letter_var in letter_vars:
this_letter_conjunction = [letter_var == letter_to_index_map[letter]]
for other_letter_var in letter_vars:
if letter_var == other_letter_var:
continue
this_letter_conjunction.append(other_letter_var != letter_to_index_map[letter])
unique_letter_disjunction.append(z3.And(this_letter_conjunction))
solver.add(z3.Or(unique_letter_disjunction))
return solver
if __name__ == "__main__":
...
for banned_letter in "badgv":
solver = add_doesnt_contain_letter_constraint(solver, letter_vars, banned_letter)
solver = add_letter_appears_once_constraint(solver, letter_vars, "e")
solver = add_exact_letter_position_constraint(solver, letter_vars, "e", 2)
solver = add_exact_letter_position_constraint(solver, letter_vars, "r", 3)
solver = add_exact_letter_position_constraint(solver, letter_vars, "y", 4)
</code></pre>
With these new constraints, running the solver produces the guess:
<pre><code class="language-bash">$ python wordle.py
Solving...
sat
fiery
</code></pre>
<img class="mt-2 w-1/4 place-self-center" src="third_guess.png" alt="Tuner">
</p>
<p class="mt-4">
A complete miss :(. Let's prevent the solver from guessing both these letters:
<pre><code class="language-python">if __name__ == "__main__":
...
for banned_letter in "badgvfi":
solver = add_doesnt_contain_letter_constraint(solver, letter_vars, banned_letter)
...
</code></pre>
<pre><code class="language-bash">$ python wordle.py
Solving...
sat
query
</code></pre>
<img class="mt-2 w-1/4 place-self-center" src="fourth_guess.png" alt="Tuner">
</p>
<p class="mt-4">
Yay! The solver guessed the correct word:
<span class="bg-green-900 text-white py-px px-0.5">Q</span>
<span class="bg-green-900 text-white py-px px-0.5">U</span>
<span class="bg-green-900 text-white py-px px-0.5">E</span>
<span class="bg-green-900 text-white py-px px-0.5">R</span>
<span class="bg-green-900 text-white py-px px-0.5">Y</span>
</p>
<p class="mt-4">
While the modeling we have so far works pretty well, we can further improve it by taking into account letter frequencies (how often a letter appears across words) and word frequencies (how often a word appears in a large corpus). In my testing I also found that the solver tended to guess words that I wouldn't really go for because they're words that, while legal, are obscure and never really used in common English. Wordle tends to use words that are not very obscure, but also not very common.
We can use z3's optimization features to encode a cost function - valuing guesses that contain more frequent letters early on and more obscure words later as we have more information about the word's makeup.
</p>
<p class="mt-4">
Here's a snippet of such a model:
</p>
<pre><code class="language-python">def optimize_search(solver, word_to_freq, letter_vars):
def maximize_letter_frequency():
normalized_letter_frequency_map = make_normalized_letter_frequency_map(letter_frequency_map)
for freq_var, letter_var in zip(letter_frequency_vars, letter_vars):
for letter_index, letter in index_to_letter_map.items():
solver.add(z3.Implies(letter_var == letter_index, freq_var == normalized_letter_frequency_map[letter]))
def maximize_word_frequency():
normalized_word_frequency_map = make_normalized_word_frequency_map(word_to_freq)
for word, freq in normalized_word_frequency_map.items():
word_conjuction = z3.And([letter_vars[index] == letter_to_index_map[letter] for index, letter in enumerate(word)])
solver.add(z3.Implies(word_conjuction, word_frequency == freq))
letter_frequency_vars = [z3.Real(f"letter_{index}_frequency") for index in range(len(letter_vars))]
letter_frequency_sum = z3.Real(f"letter_frequency_sum")
word_frequency = z3.Real(f"word_frequency")
# Each letter frequency can be [0, 1.0], we divide the sum by num of letters to get a normalized sum
solver.add(letter_frequency_sum == (z3.Sum(letter_frequency_vars) / len(letter_vars)))
maximize_letter_frequency()
maximize_word_frequency()
# We weight common words a bit higher than common letters
# otherwise the solver goes for words like "eerie" and similar weirdness
solver.maximize((0.7*word_frequency) + (0.3*letter_frequency_sum))
return solver
</code></pre>
<p class="mt-4">
I highly recommend that you give modeling a problem with Z3 (or any SMT solver) a try - it's rewarding and allows you to express algorithms declaratively that otherwise become unmaintainable messes.
</p>
</article>
</div>
</div>
<script src="https://unpkg.com/[email protected]/components/prism-core.min.js"></script>
<script src="https://unpkg.com/[email protected]/plugins/autoloader/prism-autoloader.min.js"></script>
</body>
</html>