Cifras y Letras (2)

Posted on . Updated on .

In the previous post, I mentioned the numeric rounds from the Spanish TV show Cifras y Letras. I will talk about the other side of the coin today.

In the letter rounds, the contestans are given 9 letters and they need to find the longest possible word they can using those letters. The letters are not chosen in a fully random way. Each contestant, in turns, asks to be given a vowel or a consonant, until the 9 letters are completed. As there are only 5 vowels and they usually ask for 4 or 5 vowels, it’s not uncommon to have two or even three repeated vowels. If they are given two E’s, for example, they can make words with two E’s, or maybe only one, but not three. Very simple actually.

I read about a quite interesting algorithm to solve this problem as fast as possible in Pedro Reina's website. Let’s suppose you have a dictionary in a database, with all the words that are valid for the game (plurals are not accepted unless they are present explicitly in the dictionary, and the only verb forms accepted are infinitive, gerund and past participle). If you were given 9 letters, an initial solution everybody can give is to generate all the permutations for those 9 elements, skipping repeated permuations, followed by the permutations of 8 elements, etc. The number of possible permutations is as much as (9! + 8! + 7! + 6! + 5! + 4! + 3! + 2! + 1!). That is, 409113. Quite a lot of permutations that have to be checked against the dictionary database to find available words.

But Pedro Reina and others had a very good idea, preprocessing the dictionary database first. For each word with 9 or less characters, which are the only interesting ones for this game, you sort its letters to obtain and store a "key" for that word. For example, the word "three" is associated with the key "eehrt". If you do that, you’ll find that a key is usually shared between several words. These keys allow for a reverse lookup. That is, take the given 9 letters, sort them, and search for words with that key in the database. With only one search, that query will return every possible 9-letter word that can be formed with those 9 letters, if there’s any. Instead of having to generate 9! permutations, you only have to do 1 search.

If you have noticed, this allows a jump from permutations to combinations. The total number of combinations you need to form is as much as 511. Forming 511 words is much faster than forming 409113, and easier in my opinion. You sort the initial 9 letters and that’s the only 9-letter combination. By removing each one of those letters you get the as much as 9 8-letter combinations. For each of those, you try to remove one more, etc. Generating permutations is slightly more complicated.

On top of that, as they usually choose 4 or 5 vowels, the probability of having at least a repeated vowel is quite high. In 4 vowels, this probability is 80.8%, and in 5 vowels, it’s 96.16%. This means it’s very usual to have less than 511 combinations.

Update: I have coded a very small C++ program to generate the combinations. Its code is public domain and I can send it on request. It’s only 100 lines long and obviously lacks any database code, as I don’t have an electronic dictionary.

comments powered by Disqus