### Cifras y Letras

Posted on . Updated on .

There’s a Spanish TV show called Cifras y Letras. Literally, Figures and Letters. In it, two contestants play several games. The one who has more points at the end, wins. The two main types of games are the Figures and Letters rounds. In each show, they play 4 rounds of Figures and 8 rounds of Letters. I will be talking about the Figures game.

In it, the contestants are given 6 random numbers from the set \{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 25, 50, 75, 100}. Then, they are asked to combine them arithmetically, using only natural numbers and exact operations, in order to find another random number from the range [100, 999]. One example: given the numbers (5, 5, 6, 25, 9, 7), obtain the number 881. Only sums, substractions, multiplications and divisions are allowed. One possible answer is ((7 * 5 * 26) + 6). Or, step by step:

• 7 * 5 = 35

• 35 * 25 = 875

• 875 + 6 = 881

Some days ago I coded two programs that solve this problem. One of them in Python, and I made an almost literal translation of the code to C++. The C++ version takes up more space, obviously, because the syntax is not as flexible, and data structures such as lists are not handled natively, but implemented in the standard library. However, in my own tests it runs more than 10 times faster than the Python version. Both programs are public domain code and I can mail them on request. Don’t hesitate to ask for any of the two, especially if you can’t get the C++ tarball.

Internally, I represented the problem as a graph. Starting from a root node, I generate partial solutions as childs of that node, recursively. It’s not a tree, however. Each graph node holds three important pieces of data. First, the list of operations needed to reach that node from the root node. Second, the number that was introduced when generating that node. Third, a list of remaining numbers to be operated, in which the introduced number is present.

To generate the list of children from a given node, I essentially generate all the possible pairs of numbers operated in every possible way, discarding invalid operations. Each child node has a new list of operations identical to the one from the parent node, plus one more operation. The result of that operation is the new number introduced in the child node, and the new list of numbers stems from the one of the parent, in which the pair of numbers operated is replaced with the result of the operation.

This graph is explored recursively and fully applying backtracking in order to find the shortest solution. Two small optimizations are used in both versions, and one small drawback is present in the C++ version. First, the list of numbers are always kept sorted in descending order, because this eases forming valid operations. Second, when we visit a node with a given list of numbers, that list is added to a set, so that if the same list is generated twice, it’s skipped. The drawback present in the C++ version is the way of storing those lists of numbers. The Python version benefits from having a native hash table implementation. By using number lists (or touples, to be precise) as the hash table key, and boolean values, you can easily and efficiently represent a set. However, the C++ standard library doesn’t have a hash table data structure as is. Maps and sets are specifically asked to preserve order, so they internally use some kind of binary tree and their operations are O(log(n)). The new technical revision 1 (or TR1) for the C++ language features a standard container called unordered_set that would be very useful here to store all those lists and ask if one of them is already present. I will update the program code as soon as the revision becomes an official standard, and I have a compiler that implements it.

In any case, I’ve been using the C++ version while watching the show and most problems are solved in under a second. In the show, the contestants are given around 30 seconds to solve each problem. The program sometimes gives very short but "artificial" solutions to the problems. For example, if you are given numbers (100, 6, 3, 3, 1, 1) and asked to find number 606, you will probably do 100 * 6) + (3 + 3. However, the program, by choosing the shortest solution, outputs ((100 + 1) * 6). This is not a very good example, since someone may have done it the second way, but sometimes the solutions given by the program, while valid, are really weird. And I mean really weird.

If I wanted the program to give "human" solutions I would either output all possible solutions instead of only the best, or implement a new and somehow complicated criterion to choose solutions that use low and round numbers above weird solutions. But that’s not trivial to do, and not worth it in my opinion. The reason to choose the shortest solution in the first place was to avoid having solutions with unused results. Only as an example, the first solution the program finds for the last problem I mentioned could have contained (1 * 1) as one of the operations, even though it’s useless, and the resulting 1 may not even be used at all.