Geek Blight - CMake: simple, effective and efficient

CMake: simple, effective and efficient

Posted on 2008-02-04T21:01Z. Updated on 2012-12-12T19:56Z.

Introduction

Writing programs that support multiple platforms is sometimes complicated. The bigger the program is, the closer to the system it is and the more libraries it uses, the more complicated it is to keep it multiplatform. To create such a program, special care must be put when writing code, many times no matter the programming language, and also when creating the build system for the program. Apart from the code itself, if you write it in a language that needs to be compiled, like C or C++, you must (or at least should, if you don’t want to lose your mind) specify somehow the building rules, that is, how the program needs to be compiled and linked to obtain a runnable program or usable library, or both. When someone realized typing the compiler commands by hand everytime they wanted to rebuild a project was tiresome at best, they had to do something. Probably, one of the first things they did was to create shell scripts to run the compiler commands, saving quite a lot of time. However, some time later they also realized compiling code took a lot of time. With modern computers, this need has partly disappeared. Still, it takes a good amount of minutes to completely compile a big project like the Linux kernel or KDE, and even some programs have unusually long compile times (LyX comes to mind). To the developers of those programs it’s important to be able to compile them effectively. This is because sometimes you make a small change in one of the source files and want to test if your change works as expected and you haven’t introduced an idiotic bug. So you want to rebuild that source file and integrate it into the final binary object, but only that. You don’t want to rebuild the whole thing, taking minutes, again, and you don’t want to type the two long command lines by hand either.

Makefiles

So this was probably the starting point for "make" tools. These tools are a wonderful invention. In them, you specify the rules to build your project. You create a file named "Makefile" in the directory holding the source files, containing the names of the target files (program, library names and object files, among others) and indicate two things. First, other files that target depends on, if any. Second, how to obtain the target files from the files it depends on. For example, the target program "hello" depending on "hello.c" and got by running "gcc -o hello hello.c". You can get them by running any command, not only compiler calls. A second good thing about Makefiles is that you can assign values to variables, and use them in the file. For example, you can define a variable named "COMPILER" and assign to it the value "gcc", and later use "$(COMPILER)" through the whole file instead of "gcc", so if you want to change the compiler, a one line change is enough. On top of that, "make" tools, the ones that read Makefiles and run the specified commands, only generate targets when one of the target dependencies has a more recent modification time than the target itself. So if you call "make" on a source directory in which you have made a small change to a source file, only the files that depend on it (and the ones that depend on the ones that depend on it, recursively) will be rebuilt, saving a lot of time.

This could have been the end of the chain. Everytime you start a project, you write the source files and a Makefile, and use the Makefile to specify how the project is to be built. But it’s not. This is because for multiplatform programs, even within the same family of platforms (BSD and SunOS, for example), there may be differences in the compiler, in the include directories, in the library directories, in the linker, in the underlying architecture. Enough that sometimes you need to have different Makefiles for each platform. I first noticed these in some college programs I had to create and test under some different platforms. They used sockets and, under SunOS, you had to specify a couple of extra linker flags for the program to be able to link successfully. These extra linker flags were not needed under other platforms. In fact, using them in other platforms resulted in linker errors. Sometimes the differences are very small and easy to detect, and you can write a single Makefile. Some other times you really have to write different Makefiles because it’s easier, or use nested Makefiles while the global Makefile is the platform-specific one defining stuff to be used by the rest of Makefiles. As complicated as you want or need. Not to mention that if you want to support Windows natively, the available software is usually different and maybe there is no "make" tool.

automake and autoconf

automake and autoconf are the names of some tools that assist you to create a build system for your software package, trying to overcome some of the problems mentioned above. They are very popular in the world of free or open source software. When you download a source package and the first step to build it is to run a shell script named "configure", it’s probably using automake and autoconf. These tools, using template files and shell scripts, are able to detect the specific details about your platform and generate, from a template, a suitable Makefile for your specific system. After running the "configure" script, a Makefile is generated, and you can then proceed to run "make" and use the generated Makefile to build the project. This is quite impressive, and they support a lot of platforms and are used by thousands of projects. Their age is also a guarantee of stability and correctness.

However, they do have a number of detractors. While I learned about their existance many years ago, I never really digged into them. For my small projects, Makefiles were everything I needed, and I had my doubts about learning automake and autoconf. This was because critics always mentioned how automake and autoconf are hard to understand. It is a known joke that only a handful of people really understand automake and autoconf, and the rest of people just copy each other when they need to do something. Back to reality, there were really many people who understood automake and autoconf, and some of them were not very kind when they talked about it. The KDE developers, while planning KDE4, decided that, with these tools, it was too hard to, for example, add a new source file to the project due to the size of it. automake and autoconf didn’t scale very well to projects of that size, according to them. Apart from that, there are a number of notable projects that, in fact, avoid using them. vsftpd, the FTP daemon, and Postfix, the mail transfer agent, come to my mind. If I recall correctly (I may be wrong at this moment), Postfix uses a set of Makefiles, one for each platform it supports, and vsftpd uses a quite clever mechanism to detect what it needs to detect using some relatively simple shell scripts. I recommend you to have a look at the vsftpd build system because it’s very interesting. Given the source code quality of vsftpd and the cleverness of its build scripts, I get the impression that the programmer behind it is an experienced man, and its build scripts are a clever solution that the author has been probably simplifying and fine tuning over the years. They are very insightful.

Apart form the problems just mentioned (hard to understand, do not scale very well), autoconf and automake have also some other problems related to their architecture. All the templates and shell scripts you have to include in your project for it to use these tools introduce a good weight to the final source tarball size, and they are a bit slow. To overcome the problems of these tools, some other alternatives to them have been created, and one which is going to stick is CMake, if any because it’s being used by the KDE project for KDE4 and it’s being adopted by more people due to this fact.

CMake

For the already mentioned reasons, I directly learned CMake instead of going the autoconf and automake route. I don’t know how complicated they are, but CMake is simple. I had no problems understanding it. Instead of writing a Makefile, you create a file named CMakeLists.txt. While the syntax of this file doesn’t have anything in common with a Makefile, its purpose is very similar, it’s very easy to understand, and small projects will have very simple build files. I first noticed this simplicity in the build system files when QtCurve, a theme for Qt4, Qt3 and GTK2, changed its build system to CMake. The tarball was a few hundred kibibytes before and, when it switched to CMake, it started to weight under one hundred kibibytes. If the project is much bigger, the proportional impact in the tarball size will be much less, of course.

CMake is also significantly faster than autoconf and automake. This is because CMake is a tool that compiles to native machine language, and it directly interprets the build files, like "make" does to Makefiles. This contrasts to the "configure" shell script. In big projects, the build time can be significanly reduced if you swich to CMake, like the KDE developers reported. One last difference I particularly like about CMake is that it encourages using different places for holding the source files and holding the build environment. After running "configure" from a source directory, you end up with a lot of generated files, and you then run "make" to build the project. This generates more files, and you usually have to type "make distclean" so the build files and generated files are completely deleted, so you end up with the original source directory again. CMake avoids this by recommending to use build directories. You create an empty directory and instruct CMake to, there, configure the project that has the source files in some other directory. CMake then uses that directory to generate a lot of files and a Makefile, and to store the build results, leaving the source directory intact. This is handy to keep different builds with different compiler flags at the same time, among other things.

CMake also has a generally simpler approach to the problem. You create the source files and CMakeLists.txt somewhere, and that’s all. That’s what you distribute. You don’t need to run anything to generate a "configure" script or update some metadata files. You distribute the source files and CMake, under the user system, will run and use CMakeLists.txt to give him the needed Makefile. This is another advantage over autoconf and automake. They try to be at least a bit self-contained. In other words, the source tarball tries to contain everything or almost everything you need to build the project, except the "make" tool and compilers. With CMake, the destination machine needs to have CMake installed to parse CMakeLists.txt. While this can be considered a problem, it solves more problems that it creates, because CMake does not need to run so many tests when configuring a project. When you install CMake in your system, it can perform the common checks and get information about your system at that moment, and it doesn’t have to rerun those tests again and again for every project from a "configure" script. This is a second factor influencing in CMake being faster. So give CMake a try for your C and C++ projects. I’m sure you’ll think it’s a fine tool. Did I mention it works on Windows too, out of the box, and has an optional GUI/curses interface to configure the project or make changes in the project configuration easily?

Load comments