Geek Blight - Mercurial vs Git

Mercurial vs Git

Posted on 2009-04-07T13:50Z. Updated on 2020-07-25T19:13Z.

Edit from 2009-04-10: as time passes and I receive feedback, this article is being refined and modified subtly to remove typos, make concepts more clear, or clarify when something is not really needed. Special thanks to Martin Geisler for pointing out my mistakes related to Mercurial.

There are many blog posts and articles all over the Internet providing comparisons between Git and Mercurial. Most of them only briefly describe the main differences and then try to decide which one is better. However, I didn’t find many articles explaining the differences in detail from a neutral point of view and that’s what I’ll try to do here, also providing links to relevant documentation. For simple uses like a single user managing a private project, Git and Mercurial are equivalent. Their workflow differs a little due to the underlying differences, but their usage doesn’t seem to be very far apart. However, those differences start being noticeable when you collaborate with more users and, the more complex the project is, the more you will notice them. Hopefully, this information could be useful to Mercurial users wanting to know how Git works and vice versa, as well as novice users who are not using either one yet. In that case, I suggest you to experiment a little bit with both instead of trying to make a theorethical decision based on what you read in the documentation or in articles like this one. I will focus on four main aspects.

The repository structure, that is, how each one of them record changes and history.
The noticeable differences in how they manage the branching process.
Documentation.
Their two popular hosting sites, GitHub and Bitbucket.

Repository structure

Mercurial and Git differ a lot in the way they store changes and history. Some of those differences are irrelevant from the user’s perspective while others are not. I won’t provide a very verbose explanation of either one. The specific details are well covered in chapter 3 of "Mercurial: The Definitive Guide" for Mercurial and the "Git Object Model" chapter in the Git Community Book. A very brief description of each one: Git does not store differences between different versions of the same file. For each new version of a file, Git stores a copy of that version. Mercurial, on the other hand, stores file differences for a limited number of times and a new full copy from time to time in order to optimize the time needed to reconstruct a particular revision. Also, it uses a binary comparison algorithm that works for both binary or text files. Knowing that, I think it’s possible to read any of those book chapters and understand the details. If you read both chapters, you will notice that the model in Git looks more simple, while the way Mercurial stores changes is not trivial. A consequence of this underlying model is that Git repositories, without repacking, tend to use more disk space than Mercurial repositories. Git can pack and compress objects to change this, but this means that you need to run a command from time to time. Also, this process can take quite a lot of time depending on the size of the repository and the amount of unpacked objects. However, its model is probably easier to understand once you have access to a proper explanation.

They also share similarities. In both of them, history is represented as a sequence of commits, each commit being identified by a character sequence that turns out to be the SHA1 sum of, well, something. Each commit has one or two parent commits, allowing for different branches to exist. This connection is what creates the concept of history in the repository (things were in a given state, then a change happened from that point and we ended up with this new state). The exact wording people use to describe this is "history is a directed acyclic graph". However, many people won’t know what exactly is a directed acyclic graph, and things can only get worse when it’s called DAG. I’m not sure the Git or Mercurial creators thought "I’m going to make the project history be a directed acyclic graph". Being a DAG is probably more of a consequence than a starting point in the design of either tool. Knowing it’s a DAG probably won’t help you understand anything really.

Differences in branching

In both Git and Mercurial, a given state (using Git names, a commit; using Mercurial names, a changeset) can be the parent for more than one other state. This means that the project history has the notion of branches in the sense that history can diverge at some point. Using an ASCII diagram:

        C ···
       /
A --- B
       \
        D ···

State B can be the starting point for states C and D. C and D would, somehow, indicate that the parent state is state B. In both Git and Mercurial, too, a state can have more than one parent. This means that it is possible to join branches that were once separated:

··· V --- W
           \
            Z ···
           /
··· X --- Y

Z would somehow indicate that its parent states would be W and Y.

And despite sharing all that in common, the way they export this functionality to the user is quite different, and workflows differ a lot in how you are expected to proceed in different situations. The most common expression you will hear about this is "in Git, heads or branches are explicit, while in Mercurial they are implicit". But what does this really mean? I read the previous expression in several places and didn’t fully understand what it meant until I read a very useful document that appeared in Hacker News. It’s "Understanding Git Conceptually". This tutorial or explanation to Git fails to explain properly the object model described in the Git book chapter I mentioned above, but the Git book lacks, as of the time I’m writing this, an explanation on Git branching as good was the one described in this tutorial. I recommend you to read it at some point after having read the chapter, but not too late. It is better to understand that from the beginning.

Briefly, Mercurial doesn’t have the notion of a branch or head as something by itself (I know I’m lying, but read until the end). In Mercurial, a branch is present because history diverged at some point, and a head is a history state that has no children. A head is not explicitly marked as a head, and there is no data structure called a branch. It’s implicit by looking at the project history, more or less.

However, in Git, this information is explicit. There is a type of data structure present in the repository called a "head". As the tutorial mentioned above explains, a head is like an arrow pointing to a specific commit or state in the history of the project, giving it a name. It’s possible to have several heads in one repository, and one of them is always the currently active one you’re working on. When you commit changes, the current active head points to the commit that will be the parent of the new one you are creating. After it has been created, the arrow is moved "forward" to point to the new commit. Very important too, these arrows always have a name, be it "master" or "experimental" or "testingsomething" or "whoohoo". Finally, these arrows or heads are created or destroyed as needed. When you branch, you create a new one. When you merge, you no longer need one of the arrows, but you can keep it if you want to.

Going back to Mercurial, you don’t need to do anything special to create a new branch. If someone (or even yourself) starts working on the repository from a point in time, and different commits are created starting at that point, two anonymous branches will automagically appear implicitly. At this point they reside in two different repositories. If you then pull changes from one of them to the other, the two anonymous branches will also automagically (implicitly) appear in the same repository. At that point, Mercurial suggests you to merge both to continue committing changes. You can still force Mercurial to go to any point in history and create a commit there. The commit can create a new branch or extend an existing one. However, working this way can be a bit confusing unless you use tags, named branches or a GUI to see the project history graphically.

In Git, however, you cannot do that. One of the branches is going to need a new name, like "work_on_feature_X". Actually, you can create anonymous branches like in Mercurial but it’s not the common practice and it would produce an error in old versions. So, for the rest of this document, we will suppose it still produces the error. Why? On the one hand, you worked on the default branch and moved the head forward. On the other hand, someone did the same in parallel and ended up with the head somewhere else. If you then try to pull changes, the head would either have to point to your last commit, or to their last commit. That’s why you need to give names to heads (or branches, at this point you can guess both terms are nearly equivalent). The fact that you can (and probably should) remove one of the heads after merging work means that your branch name doesn’t really have to be unique or special. Also, in Git you can pull changes without being forced or suggested to merge. After all, one of the heads is the active one and the fact that you have another head with another name pointing to another branch does not prevent you from committing more changes to your active head and moving it forward. You can merge whenever you want to and change the active head at will. The usual practice in Git to have named branches for everything makes it a bit easier to jump to any head without confusion and getting an idea of what you were doing in that branch.

While I still have to mention Mercurial’s named branches, these differences already translate to different workflows for both Git and Mercurial.

In Mercurial, branching is simpler, as you don’t need to do anything special. You simply clone the repository and start working. When you’re done, you (or someone else) pulls-and-merges. This is also needed when you want to branch your own repository. Let’s suppose you have a repository which is the official branch and want to start working on an experimental feature just like other developer, in its own branch so as not to mess up your main copy. To do that, you clone the repository to some other directory in your hard drive. In addition, as pulls and merges are a bit tied, you should keep different branches in different repository clones until you’re ready to merge.

In Git, the process is not so simple but it is more flexible. When working on a feature, you should create a new named branch explicitly. This is specially important if you’re a Mercurial user. So if you want to contribute something to a project, you clone the repository but do not immediately start working. First, you should create a branch for your work. The advantage is that you are likely to work with the same directory all the time. Also, the explicit heads allow you to continue working after a pull, before merging, and its named heads help you track what you were doing. In Mercurial, this is usually achieved by giving useful names to your repository clones. If your main repository sits at directory "foo", you will usually clone that repository to another one called "foo-new-feature" to know what you were working on in that branch.

Despite everything I said above about Mercurial branches being anonymous, in Mercurial there are also named branches. However, they are a different concept. In Mercurial, the name of the branch is stored with each commit. This means that branches cannot be deleted as they are in Git (in Git, it was simply deleting the arrow). You can create new named branches and commit changes to them, and can merge work from one branch to another like you merge anonymous branches. However, the inability to delete a named branch means that they should have unique names, and that they should be used for a different purpose. My impression is that, in Git, there is no difference between a short-term branch (a branch created to implement a new feature or fix a bug) and a long-term branch (like creating a branch for a stable release and only put commits in that branch to fix security issues or annoying bugs). In Mercurial, anonymous branches are used for the first case and named branches are used for the second case.

Documentation

The documentation quality varies from one project to another. Mercurial has always had its documentation well organized. Its manpages are good and describe the features in a comprehensible way. Some time ago, and still in some aspects, the documentation of Git was not very good. It has improved a lot, however, and many unofficial books and tutorials have appeared to fill the gaps in the official documentation, the most notable example being the community book I mentioned above. Still, as you can see, its documentation is still not fully unified. For example, for an aspect as important as branching, I would have expected the book to have a clear and comprehensive explanation such as the one present in the tutorial "Understanding Git Conceptually", also mentioned above. However, the book doesn’t have one, and the tutorial doesn’t have the beautiful and easy graphical explanation of the object model as the one in the book. So, to fully understand Git when you’re still a novice user, I think it’s still partly true that you cannot read one single document. You need to get information from various sources.

This changes when you already know the principles of Git and are looking for specific information about a command. The manpages of Git are very well written and provide lots of clear examples, even with ASCII schemes like the ones I used above.

github and bitbucket

GitHub and Bitbucket are two sites that provide hosting for projects using Git or Mercurial, respectively, to manage source code. They both have nonfree plans that give projects additional hosting services or higher limits in several aspects, but they both have free plans so any developer can host projects in it. GitHub appeared first and is probably responsible for a good amount of Git’s popularity. It provided an easy way to make distributed development. Any user can register on the site and host their own repositories, fork existing ones and communicate with other projects easily by requesting pulls, etc. Bitbucket appeared later as a clone of GitHub for people prefering Mercurial to Git. Nowadays, they differ a bit in some aspects but they allow you to do exactly the same: upload a repository and give you a project wiki you can use to write news, documentation, FAQs or whatever you want. Basically, your project can live by itself being hosted completely in GitHub or Bitbucket, both the source code and its documentation.

Despite sharing a lot of things in common, there are things you can miss in GitHub from Bitbucket and vice versa. Some examples:

GitHub has statistics.
GitHub is more popular.
Bitbucket has an integrated issue tracker.
Bitbucket’s wiki is versioned (you can work on it from your local hard drive and then push changes).

Conclusion

I hope this comparison is easy enough to understand for novice users to both programs, and I hope to have been specific enough for experts in one of the two programs wanting to use or know the other one. I would say that both Git and Mercurial seem to be converging over time in many of the additional features, while keeping the base a bit different. For example, some months ago I checked Git and it didn’t have a feature I loved from Mercurial: the ability to create a bundle (a file containing a group of commits/changesets) that could be sent by email or using a USB stick. Nowadays, it can do that too. From Git, I loved the ability to put your local changes aside (git-stash operation) to do something and being able to select specific difference chunks in files to be committed (you don’t need to commit all the changes inside a single file). Nowadays, Mercurial can do that too. And I suppose the same goes for their respective hosting sites GitHub and Bitbucket. I feel both will converge to provide the same functionality.

I have not talked about the Git staging area intentionally, leaving it for the end. It is well described in most tutorials and documents you can read about it, but let’s only mention that Git introduced a new concept that other tools lack (for good or for bad), which is the staging area. In Git, when you want to commit something you have to "prepare" the commit first, indicating which content you want to commit. This is is performed with the "add" command. With it, you will take snapshots of the current state of files and record them in an area ready to be committed. This provides a lot of flexibility when creating commits and performing some operations, but it also has its dangers. For example, if you modify a file, prepare it to be committed with "git add file" and then modify the file again to make a minor correction, you need to remember to add it again, or you will commit the file as it was before the correction. The commit operation has the "-a" option to minimize the probability of making this mistake.

Another important comment I need to make before closing this article is that Git has a lot of operations to manipulate the project’s history (e.g. git-rebase), while Mercurial does not. In Mercurial, the concept of history is a bit more immutable. Some people will like the immutability of Mercurial, while some others will prefer to be able to mutate history like Git allows you to do.

The only opinion I’m going to give over all of this is that I think both Git and Mercurial are great.

Some time later…

Events that happened, or things I found out, after writing the article above.

Mercurial has a rebase extension.
GitHub got an in-site issue tracker.
Google Code added support for Mercurial (but not Git).
The Pro Git book was published.
A guide to branching in Mercurial was written by Steve Losh.

Load comments