Git is a mysterious beast. Out of the top 30 most voted questions on Stack Overflow, 10 of them are about git.
So the obvious question is “What is git?”. I’m not looking for an answer like “it’s a distributed version control system for tracking changes in source code”. That is well established in its Wikipedia page.
No. What I’m asking is what is git, really? It can’t be that easy if there are so many people baffled by it. Granted, if you are a developer of any kind, you are 99% likely to be using git these days to store your code and collaborate with others. However, most people don’t go past a couple of sub-commands: “clone, commit, push, and pull”. And for most of my career, I stuck with those myself. The fun starts when you start cherry-picking and rewriting history because then you really need to understand what git is, and how to use it.
One thing that confused me for a long time is why they put “distributed” in the Wikipedia definition. It turns out that it comes from comparison with the previous generation of version-control systems like SVN or CVS, where there was one central repository of code and everybody worked off of that. Similar to a dictator mode.
If you’re anything like me, by now you’re thinking, but wait, git is exactly the same, all of us on my team push and pull code from GitHub|Gitlab|BitBucket! And you’re entirely correct. Although git was created as a distributed system, most of us use it connected to a centralized repository. This is for convenience’s sake. To ensure that the code is always available to everyone in the team, everyone has local copies, but they can always fetch the latest and greatest from the central repository. You can use git to exchange code just between you and your colleague without requiring a code repository service, but it isn’t very convenient. So I think of git more like a democracy, in which we use it as we want to use it.
There are other parts of git that make it work better in a distributed environment like the fact that branching (working on a changed version of the code) is a very light activity because only a reference is created instead of copying all the files to a new location to be edited.
There are dozens of ways to use git, the most well-known being GitFlow, counteracted by Trunk Based Development.
Consider git as a tree of changes from the ground up (let’s ignore the tree’s actual roots for simplicity). You have your root of the system which is the base of the trunk of the tree. That trunk grows with some changes made to it and then it starts splitting into branches (much like when you are alone in a project and then more people join the project).
One way to approach the challenge of working on the same piece of code is to have feature branches (or colleague branches if you will). People will make changes to their branch and eventually they “merge” the branch back into the trunk. Ok, ok, the metaphor doesn’t work so well. Branches don’t usually reunite with trees after a while. Just assume that it would happen in a hypothetical world.
What is merging? It’s joining changes from the trunk and the branch all into the trunk.
Most of the time this is fine because the changes in the trunk are on separate locations of the changes on the branch. Now and then there is a change on the trunk and one on the branch that collides because they are trying different things on the same part of the tree. In that case, you need to choose which change wins to resolve the merge conflict.
When you move to another state or country and live there for a while, you will typically grow apart from your old acquaintances back home, right? People grow in different directions. The same happens with feature branches. The longer they are separated (i.e. not merged) with the trunk, the more different they become and the harder it gets to merge them back in when they come home.
So another way to use git is to do Trunk based development. The purest version of this idea is that people should not use branches altogether. Instead, they should all send their changes directly to the trunk. This has the advantage that there will hardly be any merge conflicts, but has some disadvantages as well like introducing bugs into the main line of development and breaking the trunk which is used to build and deploy the software we’re making. You can think of it like calling your mom (or colleague) every day to check-in. If you do that it’s less likely that both will drift apart. This alone is a source of great philosophical (borderline religious) debates regularly.
But git is even more flexible than this! What if I told you that you could use git to rewrite history?
The way git works (and our trees as well) is that it grows with time and with the changes (aka commits) made to it. One nice side-effect from this structure is that you can look into how the code looked two weeks ago or who made a certain code change (aptly named git blame
).
The thing is that development is not always pretty. Sometimes we are exploring a concept and go back and forth in our experiments, and sometimes we just don’t know exactly what we’re doing. If you are a historian, you will initialize a repository, make changes to it, push them to the trunk, pull anything that other people did and a history line keeps forming. If you made a mistake what do you do? You revert
that mistake, which in practice means you make a new change that is reverse to what you just did. Like when you use a White Out to clean up a mistake you made with a pen. The mistake is covered, but you can see that there was something there before.
This has the advantage of a complete trace of all the steps by everybody but can get pretty messy to read after a while. Enter rebase
. Rebasing is the act of transplanting your branch on the top of the trunk. This means that the history of the trunk will be all up to date and then your changes will be put on top of those as if you never branched at all! Much nicer to read, and a more creative way to solve the issue, but not so historically accurate. Along this line of thinking you have reset
which goes back in time, cherry-picking
which allows you to pick a specific change from another branch or trunk and a few others.
So, git makes sense, but it’s also very confusing. Like democracy, it doesn’t tell you how to do what you do. In some cases, it will give you hints, but until you really understand it, it will be a bit magical.
I hope that this text helped you understand why git can be so confusing and a bit of how it works from a 10 000 foot view. If you’re more of a practitioner instead of a reader, go play this game. You’re welcome. :)