Skip the preamble and jump directly to the guides:
Git in practice
14 minutes
How git stores a repository
10 minutes
Version control is a systematic way of managing multiple versions of programs, of documents, of databases, and so on. Using version control is (in my mind) crucial for scientific reproducibility; it also provides a magical time machine for reverting to previous versions of code or text, which can be quite good for our sanity. It also makes working collaboratively with others – again, on code, on papers, and so on – much easier.
Git is an extremely powerful system for distributed version control. It has been overwhelmingly adopted, and as a result there are numerous guides to get you up and running (and every question you might have has almost certainly already been asked and answered on Stack Overflow). For many, this xkcd comic provides a nice summary:
Well, that certainly captures how I felt about working with git when I first started using it.I have a few main goals with this page. First, I want to briefly talk about version control with git in practice — What are the main git commands we use all of the time? What are some of the most important patterns (“workflows”) used in interacting with git, and what scientific tasks are they each well-suited for? Second, I want to give you the right mental model of how git stores your repository – what is actually going on when you use those basic git commands? My hope is that by having this mental model you’ll be able to escape the “git as a black-box” attitude I was trapped in for a while, and that when you decide you want to learn more about git you’ll already be well-equipped to do so. As a side benefit, having this small amount of extra information will also make you much more comfortable using git in practice.
Git in practice
There are two important sets of things to learn when it comes to working with git. The first is the basic set of commands — How do you actually create a new “version” of your whatever your project is? How do you go back and forth between versions or compare differences between them? How do you synchronize your changes with collaborators? — and I will cover the command-line version of these (I think it’s nice to not have to rely on a gui or the git integration of your favorite editor).
The second set is a byproduct of (a) git’s “branching” model of projects and (b) the fact that git is a distributed version control system. “Distributed” here means that no copy of the project is more or less important than any other, except by convention. This has several nice features, but it also means there are many different patterns for interacting with git and using it for version control. Thus, it is helpful to adopt consistent patterns of using git. These patterns are usually called “workflows.” While there are many different possible workflows, I think there are two that are most useful for solo projects or those involving only a small number of collaborators at a given time. I’ll describe them both after covering the core commands. Read more here!
How git stores a repository
What is going on under the hood when you use all of those core commands? What does it mean that git stores your repository as a directed acyclic graph? What is going to happen when you merge two branches of your repo together?
I think it helps to know even a little bit about how git stores your project and its version history — read more here!
Learn more
As mentioned above, there are many resources out there to help you learn more about git. If you want more introductory material, I recommend the Software Carpentry’s Git Notice tutorial. You could also check out some slides I made, which covers the basics of working locally and working remotely.
But if you really want to dive in, I think the Pro Git book is the place to both start and, eventually, become expert in this material.