Writing collaborative papers with LaTeX

LaTeX is still the default when it comes to writing scientific papers, particularly those for which the typesetting of mathematics is important. In some ways I find it fun (and humbling) to think that when I’m writing papers in LaTeX I’m working with a system that predates both Linux and my decidedly old-fashioned text editor of choice — to say nothing of predating my scientific career!

There is a rich ecosystem of packages and add-ons that let you customize how your work will be typeset when using LaTeX, and there are also numerous strategies for engaging in collaborative writing. These range from emailing files back and forth to editing a common file hosted on a shared drive to distributed version control systems. Not surprisingly, the combination of high-configurability, multiple collaboration strategies, and inertia leads to a lot of highly opinionated views: what packages you should use, how you should manage your bibliography, what font and microtypography you should use, whether you should ever use \left and \right… One could go on for a long time like this.

When working by yourself, you should do whatever you like. If you’re writing a paper with your advisor (me?), or with a coauthor, the purpose of everything below is just to outline what I view as a reasonable set of default practices. These will be broken down into a few sections: Reasonable rules of thumb, a reasonable document preamble and associated packages, ways to work with a shared bibliography, and ways to work with shared writing tools like git and overleaf. If you think I’ve left out something important — or if you think some of my “reasonable” suggestions are anything but — let me know!

Reasonable rules of thumb

File names

On the one hand, you are probably storing your main tex file in a folder that makes identifying it relatively straightforward. On the other hand… it’s not great to get into the habit of calling everything main.tex and bibliography.bib (or grant.pdf, or finalPaper.doc, or…) So go ahead: give your files meaningful names. Future you will thank you.

Directory structure

There are a handful of items that are nice to all keep together with your paper, and I suggest a simple structure for keeping things tidy. The root folder contains the .tex and .bib files, along with perhaps a .tex file for all of the submission materials you’ll need to keep track of (cover letter text, suggested referees, response-to-referees, and so on). A subdirectory contains all of the figures that get included in the .tex file. So far, so good.

bandedPhasesInTopologicalFlocks/
|-- topologicalFlocking.pdf
|-- topologicalFlocking.tex
|-- topologicalFlockingBibliography.bib
|-- submissionMaterial.tex
|-- figures/
|   |-- coarseGrainedFieldSnapshot.png
|   |-- correlationLengthScaling.png
|   |-- ...
|-- data/
|   |-- dataForFieldSnapshots.nc
|   |-- dataForXiPlots.csv
|   |-- ...
|-- scripts/
|   |-- generateFigures.nb
|-- deprecated/

Next we have a subdirectory which contains the data that you need to generate figures in the paper. This does not (usually) mean all of the raw, unprocessed data; instead I think it is helpful to have files with the data points you will manipulate and plot to make the figures. A separate but related folder contains the scripts (perhaps a mathematica notebook, or some python files) that you use to turn these files into the actual figures that will go into your paper. An important point here is that it should be very easy to iterate on the details of your figures (the aesthetics, the font sizes, the axis labels) and some fundamentals of their presentation (Are they on a linear or log scale? What is the range of the x-axis?). The structure suggested here decouples the creation of plots from the (often slow) process of running an analysis script on the actual raw data that you use to generate the compressed representation of your data you care about.

Finally, I often have a decprecated/ folder. Yes, this will all be under version control, and so in principle it is straightforward to go back to previous versions of a figure, or previous drafts of a paper so that you can run latexdiff. In practice it is sometimes nice to also just throw older versions of things that you know you might want to bring back in a convenient place, and in a way that doesn’t assume that your collaborators like using git’s functionality to go back and forth between tagged versions of a project. With all of that, I often end up with a paper directory that looks something like what is displayed above.

One-sentence-per-line

It might take some getting used to, but I advocate a general “One-sentence-per-line” approach to writing TeX documents. This kind of style (in which, you will probably infer, you add a line break after every sentence, and add a blank line between paragraphs) probably arose from version control systems. In many of these systems the default way of looking at differences between two versions is a per-line views: under such a system a change to a single word in the middle of a paragraph would result in the entire paragraph being marked as “changed.” I think that is still a reasonable motivation — git diff defaults to these kind of line-by-line views of changes, and it is more helpful to have something a bit more fine-grained. There are also arguments to be made that one-sentence-per-line writing (and the related concept of “semantic linefeeds”) helps you visualize and think about your writing in a different way.

What does this look like? Rather than writing this:

Algebra is the offer made by the devil to the mathematician. The devil says: I will give you this powerful machine, it will answer any question you like. All you need to do is give me your soul: give up geometry and you will have this marvellous machine. (Nowadays you can think of it as a computer!) Of course we like to have things both ways; we would probably cheat on the devil, pretend we are selling our soul, and not give it away. Nevertheless, the danger to our soul is there, because when you pass over into algebraic calculation, essentially you stop thinking; you stop thinking geometrically, you stop thinking about the meaning

You (or “You”) would write this (here I’m using the semantic rather than sentence-per-line style, only because of how few characters fit in these sample code box lines):

Algebra is the offer made by the devil to the mathematician.
The devil says:
"I will give you this powerful machine,
it will answer any question you like.
All you need to do is give me your soul:
give up geometry and you will have this marvellous machine."
(Nowadays you can think of it as a computer!)
Of course we like to have things both ways; 
we would probably cheat on the devil,
pretend we are selling our soul,
and not give it away.
Nevertheless, the danger to our soul is there,
because when you pass over into algebraic calculation,
essentially you stop thinking;
you stop thinking geometrically,
you stop thinking about the meaning.

Certainly LaTeX doesn’t care about whether or not you have these extra line breaks after your sentences — it will typeset everything just as well. Ultimately, I think this style is helpful (and is definitely convenient if anyone you are collaborating with uses non-mouse-based text editors) — in addition to making git diffs easier to read by default, I do think there’s some nice visual feedback you can get about the rhythm of your writing. But I’m not a stickler for it, I just think it’s a very reasonable practice.

By the way: I mentioned above that git diff defaults to whole-line view of changes, but that is just the default behavior. If you find yourself working on a paper where for whatever reason people do not want to adopt a one-sentence-per-line formatting, you can get more useful git diffs by adding some options. For instance:

$ git diff -color-words HEAD~1

will highlight changed words within a line rather than highlighting the entire line as being changed (here I’m just using the “HEAD~1” shorthand to refer to the commit before the current one — feel free to substitute whatever commit you want instead).

Meaningful labeling

LaTeX makes it easy to reference figures, equations, and parts of your document with logical labels — what, then, would be the point of labeling things like \label{figure1} or \label{coolPartOfThePaper}? Be reasonable, and choose instead things like \label{sec:methods} and \label{fig:divergingCorrelationLength} and \label{eq:NewtonsFirstLaw}.

Linewidth figures

I think figures should be aesthetically reasonable (with points and curves that are legible both in color and in gray scale), should have font sizes that are legible (yes, even for text on the axes of the inset), and should have their size specified as a fraction of the linewidth. Making figures that are all exactly the right size, and with fonts that match up exactly can be a bit of a fiddly business, but I recommend doing your best to make the actual figure size (i.e., of the figure file) the same as what you want the printed size to be.

Adding comments

Overleaf (and other solutions, but on this page I want to focus on the combination of Overleaf and git) provides some nice commenting and document review features. I tend to think that these are most useful when you are going over the final pre-submission versions of a paper, and want to iterate back and forth over a small number of points. For the remainder of the writing process (when many of your collaborators might not ever be looking at the Overleaf page — see my comments on git synchronization below), I recommend defining new commands in your document preamble that show comments in a different color (or color and font; whatever). By defining such a “comment from author X” command you make it easy to search the document for any instances of it (which I thus view as an improvement over just commenting out your comments with a percent sign).

The basic version of this just uses the color or xcolor package, like so:

\newcommand{\daniel}[1]{{\color{blue} #1}}

If you prefer, the slightly fancier todonotes package can be quite useful, too:

\newcommand{\dms}[1]{\todo[size=\tiny,color=white,textcolor=blue,bordercolor=blue,linecolor=blue]{[DMS]: #1}}

A reasonable preamble

I’ll jump straight to it: here’s what I currently think is a pretty reasonable preamble for your LaTeX documents that are meant to be submitted at some point to a journal.

\documentclass[reprint]{revtex4-2}
\usepackage{amsmath}
\usepackage{amssymb}
%\usepackage{natbib}
%\usepackage{float}
\usepackage{graphicx}
\graphicspath{{./figures/}{./}}
\usepackage[labelformat=simple]{subcaption}
\renewcommand\thesubfigure{(\alph{subfigure})}
\usepackage{xcolor}
\usepackage{todonotes}
\usepackage{physics}
\usepackage{hyperref}
\usepackage[capitalize]{cleveref}

What follows are some brief comments on all of these components (with comments on bibliography management saved for later).

Document class

Assuming you are writing a physics paper, I just think the default APS styling provided by the revtex4-2 package is a good place to start. If you use it you do not have to explicitly add some other very reasonable packages (such as the commented out float and natbib packages above). I frequently bounce back and forth between the default two-column (reprint) option and a one-column-no-title-page option for revtex.

AMS packages

The amsmath (for aligned equations, for matrices, and many other mathematical conveniences) and amssymb (for… various symbols) are extremely common, and extremely useful.

Figures, subfigures, and floats

The graphicx is the de facto standard for including figures in papers, and I like to additionally use its “graphicspath” option so that I don’t have to manually specify relative paths for the figures I want to include — just a list of directories I might put those images in. The float package is indispensable for allowing included graphics to float a bit in the document.

Some people like to use one of a small handful of potential packages for managing figures that have multiple subfigures within them. Ultimately, I don’t care all that much for these options, but if you have to pick one there is a natural best choice. That’s because the subfigures package is deprecated, and the subfig package is unmaintained (and has bugs when using it with hyperref). (If you do want to use the subfig package, note that it is not particularly compatible with RevTeX4-2, and will generate poorly-formatted figure captions unless you load it like \usepackage[caption=false]{subfig}.)

Thus, in the above configuration I recommend subcaption package. In order to generate (1) a label for the figure as a whole, (2) labels for all subfigures, and (3) a single caption as you would expect for a journal article, you can use this kind of structure:

\begin{figure}
\begin{subcaptiongroup}
\includegraphics[width=\linewidth]{figure1a.png}
\phantomcaption\label{fig:figure1a}
\includegraphics[width=\linewidth]{figure1b.png} 
\phantomcaption\label{fig:figure1b}
\end{subcaptiongroup}
\caption{Two subfigures: \subref{fig:figure1a} the first subfigure,
and \subref{fig:figure1b} the second subfigure.}
\label{fig:figure}
\end{figure}

With this set up, you will get a result consistent with how multi-subfigure figures are expected to work in typical physics journals.

Colors

The xcolor package (which I often actually invoke with something like \usepackage[dvipsnames,x11names]{xcolor}) provides very convenient ways of specifying custom colors. This might be useful for putting some text in various colors (as in the “Adding Comments” section above), or for specifying how your hyperlinks and citations appear, or for making graphics with TikZ (if you’re into that kind of thing).

Todo notes

The handy todonotes packages provides the \todo{Message here}, which puts a hard-to-miss message in the margin of your document, and it also provides a \listoftodos command that lets you stick a to-do list into your document (i.e., not just scattered throughout the margins).

Physics package

The physics package is the subject of some debate, but it is a convenient place to start. It has some weird conventions, and some behaviors which are almost certainly bugs, but it also makes typesetting vector calculus (and various bra-ket notations) a bit easier out of the box. I think it’s reasonable to include it, but if you want to define a bunch of custom newcommands instead I’d think that was reasonable, too.

Hyper- and clever referencing

Finally, we have the hyperref package (which is very finicky, and has to be loaded last in your preamble with a few exceptions. Despite being fussy in this way, it is excellent, and should almost always be used so that your pdf has hypertext references.

The cleverref package is a nice convenience – with it you do not need to worry about typing Eq.~\ref{eq:newtonsEquations} or Figure \ref{fig:schematic} shows that... Instead you can just use the \cref{eq:newtonsEquations} command, or \Cref{fig:schematic}, and cleverref will figure out the correct contextual stuff to put around the reference. It also plays nicely with, e.g., the subcaption package, so that you will be able to see \cref{fig:figure} or \Cref{fig:figure1a} contains... as desired, and the output it generates is easy to customize.

Reasonable bibliography management

We want to be able to easily use and reuse citation data, and we certainly do not want to have to sort the entries in our bibliography by hand. The good news is that LaTeX has several options for doing all of this, and the good news/bad news is that our choice is already made for us: because we will be submitting papers to journals that only support using BibTeX, that’s what we’ll use.

I know that some people like to use a single global .bib files for all of their projects While this is reasonable, I think it quickly gets unwieldy when you start trying to merge these files together when collaborating — multiple PIs all have these thousands-of-references behemoths, with repeated entries that have different citation keys (or have multiple references to both published and arXiv versions of a paper), and it’s just a mess. So: I think a reasonable choice is to have a separate .bib file containing all of the relevant citation information for the paper in question — if on the side you want to later incorporate that into your global bibliography file that’s easy enough to do.

The only remaining thing is to choose a reasonable naming convention for your citation keys. Personally, I like something like “firstAuthorLastName/year/firstWordOfTitle” as a good default, leading to citations in your text like \cite{noether1918invariante}. This style also happens to basically be the default Google Scholar format.

Google Scholar

Speaking of Google Scholar, please do not type your BibTeX entries by hand. Ultimately you will probably find yourself using some citation and reference management system, but even if not: a reasonable start for generating entries for your .bib file is to go to Google Scholar, search for the reference in question, click on “cite” and then click on “BibTeX”. This will give you a block of text you can copy and paste into your file (you can even go to the settings on Google Scholar, scroll down to the “Bibliography manager” section, and toggle the “Show links to import citations into BibTeX” option, which will save you a few clicks).

Google Scholar’s bibliography entries are not perfect (and it annoys me that they don’t include the DOI of papers for easy hyperlinking in the bibliography), but starting with them will almost certainly lead to fewer total errors in your bibliography compared to attempting to do everything by hand.

Zotero

There are many different tools you could use to help manage not only your bibliographies but your references in general. Over the years I’ve used systems ranging from “just try to remember everything” to mendeley to papers. They are all reasonable choices (well, other than the “rely on memory” option), they all have different pluses and minuses, and there is no particular reason to try to synchronize this particular choice across a collaboration. Use whatever you like.

At the moment, I think the free version of Zotero has all of the features that I need, and it works quite well for me. This includes convenient ways to look through the references I’ve taken note of, and a browser plugin to quickly add papers (from the arXiv, from a journal website, etc), to my Zotero library. If you use Zotero, you will almost certainly also want to use the better BibTeX (BBT) plugin, which provides a few very nice features. It makes it easy to customize the style you want to export BibTeX information in to various .bib files (including the style of the citation key). This includes specifying the style of the citation key — if you’re curious, I use “auth.lower + year + shorttitle(3,3)”

You can also, by the way, set BBT up so that it creates an auto-updating global .bib file for your library. If you are the kind of person who does like having a monolithic bibliography this is very convenient. At the same time, Zotero makes it easy to extract a library of references from the .aux file generated when you compile a .tex document. This is extremely handy if you (1) use a monolithic bibliography yourself and (2) want to be able to extract references from a paper you are coauthoring and add any you don’t already have to your library.

Collaboration: Git and overleaf

I have only a little bit left to add, here. I think it goes without saying that papers should be version controlled, and if you and your coauthors are all happy with git then setting up a shared remote repository for your paper and collaborating by pushing and pulling should be straightforward.

If that is not the case, then I recommend using Overleaf as a starting point. Overleaf has several important features: it makes working on a shared .tex document — in an environment that can compile that .tex document! — extremely straightforward, and it has reasonable reviewing and version/history functionality (assuming you are working at an institution that provides access to the upgraded version of it).

It’s main downside is that editing in your browser is not fun — most people probably prefer working in whatever their local setup is. Fortunately, even this is not a problem. Overleaf provides functionality to either directly sync an overleaf project with one of your existing GitHub repositories (GitHub Synchronization), or to have the overleaf project function as a kind of git repo itself (Git integration). These are, unfortunately, also an “Overleaf premium” feature — if you do not have access to this, I suggest just setting up your own remote repo and coordinating that way.