git submodules vs. google’s repo tool
Category : Tools
There are a lot of articles on the internet bashing each of the tools, but in our opinion – most of it comes from misunderstanding the tool’s design or trying to apply it in an unappropriate context.
This post summarizes the general rules of thumb we at Otomato follow when choosing a solution for this admittedly non-trivial situation.
First of all – whenever possible – we recommend integrating your components on binary package level rather than compiling everything from source each time. I.e. : packaging components to jars, npms, eggs, rpms or docker images, uploading to a binary repo and pulling in as versioned dependencies during the build.
Still – sometimes this is not an optimal solution, especially if you do a lot of feature branch development (which in itself is an anti-pattern in classical Continuous Delivery approach – see here for example).
For these cases we stick to the following guidelines.
Git Submodules :
- An integrated solution, part of git since v1.5
- Deterministic relationship definition (parent project always points to a specific commit in submodule)
- Integration points are recorded in parent repo.
- Easy to recreate historical configurations.
- Total separation of lifecycles between the parent and the submodules.
- Supported by jenkins git plugin.
- Management overhead. (Need separate clones to introduce changes in submodules)
- Developers get confused if they don’t understand the inner mechanics.
- Need for additional commands (‘clone –recursive’ and ‘submodule update’)
- External tools support is not perfect (bitbucket, sourcetree, ide plugins)
- Tracking synchronized development effort is easier.
- Gerrit integration (?)
- A separate jenkins plugin.
- An external obscure mechanism
- Requires an extra repository for management.
- Non-deterministic relationship definition (each repo version can be defined as a floating head)
- Hard to reconstruct old versions.
- No support in major repo managers (bitbucket, gitlab) or gui tools.
In general : Whenever we want to integrate separate decoupled components with distinct lifecycles – we recommend submodules over repo, but their implementation must come with proper education regarding the special workflow they require. In the long run it pays off – as integration points can be managed in deterministic manner and with knowledge comes the certainty in the tool.
If you find your components are too tightly coupled or you you’re in need of continuous intensive development occurring concurrently in multiple repos you should probably use git subtrees or just spare yourself the headache and drop everything into one big monorepo. (This depends, of course, on how big your codebase is)
To read more about git subtree – see here.
The important thing to understand is that software integration is never totally painless and there is no perfect cure for the pain. Choose the solution that makes your life easier and assume the responsibilty of learning the accompanying workflow. As they say : “it’s not the tool – it’s how you use it.”
I’ll be happy to hear what you think, as this is a controversial issue with many different opinions flying around on the internet.
And – if you’re looking for some git consulting – drop us a note and we’ll be happy to help.