Jakub Suder's blog on Cocoa and web development

Sharing code between projects with git subtree

Categories: Programming 18 comments

I came across a problem recently. I have a project called xBlip which I’ve described before – it’s an iPhone client for a Polish Twitter-like service Blip. This project has a backend part which I keep in a subdirectory “ObjectiveBlip” and which I’ve tried to keep as separate from the rest as possible, with the intention that it might be one day extracted as a separate project.

Now I got the idea that I could write a desktop application for Mac that does the same – and of course I could reuse that backend part for that. I would also like to create a separate project on Github with just the backend, so that theoretically someone might use it in future for some purpose.

But this means that I would be maintaining three separate copies of the same code, which I’d have to keep in sync somehow. So the question is, how to do this best?

There are a few ways in Git to share code between projects (for example, git submodules) – but most of them are intended only for one-way communication, i.e. downloading updates to a library maintained by someone else into your project. Here, I want to have a two-way communication: I could extend the backend code while working either on the iPhone or the Mac application (working directly on the backend-only project wouldn’t usually make sense), and then broadcast the changes into the other two projects. I also don’t want the solution to be inconvenient to people who download the project, as is the case with git submodules – you have to manually update them once you download the main code, initially their directories are empty.

So I started looking for a way to pull this off – and I found a script called “git subtree” which seems to do exactly what I need (confusingly, there’s another plugin also called git subtree which is completely unrelated to the first one…). It took me some time (and a few emails to the author, Avery Pennarun) to figure out how to use it, so I thought I’d post a tutorial here in case anyone has a similar situation.

So, here’s what we need to do… (grab a coffee, it’s going to be long):


Starting point

We have one project – xBlip – with the backend code in ObjectiveBlip/ and the UI code in other subdirectories. We want to make a second project with just the backend code, and a third one with a Mac application which reuses it, and set up a way to sync the changes between these three.

Extracting ObjectiveBlip

There are (at least) two ways to extract the backend project from xBlip: I can either use git subtree to extract whole ObjectiveBlip’s history, or I can copy the files manually. If I chose the first option, I’d do:

git subtree split -P ObjectiveBlip -b export

This would create a new ‘export’ branch in my repo, containing only the commits and changes that had anything to do with the ObjectiveBlip directory, and ignoring anything that happened outside it. That way, the new project would have some kind of history from the beginning. Then, I would create a new repository out of that specific branch (I’ve learned that trick from Avery):

cd ~/Projects
mkdir ObjectiveBlip
cd ObjectiveBlip
git init
git fetch ../xblip export
git checkout -b master FETCH_HEAD

This looks weird because normally when you create a new repo (git init), the first thing you do is make the initial commit. Here, we instead fetch existing commits from an existing repo, and only commits from a specific branch (“export”), and then we manually create a master branch out of the fetched commits.

I’ve decided not to do that; the reason is that the commits that would form ObjectiveBlip’s history weren’t created with this separate project in mind – they were done as a part of coding on xBlip. And while it’s possible to extract only the relevant information with git subtree, the commits just wouldn’t always make sense. It would all be a bit artificial.

So instead I extracted the files manually and created a fresh project with no history:

cd ~/Projects
cp -R xblip/ObjectiveBlip .
cd ObjectiveBlip
git init
git add .
git commit -m "extracted ObjectiveBlip from iPhone xBlip"
git remote add origin git@github.com:jsuder/ObjectiveBlip.git
git push origin master

Adding ObjectiveBlip back to xBlip as a subproject

In order to move commits around between projects, I need to have ObjectiveBlip repo added as a remote in both application projects. I will then see all ObjectiveBlip commits in a separate branch (objblip/master), and I will decide how to copy commits between that branch and the master branch.

cd ~/Projects/xblip
git remote add objblip git@github.com:jsuder/ObjectiveBlip.git
git fetch objblip

The graph in GitX looks like this at this point:

(I cheated a bit and did some tricks involving commit --amend in order to force GitX to draw the graph this way – the author didn’t really foresee such configuration and GitX kind of freaks out sometimes when you work with git subtree, and shows long and messy lines, or even lines that break and continue somewhere else…)

To add ObjectiveBlip into the master branch as a subproject, I need to delete the existing files first:

git rm -r ObjectiveBlip
git commit -m "removed ObjectiveBlip files"

Now, to join the subproject I need to use git subtree add with the option --prefix ObjectiveBlip (or -P ObjectiveBlip). There are actually two ways to do that; I can do it either with an additional option --squash, or without it. Squash means that the subproject commits that you add into your main project are merged into one.

Let’s try the version without squash first:

git subtree add -P ObjectiveBlip -m "readded ObjectiveBlip as a subproject" \
 ↪ objblip/master

If you don’t use squash, the commits will be kept intact, so both application projects will contain a complete history, commit by commit, of the changes in ObjectiveBlip code. They will form a separate timeline parallel to your main one, but it will be connected to your main timeline at the points of merges, so if you look at a one-dimensional commit list (e.g. Github “commits” page), it will show the backend commits mixed with frontend commits. What’s worse, any commit you make to the subproject while working on the application’s master and backport to the other timeline (and I certainly will be making commits this way, because it’s easier to develop the backend if I can constantly test it in the actual app), will appear two times on the “commits” page – once in the main timeline, and once in the backend timeline.

I know, you probably didn’t understand any of this. Maybe this graph will clear things up:

This is the state of the xBlip repository after a few commits made in the xBlip repo and in the ObjectiveBlip repo.

The left vertical line is the main (master) timeline, which contains normal code of my project, with ObjectiveBlip in a subdirectory. The right vertical line is the ObjectiveBlip’s timeline which contains its files at the root of the project, and none of the UI code. Note that this isn’t really a direct git merge, and you can’t use plain git merge command to make the joins, or bad things will happen. You have to use git subtree to “translate” the commits for you.

Note also that the commit named “added foo to ObjectiveBlip” appears twice, once in the original version, and once in the “translated” version, and both versions will be visible on the “commits” page on Github. I could prevent that if I used ‘git rebase’ to delete the commit in the left timeline after I copied it to the right timeline, but that’s one extra thing I’d have to remember…

Merging and splitting

After the initial merge with git subtree add, for subsequent merges you use git subtree merge (for any command, you need to remember to use the prefix option to tell it the location of the subdirectory). If you make any commits to the subproject inside master, you can use git subtree split to backport the commits to the right timeline; pass it a --branch option with a name of a branch to be created or updated to point to the newest commit, and then push it to the external repository. Note that it’s usually better to keep the changes to files inside subproject’s directory and to files outside it in separate commits, e.g. make a commit “added foo to ObjectiveBlip” and then separately “added FooController in the UI”, even if you worked on both parts simultaneously.

Here’s a list of commands that were used to create the graph above:

# add the subproject - creates the first merge point,
# adds a ObjectiveBlip/ subdirectory
git subtree add -P ObjectiveBlip -m "readded ObjectiveBlip as a subproject" \
 ↪ objblip/master

# after we create the commit "added readme"
# in external ObjectiveBlip repo:
git fetch objblip
git subtree merge -P ObjectiveBlip -m "merged changes in ObjectiveBlip" \
 ↪ objblip/master

# after we make the changes to both UI and the backend while working
# in xblip master, we backport the relevant commit to the timeline on
# the right, and push it to objblip repo; note that the second commit
# is ignored, as it contains only changes unrelated to ObjectiveBlip
git subtree split -P ObjectiveBlip -b backport
git push objblip backport:master

# after we update readme in ObjectiveBlip repo:
git fetch objblip
git subtree merge -P ObjectiveBlip -m "merged changes in ObjectiveBlip" \
 ↪ objblip/master

Using squash

I’ve decided to use the version with --squash instead. If you use squash, you will actually have 3 timelines (!) in your application repo… First will be the master, second – the subproject one, and the third one will be the squashed one. What’s important is that the squashed timeline will be merged with the master timeline, but the original subproject timeline will be kept completely separate, and you don’t even have to push it to Github with your application project.

Again, a graph will (hopefully) clear this up a bit:

The left vertical line is the master, the right one contains the squashed commits. The subproject timeline – the one that is used to make pushes and fetches from the external repository – will appear separately from the rest, either on top, or at the bottom. You only need it locally and you don’t need to push it to the ‘origin’ repo.

Here are the commands used this time:

git subtree add -P ObjectiveBlip --squash \
 ↪ -m "readded ObjectiveBlip as a subproject" objblip/master

...

git fetch objblip
git subtree merge -P ObjectiveBlip --squash \
 ↪ -m "merged changes from ObjectiveBlip" objblip/master

...

# note: for split, you don't pass --squash
# (there's currently no way to squash the backported commits)
git subtree split -P ObjectiveBlip -b backport
git push objblip backport:master

...

git fetch objblip
git subtree merge -P ObjectiveBlip --squash \
 ↪ -m "merged changes from ObjectiveBlip" objblip/master

There are two practical differences in the way your commit timelines will look like between the two strategies:

  • with squash, there will always be only one commit per merge in the right (squashed) timeline; this may be good or bad, depending on what you expect, but I think most of the time you probably won’t need every single commit from the subproject to appear in your timeline
  • with squash, commits backported from master to subproject will not appear second time in the right timeline, because they will be a part of one of the squashed commits

I believe you can use either of the approaches depending of how you want your commit graph to look like. But please pick one at the beginning and stick to it, or bad things will happen…

18 comments:

darkjames

What about git submodule?

darkjames

The question before should be: "What's wrong with git submodule?"

Anyway found it.

Unlike the 'git submodule' command, git subtree doesn't produce
any special constructions (like .gitmodule files or gitlinks) in
your repository, and doesn't require end-users of your
repository to do anything special or to understand how subtrees
work.

I still prefer to use git submodule :)

a/ It's better known.
b/ It's built-in into git suite.

Psi

Inconvenience for end users is one thing, but the bigger problem is that - as the git submodule manual says:

you cannot modify the contents of the submodule from within the main project

And that's the whole point, I wanted to have the backend extracted as a separate project, but still be able to make updates to it while working on one of the frontend apps (because it's much easier to test the changes in the backend if I can use the frontend to check if the new feature works).

bronson

Great writeup! I actually started writing my git subtree before I found out about apenwarr's. It uses the same technique (subtree merge) to to solve the same problem, it just goes about it differently.

You can see that I hit a a wall -- I ended up spending all my time working around Bash limitations rather than adding features, and I wasn't willing to take the time to rewrite it in Ruby or Perl, so... I guess my git-subtree is dead. Wish github had a place to put abandoned projects without deleting them.

Why not git submodules? Because once a submodule is in your project, you can't merge, bisect, etc, without a lot hassle. They basically prevent you from using the coolest features of git. They do work great if you have one developer who's moving forward on one branch. Once you add another programmer and start merging, it doesn't take long to discover first-hand why submodules are so horrible. :)

DW

Hi, I am fairly new to versioning control in general but I git subtree seems to be what I need. I have asked a fairly basic usage question here though:
http://stackoverflow.com/questions/2503816
and would be greatful for any help.

Luciano

I was searching for this kind of feature, and while I was reading this post and comments, an idea came to my mind. What if I put the subtree as a normal git-managed directory and just ignore it from the main tree? For example:

/ mainProject folder
|-- .git
|-- .gitignore file (contents: libProject/)
|-- ...
|-- [a lot of files and folders specific to mainProject]
|-- ...
|-- libProject
|- .git
|- [a lot of files and folders common to other projects]

Would that work?

Luciano

Fixing the tree:

/ mainProject folder
|-- .git
|-- .gitignore file (contents: libProject/)
|-- ...
|-- [a lot of files and folders specific to mainProject]
|-- ...
|-- libProject
....|- .git
....|- [a lot of files and folders common to other projects]

Psi

Luciano: Yes, it would work in that you would have a subdirectory within the main project directory, you could update it separately and even push updates back to the sub-repo; but there's one problem: the directory wouldn't be included in the main project's repository - if you or someone else clones your main project's repo on a new machine, they wouldn't have the libProject directory at all. So this would only work if you agreed that anytime someone clones your project, they have to manually clone the subproject too - otherwise they won't be able to compile or run your project because the subproject's files will be missing.

Boryn

Geee... This is probably what I would also need :) I have to find several hours to test it out :)

I tried submodules but I am not very happy about them neither. You have to update them manually by "git submodule update" which is not very convenient. And committing demands also independent commits...

Hope your solution will help me and other devs :)

Thanks

Jason Boxman

Thanks for the write up with the examples. I was about to start playing with git-subtree and now with your screenshots it makes a lot more sense.

Cynthia Kiser

Would it be possible for you to update this example with multiple splits and pushes upstream? I am using this as a model and if I reuse the same branch for my second split, when I merge into the remote for my library, I get duplicate commits - same log messages and files but with different SHA1 signatures.

So far, developing a library from within a super project using submodules is a lot clearer to me. The comment about not modifying the submodule from the main project is a bit misleading. All you have to do is cd into the submodule and then you can make whatever changes you like and push them to your submodule's own repository just any other repository.

Psi

Honestly, since that time I had some problems with git submodule, like you've described - even though it did work before... maybe they've changed something in one of the later versions, or maybe it doesn't handle all possible cases yet. I haven't tried to solve this.

I haven't thought of using git submodules as you say - maybe it will be a better solution after all... the downside of git submodule is that it adds a lot of complexity, and you have to think about all these branches and merges and stuff.

Carlos

Great post, thanks!

I use exactly the same work flow and I think I can benefit from the better adapted features of subtree over submodule.
One stupid question: When in the process can you fetch code from a subproject and check those changes before committing them on the superproject tree? In other words: Imagine I have two superprojects (A and B), sharing code from a subproject C, for which they have a subtree as you describe. If I change C when working on project A, I succeed in pushing the changes upstream. Even downloading those changes onto project B. But what I would like is to fetch the code on project B, and do a git diff to check what's changed before I commit the changes on project B.
Is there a way to do this? Doing a git diff after your git fetch does not show the changes..
Thanks!

Psi

@Carlos: after you do 'git fetch ...', the commits are downloaded to your local repo and you can browse them as long as you like before you do 'git subtree merge ...'. You can for example switch to the branch that was updated (in my case: 'git checkout objblip/master') and use 'git log ...' to show recent commits on that branch.

Carlos

That worked, thanks!
I made the non-squashed version work, but I also consider the squashed version more useful.
I followed the command sequence you propose and I get an error when I try to push upstream changes made in the superproject.
Following your example I did (using your project names):

(...)
git subtree split -P ObjectiveBlip -b backport

And got this error:

fatal: bad object 0e630d0e534b235073a449dd08b96700541ca97b

I read this interesting post: http://stackoverflow.com/questions/5760331/git-subtree-is-not-retaining-history-so-i-cannot-push-subtree-changes-how-can-i

but they don't really provide an answer to the questions, which correspond exactly with what I would ask. Also, I wonder what I am doing wrong if I follow the exact same sequence as the one you propose. Did you ever encounter this issue when looking into this?
Thanks!

Reuben Cummings

I get the following Error: Use --prefix instead of bare filenames.

When I try to use a local remote path instead of pointing to github. These were my steps:

# create git repo
mkdir -p /Users/reubano/Documents/Projects/test/export
cd /Users/reubano/Documents/Projects/test
git init
touch test.php
touch test.xml
touch README.textile
git add .
git commit -m "initial commit"

# add lib_general as remote repo
git remote add lib_general ~/Documents/Projects/lib_general
git fetch lib_general master
git subtree add -P lib_general -m "add lib_general as a subtree"
lib_general master

# update project with lib_general changes
git fetch lib_general master
git subtree merge -P lib_general -m "merge changes from lib_general
subtree" lib_general # this step errors

Projects/

├─test/
│ │
│ ├─lib_general/
│ │ │
│ │ └─General.inc.php
│ │
│ ├─files...

├─lib_general/
| │
| └─General.inc.php


Any idea how to fix it?

Reuben Cummings

I figured it out. The last line should read

git subtree merge -P lib_general -m "merge changes from lib_general subtree" master

MihaiP

Hi guys,
I tried to install git subtree on Mac and after first step (cp git-subtree.sh "$(git --exec-path)"/git-subtree), when I try to use git subtree I get always an error.

sudo git subtree split -P ObjectiveBlip export
fatal: cannot exec 'git-subtree': Operation not permitted

Can you help me with this?

Leave a comment

*

*
This will only be used to display your Gravatar image.

Are you a human? (yes/no) *