Split a large Git repository

My latest project has been set up with several sub projects in one large repository. At the time we set it up it helped our workflow to have everything in one place. After some weeks we found our initial Git repository setup not really helpful though.

Our initial repository LargeRepository looked something likes this:

  • LargeRepository/
    • .git/
    • Frontend/
    • Admin/
    • API/
    • Services/

Frontend is an AngularJS JavaScript Web Application which is running on the client. It just requests data from our API. Admin, API and Services are Java projects which are runing on the server-side.

Problems

We decided to split our LargeRepository into smaller ones because of the following problems:

Merging over and over again

The Frontend and Backend in our Web Application are independent from each other, but we still had a lot of unnecessary merges

Continuous Integration

We’re using Git Hooks to trigger the Build and Deployment for our Web Application. We didn’t want to deploy a new version of the Backend if there has just been an update on the Frontend.

Achievement

How could we split our large project, extract only our Frontend directory without loosing the commit history and get rid of everything else?
Git is an amazing piece of software, but does it have a solution for our problem?

Approach

Yes, Git has got a solution! We can use Git’s filter-branch command to prune out a specific directory of a repository.

Procedure

To achieve this approach we have to go through several steps.

Preparation

First of all we have to make a clone of our local large repository. The new clone will be our splitted repository which only includes the Frontend later on.

$ git clone --no-hardlinks /path/to/LargeRepository /path/to/FrontendRepo

If it was successful you should see something like this:

Cloning into 'FrontendRepo'...
done.
Checking connectivity... done

The --no-hardlinks parameter avoids to hardlink the .git/objects directory to the one we’re cloning from. As this will be a new repository we have to be independent from any old repositories.

Now change the directory to the cloned repository
$ cd /path/to/FrontendRepo.

Action

Option A: Split-out & keep one specific directory

Relocate all files of the Frontend directory into the root of our new repository.
$ git filter-branch --subdirectory-filter Frontend HEAD

Output:

Rewrite 4a85024ef3c733422da8b7e866670657e043bcbe (15/15)
Ref 'refs/heads/master' was rewritten

Option B: Remove one specific subdirectory & Keep everything else

Relocate all subdirectories except the Frontend directory into the root of our new repository.

$ git filter-branch --tree-filter "rm -rf Frontend" --prune-empty HEAD

Output:

Rewrite 9885ee8c351c23bc12326c83e0b2c2fe9152a350 (33/33)
Ref 'refs/heads/master' was rewritten

Cleanup

We don’t want to be able to restore anything and remove all references to old objects we don’t need anymore.

Remove original refs

$ git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d

Output:

05fad00ddbf8920dda5f3e7708c94aa06c06d13a commit refs/heads/master
9885ee8c351c23bc12326c83e0b2c2fe9152a350 commit refs/original/refs/heads/master
9885ee8c351c23bc12326c83e0b2c2fe9152a350 commit refs/remotes/origin/HEAD
9885ee8c351c23bc12326c83e0b2c2fe9152a350 commit refs/remotes/origin/master

Expire reflog entries

$ git reflog expire --expire=now --all

Reset repository directory

$ git reset --hard

Output:

HEAD is now at 05ead00 Your last commit message

Garbage collection

$ git gc

Output:

Counting objects: 594, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (490/490), done.
Writing objects: 100% (594/594), done.
Total 594 (delta 221), reused 88 (delta 16)

Push to new remote origin

Now our respository is splitted and ready to be pushed to a new remote repository.

$ git remote rm origin
$ git remote add origin https://git@yourgitserver.tld/Frontend.git
$ git push origin master


Further information & Sources:

1 Comments

Leave a Comment.