Git Filter-Repo Submodule Extraction

Use git-filter-repo to Extract Folders into Submodules and Boost Code Reuse

·

4 min read

Git Filter-Repo Submodule Extraction

Reuse Code by Extracting Folders and Creating Submodules (đź“·: cigdem)

“I will always choose a lazy person to do a difficult job because a lazy person will find an easy way to do it.” — Frank B. Gilbreth Sr

Management đź—‚

Managing projects with duplicated code and tangled file systems is a fool's errand. Impending deadlines and pressure to complete tasks leads to corner-cutting and hasty decision-making. Laziness in the present often causes exponentially more work in the future whereas an ounce of critical thinking and strategy can save you from tons of hard work down the line.

Strategy 🤔

While working on a new C++ project, two of our engineers noticed some similarities between a legacy C++ project and our new project. The new project has fundamental architecture differences when compared with the older project, but shares a few pieces of functionality. We wanted to share modules between the projects using only tools we were already using and decided that extracting pieces of code into standalone projects and extracting those projects into Git submodules was the optimal strategy for our situation. Submodules are repos inside of repos and allow for code to be shared, but modified and tracked separately from their parent repos. Additionally, by using git-filter-repo we can extract and preserve all of the commits to our folder of interest.

Extracting the Submodule 🏗

The recommended tool for submodule extraction is git-filter-repo. Git-filter-repo is a tool that doesn’t ship with Git by default but is recommended for this type of operation in Git’s official documentation.

To use git-filter-repo you’ll need to perform the following steps:

  1. Ensure you have Python 3 installed. On newer versions of Windows installing Python via the Windows Store is the easiest way to get Python configured for git-filter-repo.

  2. Download git-filter-repo.py but save the file as git-filter-repo (no extension) if you’re on macOS or Linux you’ll need to run chmod +x git-filter-repo to make it executable.

  3. Add the folder containing git-filter-repo to PATH (Windows, macOS, Linux).

Once you’ve installed git-filter-repo you’ll want to clone a new copy of the repo you plan to extract the folder from. You can specify the name of the new repo as the last argument to git clone to remind yourself what you were doing in case you get interrupted or distracted:

git clone https://github.com/user/repo name-of-new-repo

Set your working directory to the root folder of the repo you just cloned. This example uses the main branch, but you can filter a repo from the commits on any branch:

cd name-of-new-repo of && git checkout main

Before performing any repo filtering we want to remove the origin remote so that we don’t accidentally push any destructive changes to the parent repo.

git remote rm origin

Next, run filter-repo passing the --subdirectory-filter argument followed by the folder you'd like to use to create your submodule. This command will remove any folders and commits that aren’t related to the folder specified by --subdirectory-filter and will move the files in the specified path to the root directory of the repo. Note, Windows users will want to use the forward slash / instead of the traditional backward slash in your paths:

git filter-repo --subdirectory-filter path/to/folder

Notice we use the --force flag in the command above. The --force flag is required to override a warning about the repo not being clean. The warning happens because we removed the origin remote. We run git filter-repo with the --force flag to safely override the warning due to the remote mismatch.

Now that you’ve filtered the repo we can push the contents to the submodule repo. Create a new repo in GitHub and add the URL as the origin remote:

git remote add origin https://github.com/user/submodule

Finally, push the current branch to the new repo:

git push --set-upstream origin main

Using the Submodule 📦

At this point, you’ve successfully extracted the folder to a submodule! To use the submodule, switch over to the parent repo and run the following command:

git submodule add https://github.com/user/submodule-repo

That’s it! You’ve successfully extracted your folder to a submodule. You can now modify the files within the submodule and pull the changes in multiple repos.

You can learn more about extracting git submodules via the following resources:

  • Splitting a subfolder out into a new repository (GitHub)

  • Create a submodule repository from a folder and keep its git commit history (Stack Overflow)

  • How to Git clone including submodules (Stack Overflow)

Thanks for reading!

Want to Connect?

If you found the information in this tutorial useful please subscribe on Hashnode, follow me on Twitter, and/or subscribe to my YouTube channel.

Â