Git Filter Repo

Advanced Git

Amanda Crawford-Adamo

Software and Data Engineer

What is git filter-repo?

Git Filter-Repo Command

git filter-repo

A tool for rewriting Git repository history quickly and safely.

  • Rename files or directories
  • Operates on all branches simultaneously

Purposes

  1. Remove sensitive data (e.g., passwords, tokens)
  2. Clean up unnecessary files
  3. Restructure repositories
  4. Reduce repository size
Advanced Git

Filter-Repo process

  1. Install git filter-repo using pip

    pip install git-filter-repo
    
  2. Remove secrets.txt from every commit

    git filter-repo --path secrets.txt --invert-paths
    

Filter-Repo Related Filters

--path: specifies which paths to operate on

--invert-paths: operate on all paths except the ones specified in --path

Advanced Git

Filter-Repo result

Output

  Parsed 150 commits
  New history written in 0.10 seconds; now repacking/cleaning...
  Repacking your repo and cleaning out old unneeded objects

Key Implications

  • All branches and commits were updated
  • All commit hashes were changed
  • A force push is needed after this step
  • Team members will need to clone repo again
Advanced Git

When to use filter-repo

Use cases

  • Removing sensitive data (e.g., passwords)
  • Cleaning up bloated repositories
  • Renaming or reorganizing files across all commits

Tips

  • Always back up your repository before using filter-repo
  • Coordinate with collaborators before pushing rewritten history
Advanced Git

Let's practice!

Advanced Git

Preparing Video For Download...