Git Large File Storage

Advanced Git

Amanda Crawford-Adamo

Software and Data Engineer

What is Git Large File System?

Git LFS Command:

git lfs
  • Git LFS: Git Large File Storage
  • Replace large files in repo
  • Small pointer files
  • Large files separate from repo

Benefits:

  1. Reduced repository size
  2. Faster cloning and fetching
  3. Efficient binary file handling
  4. Improved collaboration on large files
Advanced Git

Git LFS initialization process

  • Initialize Git LFS

    git lfs install
    
  • Setup files to track and generate .gitattributes file

    git lfs track "*.csv"
    
  • Add to git index .gitattributes with tracking config

    git add .gitattributes
    
  • Commit new changes
    git commit -m "Track CSV files"
    

A gif of file cabinet showing raised files

Advanced Git

Git LFS update process

  1. Add new file using git add
    git add large_file.csv
    
  2. Commit and push the changes
    git commit -m "Update large CSV file"
    git push origin main
    
  3. Download changes
    git pull
    git lfs pull  # If needed to explicitly download LFS content
    
Advanced Git

When to use Git LFS

When to use:

  1. Need to track changes to large datasets (CSV, JSON, etc.)
  2. Machine learning models
  3. Binary assets (images, videos)
  4. Version control compressed or installer files

When not to use:

  1. Infrequently updated large files
  2. Small text files, like code
  3. Tight storage quotas
Advanced Git

Best practices

  1. Efficient large file management
  2. Improved collaboration on data-heavy projects
  3. Seamless integration with Git workflow

Tips:

  1. Track files selectively
  2. keep your team informed about LFS usage
  3. Regularly prune LFS cache
Advanced Git

Let's practice!

Advanced Git

Preparing Video For Download...