Configuring DVC Remotes

Introduction to Data Versioning with DVC

Ravi Bhadauria

Machine Learning Engineer

Recap

  • Initializing DVC repository
    • Run dvc init
    • Repo inside workspace (/path/to/my-project)
  • Setting up DVC cache
    • Temporary staging area within .dvc directory
      • /path/to/my-project/.dvc/cache
    • Stage temporary files using dvc add
  • Now: DVC Remotes
    • External storage
    • Track and share assets
Introduction to Data Versioning with DVC

The Need for DVC Remotes

  • DVC Remotes: Location for Data Storage
  • Similar to Git remotes, but for cached data
  • Benefits of using remotes
    • Synchronize large files and directories
    • Centralize or distribute data storage
    • Save local space

Schematic of data flow from DVC workspace to cache and remote

Introduction to Data Versioning with DVC

Supported Storage Types

Graphic of supported storage types in DVC

Introduction to Data Versioning with DVC

Setting up Remotes

  • Setting remotes

    • dvc remote add <name> <location>
  • S3 bucket

$ dvc remote add s3_remote \
   s3://mys3bucket
  • DVC config changes
 ['remote "s3_remote"']
     url = s3://mys3bucket
  • GCP bucket
$ dvc remote add gcp_remote \
   gs://myGCPbucket
  • Azure
$ dvc remote add azure_remote \
   azure://mycontainer/path
Introduction to Data Versioning with DVC

Local Remotes

  • Local remotes are used for rapid prototyping
  • Use system directories or Network Attached Storage
$ dvc remote add mylocalremote /tmp/dvc
  • Set default remotes with -d flag
$ dvc remote add -d mylocalremote /tmp/dvc
  • Default remote assigned in the core section of .dvc/config
[core]
remote = mylocalremote
Introduction to Data Versioning with DVC

Listing Remotes

  • Listing remotes
$ dvc remote list
s3_remote    s3://mys3bucket
local_remote /tmp/dvcremote
  • Reads from .dvc/config
 ['remote "s3_remote"']
     url = s3://mys3bucket
 ['remote "local_remote"']
     url = /tmp/dvcremote
Introduction to Data Versioning with DVC

Modifying Remote Configuration

  • Customizations can be done with dvc remote modify
$ dvc remote modify s3_remote connect_timeout 300
  • DVC config file change
 ['remote "s3_remote"']
     url = s3://mys3bucket
     connect_timeout = 300
Introduction to Data Versioning with DVC

Summary

  • DVC remotes are used to share data and ML models
  • Variety of local and cloud based storage locations are supported
  • Add remotes: dvc remote add
    • Use -d flag to specify default
  • List remotes: dvc remote list
  • Modify remotes: dvc remote modify
Introduction to Data Versioning with DVC

Let's practice!

Introduction to Data Versioning with DVC

Preparing Video For Download...