Data Processing in Shell
Susan Sun
Data Person
Python standard library has a collection of:
Data science packages like scikit-learn and statsmodel:
pip
, the standard package manager for Python, via the command lineDocumentation:
pip -h
Usage:
pip <command> [options]
Commands:
install Install packages.
uninstall Uninstall packages.
freeze Output installed packages in requirements format.
list List installed packages.
Documentation:
pip --version
pip 19.1.1 from /usr/local/lib/python3.5/dist-packages/pip (python 3.5)
python --version
Python 3.5.2
If pip
is giving an upgrade warning:
WARNING: You are using pip version 19.1.1, however version 19.2.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Upgrade pip
using itself:
pip install --upgrade pip
Collecting pip
|################################| 1.4MB 10.7MB/s
Successfully installed pip-19.2.1
pip list
: displays the Python packages in your current Python environment
pip list
Package Version
- - - - - - - - - - - -
agate 1.6.1
agate-dbf 0.2.1
agate-excel 0.2.3
agate-sql 0.5.4
Babel 2.7.0
pip install
installs the package specified and any other dependencies
pip install scikit-learn
Collecting scikit-learn
Downloading https://files.pythonhosted.org/packages/1f/af/e3c3cd6f61093830059138624dbd26d938d6da1caeec5aeabe772b916069/scikit_learn-0.21.3-cp35-cp35m-manylinux1_x86_64.whl (6.6MB)
|################################| 6.6MB 32.5MB/s
Collecting scipy>=0.17.0 (from scikit-learn)
Downloading https://files.pythonhosted.org/packages/14/49/8f13fa215e10a7ab0731cc95b0e9bb66cf83c6a98260b154cfbd0b55fb19/scipy-1.3.0-cp35-cp35m-manylinux1_x86_64.whl (25.1MB)
|################################| 25.1MB 35.5MB/s
...
By default, pip install
will always install the latest version of the library.
pip install scikit-learn
Successfully built sklearn
Installing collected packages: joblib, scipy, scikit-learn, sklearn
Successfully installed joblib-0.13.2 scikit-learn-0.21.3 scipy-1.3.0 sklearn-0.0
To install a specific (or older) version of the library:
pip install scikit-learn==0.19.2
Collecting scikit-learn==0.19.2
Downloading https://files.pythonhosted.org/packages/b6/e2/a1e254a4a4598588d4fe88b45ab88a226c289ecfd0f6c90474eb6a9ea6b3/scikit_learn-0.19.2-cp35-cp35m-manylinux1_x86_64.whl (4.9MB)
|################################| 4.9MB 15.6MB/s
Installing collected packages: scikit-learn
Successfully installed scikit-learn-0.19.2
Upgrade the Scikit-Learn package using pip:
pip install --upgrade scikit-learn
Collecting scikit-learn
Downloading https://files.pythonhosted.org/packages/1f/af/e3c3cd6f61093830059138624dbd26d938d6da1caeec5aeabe772b916069/scikit_learn-0.21.3-cp35-cp35m-manylinux1_x86_64.whl (6.6MB)
|################################| 6.6MB 41.5MB/s
Requirement already satisfied, skipping upgrade: numpy>=1.11.0 in /usr/local/lib/python3.5/dist-packages (from scikit-learn) (1.16.4)
Collecting scipy>=0.17.0 (from scikit-learn)
Installing collected packages: scipy, joblib, scikit-learn
Successfully installed joblib-0.13.2 scikit-learn-0.21.3 scipy-1.3.0
To pip install
multiple packages, separate the packages with spaces:
pip install scikit-learn statsmodels
Upgrade multiple packages:
pip install --upgrade scikit-learn statsmodels
requirements.txt
file contains a list of packages to be installed:
cat requirements.txt
scikit-learn
statsmodel
Most Python developers include requirements.txt
files in their Python Github repos.
-r
allows pip install
to install packages from a pre-written file:
-r, --requirement <file>
Install from the given requirements file. This option can be used multiple times.
In our example:
pip install -r requirements.txt
is the same as
pip install scikit-learn statsmodel
Data Processing in Shell