Filtering pandas DataFrames

Intermediate Python

Hugo Bowne-Anderson

Data Scientist at DataCamp

brics

import pandas as pd
brics = pd.read_csv("path/to/brics.csv", index_col = 0)
brics
         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Delhi   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98
Intermediate Python

Goal

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Delhi   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98
  • Select countries with area over 8 million km2
  • 3 steps
    • Select the area column
    • Do comparison on area column
    • Use result to select countries
Intermediate Python

Step 1: Get column

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Delhi   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98
brics["area"]
BR     8.516
RU    17.100
IN     3.286
CH     9.597
SA     1.221
Name: area, dtype: float64    # - Need Pandas Series
  • Alternatives:
brics.loc[:,"area"]
brics.iloc[:,2]
Intermediate Python

Step 2: Compare

brics["area"]
BR     8.516
RU    17.100
IN     3.286
CH     9.597
SA     1.221
Name: area, dtype: float64
brics["area"] > 8
BR     True
RU     True
IN    False
CH     True
SA    False
Name: area, dtype: bool
is_huge = brics["area"] > 8
Intermediate Python

Step 3: Subset DF

is_huge
BR     True
RU     True
IN    False
CH     True
SA    False
Name: area, dtype: bool
brics[is_huge]
   country   capital    area  population
BR  Brazil  Brasilia   8.516       200.4
RU  Russia    Moscow  17.100       143.5
CH   China   Beijing   9.597      1357.0
Intermediate Python

Summary

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Delhi   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.988
is_huge = brics["area"] > 8
brics[is_huge]
   country   capital    area  population
BR  Brazil  Brasilia   8.516       200.4
RU  Russia    Moscow  17.100       143.5
CH   China   Beijing   9.597      1357.0
brics[brics["area"] > 8]
   country   capital    area  population
BR  Brazil  Brasilia   8.516       200.4
RU  Russia    Moscow  17.100       143.5
CH   China   Beijing   9.597      1357.0
Intermediate Python

Boolean operators

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Delhi   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98
import numpy as np
np.logical_and(brics["area"] > 8, brics["area"] < 10)
BR     True
RU    False
IN    False
CH     True
SA    False
Name: area, dtype: bool
brics[np.logical_and(brics["area"] > 8, brics["area"] < 10)]
   country   capital   area  population
BR  Brazil  Brasilia  8.516       200.4
CH   China   Beijing  9.597      1357.0
Intermediate Python

Let's practice!

Intermediate Python

Preparing Video For Download...