Filtrar DataFrames de pandas

Python intermedio

Hugo Bowne-Anderson

Data Scientist at DataCamp

brics

import pandas as pd
brics = pd.read_csv("path/to/brics.csv", index_col = 0)
brics
         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Delhi   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98
Python intermedio

Objetivo

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Delhi   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98
  • Seleccionar países con una superficie mayor que 8 millones de km<sup>2</sup>
  • 3 pasos
    • Seleccionar la columna area
    • Realizar una comparación en la columna area
    • Utilizar el resultado para seleccionar países
Python intermedio

Paso 1: obtener columna

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Delhi   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98
brics["area"]
BR     8.516
RU    17.100
IN     3.286
CH     9.597
SA     1.221
Name: area, dtype: float64    # - Need Pandas Series
  • Alternativas:
brics.loc[:,"area"]
brics.iloc[:,2]
Python intermedio

Paso 2: comparar

brics["area"]
BR     8.516
RU    17.100
IN     3.286
CH     9.597
SA     1.221
Name: area, dtype: float64
brics["area"] > 8
BR     True
RU     True
IN    False
CH     True
SA    False
Name: area, dtype: bool
is_huge = brics["area"] > 8
Python intermedio

Paso 3: hacer subconjuntos del DF

is_huge
BR     True
RU     True
IN    False
CH     True
SA    False
Name: area, dtype: bool
brics[is_huge]
   country   capital    area  population
BR  Brazil  Brasilia   8.516       200.4
RU  Russia    Moscow  17.100       143.5
CH   China   Beijing   9.597      1357.0
Python intermedio

Resumen

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Delhi   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.988
is_huge = brics["area"] > 8
brics[is_huge]
   country   capital    area  population
BR  Brazil  Brasilia   8.516       200.4
RU  Russia    Moscow  17.100       143.5
CH   China   Beijing   9.597      1357.0
brics[brics["area"] > 8]
   country   capital    area  population
BR  Brazil  Brasilia   8.516       200.4
RU  Russia    Moscow  17.100       143.5
CH   China   Beijing   9.597      1357.0
Python intermedio

Operadores booleanos

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Delhi   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98
import numpy as np
np.logical_and(brics["area"] > 8, brics["area"] < 10)
BR     True
RU    False
IN    False
CH     True
SA    False
Name: area, dtype: bool
brics[np.logical_and(brics["area"] > 8, brics["area"] < 10)]
   country   capital   area  population
BR  Brazil  Brasilia  8.516       200.4
CH   China   Beijing  9.597      1357.0
Python intermedio

¡Vamos a practicar!

Python intermedio

Preparing Video For Download...