Selecting data in pandas

Python for R Users

Daniel Chen

Instructor

Manually create DataFrame

df = pd.DataFrame({
            'A': [1, 2, 3],
            'B': [4, 5, 6], 
            'C': [7, 8, 9]}, 
            index = ['x', 'y', 'z'])

print(df)
    A     B     C
x     1     4     7
y     2     5     8
z     3     6     9
Python for R Users
df = pd.DataFrame({
 'A': [1, 2, 3],
 'B': [4, 5, 6], 
 'C': [7, 8, 9]}, 
 index = ['x', 'y', 'z'])
df
   A  B  C
x  1  4  7
y  2  5  8
z  3  6  9
df['A']
x    1
y    2
z    3
Name: A, dtype: int64
df.A
x    1
y    2
z    3
Name: A, dtype: int64
df[['A', 'B']]
   A  B
x  1  4
y  2  5
z  3  6
Python for R Users

Subsetting rows

  • Row-label (loc) vs row-index (iloc)
  • Python starts counting from 0
Python for R Users

Subsetting rows .iloc

df
   A  B  C
x  1  4  7
y  2  5  8
z  3  6  9
df.iloc[0]
A    1
B    4
C    7
Name: x, dtype: int64
df.iloc[[0, 1]]
   A  B  C
x  1  4  7
y  2  5  8
df.iloc[0, :]
A    1
B    4
C    7
Name: x, dtype: int64
df.iloc[[0, 1], :]
   A  B  C
x  1  4  7
y  2  5  8
Python for R Users

Subsetting rows .loc

df
   A  B  C
x  1  4  7
y  2  5  8
z  3  6  9
df.loc['x']
A    1
B    4
C    7
Name: x, dtype: int64
df.loc[['x', 'y']]
   A  B  C
x  1  4  7
y  2  5  8
Python for R Users
df
   A  B  C
x  1  4  7
y  2  5  8
z  3  6  9
df.loc['x', 'A']
1
df.loc[['x', 'y'], ['A', 'B']]
   A  B
x  1  4
y  2  5
Python for R Users

Conditional subsetting

df[df.A == 3]
   A  B  C
z  3  6  9
df[(df.A == 3) | (df.B == 4)]
   A  B  C
x  1  4  7
z  3  6  9
Python for R Users

Attributes

df.shape
(3, 2)
df.shape()
 --------------------------------------------------------------------
TypeError                          Traceback (most recent call last)
<ipython-input-17-0e566b70f572> in <module>()
<hr />-> 1 df.shape()

TypeError: 'tuple' object is not callable
Python for R Users

Let's practice!

Python for R Users

Preparing Video For Download...