Python for R Users
Daniel Chen
Instructor
np.NaN
, np.NAN
, np.nan
are all the same as the NA
R valuepd.isnull
pd.notnull
pd.isnull
is an alias for pd.isna
df
name treatment_a treatment_b
0 John Smith NaN 2
1 Jane Doe 16.0 11
2 Mary Johnson 3.0 1
a_mean = df['treatment_a'].mean()
a_mean
9.5
df['a_fill'] = df['treatment_a'].fillna(a_mean)
df
name treatment_a treatment_b a_fill
0 John Smith NaN 2 9.5
1 Jane Doe 16.0 11 16.0
2 Mary Johnson 3.0 1 3.0
apply
methoddf = data.frame('a' = c(1, 2, 3),
'b' = c(4, 5, 6))
apply(df, 2, mean)
a b
2 5
apply(df, 1, mean)
2.5 3.5 4.5
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3],
'B':[4, 5, 6]})
df.apply(np.mean, axis=0)
A 2.0
B 5.0
dtype: float64
df.apply(np.mean, axis=1)
0 2.5
1 3.5
2 4.5
dtype: float64
Tidy Data Paper: http://vita.had.co.nz/papers/tidy-data.pdf
df
name treatment_a treatment_b
0 John Smith NaN 2
1 Jane Doe 16.0 11
2 Mary Johnson 3.0 1
df_melt = pd.melt(df, id_vars='name')
df_melt
name variable value
0 John Smith treatment_a NaN
1 Jane Doe treatment_a 16.0
2 Mary Johnson treatment_a 3.0
3 John Smith treatment_b 2.0
...
df_melt_pivot = pd.pivot_table(df_melt,
index='name',
columns='variable',
values='value')
df_melt_pivot
variable treatment_a treatment_b
name
Jane Doe 16.0 11.0
John Smith NaN 2.0
Mary Johnson 3.0 1.0
df_melt_pivot.reset_index()
variable name treatment_a treatment_b
0 Jane Doe 16.0 11.0
1 John Smith NaN 2.0
2 Mary Johnson 3.0 1.0
groupby
: split-apply-combine name variable value
0 John Smith treatment_a NaN
1 Jane Doe treatment_a 16.0
2 Mary Johnson treatment_a 3.0
3 John Smith treatment_b 2.0
4 Jane Doe treatment_b 11.0
5 Mary Johnson treatment_b 1.0
df_melt.groupby('name')['value'].mean()
name
Jane Doe 13.5
John Smith 2.0
Mary Johnson 2.0
Name: value, dtype: float64
Python for R Users