Python for R Users
Daniel Chen
Instructor
df = pd.DataFrame({
'name':['John Smith', 'Jane Doe', 'Mary Johnson'],
'treatment_a': [np.NaN, 16, 3],
'treatment_b': [2, 11, 1]
})
df_melt = pd.melt(df, id_vars='name')
df_melt.groupby('name')['value'].mean()
name
Jane Doe 13.5
John Smith 2.0
Mary Johnson 2.0
Name: value, dtype: float64
df_melt.groupby('name')['value'].agg(['mean', 'max'])
mean max
name
Jane Doe 13.5 16.0
John Smith 2.0 2.0
Mary Johnson 2.0 3.0
df = pd.DataFrame({
'status':['sick', 'healthy', 'sick'],
'treatment_a': [np.NaN, 16, 3],
'treatment_b': [2, 11, 1]
})
df
status treatment_a treatment_b
0 sick NaN 2
1 healthy 16.0 11
2 sick 3.0 1
pd.get_dummies(df)
treatment_a treatment_b status_healthy status_sick
0 NaN 2 0 1
1 16.0 11 1 0
2 3.0 1 0 1
Python for R Users