Exploratory Data Analysis in Python
George Boorman
Curriculum Manager, DataCamp
For example:
print(planes["Destination"].value_counts())
Cochin 4391
Banglore 2773
Delhi 1219
New Delhi 888
Hyderabad 673
Kolkata 369
Name: Destination, dtype: int64
planes["Destination"].value_counts(normalize=True)
Cochin 0.425773
Banglore 0.268884
Delhi 0.118200
New Delhi 0.086105
Hyderabad 0.065257
Kolkata 0.035780
Name: Destination, dtype: float64
pd.crosstab(
pd.crosstab(planes["Source"],
pd.crosstab(planes["Source"], planes["Destination"])
Destination Banglore Cochin Delhi Hyderabad Kolkata New Delhi
Source
Banglore 0 0 1199 0 0 868
Chennai 0 0 0 0 364 0
Delhi 0 4318 0 0 0 0
Kolkata 2720 0 0 0 0 0
Mumbai 0 0 0 662 0 0
Source |
Destination |
Median Price (IDR) |
---|---|---|
Banglore | Delhi | 4232.21 |
Banglore | New Delhi | 12114.56 |
Chennai | Kolkata | 3859.76 |
Delhi | Cochin | 9987.63 |
Kolkata | Banglore | 9654.21 |
Mumbai | Hyderabad | 3431.97 |
pd.crosstab(planes["Source"], planes["Destination"],
values=planes["Price"], aggfunc="median")
Destination Banglore Cochin Delhi Hyderabad Kolkata New Delhi
Source
Banglore NaN NaN 4823.0 NaN NaN 10976.5
Chennai NaN NaN NaN NaN 3850.0 NaN
Delhi NaN 10262.0 NaN NaN NaN NaN
Kolkata 9345.0 NaN NaN NaN NaN NaN
Mumbai NaN NaN NaN 3342.0 NaN NaN
Source |
Destination |
Median Price (IDR) |
Median Price (dataset) |
---|---|---|---|
Banglore | Delhi | 4232.21 | 4823.0 |
Banglore | New Delhi | 12114.56 | 10976.50 |
Chennai | Kolkata | 3859.76 | 3850.0 |
Delhi | Cochin | 9987.63 | 10260.0 |
Kolkata | Banglore | 9654.21 | 9345.0 |
Mumbai | Hyderabad | 3431.97 | 3342.0 |
Exploratory Data Analysis in Python