Exploratory Data Analysis in Python
George Boorman
Curriculum Manager, DataCamp
print(salaries.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 594 entries, 0 to 593
Data columns (total 9 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Working_Year 594 non-null int64
1 Designation 567 non-null object
2 Experience 561 non-null object
3 Employment_Status 563 non-null object
4 Salary_In_Rupees 566 non-null object
5 Employee_Location 554 non-null object
6 Company_Location 570 non-null object
7 Company_Size 535 non-null object
8 Remote_Working_Ratio 571 non-null float64
dtypes: float64(1), int64(1), object(7)
memory usage: 41.9+ KB
None
print(salaries["Salary_In_Rupees"].head())
0 20,688,070.00
1 8,674,985.00
2 1,591,390.00
3 11,935,425.00
4 5,729,004.00
Name: Salary_In_Rupees, dtype: object
Salary_In_Rupees
float
data type
pd.Series.str.replace("characters to remove", "characters to replace them with")
salaries["Salary_In_Rupees"] = salaries["Salary_In_Rupees"].str.replace(",", "")
print(salary["Salary_In_Rupees"].head())
1 20688070.00
2 8674985.00
3 1591390.00
4 11935425.00
5 5729004.00
Name: Salary_In_Rupees, dtype: object
salaries["Salary_In_Rupees"] = salaries["Salary_In_Rupees"].astype(float)
salaries["Salary_USD"] = salaries["Salary_In_Rupees"] * 0.012
print(salaries[["Salary_In_Rupees", "Salary_USD"]].head())
Salary_In_Rupees Salary_USD
0 20688070.0 248256.840
1 8674985.0 104099.820
2 1591390.0 19096.680
3 11935425.0 143225.100
4 5729004.0 68748.048
salaries.groupby("Company_Size")["Salary_USD"].mean()
Company_Size
L 111934.432174
M 110706.628527
S 69880.980179
Name: Salary_USD, dtype: float64
salaries["std_dev"] = salaries.groupby("Experience")
salaries["std_dev"] = salaries.groupby("Experience")["Salary_USD"]
salaries["std_dev"] = salaries.groupby("Experience")["Salary_USD"].transform(
salaries["std_dev"] = salaries.groupby("Experience")["Salary_USD"].transform(lambda x: x.std())
print(salaries[["Experience", "std_dev"]].value_counts())
Experience std_dev
SE 52995.385395 257
MI 63217.397343 197
EN 43367.256303 83
EX 86426.611619 24
salaries["median_by_comp_size"] = salaries.groupby("Company_Size") \
["Salary_USD"].transform(lambda x: x.median())
print(salaries[["Company_Size", "median_by_comp_size"]].head())
Company_Size median_by_comp_size
0 S 60833.424
1 M 105914.964
2 S 60833.424
3 L 95483.400
4 L 95483.400
Exploratory Data Analysis in Python