Working with numeric data

Exploratory Data Analysis in Python

George Boorman

Curriculum Manager, DataCamp

The original salaries dataset

print(salaries.info())
Exploratory Data Analysis in Python

The original salaries dataset

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 594 entries, 0 to 593
Data columns (total 9 columns):
 #   Column                Non-Null Count  Dtype  
--   ------                --------------  -----  
 0   Working_Year          594 non-null    int64  
 1   Designation           567 non-null    object 
 2   Experience            561 non-null    object 
 3   Employment_Status     563 non-null    object 
 4   Salary_In_Rupees      566 non-null    object 
 5   Employee_Location     554 non-null    object 
 6   Company_Location      570 non-null    object 
 7   Company_Size          535 non-null    object 
 8   Remote_Working_Ratio  571 non-null    float64
dtypes: float64(1), int64(1), object(7)
memory usage: 41.9+ KB
None
Exploratory Data Analysis in Python

Salary in rupees

print(salaries["Salary_In_Rupees"].head())
0    20,688,070.00
1     8,674,985.00
2     1,591,390.00
3    11,935,425.00
4     5,729,004.00
Name: Salary_In_Rupees, dtype: object
Exploratory Data Analysis in Python

Converting strings to numbers

  • Remove comma values in Salary_In_Rupees

 

  • Convert the column to float data type

 

  • Create a new column by converting the currency

Dollar bills

Exploratory Data Analysis in Python

Converting strings to numbers

pd.Series.str.replace("characters to remove", "characters to replace them with")
salaries["Salary_In_Rupees"] = salaries["Salary_In_Rupees"].str.replace(",", "")

print(salary["Salary_In_Rupees"].head())
1    20688070.00
2     8674985.00
3     1591390.00
4    11935425.00
5     5729004.00
Name: Salary_In_Rupees, dtype: object
Exploratory Data Analysis in Python

Converting strings to numbers

salaries["Salary_In_Rupees"] = salaries["Salary_In_Rupees"].astype(float)
  • 1 Indian Rupee = 0.012 US Dollars
salaries["Salary_USD"] = salaries["Salary_In_Rupees"] * 0.012
Exploratory Data Analysis in Python

Previewing the new column

print(salaries[["Salary_In_Rupees", "Salary_USD"]].head())
   Salary_In_Rupees  Salary_USD
0        20688070.0  248256.840
1         8674985.0  104099.820
2         1591390.0   19096.680
3        11935425.0  143225.100
4         5729004.0   68748.048
Exploratory Data Analysis in Python

Adding summary statistics into a DataFrame

salaries.groupby("Company_Size")["Salary_USD"].mean()
Company_Size
L    111934.432174
M    110706.628527
S     69880.980179
Name: Salary_USD, dtype: float64
Exploratory Data Analysis in Python

Adding summary statistics into a DataFrame

Group by the Experience column

salaries["std_dev"] = salaries.groupby("Experience")
Exploratory Data Analysis in Python

Adding summary statistics into a DataFrame

Then select the Salary_USD column

salaries["std_dev"] = salaries.groupby("Experience")["Salary_USD"]
Exploratory Data Analysis in Python

Adding summary statistics into a DataFrame

Call the pandas dot-transform method

salaries["std_dev"] = salaries.groupby("Experience")["Salary_USD"].transform(
Exploratory Data Analysis in Python

Adding summary statistics into a DataFrame

Apply a lambda function

salaries["std_dev"] = salaries.groupby("Experience")["Salary_USD"].transform(lambda x: x.std())
Exploratory Data Analysis in Python

Adding summary statistics into a DataFrame

print(salaries[["Experience", "std_dev"]].value_counts())
Experience         std_dev
SE                 52995.385395        257
MI                 63217.397343        197
EN                 43367.256303         83
EX                 86426.611619         24
Exploratory Data Analysis in Python

Adding summary statistics into a DataFrame

salaries["median_by_comp_size"] = salaries.groupby("Company_Size") \
                                        ["Salary_USD"].transform(lambda x: x.median())
print(salaries[["Company_Size", "median_by_comp_size"]].head())
     Company_Size     median_by_comp_size
0    S                60833.424
1    M                105914.964
2    S                60833.424
3    L                95483.400
4    L                95483.400
Exploratory Data Analysis in Python

Let's practice!

Exploratory Data Analysis in Python

Preparing Video For Download...