Selamat

Analisis Data Eksploratif di Python

George Boorman

Curriculum Manager, DataCamp

Inspeksi dan validasi

Histogram rating buku

books["year"] = books["year"].astype(int)
books.dtypes
name       object
author     object
rating    float64
year        int64
genre      object
dtype: object
Analisis Data Eksploratif di Python

Agregasi

books.groupby("genre").agg(
    mean_rating=("rating", "mean"),
    std_rating=("rating", "std"),
    median_year=("year", "median")
)
|  genre      | mean_rating | std_rating | median_year |
|-------------|-------------|------------|-------------|
|   Childrens |    4.780000 |   0.122370 |      2015.0 |
|     Fiction |    4.570229 |   0.281123 |      2013.0 |
| Non Fiction |    4.598324 |   0.179411 |      2013.0 |
Analisis Data Eksploratif di Python

Tangani data hilang

print(salaries.isna().sum())
Working_Year            12
Designation             27
Experience              33
Employment_Status       31
Employee_Location       28
Company_Size            40
Remote_Working_Ratio    24
Salary_USD              60
dtype: int64
Analisis Data Eksploratif di Python

Tangani data hilang

  • Hapus nilai hilang

 

  • Imputasi mean, median, modus

 

  • Imputasi per subkelompok

 

salaries_dict = salaries.groupby("Experience")["Salary_USD"].median().to_dict()
salaries["Salary_USD"] = salaries["Salary_USD"].fillna(salaries["Experience"].map(salaries_dict))
Analisis Data Eksploratif di Python

Analisis data kategorikal

salaries["Job_Category"] = np.select(conditions, 
                                     job_categories, 
                                     default="Other")

Diagram batang jumlah pekerjaan per kategori

Analisis Data Eksploratif di Python

Terapkan fungsi lambda

Terapkan fungsi lambda

salaries["std_dev"] = salaries.groupby("Experience")["Salary_USD"].transform(lambda x: x.std())
Analisis Data Eksploratif di Python

Tangani pencilan

sns.boxplot(data=salaries,
            y="Salary_USD")
plt.show()

Box plot gaji profesional data: persentil 25 di bawah kotak, persentil 50 garis tengah, persentil 75 di atas kotak

Analisis Data Eksploratif di Python

Pola seiring waktu

sns.lineplot(data=divorce, x="marriage_month", y="marriage_duration")
plt.show()

Plot garis hubungan antara bulan pernikahan dan lama pernikahan

Analisis Data Eksploratif di Python

Korelasi

sns.heatmap(divorce.corr(numeric_only=True), annot=True)
plt.show()

Heatmap korelasi perceraian

Analisis Data Eksploratif di Python

Distribusi

sns.kdeplot(data=divorce, x="marriage_duration", hue="education_man", cut=0)
plt.show()

KDE durasi pernikahan dengan hue education_man dan cut sama dengan nol

Analisis Data Eksploratif di Python

Tabulasi silang

pd.crosstab(planes["Source"], planes["Destination"],
            values=planes["Price"], aggfunc="median")
Destination  Banglore   Cochin   Delhi  Hyderabad  Kolkata  New Delhi
Source                                                               
Banglore          NaN      NaN  4823.0        NaN      NaN    10976.5
Chennai           NaN      NaN     NaN        NaN   3850.0        NaN
Delhi             NaN  10262.0     NaN        NaN      NaN        NaN
Kolkata        9345.0      NaN     NaN        NaN      NaN        NaN
Mumbai            NaN      NaN     NaN     3342.0      NaN        NaN
Analisis Data Eksploratif di Python

pd.cut()

Sediakan bins

planes["Price_Category"] = pd.cut(planes["Price"],
                                  labels=labels,
                                  bins=bins)
Analisis Data Eksploratif di Python

Data snooping

Heatmap skor koefisien korelasi untuk tiap jumlah pemberhentian

Analisis Data Eksploratif di Python

Membangun hipotesis

sns.barplot(data=planes, x="Airline", y="Duration")
plt.show()

Diagram batang durasi vs maskapai

Analisis Data Eksploratif di Python

Langkah berikutnya

Analisis Data Eksploratif di Python

Selamat!

Analisis Data Eksploratif di Python

Preparing Video For Download...