Membuat fitur baru

Analisis Data Eksploratif di Python

George Boorman

Curriculum Manager, DataCamp

Korelasi

sns.heatmap(planes.corr(numeric_only=True), annot=True)
plt.show()

Heatmap yang menunjukkan koefisien korelasi Pearson 0,54 antara Price dan Duration

Analisis Data Eksploratif di Python

Melihat tipe data

print(planes.dtypes)
Airline                    object
Date_of_Journey    datetime64[ns]
Source                     object
Destination                object
Route                      object
Dep_Time           datetime64[ns]
Arrival_Time       datetime64[ns]
Duration                  float64
Total_Stops                object
Additional_Info            object
Price                     float64
dtype: object
Analisis Data Eksploratif di Python

Total perhentian

print(planes["Total_Stops"].value_counts())
1 stop      4107
non-stop    2584
2 stops     1127
3 stops       29
4 stops        1
Name: Total_Stops, dtype: int64
Analisis Data Eksploratif di Python

Membersihkan total perhentian

planes["Total_Stops"] = planes["Total_Stops"].str.replace(" stops", "")

planes["Total_Stops"] = planes["Total_Stops"].str.replace(" stop", "")
planes["Total_Stops"] = planes["Total_Stops"].str.replace("non-stop", "0")
planes["Total_Stops"] = planes["Total_Stops"].astype(int)
Analisis Data Eksploratif di Python

Korelasi

sns.heatmap(planes.corr(numeric_only=True), annot=True)
plt.show()

Heatmap yang menunjukkan korelasi Pearson 0,62 antara Price dan Total Stops serta 0,74 antara Duration dan Total Stops

Analisis Data Eksploratif di Python

Tanggal

print(planes.dtypes)
Airline                    object
Date_of_Journey    datetime64[ns]
Source                     object
Destination                object
Route                      object
Dep_Time           datetime64[ns]
Arrival_Time       datetime64[ns]
Duration                  float64
Total_Stops                 int64
Additional_Info            object
Price                     float64
dtype: object
Analisis Data Eksploratif di Python

Ekstrak bulan dan hari

planes["month"] = planes["Date_of_Journey"].dt.month

planes["weekday"] = planes["Date_of_Journey"].dt.weekday
print(planes[["month", "weekday", "Date_of_Journey"]].head())
   month  weekday   Date_of_Journey
0      9        4        2019-09-06
1     12        3        2019-12-05
2      1        3        2019-01-03
3      6        0        2019-06-24
4     12        1        2019-12-03
Analisis Data Eksploratif di Python

Waktu berangkat dan tiba

planes["Dep_Hour"] = planes["Dep_Time"].dt.hour
planes["Arrival_Hour"] = planes["Arrival_Time"].dt.hour
Analisis Data Eksploratif di Python

Korelasi

Heatmap yang menunjukkan tidak ada hubungan antara atribut datetime dan harga

Analisis Data Eksploratif di Python

Membuat kategori

print(planes["Price"].describe())
count     7848.000000
mean      9035.413609
std       4429.822081
min       1759.000000
25%       5228.000000
50%       8355.000000
75%      12373.000000
max      54826.000000
Name: Price, dtype: float64
Rentang Jenis tiket
<= 5228 Ekonomi
> 5228 <= 8355 Ekonomi Premium
> 8335 <= 12373 Bisnis
> 12373 Kelas Utama
Analisis Data Eksploratif di Python

Statistik deskriptif

twenty_fifth = planes["Price"].quantile(0.25)

median = planes["Price"].median()
seventy_fifth = planes["Price"].quantile(0.75)
maximum = planes["Price"].max()
Analisis Data Eksploratif di Python

Label dan bin

labels = ["Economy", "Premium Economy", "Business Class", "First Class"]

bins = [0, twenty_fifth, median, seventy_fifth, maximum]
Analisis Data Eksploratif di Python

pd.cut()

Panggil pd-dot-cut

planes["Price_Category"] = pd.cut(


Analisis Data Eksploratif di Python

pd.cut()

Berikan datanya

planes["Price_Category"] = pd.cut(planes["Price"],


Analisis Data Eksploratif di Python

pd.cut()

Tetapkan label

planes["Price_Category"] = pd.cut(planes["Price"],
                                  labels=labels,

Analisis Data Eksploratif di Python

pd.cut()

Sediakan bin

planes["Price_Category"] = pd.cut(planes["Price"],
                                  labels=labels,
                                  bins=bins)
Analisis Data Eksploratif di Python

Kategori harga

print(planes[["Price","Price_Category"]].head())
     Price   Price_Category
0  13882.0      First Class
1   6218.0  Premium Economy
2  13302.0      First Class
3   3873.0          Economy
4  11087.0   Business Class
Analisis Data Eksploratif di Python

Kategori harga per maskapai

sns.countplot(data=planes, x="Airline", hue="Price_Category")
plt.show()
Analisis Data Eksploratif di Python

Kategori harga per maskapai

Countplot jumlah penerbangan per maskapai per kategori harga; Jet Airways paling banyak Kelas Utama

Analisis Data Eksploratif di Python

Ayo berlatih!

Analisis Data Eksploratif di Python

Preparing Video For Download...