Validasi data

Analisis Data Eksploratif di Python

Izzy Weber

Curriculum Manager, DataCamp

Validasi tipe data

books.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 350 entries, 0 to 349
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
--   ------  --------------  -----  
 0   name    350 non-null    object 
 1   author  350 non-null    object 
 2   rating  350 non-null    float64
 3   year    350 non-null    float64 
 4   genre   350 non-null    object 
dtypes: float64(1), int64(1), object(3)
memory usage: 13.8+ KB
books.dtypes
name       object
author     object
rating    float64
year      float64
genre      object
dtype: object
Analisis Data Eksploratif di Python

Memperbarui tipe data

books["year"] = books["year"].astype(int)

books.dtypes
name       object
author     object
rating    float64
year        int64
genre      object
dtype: object
Analisis Data Eksploratif di Python

Memperbarui tipe data

Tipe Nama Python
String str
Integer int
Float float
Dictionary dict
List list
Boolean bool
Analisis Data Eksploratif di Python

Validasi data kategorikal

books["genre"].isin(["Fiction", "Non Fiction"])
0       True
1       True
2       True
3       True
4      False
       ...  
345     True
346     True
347     True
348     True
349    False
Name: genre, Length: 350, dtype: bool
Analisis Data Eksploratif di Python

Validasi data kategorikal

~books["genre"].isin(["Fiction", "Non Fiction"])
0      False
1      False
2      False
3      False
4       True
       ...  
345    False
346    False
347    False
348    False
349     True
Name: genre, Length: 350, dtype: bool
Analisis Data Eksploratif di Python

Validasi data kategorikal

books[books["genre"].isin(["Fiction", "Non Fiction"])].head()
|   |                          name |              author | rating | year |       genre |
|---|-------------------------------|---------------------|--------|------|-------------|
| 0 | 10-Day Green Smoothie Cleanse |            JJ Smith |    4.7 | 2016 | Non Fiction |
| 1 |             11/22/63: A Novel |        Stephen King |    4.6 | 2011 |     Fiction |
| 2 |             12 Rules for Life |  Jordan B. Peterson |    4.7 | 2018 | Non Fiction |
| 3 |        1984 (Signet Classics) |       George Orwell |    4.7 | 2017 |     Fiction |
| 5 |         A Dance with Dragons  | George R. R. Martin |    4.4 | 2011 |     Fiction |
Analisis Data Eksploratif di Python

Validasi data numerik

books.select_dtypes("number").head()
|   | rating | year |
|---|--------|------|
| 0 |    4.7 | 2016 |
| 1 |    4.6 | 2011 |
| 2 |    4.7 | 2018 |
| 3 |    4.7 | 2017 |
| 4 |    4.8 | 2019 |
Analisis Data Eksploratif di Python

Validasi data numerik

books["year"].min()
2009
books["year"].max()
2019
sns.boxplot(data=books, x="year")
plt.show()

sebuah boxplot tahun terbit pada data books

Analisis Data Eksploratif di Python

Validasi data numerik

sns.boxplot(data=books, x="year", y="genre")

sebuah boxplot data books, dipisah per genre

Analisis Data Eksploratif di Python

Ayo berlatih!

Analisis Data Eksploratif di Python

Preparing Video For Download...