Datenvalidierung

Explorative Datenanalyse in Python

Izzy Weber

Curriculum Manager, DataCamp

Validierung von Datentypen

books.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 350 entries, 0 to 349
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
--   ------  --------------  -----  
 0   name    350 non-null    object 
 1   author  350 non-null    object 
 2   rating  350 non-null    float64
 3   year    350 non-null    float64 
 4   genre   350 non-null    object 
dtypes: float64(1), int64(1), object(3)
memory usage: 13.8+ KB
books.dtypes
name       object
author     object
rating    float64
year      float64
genre      object
dtype: object
Explorative Datenanalyse in Python

Aktualisierung von Datentypen

books["year"] = books["year"].astype(int)

books.dtypes
name       object
author     object
rating    float64
year        int64
genre      object
dtype: object
Explorative Datenanalyse in Python

Aktualisierung von Datentypen

Typ Name in Python
Zeichenkette str
Ganzzahl int
Fließkommazahl float
Zuordnungstabelle dict
Liste list
Wahrheitswert bool
Explorative Datenanalyse in Python

Validierung kategorialer Daten

books["genre"].isin(["Fiction", "Non Fiction"])
0       True
1       True
2       True
3       True
4      False
       ...  
345     True
346     True
347     True
348     True
349    False
Name: genre, Length: 350, dtype: bool
Explorative Datenanalyse in Python

Validierung kategorialer Daten

~books["genre"].isin(["Fiction", "Non Fiction"])
0      False
1      False
2      False
3      False
4       True
       ...  
345    False
346    False
347    False
348    False
349     True
Name: genre, Length: 350, dtype: bool
Explorative Datenanalyse in Python

Validierung kategorialer Daten

books[books["genre"].isin(["Fiction", "Non Fiction"])].head()
|   |                          name |              author | rating | year |       genre |
|---|-------------------------------|---------------------|--------|------|-------------|
| 0 | 10-Day Green Smoothie Cleanse |            JJ Smith |    4.7 | 2016 | Non Fiction |
| 1 |             11/22/63: A Novel |        Stephen King |    4.6 | 2011 |     Fiction |
| 2 |             12 Rules for Life |  Jordan B. Peterson |    4.7 | 2018 | Non Fiction |
| 3 |        1984 (Signet Classics) |       George Orwell |    4.7 | 2017 |     Fiction |
| 5 |         A Dance with Dragons  | George R. R. Martin |    4.4 | 2011 |     Fiction |
Explorative Datenanalyse in Python

Validierung numerischer Daten

books.select_dtypes("number").head()
|   | rating | year |
|---|--------|------|
| 0 |    4.7 | 2016 |
| 1 |    4.6 | 2011 |
| 2 |    4.7 | 2018 |
| 3 |    4.7 | 2017 |
| 4 |    4.8 | 2019 |
Explorative Datenanalyse in Python

Validierung numerischer Daten

books["year"].min()
2009
books["year"].max()
2019
sns.boxplot(data=books, x="year")
plt.show()

Ein Boxplot der Erscheinungsjahre zu den Buchdaten

Explorative Datenanalyse in Python

Validierung numerischer Daten

sns.boxplot(data=books, x="year", y="genre")

Ein Boxplot der Buchdaten, sortiert nach Genre

Explorative Datenanalyse in Python

Lass uns üben!

Explorative Datenanalyse in Python

Preparing Video For Download...