Data opschonen in Python
Adel Nehme
VP of AI Curriculum, DataCamp
import pandas as pd
flights = pd.read_csv('flights.csv')
flights.head()
flight_number economy_class business_class first_class total_passengers
0 DL140 100 60 40 200
1 BA248 130 100 70 300
2 MEA124 100 50 50 200
3 AFR939 140 70 90 300
4 TKA101 130 100 20 250
Het gebruik van meerdere velden in een dataset om de dataconsistentie te checken
flight_number economy_class business_class first_class total_passengers
0 DL140 100 + 60 + 40 = 200
1 BA248 130 + 100 + 70 = 300
2 MEA124 100 + 50 + 50 = 200
3 AFR939 140 + 70 + 90 = 300
4 TKA101 130 + 100 + 20 = 250
sum_classes = flights[['economy_class', 'business_class', 'first_class']].sum(axis = 1)passenger_equ = sum_classes == flights['total_passengers']# Zoek en filter rijen met inconsistente passagierstotalen inconsistent_pass = flights[~passenger_equ] consistent_pass = flights[passenger_equ]
users.head()
user_id Age Birthday
0 32985 22 1998-03-02
1 94387 27 1993-12-04
2 34236 42 1978-11-24
3 12551 31 1989-01-03
4 55212 18 2002-07-02
import pandas as pd import datetime as dt # Converteren naar datetime en de datum van vandaag ophalen users['Birthday'] = pd.to_datetime(users['Birthday'])today = dt.date.today()# Voor elke rij in de kolom Birthday het jaarsverschil berekenen age_manual = today.year - users['Birthday'].dt.year# Vind gevallen waar leeftijden overeenkomen age_equ = age_manual == users['Age']# Zoek en filter rijen met inconsistente leeftijd inconsistent_age = users[~age_equ] consistent_age = users[age_equ]

Data opschonen in Python