Data opschonen in Python
Adel Nehme
VP of AI Curriculum, DataCamp

| Kolom | Eenheid |
|---|---|
| Temperature | 32°C is ook 89.6°F |
| Weight | 70 Kg is ook 11 st. |
| Date | 26-11-2019 is ook 26, November, 2019 |
| Money | 100$ is ook 10763.90¥ |
temperatures = pd.read_csv('temperature.csv')
temperatures.head()
Date Temperature
0 03.03.19 14.0
1 04.03.19 15.0
2 05.03.19 18.0
3 06.03.19 16.0
4 07.03.19 62.6
temperatures = pd.read_csv('temperature.csv')
temperatures.head()
Date Temperature
0 03.03.19 14.0
1 04.03.19 15.0
2 05.03.19 18.0
3 06.03.19 16.0
4 07.03.19 62.6 <--
# Import matplotlib import matplotlib.pyplot as plt# Maak een scatterplot plt.scatter(x = 'Date', y = 'Temperature', data = temperatures)# Titel, x- en y-label plt.title('Temperature in Celsius March 2019 - NYC') plt.xlabel('Dates') plt.ylabel('Temperature in Celsius')# Toon plot plt.show()


$$C = (F - 32) \times \frac{5}{9}$$
temp_fah = temperatures.loc[temperatures['Temperature'] > 40, 'Temperature']temp_cels = (temp_fah - 32) * (5/9)temperatures.loc[temperatures['Temperature'] > 40, 'Temperature'] = temp_cels
# Controleer of de conversie klopt
assert temperatures['Temperature'].max() < 40
birthdays.head()
Birthday First name Last name
0 27/27/19 Rowan Nunez
1 03-29-19 Brynn Yang
2 March 3rd, 2019 Sophia Reilly
3 24-03-19 Deacon Prince
4 06-03-19 Griffith Neal
birthdays.head()

datetime is handig voor datums
| Datum | datetime-formaat |
|---|---|
| 25-12-2019 | %d-%m-%Y |
| December 25th 2019 | %c |
| 12-25-2019 | %m-%d-%Y |
| ... | ... |
pandas.to_datetime()
# Converteert naar datetime - maar dit werkt niet!
birthdays['Birthday'] = pd.to_datetime(birthdays['Birthday'])
ValueError: month must be in 1..12
# Dit werkt wel!
birthdays['Birthday'] = pd.to_datetime(birthdays['Birthday'],
# Geef NA terug als conversie faalt
errors = 'coerce')
birthdays.head()
Birthday First name Last name
0 NaT Rowan Nunez
1 2019-03-29 Brynn Yang
2 2019-03-03 Sophia Reilly
3 2019-03-24 Deacon Prince
4 2019-06-03 Griffith Neal
birthdays['Birthday'] = birthdays['Birthday'].dt.strftime("%d-%m-%Y")
birthdays.head()
Birthday First name Last name
0 NaT Rowan Nunez
1 29-03-2019 Brynn Yang
2 03-03-2019 Sophia Reilly
3 24-03-2019 Deacon Prince
4 03-06-2019 Griffith Neal
Is 2019-03-08 in augustus of maart?
NA en behandel ditData opschonen in Python