Gestroomlijnde data-inname met pandas
Amany Mahfouz
Instructor
parse_dates aan dat kolommen datetimes zijn (niet met dtype!)parse_dates accepteert:



# List columns of dates to parse date_cols = ["Part1StartTime", "Part1EndTime"]# Load file, parsing standard datetime columns survey_df = pd.read_excel("fcc_survey.xlsx", parse_dates=date_cols)
# Check data types of timestamp columns
print(survey_df[["Part1StartTime",
"Part1EndTime",
"Part2StartDate",
"Part2StartTime",
"Part2EndTime"]].dtypes)
Part1StartTime datetime64[ns]
Part1EndTime datetime64[ns]
Part2StartDate object
Part2StartTime object
Part2EndTime object
dtype: object
# List columns of dates to parse date_cols = ["Part1StartTime", "Part1EndTime",[["Part2StartDate", "Part2StartTime"]]]# Load file, parsing standard and split datetime columns survey_df = pd.read_excel("fcc_survey.xlsx", parse_dates=date_cols)print(survey_df.head(3))
Part2StartDate_Part2StartTime Age ... SchoolMajor StudentDebtOwe
0 2016-03-29 21:24:57 28.0 ... NaN 20000
1 2016-03-29 21:27:14 22.0 ... NaN NaN
2 2016-03-29 21:27:13 19.0 ... NaN NaN
[3 rows x 98 columns]
# List columns of dates to parse date_cols = {"Part1Start": "Part1StartTime", "Part1End": "Part1EndTime", "Part2Start": ["Part2StartDate", "Part2StartTime"]}# Load file, parsing standard and split datetime columns survey_df = pd.read_excel("fcc_survey.xlsx", parse_dates=date_cols) print(survey_df.Part2Start.head(3))
0 2016-03-29 21:24:57
1 2016-03-29 21:27:14
2 2016-03-29 21:27:13
Name: Part2Start, dtype: datetime64[ns]
parse_dates werkt niet met niet-standaard datetime-formatenpd.to_datetime() als parse_dates niet werktto_datetime():format: stringvoorstelling van het datetime-formaat| Code | Betekenis | Voorbeeld |
|---|---|---|
%Y |
Jaar (4 cijfers) | 1999 |
%m |
Maand (met nul opgevuld) | 03 |
%d |
Dag (met nul opgevuld) | 01 |
%H |
Uur (24-uursklok) | 21 |
%M |
Minuut (met nul opgevuld) | 09 |
%S |
Seconde (met nul opgevuld) | 05 |

format_string = "%m%d%Y %H:%M:%S"survey_df["Part2EndTime"] = pd.to_datetime(survey_df["Part2EndTime"], format=format_string)
print(survey_df.Part2EndTime.head())
0 2016-03-29 21:27:25
1 2016-03-29 21:29:10
2 2016-03-29 21:28:21
3 2016-03-29 21:30:51
4 2016-03-29 21:31:54
Name: Part2EndTime, dtype: datetime64[ns]
Gestroomlijnde data-inname met pandas