Dati puliti

Analizzare i dati IoT in Python

Matthias Voppichler

IT Developer

Dati mancanti

Perché mancano dati dai dispositivi IoT

  • Connessione instabile
  • Mancanza di alimentazione
  • Altri fattori esterni

Quando gestire la qualità dei dati

  • Durante la raccolta
  • In analisi
Analizzare i dati IoT in Python

Gestire i dati mancanti

Metodi per gestire i dati mancanti

  • riempi
    • media
    • mediana
    • forward-fill
    • backward-fill
  • elimina
  • interrompi l’analisi
Analizzare i dati IoT in Python

Rilevare i valori mancanti

df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 12 entries, 2018-10-15 08:00:00 to 2018-10-15 08:55:00
Data columns (total 3 columns):
temperature      8 non-null float64
humidity         8 non-null float64
precipitation    12 non-null float64
dtypes: float64(3)
memory usage: 384.0 bytes
Analizzare i dati IoT in Python

Eliminare i valori mancanti

print(df.head())
                     temperature  humidity  precipitation
timestamp                                                
2018-10-15 08:00:00         16.7      64.2            0.0
2018-10-15 08:05:00         16.6       NaN            0.0
2018-10-15 08:10:00         16.5      65.3            0.0
2018-10-15 08:15:00          NaN      65.0            0.0
2018-10-15 08:20:00         16.8      64.3            0.0
df.dropna()
                     temperature  humidity  precipitation
timestamp                                                
2018-10-15 08:00:00         16.7      64.2            0.0
2018-10-15 08:10:00         16.5      65.3            0.0
2018-10-15 08:20:00         16.8      64.3            0.0

Analizzare i dati IoT in Python

Riempire i valori mancanti

df
                     temperature  humidity  precipitation
timestamp                                                
2018-10-15 08:00:00         16.7      64.2            0.0
2018-10-15 08:05:00         16.6       NaN            0.0
2018-10-15 08:10:00         17.0      65.3            0.0
2018-10-15 08:15:00          NaN      65.0            0.0
2018-10-15 08:20:00         16.8      64.3            0.0
df.fillna(method="ffill")
                     temperature  humidity  precipitation
timestamp                                                
2018-10-15 08:00:00         16.7      64.2            0.0
2018-10-15 08:05:00         16.6      64.2            0.0
2018-10-15 08:10:00         17.0      65.3            0.0
2018-10-15 08:15:00         17.0      65.0            0.0
2018-10-15 08:20:00         16.8      64.3            0.0
Analizzare i dati IoT in Python

Misurazione interrotta


print(df.head())
timestamp            temperature  humidity
2018-10-15 00:00:00         13.5      84.7
2018-10-15 00:10:00         13.3      85.6
2018-10-15 00:20:00         12.9      88.8
2018-10-15 00:30:00         12.8      89.2
2018-10-15 00:40:00         13.0      87.7
print(df.isna().sum())
temperature    0
humidity       0
dtype: int64
df_res = df.resample("10min").last()
print(df_res.head())
timestamp            temperature  humidity
2018-10-15 00:00:00         13.5      84.7
2018-10-15 00:10:00         13.3      85.6
2018-10-15 00:20:00         12.9      88.8
2018-10-15 00:30:00         12.8      89.2
2018-10-15 00:40:00         13.0      87.7
print(df_res.isna().sum())

temperature    34
humidity       34
dtype: int64
Analizzare i dati IoT in Python

Misurazione interrotta

df_res.plot(title="Environment")

Grafico con linee interrotte: la raccolta dati si è fermata

Analizzare i dati IoT in Python

Ayo berlatih!

Analizzare i dati IoT in Python

Preparing Video For Download...