Resampling and aggregating observations

Manipolare dati di serie temporali in R

Harrison Brown

Graduate Researcher in Geography

Sampling frequency

Frequency:

  • Number of observations per year
  • e.g., weekly, daily, monthly, ...

Temporal resolution:

  • "High resolution" sampled often
  • "Low resolution" sampled infrequently
  • "High" and "low" are subjective

Conceptual diagram showing two grids: one with a low resolution and one with a high resolution. This diagram highlights the idea of data sampled more frequently or less frequently.

Manipolare dati di serie temporali in R

Aggregation

  • High resolution -> low resolution
  • Applies a function like mean, sum, max to the chosen interval
  • e.g.:
    • Monthly sum of daily data
    • Weekly mean of hourly values
    • ...
  • Cannot 'reverse' aggregation
  • Monthly total -> daily values?
  • Provides statistics to describe patterns in the data
  • Aggregation reduces information
Manipolare dati di serie temporali in R

Aggregating data with xts

xts:

  • eXtensible Time Series
  • Extends the zoo package and zoo class of objects
  • apply.*() functions
yearly_mean <-
  apply.yearly(x = maunaloa,
               FUN = mean)
autoplot(yearly_mean) + 
  labs(...)

Graph of the Mauna Loa dataset, which has been aggregated to find the yearly average of the data. Rather than in the original data, where there was strong variation within each year, this graph has a very smooth line, as the sampling resolution is much lower.

Manipolare dati di serie temporali in R

Aggregating data with xts

Graph of the Mauna Loa time series, which represents carbon dioxide concentrations sampled weekly. The graph has a general upwards trend, with a periodic, seasonal peak each year.

Graph of the Mauna Loa dataset, which has been aggregated to find the yearly average of the data. Rather than in the original data, where there was strong variation within each year, this graph has a very smooth line, as the sampling resolution is much lower.

Manipolare dati di serie temporali in R

apply-dot functions

daily_total <-
  apply.daily(hourly_sales,
              FUN = sum)
weekly_max <-
  apply.weekly(daily_temperature,
               FUN = max)
monthly_average <-
  apply.monthly(daily_price,
               FUN = mean)
apply.quarterly(sales_report,
                FUN = sum)
apply.yearly(monthly_salary,
             FUN = sum)
Manipolare dati di serie temporali in R

Endpoints and period.apply

xts::endpoints()

xts::period.apply()

biweekly_eps <-
  endpoints(x = daily_data,
            on = "weeks",
            k = 2)
biweekly_data <-
  period.apply(x = daily_data,
               INDEX = biweekly_eps,
               FUN = mean)
biweekly_data
2002-05-05 8.148611
2002-05-19 8.146776
2002-06-02 8.060020
2002-06-16 8.028224
2002-06-30 7.944792
2002-07-14 7.930159
...
Manipolare dati di serie temporali in R

Let's practice!

Manipolare dati di serie temporali in R

Preparing Video For Download...