This vignette is meant to serve as an introduction to {slider}. In
it, you’ll learn about the three core functions in the package:
`slide()`

, `slide_index()`

, and
`slide_period()`

, along with their many variants.

slider is a package for rolling analysis using window functions. “Window functions” is a term that I’ve borrowed from SQL that means that some function is repeatedly applied to different “windows” of your data as you step through it. Typical examples of applications of window functions include rolling averages, cumulative sums, and more complex things such as rolling regressions.

## slide()

To better understand window functions, we’ll turn to our first core
function, `slide()`

. `slide()`

is a bit like
`purrr::map()`

. You supply a vector to slide over,
`.x`

, and a function to apply to each window,
`.f`

. With those two things alone, `slide()`

is
almost identical to `map()`

.

```
slide(1:4, ~.x)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2
#>
#> [[3]]
#> [1] 3
#>
#> [[4]]
#> [1] 4
```

On top of this, you can control the size and placement of the window
by using the additional arguments to `slide()`

. For example,
you can ask for a window of size 3 containing “the current element, as
well as the 2 before it” like this:

```
slide(1:4, ~.x, .before = 2)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 1 2
#>
#> [[3]]
#> [1] 1 2 3
#>
#> [[4]]
#> [1] 2 3 4
```

You’ll notice that the first two elements of the list contain partial
or “incomplete” windows. By default, `slide()`

assumes that
you want to compute on these windows anyways, but if you don’t care
about them, you can change the `.complete`

argument.

```
slide(1:4, ~.x, .before = 2, .complete = TRUE)
#> [[1]]
#> NULL
#>
#> [[2]]
#> NULL
#>
#> [[3]]
#> [1] 1 2 3
#>
#> [[4]]
#> [1] 2 3 4
```

`slide()`

is *size stable*, so you always get an
output that is the same size as your input. Because of that, the partial
results have been replaced by the corresponding missing value. For a
list, that is `NULL`

.

Sometimes, changing the placement of the window is a critical part of
your calculation. For example, you might want a “center alignment” where
you have an equal number of values before and after the current element.
To accomplish this, you can combine the `.before`

argument
with `.after`

to get a centered window. Here we ask for a
window of size 3 containing “the current element, as well as 1 element
before and 1 element after”. It is “centered” because in position 2 we
have a complete window of the current element (2), along with one
element before (1) and one after (3).

```
slide(1:4, ~.x, .before = 1, .after = 1)
#> [[1]]
#> [1] 1 2
#>
#> [[2]]
#> [1] 1 2 3
#>
#> [[3]]
#> [1] 2 3 4
#>
#> [[4]]
#> [1] 3 4
```

`slide()`

can also perform *expanding* windows.
These are the type that allow *cumulative* operations to work. In
prose, an expanding window would be “the current element, along with
every element before this one”. To construct this kind of window, you
can set `.before`

to `Inf`

.

```
slide(1:4, ~.x, .before = Inf)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 1 2
#>
#> [[3]]
#> [1] 1 2 3
#>
#> [[4]]
#> [1] 1 2 3 4
```

`slide()`

is *type-stable*, meaning that it always
returns an object of the same type, and the base form of
`slide()`

always returns a list. So far, this is all that we
have used to illustrate how it works, but practically you are more
likely to use one of the suffixed forms like `slide_dbl()`

or
`slide_int()`

. For example, you might have a vector of sales
data that you want to compute a 3 value moving average on.

## slide_index()

To make things a bit more interesting, let’s assume that the sales vector from the example above is also tied to some “index”, like a date vector of when the sale actually occurred.

```
index_vec <- as.Date("2019-08-29") + c(0, 1, 5, 6)
wday_vec <- as.character(wday(index_vec, label = TRUE))
company <- tibble(
sales = sales_vec,
index = index_vec,
wday = wday_vec
)
company
#> # A tibble: 4 × 3
#> sales index wday
#> <dbl> <date> <chr>
#> 1 2 2019-08-29 Thu
#> 2 4 2019-08-30 Fri
#> 3 6 2019-09-03 Tue
#> 4 2 2019-09-04 Wed
```

This index is increasing but irregular, meaning that we “jumped” from
Friday to Tuesday because there were no sales between those dates. For
the purpose of this example, let’s assume that this is an online company
where it is perfectly reasonable that you *could* have sales on
both Saturday and Sunday (If your use case requires that you “skip over”
weekends and even holidays, you might like {almanac}).

A reasonable business question to ask would be to compute a *3
day* moving average. Is this different from the 3 value moving
average we computed before? Here is the expected result, side by side
with the 3 value one computed using `slide_dbl()`

from
before.

```
#> # A tibble: 4 × 5
#> sales index wday roll_val roll_day
#> <dbl> <date> <chr> <dbl> <dbl>
#> 1 2 2019-08-29 Thu 2 2
#> 2 4 2019-08-30 Fri 3 3
#> 3 6 2019-09-03 Tue 4 6
#> 4 2 2019-09-04 Wed 4 4
```

The difference shows up in the third row, when computing the 3 day
moving average looking back from Tuesday. To understand why they are
different, consider what `slide_dbl()`

does. It uses the
`sales`

column and looks at the “current row, along with two
rows before it” to compute the result. When you are on row 3, this would
select rows 1-3 giving the date range of `[Thu, Tue]`

, which
isn’t 3 days. The correct answer would have been to look back 2 days
from Tuesday, not 2 rows from row 3. This would have given us the date
window of `[Sun, Tue]`

, and only values in that range should
be included in the moving average calculation for row 3. The only row in
that range is row 3, so we should just be averaging the single value of
`6`

to get our result.

`slide_dbl()`

doesn’t give us what we want because it is
*unaware of the index column*. It just looks back a set number of
values. What we need is a function that “knows” about the
`index`

and can adjust accordingly. For that, you can use
`slide_index(.x, .i, .f, ...)`

which has a `.i`

argument to pass an index vector through.

To understand how `slide_index()`

works, take a look at
the following comparison to `slide()`

. For illustration, the
current window of the weekday vector is printed out. Notice that in
position 3, `slide()`

gives us the “wrong” result of
Thursday, Friday and Tuesday, because it just looks back 2 values.

```
wday_vec
#> [1] "Thu" "Fri" "Tue" "Wed"
slide(wday_vec, ~.x, .before = 2)
#> [[1]]
#> [1] "Thu"
#>
#> [[2]]
#> [1] "Thu" "Fri"
#>
#> [[3]]
#> [1] "Thu" "Fri" "Tue"
#>
#> [[4]]
#> [1] "Fri" "Tue" "Wed"
```

On the other hand, `slide_index()`

can be “aware” of the
irregular index vector. By passing it through as `.i`

, and by
swapping a look back period of 2 for the lubridate object of
`days(2)`

, the start of the range is computed as
`.i - days(2)`

, which correctly computes a date window of
`[Sun, Tue]`

for the third element, so that we only capture
Tuesday in the window.

```
slide_index(wday_vec, index_vec, ~.x, .before = days(2))
#> [[1]]
#> [1] "Thu"
#>
#> [[2]]
#> [1] "Thu" "Fri"
#>
#> [[3]]
#> [1] "Tue"
#>
#> [[4]]
#> [1] "Tue" "Wed"
```

Knowing this, we can swap out `slide_dbl()`

for
`slide_index_dbl()`

to see how to correctly compute our 3 day
rolling average.

```
mutate(
company,
roll_val = slide_dbl(sales, mean, .before = 2),
roll_day = slide_index_dbl(sales, index, mean, .before = days(2))
)
#> # A tibble: 4 × 5
#> sales index wday roll_val roll_day
#> <dbl> <date> <chr> <dbl> <dbl>
#> 1 2 2019-08-29 Thu 2 2
#> 2 4 2019-08-30 Fri 3 3
#> 3 6 2019-09-03 Tue 4 6
#> 4 2 2019-09-04 Wed 4 4
```

## slide_period()

With `slide_index()`

, we always returned a vector of the
same size as `.x`

, and the idea was to build indices to slice
`.x`

with using “the current element of `.i`

+
some number of elements before/after it”. `slide_period()`

works a bit differently. It first breaks `.i`

up into “time
blocks” by some period (like monthly), and then uses those blocks to
define how to slide over `.x`

.

To see an example, let’s expand out our `company`

sales
data frame.

```
big_index_vec <- c(
as.Date("2019-08-30") + 0:4,
as.Date("2019-11-30") + 0:4
)
big_sales_vec <- c(2, 4, 6, 2, 8, 10, 9, 3, 5, 2)
big_company <- tibble(
sales = big_sales_vec,
index = big_index_vec
)
big_company
#> # A tibble: 10 × 2
#> sales index
#> <dbl> <date>
#> 1 2 2019-08-30
#> 2 4 2019-08-31
#> 3 6 2019-09-01
#> 4 2 2019-09-02
#> 5 8 2019-09-03
#> 6 10 2019-11-30
#> 7 9 2019-12-01
#> 8 3 2019-12-02
#> 9 5 2019-12-03
#> 10 2 2019-12-04
```

Now say we want to compute the monthly sales, and just return 1 value
per month. Since we have 4 months, we should get 4 values back. What we
really want to do here is break the `index`

up into “time
blocks” of 1 month, and then slide over those. That’s what
`slide_period()`

does.

```
slide_period(big_company, big_company$index, "month", ~.x)
#> [[1]]
#> # A tibble: 2 × 2
#> sales index
#> <dbl> <date>
#> 1 2 2019-08-30
#> 2 4 2019-08-31
#>
#> [[2]]
#> # A tibble: 3 × 2
#> sales index
#> <dbl> <date>
#> 1 6 2019-09-01
#> 2 2 2019-09-02
#> 3 8 2019-09-03
#>
#> [[3]]
#> # A tibble: 1 × 2
#> sales index
#> <dbl> <date>
#> 1 10 2019-11-30
#>
#> [[4]]
#> # A tibble: 4 × 2
#> sales index
#> <dbl> <date>
#> 1 9 2019-12-01
#> 2 3 2019-12-02
#> 3 5 2019-12-03
#> 4 2 2019-12-04
```

Since this returns 4 values, and not the same number of values as
there are in `.x`

, it won’t fit naturally in a
`mutate()`

or `summarise()`

statement. I find the
easiest way to do this is to create a helper function that takes a data
frame and returns one with the summary result for one time block, and
then call that with `slide_period_dfr()`

.

```
monthly_summary <- function(data) {
summarise(data, index = max(index), sales = sum(sales))
}
slide_period_dfr(
big_company,
big_company$index,
"month",
monthly_summary
)
#> # A tibble: 4 × 2
#> index sales
#> <date> <dbl>
#> 1 2019-08-31 6
#> 2 2019-09-03 16
#> 3 2019-11-30 10
#> 4 2019-12-04 19
```

Now you might be thinking, “I can do that with dplyr and lubridate!”, and you’d be right:

```
big_company %>%
mutate(monthly = floor_date(index, "month")) %>%
group_by(monthly) %>%
summarise(sales = sum(sales))
#> # A tibble: 4 × 2
#> monthly sales
#> <date> <dbl>
#> 1 2019-08-01 6
#> 2 2019-09-01 16
#> 3 2019-11-01 10
#> 4 2019-12-01 19
```

But here is where things get interesting! Now what if we want to
compute those monthly sales, but we want the time blocks to be made of
the “current month block, plus 1 month block before it”. For example,
for the month of `2019-09`

, it would include
`2019-08`

and `2019-09`

together in the rolling
summary. There isn’t an easy way to do this in dplyr alone. With slider,
there are two ways to do this.

The first is with `slide_period_dfr()`

, and it is as easy
as adding `.before = 1`

, to select the current month block
and 1 before it.

```
slide_period_dfr(
big_company,
big_company$index,
"month",
monthly_summary,
.before = 1
)
#> # A tibble: 4 × 2
#> index sales
#> <date> <dbl>
#> 1 2019-08-31 6
#> 2 2019-09-03 22
#> 3 2019-11-30 10
#> 4 2019-12-04 29
```

Depending on your use case, you might want to append these results as
a new column in `big_company`

. To do this, we can instead go
back to using `floor_date()`

to generate monthly groupings,
and slide over them using `slide_index_dbl()`

with a lookback
period of 1 month.

```
big_company %>%
mutate(
monthly = floor_date(index, "month"),
sales_summary = slide_index_dbl(sales, monthly, sum, .before = months(1))
)
#> # A tibble: 10 × 4
#> sales index monthly sales_summary
#> <dbl> <date> <date> <dbl>
#> 1 2 2019-08-30 2019-08-01 6
#> 2 4 2019-08-31 2019-08-01 6
#> 3 6 2019-09-01 2019-09-01 22
#> 4 2 2019-09-02 2019-09-01 22
#> 5 8 2019-09-03 2019-09-01 22
#> 6 10 2019-11-30 2019-11-01 10
#> 7 9 2019-12-01 2019-12-01 29
#> 8 3 2019-12-02 2019-12-01 29
#> 9 5 2019-12-03 2019-12-01 29
#> 10 2 2019-12-04 2019-12-01 29
```