Given a forecast and set of observations, compute the Continuous Ranked Probability Score (CRPS) for every time point.
Arguments
- fcst
A forecast object (see output of
create_forecast()
).- obs
An observations data frame.
- summarize
A boolean, defaults to TRUE. If TRUE, a single number will be returned as the score for the forecast. If FALSE, a data frame with columns named
time
,val_obs
, andscore
will be returned, containing the scores for each individual time point. This can be used by plotting functions to colour-code observations, for example.- at
(Optional) A time (must be compatible with
fcst
andobs
). If specified, the score for this time point will be returned. Mutually exclusive withafter
.- after
(Optional) A number. If specified, the score at time
fcst$forecast_time + after
will be returned. Mutually exclusive withat
.
Value
If summarize
is FALSE
,
a data frame containing times, observations, and scores.
Otherwise, the score at the time speficied by either at
or after
.
Details
For a given distribution f
and observation y
,
let F
be the CDF of f
. Then the CRPS is the integral of the square
of F(x) - H(x - y)
, where H
is the Heaviside function.
The CDF is obtained using an emperical distribution function on the forecast data.
Grouping
If summarize=FALSE
is passed, the resulting scores will be grouped by time.
If group columns are present, the data will be grouped by the group columns before scoring.
In either case, the return value will instead be a data frame with columns for the time, observation, score, and group columns if they exist.
See vignette("casteval")
for more details.
Examples
fc <- create_forecast(list(
time=c(1,2,3),
vals=list(c(1,1,1), c(2,2,2), c(3,3,3), c(4,4,4), c(5,5,5)), forecast_time=1)
)
crps(fc, data.frame(time=1:3, val_obs=c(3,4,5)), summarize=FALSE)
#> # A tibble: 3 × 3
#> time score val_obs
#> <dbl> <dbl> <dbl>
#> 1 1 0.4 3
#> 2 2 0.6 4
#> 3 3 1.2 5
crps(fc, data.frame(time=1:3, val_obs=c(3,4,5)), at=2)
#> [1] 0.6