Probabilistic Forecasting Made Simple
Probabilistic Forecasting is something very cool, but it is not approachable in the current state of affairs.
While researching probabilistic forecasting in a client project I managed to find a paper which opens the door to any neural network with dropout - which is the majority. That is, we can do probabilistic forecasting with essentially any network!
Darts, a brilliant timeseries library, includes a very competent probabilistic forecasting but it’s not really applicable to all models. This is the reason that I started diving into the whole space of probabilistic forecasting. A probabilistic model includes not only a raw prediction value but a distribution of possible points, which ends up with a prediction like:
Probabilistic Model by unit8/darts
Additionally models like ARIMA and ExponentialSmoothing allows to do this kind of thing very easily, simply sample running simulations of their state-spaced models with a bit of randomly sampled errors. To solve this on their deep learning models darts decided to model distribution using a Likelihood
class. What does this mean?
The model does not actually predict a value but a distribution, using Gaussian
we’d predict two values - mean
and std
.
How to do probabilistic forecasting on any deep learning model
By combining the knowledge in Deep and Confident Prediction Time Series at Uber by L. Zhi & N. Laptev (2017) with What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? by A. Kendall & Y. Gal (2017) one can conclude that it’s possible to model distributions using dropout during inference. In the Uber paper they use a special variant they call “Monte Carlo dropout”, which I don’t believe is required to achieve interesting results. Using the pure dropout-module which randomly zeroes some elements by a probability \(p\) sampling from a Bernoulli Distribution.
How do we do this?
- Activate Dropout during Inference.
- Do \(x\) predictions with a dropout probability \(p\).
- Based on these x predictions we have a distribution of data.
- Build a confidence interval from the points.
= torch.vstack([model.predict(in_data) for i in range(x)]) outs # Defined by confidence coefficients = {0.8: 1.28, 0.85: 1.44, 0.9: 1.65, 0.95: 1.96, 0.99: 2.58, 0.999: 3.29, 0.9999: 3.89} Z_TABLE # Confidence Interval with mean as line = outs.mean() mean = mean - Z_TABLE[confidence] * outs.std() lower = mean + Z_TABLE[confidence] * outs.std() upper
The possibilities
There’s a lot of possiblities, I’ll share two of our biggest ones.
1. Model Understanding (Weakness/Strength)
By returning a probabilistic forecast, i.e. a distribution/confidence interval, we can learn more about the model and its strengths/weaknesses.
In our project(s) we’ve seen that it opens a door to really figure out how to improve our models by focusing on the areas were the model is the most uncertain. This has proved to improve performance by a substantial amount which makes the effort worth it.
2. Downstream Consumer Happiness
We see that our clients trust the model further by being able to see how confident they are. Building trust between model and downstream consumer is really important to deliver an actual successful project, which once again makes the effort totally worth it!
Bonus: we also found that it opens new possibilities to chain of the inference power if you keep it in production, as your downstream tasks can now make use of a confidence interval rather than a raw data point. But the inference is very expensive compared to the usual (remember we do x predictions per prediction)!
Sources
Deep and Confident Prediction Time Series at Uber by L. Zhi & N. Laptev (2017) - https://arxiv.org/pdf/1709.01907.pdf
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? by A. Kendall & Y. Gal (2017) - https://arxiv.org/pdf/1703.04977.pdf