Guide to GFS Ensemble MOS Forecast Probability Distributions
Jon Moskaitis
8/24/06
What do these plots show?
There are two plots, one for the morning low (00z-12z) and one for the
afternoon high (12z-00z) at the current CWFC station. The blue line
is a continuous forecast probability distribution based on the 16
member 00z GFS ensemble MOS. Just above the abscissa, the triangles
show the forecast values for each member: light blue for the
high-resolution operational GFS, green for the lower resolution GFS
control, and red for the 14 perturbed members (also lower resolution).
The black dashed line shows the median of the forecast probability
distribution, the optimal deterministic forecast for the absolute
error-based scoring used in the CWFC (see theory section below). The
optimality of the median can be seen from the grey line, which shows
the expected absolute error as a function of forecasted temperature
value. The expected absolute error is minimized for a forecast of the
median.
How can one use them?
First some cautionary notes:
1)
This will not pick up on high temperatures that happen at night or low
temps that occur at the very end of the forecast window (6z on the
'second' night)
2) This does not account for state-dependent model error. If the GFS
operational model can't adequately handle, say, lows moving out of the
Rockies, the ensemble members won't either.
3) The forecast is from 00z, so it is not as new as the MOS products you'll have available from 12z or even 18z.
Even so, the probabilistic interpretation of the forecast is quite
powerful. Note how wide the forecast probability distribution is and
the shape it takes on (Gaussian, skewed, bimodal, etc.), and try to
justify why the distribution might have these characteristics. For
instance, a very wide high temperature forecast probability
distribution may be due to uncertainty in the exact timing of a frontal
passage during the day. Also look to see if the control forecast is
an outlier with respect to the distribution; this is a hint that the
operational GFS forecast (even newer 12z or 18z versions) probably is
not on the right track (note that the only difference between the
operational and control runs is model resolution). In such a
situation, maybe ETA or NGM MOS is more consistent with the ensemble
members? Check those other models to see where they fall relative to
the distribution. Finally, the easiest thing to do is just pick off
the median value and use that as a forecast! One can think of it as
just another deterministic guidance product, just like the conventional
MOS. The risk of deviating from the median can be evaluated using the
grey expected absolute error curve. Perhaps a forecast other than the
median is worthwhile if you are in a gambling mood, or suspect the
forecast probability distribution is underestimating or overestimating
the probability of certain outcomes.
Theory and other details
For
the 2005-2006 contest season, the method used to formulate a continuous
forecast probability distribution based on a discrete set of ensemble
members was what i call the 'trained Gaussian mixture model' approach.
In this approach, the forecast probability distribution is composed
of a weighted sum of component Gaussian distributions, which are
assigned to each ensemble member. Each component Gaussian
distribution has a mean value of the ensemble member forecast. The
standard deviations of the component distributions, and their weights
in the sum to make the probability distribution, are decided through a
parameter estimation procedure. For a set of training data (past
ensemble forecasts and observations) the estimation procedure finds the
set of weights and standard deviations that maximize the likelihood of
the verifications given the forecast probability distributions. The
training data used was nine months of non-summer Boston GFS ensemble
MOS forecasts and observations. The weights and standard deviations
derived from this training set were likely suboptimal for other cities
and for the summer, but my verification study of the forecast
probability distributions showed that they reasonably quantified the
forecast uncertainty (at least for high temperature; see link on last
page).
During May 2006, NCEP performed a major upgrade on their GFS ensemble,
increasing the number of perturbed members from 10 to 14 and
significantly changing the manner in which initial ensemble
perturbations are formulated. There has not been sufficient time to
build up a set of training data for this new system to calculate a new
set of component distribution parameters. Thus, I have decided to
revert to a simpler and relatively ad hoc strategy to define the
parameters. For a given forecast variable, I have set all the
component distribution weights equal and all the component distribution
standard deviations equal. This defines the weights, but external
input is still needed to define the standard deviation value for each
variable. Such external input comes from last season's verification
study, where I related expected absolute error (EAE) of the forecast
distribution medians to realized absolute error (AE) of those medians.
For a sample of forecast distributions with similar EAEs, the mean
EAE should be roughly the same as the mean AE. I have set the
component distribution standard deviations to try to better establish
this relationship at the small-EAE limit, assuming last year's AEs
would be representative of this year's. A close EAE-AE relationship
does not mean that the forecast probability distributions are perfect
(they could be systematically biased and overdispersed, for instance),
but it is a good starting point.
So, why is the median of the forecast probability distribution marked
on the plot instead of the mean? Well, given an arbitrary forecast
probability distribution, it is the median of this distribution that is
the optimal deterministic forecast, if the verification system is based
on absolute error. This result is quite straightforward to derive.
Just calculate the expected value of the absolute error according to
the probability distribution, assuming that it represents the
probability distribution of the future observation. Regardless of the
shape of the distribution, it is the median that mimimizes expected
absolute error. Of course, the forecast probability distribution is
not the true forecast probability distribution, so this throws a bit of
a wrench in the works. Postprocessing that removes any model bias is
necessary (essentially update the distribution with more information);
fortunately the MOS technique should handle this. Errors in the shape
of the distribution are fairly unavoidable at this point. Note
however, that the median is a robust measure of the center of the
distribution that should resist shifting due to modest changes in the
shape of the distribution. So, if a biased corrected ensemble can get
the shape of the forecast distribution reasonably well, the median
should be a near-optimal deterministic forecast, under absolute error
verification.