Updated 8/24/2006

What do these plots show?
There are two plots, one for the morning low (00z-12z) and one for the afternoon high (12z-00z) at the current CWFC station.   The blue line is a continuous forecast probability distribution based on the 16 member 00z GFS ensemble MOS.   Just above the abscissa, the triangles show the forecast values for each member: light blue for the high-resolution operational GFS, green for the lower resolution GFS control, and red for the 14 perturbed members (also lower resolution).   The black dashed line shows the median of the forecast probability distribution, the optimal deterministic forecast for the absolute error-based scoring used in the CWFC (see theory section below).   The optimality of the median can be seen from the grey line, which shows the expected absolute error as a function of forecasted temperature value.   The expected absolute error is minimized for a forecast of the median.

How can one use them?
First some cautionary notes:
1) This will not pick up on high temperatures that happen at night or low temps that occur at the very end of the forecast window (6z on the 'second' night)
2) This does not account for state-dependent model error.   If the GFS operational model can't adequately handle, say, lows moving out of the Rockies, the ensemble members won't either.
3) The forecast is from 00z, so it is not as new as the MOS products you'll have available from 12z or even 18z.

Even so, the probabilistic interpretation of the forecast is quite powerful.   Note how wide the forecast probability distribution is and the shape it takes on (Gaussian, skewed, bimodal, etc.), and try to justify why the distribution might have these characteristics.   For instance, a very wide high temperature forecast probability distribution may be due to uncertainty in the exact timing of a frontal passage during the day.   Also look to see if the control forecast is an outlier with respect to the distribution; this is a hint that the operational GFS forecast (even newer 12z or 18z versions) probably is not on the right track (note that the only difference between the operational and control runs is model resolution).   In such a situation, maybe ETA or NGM MOS is more consistent with the ensemble members?   Check those other models to see where they fall relative to the distribution.   Finally, the easiest thing to do is just pick off the median value and use that as a forecast!   One can think of it as just another deterministic guidance product, just like the conventional MOS.   The risk of deviating from the median can be evaluated using the grey expected absolute error curve.   Perhaps a forecast other than the median is worthwhile if you are in a gambling mood, or suspect the forecast probability distribution is underestimating or overestimating the probability of certain outcomes.

Theory and other details
For the 2005-2006 contest season, the method used to formulate a continuous forecast probability distribution based on a discrete set of ensemble members was what i call the 'trained Gaussian mixture model' approach.   In this approach, the forecast probability distribution is composed of a weighted sum of component Gaussian distributions, which are assigned to each ensemble member.   Each component Gaussian distribution has a mean value of the ensemble member forecast.   The standard deviations of the component distributions, and their weights in the sum to make the probability distribution, are decided through a parameter estimation procedure.   For a set of training data (past ensemble forecasts and observations) the estimation procedure finds the set of weights and standard deviations that maximize the likelihood of the verifications given the forecast probability distributions.   The training data used was nine months of non-summer Boston GFS ensemble MOS forecasts and observations.   The weights and standard deviations derived from this training set were likely suboptimal for other cities and for the summer, but my verification study of the forecast probability distributions showed that they reasonably quantified the forecast uncertainty (at least for high temperature; see link on last page).

During May 2006, NCEP performed a major upgrade on their GFS ensemble, increasing the number of perturbed members from 10 to 14 and significantly changing the manner in which initial ensemble perturbations are formulated.   There has not been sufficient time to build up a set of training data for this new system to calculate a new set of component distribution parameters.   Thus, I have decided to revert to a simpler and relatively ad hoc strategy to define the parameters.   For a given forecast variable, I have set all the component distribution weights equal and all the component distribution standard deviations equal.   This defines the weights, but external input is still needed to define the standard deviation value for each variable.   Such external input comes from last season's verification study, where I related expected absolute error (EAE) of the forecast distribution medians to realized absolute error (AE) of those medians.   For a sample of forecast distributions with similar EAEs, the mean EAE should be roughly the same as the mean AE.   I have set the component distribution standard deviations to try to better establish this relationship at the small-EAE limit, assuming last year's AEs would be representative of this year's.   A close EAE-AE relationship does not mean that the forecast probability distributions are perfect (they could be systematically biased and overdispersed, for instance), but it is a good starting point.

So, why is the median of the forecast probability distribution marked on the plot instead of the mean?   Well, given an arbitrary forecast probability distribution, it is the median of this distribution that is the optimal deterministic forecast, if the verification system is based on absolute error.  This result is quite straightforward to derive.  Just calculate the expected value of the absolute error according to the probability distribution, assuming that it represents the probability distribution of the future observation.  Regardless of the shape of the distribution, it is the median that mimimizes expected absolute error.  Of course, the forecast probability distribution is not the true forecast probability distribution, so this throws a bit of a wrench in the works.  Postprocessing that removes any model bias is necessary (essentially update the distribution with more information); fortunately the MOS technique should handle this.  Errors in the shape of the distribution are fairly unavoidable at this point.  Note however, that the median is a robust measure of the center of the distribution that should resist shifting due to modest changes in the shape of the distribution.   So, if a biased corrected ensemble can get the shape of the forecast distribution reasonably well, the median should be a near-optimal deterministic forecast, under absolute error verification.

---------------------------
Jon Moskaitis
jonmosk@mit.edu
---------------------------