Wednesday, June 6, 2012

Goodness of Fit with CSTAT/CASH

Sherpa has two statistics derived from the Poisson likelihood called Cash and CSTAT.  In this case the measure of goodness of fit requires simulations. Sherpa sampling functions make these simulations really easy.  After fitting the data using these statistics one can simply use a sampler and plot the resulting distributions. The sampler will generate a number of parameter sets using the sampling distribution  (normal, t, uniform) centered on the best fit parameter values. One can then check whether
the CSTAT or Cash values given for these parameters are largely different than the ones obtained by fitting the data.

fit()
sim = normal_sample( num=1000 )
plot_cdf( sim[ :, 0] )

plotting the cummulative distribution of the statistic values provides immediate visualization of
the best fit statistics in comparison to the simulations. The best fit statistic values should be close to the 0.5 in the cdf, so about 50% of the values. If the best fit statistics value is not close to 50% in the cdf plot then the fit is not good.

Using numpy we can also check the minimum and median of the simulated distribution in comparison to the best fit values.

# first check the current statistics, then check the simulations:

calc_stat_info()
numpy.min( sim[ :, 0 ])
numpy.median( sim[ :, 0 ])