Thursday, April 29, 2010

Define parameter grid and calculate statistics - steppar functionality in Sherpa

It is trivial to define a grid in Sherpa Python and then calculate the statistic values for set of par1, par2 parameters:

low_lim1=0.1
up_lim1=2.0
nstep1=10
low_lim2=1.5
up_lim2=3.0
nstep2=20
par1_grid=numpy.linspace(low_lim1, up_lim1, nstep1)
par2_grid=numpy.linspace(low_lim2, up_lim2, nstep2)

#Now calculate the statistics for all the parameters on the grid

stat_vals=numpy.zeros((nstep1, nstep2))

for i, parval1 in enumerate(par1_grid):
    set_par(par1, parval1)
    for j, parval2 in enumerate(par2_grid):
        set_par(par2, parval2)
        stat_vals[i,j] = calc_stat()
        print '%d %.3f %d %.3f %f' % (i, parval1, j, parval2, stat_vals[i, j])
    

Probing Parameter Space using Sampler in Sherpa

There is no fully Bayesian sampler inside Sherpa yet, however, there is a way to sample parameter space after the fit converged. The main assumption is that sampling starts from the best fit parameter values, the free parameters have Normal distributions with the scales  based on the covariance matrix with the off-diagonal elements to account for correlations.  The function "sample_energy_flux()" returns an array with sampled parameters and calculated flux for these parameters.

The thread   http://cxc.harvard.edu/sherpa/threads/flux_dist/index.py.html

describes how to use the functions to calculate the flux errors.

For the parameter errors it is easy to check the distribution of parameters returned by the sample_energy_flux()

Example:
sample100=sample_energy_flux(0.5,2.,num=1000)
print sample100
[...
[  2.90602963e-10   1.10100135e+00   8.39761377e-01   7.01873798e-01
   2.35626905e+00   1.03667988e+00]...
]

The first column contains the flux value for all the parameters in the other columns.

We can easily calculate 95% bounds for a given parameter:

sherpa-100> print sample100[:,1]
----------> print(sample100[:,1])
[ 1.10121073  1.1011715   1.10100135  1.10191317  1.10233657  1.10103386
  1.10360142  1.09774949  1.0984214   1.09927707  1.10041681  1.09919926

sherpa-102> min(sample100[:,1])
            1.0972469907219351
sherpa-103> max(sample100[:,1])
            1.1065345259787218

We can  use numpy to obtain the properties of the parameter distribution:
sherpa-104> numpy.median(sample100[:,1])
            1.1015496488552339
sherpa-105> numpy.mean(sample100[:,1])
            1.1015685708445886
sherpa-106> numpy.std(sample100[:,1])
            0.0015635753706038143

We can also sort and make a histogram and plot the distribution:
sort_par=numpy.sort(sample100[:,1])
sort_par[0.99*len(sample100[:,1])-1]

# make a histogram of parameter value and plot:
xspace=numpy.linspace(min(sample100[:,1]),max(sample100[:,1]),30)
xlo=xspace[:-1]
xhi=xspace[1:]
parhist=histogram1d(sample100[:,1],xlo,xhi)
parhist100=parhist/1000.
add_histogram(xlo,parhist100)
print_window("hist.png")

The histogram can be saved, but I could not place the figure into the Blogger - a known issue with the Google Blogger editor.