7.5 The proposed response matrix

7.5.1 Dividing the response into components

When a response matrix is stored on disk, usually only the non-zero matrix elements are stored and used. This is done in order to save both disk space and computational time. The procedure as used in XSPEC and the older versions of SPEX is then that for each model energy bin j the relevant column of the response matrix is subdivided into groups. The groups are the continuous pieces of the column where the response is non-zero. In general these groups are stored in a specific order: starting from the lowest energy, all groups of a single energy are given before turning to the next higher photon energy.

This is not the optimal situation neither with respect to disk storage nor with respect to computational efficiency, as is illustrated by the following example. For the XMM/RGS, the response consists of a narrow gaussian-like core with in addition a broad scattering component due to the gratings. The FWHM of the scattering component is typically 10 times larger than that of the core of the response. As a result, if the response would be saved as a ”classical” matrix, we would end up with one response group per energy, namely the combined core and wings response, since these contributions have overlap, namely in the region of the gaussian-like core. As a result, the response becomes large, being a significant fraction of the product of the total number of model energy bins times the total number of data channels. This is not necessary, since the scattering contribution with its ten times larger width needs to be specified only on a model energy grid with ten times fewer bins, as compared to the gaussian-like core. Thus, by separating out the core and the scattering contribution, the total size of the response matrix can be reduced by about a factor of 10! Of course, as a consequence each contribution needs to carry its own model energy grid with it.

Therefore I propose here to subdivide the response matrix into its physical components. Then for each response component, the optimum model energy grid can be determined according to the methods described in section 3, and this model energy grid for the component can be stored together with the response matrix part of that component. Furthermore, at any energy each component may have at most 1 response group. If there would be more response groups, the component should be sub-divided further.

In X-ray detectors other than the RGS detector the subdivision could be quite different. For example, with the ASCA SIS detector one could split up the response e.g. into 4 components: the main diagonal, the Si fluorescence line, the escape peak and a broad component due to split events.

7.5.2 More complex situations

Outline of the problem

In most cases an observer will analyze the data of a single source, with a single spectrum and response for a single instrument. However, more complicated situations may arise. Examples are:

  1. An extended source, where the spectrum may be extracted from different regions of the detector, but where these spectra need to be analysed simultaneously due to the overlap in point-spread function from one region to the other. This situation is e.g. encountered in the analysis of cluster data with ASCA or BeppoSAX.
  2. For the RGS of XMM, the actual data-space in the dispersion direction is actually two-dimensional: the position z where a photon lands on the detector and its energy or pulse height c as measured with the CCD detector. X-ray sources that are extended in the direction of the dispersion axis ϕ are characterised by spectra that are a function of both the energy E and off-axis angle ϕ. The sky photon distribution as a function of (ϕ,E) is then mapped onto the (z,c)-plane. By defining appropriate regions in both planes and evaluating the correct (overlapping) responses, one may analyse extended sources.
  3. One may also fit simultaneously several time-dependent spectra using the same response, e.g. data obtained during a stellar flare.

It is relatively easy to model all these situations (provided that the instrument is understood sufficiently, of course), as we show below.

Sky sectors

First, the relevant part of the sky is subdivided into sectors, each sector corresponding to a particular region of the source, for example a circular annulus centered around the core of a cluster, or an arbitrarily shaped piece of a supernova remnant, etc.

A sector may also be a point-like region on the sky. For example if there is a bright point source superimposed upon the diffuse emission of the cluster, we can define two sectors: an extended sector for the cluster emission, and a point-like sector for the point source. Both sectors might even overlap, as this example shows!

Another example: the two nearby components of the close binary α Cen observed with the XMM instruments, with overlapping psfs of both components. In that case we would have two point-like sky sectors, each sector corresponding to one of the double star’s components.

The model spectrum for each sky sector may and will be different in general. For example, in the case of an AGN superimposed upon a cluster of galaxies, one might model the spectrum of the point-like AGN sector using a power law, and the spectrum from the surrounding cluster emission using a thermal plasma model.

Detector regions

The observed count rate spectra are extracted in practice in different regions of the detector. It is necessary here to distinguish clearly the (sky) sectors and (detector) regions. A detector region for the XMM EPIC camera would be for example a rectangular box, spanning a certain number of pixels in the x- and y-directions. It may also be a circular or annular extraction region centered around a particular pixel of the detector, or whatever spatial filter is desired. For the XMM RGS it could be a specific ”banana” part of the detector-coordinate CCD pulse-height plane (z,c).

Note that the detector regions need not to coincide with the sky sectors, neither should their number to be equal. A good example of this is again the example of an AGN superimposed upon a cluster of galaxies. The sky sector corresponding to the AGN is simply a point, while, for a finite instrumental psf, its extraction region at the detector is for example a circular region centered around the pixel corresponding to the sky position of the source.

Also, one could observe the same source with a number of different instruments and analyse the data simultaneously. In this case one would have only one sky sector but more detector regions, namely one for each participating instrument.

Consequences for the response

In all the cases mentioned above, where there is either more than one sky sector or more than one detector region involved, it is necessary to generate the response contribution for each combination of sky sector - detector region. In the spectral analysis, for each sky sector the model photon spectrum is calculated, and all these model spectra are convolved with the relevant response contributions in order to predict the count spectra for all detector regions. Each response contribution for a sky sector - detector region combination itself may consist again of different response components, as outlined in the previous subsection.

Combining all this, the total response matrix then will consist of a list of components, each component corresponding to a particular sky sector and detector region. For example, let us assume that the RGS has two response contributions, one corresponding to the core of the lsf, and the other to the scattering wings. Let us assume that this instrument observes a triple star where the instrument cannot resolve two of the three components. The total response for this configuration then consists of 12 components, namely 3 sky sectors (assuming each star has its own characteristic spectrum) times two detector regions (a spectrum extracted around the two unresolved stars and one around the other star) times two instrumental contributions (the lsf core and scattering wings).

7.5.3 A response component

We have shown in the previous section how for the combination of a particular source configuration and instrumental properties response matrix components can be defined. In this subsection we investigate how to build such a matrix component.

Let us first return to our discussion on the optimum model energy grid binning. For high-resolution instruments the factor EjFWHM in (7.60) is by definition large, and hence for most energies the requirements for the binning are driven by the model accuracy, not by the effective area accuracy. One might ask whether the approximation of a constant effective area within a model bin instead of the linear approximation (7.54) would not be sufficient in that case. The reason is the following.

In order to acquire the high accuracy, we need to convolve the model spectrum for the bin, approximated as a δ-function centered around Ea, with the instrument response. In most cases we cannot do this convolution analytically, so we have to make approximations. From our expressions for the observed count spectrum s(c), eqns. 7.35) and (7.36), it can be easily derived that the number of counts or count rate for data channel i is given by

     c∫i2  ∫
S =    dc  dER (c,E)f(E)
 i
    ci1
(7.62)

where as before ci1 and ci2 are the formal channel limits for data channel i and Si is the observed count rate in counts/s for data channel i. Interchanging the order of the integrations and defining the mono-energetic response for data channel i by R˜i(E) as follows:

        c∫i2
˜Ri(E) ≡   R(c,E)dc,
       c
        i1
(7.63)

we have

    ∫        ˜
Si =  dEf (E)Ri(E).
(7.64)

From the above equation we see that as long as we are interested in the observed count rate Si of a given data channel i, we get that number by integrating the model spectrum multiplied by the ”effective area” ˜Ri(E) for that particular data channel. We have approximated f(E) for each model bin j by (7.44), so that (7.64) becomes:

     ∑    ˜
Si =    FjRi(Ea,j),
      j
(7.65)

where as before Ea,j is the average energy of the photons in bin j, given by (7.45) and Fj is the total photon flux for bin j, in e.g. photons m-2 s-1. It is seen from (7.65) that we need to evaluate  ˜
Ri not at the bin center Ej but at Ea,j, as expected.

Formally we may split-up R˜i(E) in an effective area part A(E) and a redistribution part ˜r i(E) in such a way that

˜Ri(E ) = A (E )˜ri(E).
(7.66)

We have chosen our binning already in such a way that we have sufficient accuracy when the total effective area A(E) within each model energy grid bin j is approximated by a linear function of the photon energy E. Hence the ”arf”-part of ˜Ri is of no concern. We only need to check how the redistribution (”rmf”) part ˜r i can be calculated with sufficiently accuracy.

For ˜r i the arguments go exactly the same as for A(E), in the sense that if we approximate it locally for each bin j by a linear function of energy, the maximum error that we make is proportional to the second derivative with respect to E of r˜ i, cf. (7.56).

In fact, for a Gaussian redistribution function the following is straightforward to proove, and is left here as an excercise for the reader:

Theorem 1 Assume that for a given model energy bin j all photons are located at the upper bin boundary EjE∕2. Suppose that for all data channels we approximate ˜r i by a linear function of E, the coefficients being the first two terms in the Taylor expansion around the bin center Ej. Then the maximum error δ made in the cumulative count distribution (as a function of the data channel) is given by (7.47) in the limit of small ΔE.

The importance of the above theorem is that it shows that the binning for the model energy grid that we have chosen in section 2 is also sufficiently accurate so that ˜r i(E) can be approximated by a linear function of energy within a model energy bin j, for each data channel i. Since we showed already that our binning is also sufficient for a similar linear approximation to A(E), it follows that also the total response ˜Ri(E) can be approximated by a linear function. Hence we use within the bin j:

R˜i(Ea,j) = ˜Ri(Ej)+ dR˜i-(Ej ) (Ea,j - Ej).
                  dEj
(7.67)

Inserting the above in (7.65) and comparing with (7.36) for the classical response matrix, we obtain finally:

S = ∑  R   F  +R   (E  - E )F ,
 i   j   ij,0 j    ij,1  a    j  j
(7.68)

where Rij,0 is the ”classical” response matrix, evaluated for photons at the bin center (Note: not the average over the bin!), and Rij,1 is its derivative with respect to the photon energy (Note: not to be confused with the data channel!), evaluated also at the bin center. In addition to the ”classical” convolution, we thus get a second term containing the relative offset of the photons with respect to the bin center. This is exactly what we intended to have when we argued that the number of bins could be reduced considerably by just taking that offset into account! It is just at the expense of an additional derivative matrix, which means only a factor of two more disk space and computation time. But for this extra expenditure we gained much more disk space and computational efficiency because the number of model bins is reduced by a factor between 10–100!

Finally a practical note. The derivative Rij,1 can be calculated in practice either analytically or by numerical differentiation. In the last case it is more accurate to evaluate the derivative by taking the difference at E + ΔE∕2 and E - ΔE∕2, and wherever possible not to evaluate it at one of these boundaries and the bin center. This last situation is perhaps only un-avoidable at the first and last energy value.

Also, negative response values should be avoided. Thus it should be ensured that Rij,0 + Rij,1h is everywhere non-negative for -ΔE∕2 h ΔE∕2. This can be translated into the constraint that Rij,1 should be limited always to the following interval:

- 2Rij,0∕ΔE  ≤ Rij,1 ≤ 2Rij,0∕ΔE.
(7.69)

Whenever the calculated value of Rij,1 should exceed the above limits, the limiting value should be inserted instead. This situation may happen for example for a Gaussian redistribution for responses a few σ away from the center, where the response falls of exponentially. However, the response Rij,0 is small anyway for those energies so this limitation is not serious. The only reason is that we want to avoid negative predicted count rates, whatever small they may be.