7.3 Data binning

7.3.1 Introduction

An important problem in data analysis is how the data should be sampled. For example, most X-ray detectors are able to count the individual photons (events) that hit the detector. A number of properties of the event, like its arrival time, position or energy may be recorded. The electronics of most detectors operate in such a way that not the energy value (or a related pulse height) is recorded, but a digitized version of it. Then a histogram is produced containing the number of events as a function of the energy or pulse height channel. The bin size of these data channels ideally should not exceed the resolution of the instrument, otherwise important information may be lost. On the other hand, if the bin size is too small, one may have to deal with low statistics per data channel or with a large computational overhead caused by the large number of data channels. In this section we derive the optimum bin size for the data channels. We will start with the Shannon theorem and derive expressions for the errors made by using a particular bin size. From these errors and the statistical noise on the data it is then possible to arrive at the optimum bin size.

7.3.2 The Shannon theorem

There is an important theorem which helps to derive the optimum binsize for any signal. This is the Shannon (1949) sampling theorem, also sometimes attributed to Nyquist. This theorem states the following:

Let f(t) be a continuous signal. Let g(ω) be its Fourier transform, given by

      ∫∞
g(ω ) =   eiωtdt.
     - ∞
(7.1)

If g(ω) = 0 for all |ω| > W for a given frequency W, then f(t) is band-limited, and in that case Shannon has shown that

              ∞
f(t) = f (t) ≡ ∑   f(nΔ)sinπ(t∕Δ---n-).
       s     n= -∞        π(t∕Δ - n)
(7.2)

In (7.2), the bin size Δ = 12W. Thus, a band-limited signal is completely determined by its values at an equally spaced grid with spacing Δ.

We can translate the above Shannon theorem easily in terms of a response function for an X-ray detector. In this case the variable t denotes the energy. In practice, many response functions are not band-limited (example: a Gaussian line profile). It is common practice in those cases to neglect the Fourier power for frequencies above W, and to reconstruct f(t) from this cut-off Fourier transform. The arguing is that for sufficiently large W, the neglected power in the tail of the Fourier spectrum is small and hence (7.2) gives a fair approximation to the true signal.

The problem with such a treatment is that it is not obvious what the best choice for W and hence the bin size Δ is. There are theorems which express the maximum error in f(t) in terms of the neglected power. Weiss (1963) has derived an expression for the maximum error introduced (see also Jerri, 1977):

                   ∫∞
E = |f(t)- f(t)| ≤ 2-  |g(ω)dω.
 a          s     π
                   W
(7.3)

One could now proceed by e.g. assuming that this maximum error holds over the entire range where a significant signal is detected, and then, depending upon the number of events used to sample f(t), determine W. However, this yields a large over-estimation of the true error, and hence a much smaller bin size than really necessary.

The point to make here is that (7.3) is only useful for continuous signals that are sampled at a discrete set of time intervals, with no measurements in between. However X-ray spectra are sampled differently. First, the measured X-ray spectrum is determined by counting individual photons. The energy of each photon is digitized to a discrete data channel number. Therefore the measured X-ray spectrum is essentially a histogram of the number of events as a function of channel number. The essential difference with the previous discussion is that we do not measure the signal at the data channel boundaries, but we measure the sum (integral) of the signal between the data channel boundaries. Hence for X-ray spectra it is more appropriate to study the integral of f(t) instead of f(t) itself. This will be elaborated in the next paragraph.

7.3.3 Integration of Shannon’s theorem

We have shown that for X-ray spectra it is more appropriate to study the integrated signal instead of the signal itself. Let us assume that the X-ray spectrum f(t) represents a true probability distribution. In fact, it gives the probability that a photon will be detected at the data channel t. The cumulative probability density distribution function F(t) is given by

       ∫t
F (t) =    f(x )dx.
      -∞
(7.4)

If we insert (7.4) into the Shannon reconstruction (7.2) we obtain after interchanging the integration and summation, and keeping into mind that we cannot evaluate F(t) at all arbitrary points but only at those grid points mΔ for integer m where also fs is sampled:

          Δ  ∞∑        { π             }
Fs(m Δ) = π-     f(nΔ ) 2-+ Si[π(m - n)] .
            n=-∞
(7.5)

The function Si(x) is the sine-integral as defined e.g. in Abramowitz & Stegun (1965). It is an antisymmetric function: Si(-x) = -Si(x), and for large x it oscillates with an amplitude decreasing proportional to 1∕x towards the constant π∕2. From (7.5) it can be easily seen that in the limit of Δ 0 with t = mΔ fixed this expression reduces to (7.4). For illustration, we list in table 7.1 some values of the weights used in the summation.


Table 7.1: Weights wmn = 12+Si[(π(m-n)]∕π used in the cumulative Shannon approximation.




m - n wmnm - n wmn




-9 -0.01129 1.0112
-8 0.01268 0.9874
-7 -0.01447 1.0144
-6 0.01686 0.9832
-5 -0.02015 1.0201
-4 0.02504 0.9750
-3 -0.03313 1.0331
-2 0.04862 0.9514
-1 -0.08951 1.0895





The expression (7.5) for Fs equals F if f(t) is band-limited. We see in that case that at the grid points F is completely determined by the value of f at the grid points. By inverting this relation, one could express f at the grid points as a unique linear combination of the F-values at the grid. Since Shannon’s theorem states that f(t) for arbitrary t is determined completely by the f-values at the grid, we infer that f(t) can be completely reconstructed from the discrete set of F-values. And then, by integrating this reconstructed f(t), also F(t) is determined.

We conclude that also F(t) is completely determined by the set of discrete values F(mΔ) at t = mΔ for integer values of m, provided that f(t) is band-limited.

In cases of non-bandlimited responses, we will use (7.5) to approximate the true cumulative distribution function at the energy grid. In doing this, a small error is made. These errors are known as aliasing errors. The errors can be calculated easily by comparing Fs(mΔ) by the true F(mΔ) values. Then, for an actually observed spectrum containing counting noise, these sampling errors can be compared to the counting noise errors. We will conclude that the binning Δ is sufficient if the sampling errors are sufficiently small as compared to the noise errors. What is sufficient will be elaborated further.

For the comparison of the aliased model to the true model we consider two statistical tests, the classical χ2 test and the Kolmogorov-Smirnov test. This is elaborated in the next sections.

7.3.4 The χ2 test

Suppose that f0(x) is the unknown, true probability density function (pdf) describing the observed spectrum. Further let f1(x) be an approximation to f0 using e.g. the Shannon reconstruction as discussed in the previous sections, with a bin size Δ. The probability p0n of observing an event in the data bin number n under the hypothesis H0 that the pdf is f0 is given by

       ∫nΔ

p0n =      f0(x)dx
     (n-1)Δ
(7.6)

and similar the probability p1n of observing an event in the data bin number n under the hypothesis H1 that the pdf is f1 is

       n∫Δ
p1n =      f1(x)dx.

     (n-1)Δ
(7.7)

We define the relative difference δn between both probabilities as

p1n ≡ p0n(1 + δn).
(7.8)

Let the total number of events in the spectrum be given by N. Then the random variable Xn, defined as the number of observed events in bin n, has a Poisson distribution with an expected value μn = Np0n.

The classical χ2 test for testing the hypothesis H0 against H1 is now based on the random variable:

      k
X2 ≡ ∑  (Xn---N-p0n)2.
     n=1    N p0n
(7.9)

The hypothesis H0 will be rejected if X2 becomes too large. In (7.9) k is the total number of data bins. X2 has a (central) χ2 distribution with k degrees of freedom, if the hypothesis H0 is true, and in the limit of large N.

However if H1 holds, X2 no longer has a central χ2 distribution. It is straightforward to show that if f1 approaches f0, then under the hypothesis H1 X2 has a non-central χ2 distribution with non-central parameter

      ∑k
λc = N   p0nδ2n.
      n=1
(7.10)

This result was derived by Eisenhart (1938). It is evident that λc becomes small when f1 comes close to f0. For λc = 0 the non-central χ2 distribution reduces to the ”classical” central χ2 distribution. The expected value and variance of the non-central χ2 distribution are simply k + λc and 2k + 4λc, respectively.

We now need to find out a measure how much the probability distribution of X2 under the assumptions H0 and H1 differ. The bin size Δ will be acceptable if the corresponding probability distributions for X2 under H0 (a central χ2 distribution) and H1 (a non-central χ2 distribution) are close enough.

Using the χ2 test, H0 will in general be rejected if X2 becomes too large, say larger than a given value cα. The probability α that H0 will be rejected if it is true is called the size of the test. Let us take as a typical value α = 0.05. The probability that H0 will be rejected if H1 is true is called the power of the test and is denoted by 1 -β. An ideal statistical test of H0 against H1 would also have β small (large power). However, in our case we are not interested in getting the most discriminative power of both alternatives, but we want to know how small λc can be before the alternative distributions of X2 become indistinguishable. In other words,

P rob(X2 >  cα|H1)  =  f α                           (7.11)
P rob(X2 >  cα|H0)  =  α,                            (7.12)

where ideally f should be 1 or close to 1. For f = 1, both distributions would be equal and λc = 0, resulting in Δ = 0. We adopt here as a working value f = 2, α = 0.05. That is, using a classical χ2 test H0 will be rejected in only 5 % of the cases, and H1 in 10 % of the cases. In general we are interested in spectra with a large number of bins k. In the limit of large k, both the central and non-central χ2 distribution can be approximated by a normal distribution with the appropriate mean and variance. Using this approximation, the criterion (7.11) translates into

   √ ---           ∘ --------
k+   2kqα = k +λc +   2k+ 4λcqfα
(7.13)

where now qα is given by G(qα) = 1 -α with G the cumulative probability distribution of the standard normal distribution. For the standard values we have q0.05 = 1.64485 and q0.10=1.28155. Solving (7.13) in the asymptotic limit of large k yields

    √ ---               √ ---
λc =  2k(qα - qfα ) ≡ p(α,f) 2k.
(7.14)

For our particular choice of α = 0.05 and f = 2 we have p(α,f) = 0.3633. Other values are listed in table 7.2.

We see that the central displacement λc must be typically a small fraction of the natural ”width” √2k- of the χ2 distribution. The constant of proportionality p(α,f) does not depend strongly upon α as can be seen from table 7.2. The dependence on f is somewhat stronger, in the sense that if f approaches 1, p(α,f) approaches 0 as it should be.


Table 7.2: Scaling factors for the displacement parameter



α f p(α,f)



0.012.00.27260
0.0252.00.31511
0.052.00.36330
0.102.00.47513



0.051.20.09008
0.051.50.20532
0.052.00.36330




As a typical value we use the value of 0.3633 for α = 0.05 and f = 2. Combining (7.10) and (7.14) yields

ϵ = (2k)0.25[p(α,f)]0.5N -0.5,
(7.15)

where the probability-weighted relative error ϵ is defined by

 2  ∑k     2
ϵ ≡    p0nδn.
    n=1
(7.16)

7.3.5 Extension to complex spectra

The χ2 test and the estimates that we derived hold for any spectrum. In (7.15) N should represent the total number of counts in the spectrum and k the total number of bins. Given a spectrum with pdf f0(x), and adopting a certain bin width Δ, the Shannon approximation f1(x) can be determined, and accordingly the value of ϵ2 can be obtained from (7.16). By using (7.15) the value of N corresponding to Δ is found, and by inverting this relation, the bin width as a function of N is obtained.

This method is not always practical, however. First, for many X-ray detectors the spectral resolution is a function of the energy, hence binning with a uniform bin width over the entire spectrum is not always desired.

Another disadvantage is that regions of the spectrum with good statistics are treated the same way as regions with poor statistics. Regions with poor statistics contain less information than regions with good statistics, hence can be sampled on a much coarser grid.

Finally, a spectrum often extends over a much wider energy band than the spectral resolution of the detector. In order to estimate ϵ2, this would require that first the model spectrum, convolved with the instrument response, should be used in order to determine f0(x). However, the true model spectrum is known in general only after spectral fitting. The spectral fitting can be done only after a bin size Δ has been set.

It is much easier to characterize the errors in the line-spread function of the instrument for a given bin size Δ, since that can be done independently of the incoming photon spectrum. If we restrict our attention now to a region r of the spectrum with a width of the order of a resolution element, then in this region the observed spectrum is mainly determined by the convolution of the lsf in this resolution element r with the true model spectrum in a region with a width of the order of the size of the resolution element. We ignore here any substantial tails in the lsf, since they are of little importance in the determination of the optimum bin size/resolution.

If all photons within the resolution element would be emitted at the same energy, the observed spectrum would be simply the lsf multiplied by the number of photons Nr. For this situation, we can easily evaluate ϵr2 for a given binning. If the spectrum is not mono-energetic, then the observed spectrum is the convolution of the lsf with the photon distribution within the resolution element, and hence any sampling errors in the lsf will be more smooth (smaller) than in the mono-energetic case. It follows that if we determine ϵr2 from the lsf, this is an upper limit to the true value for ϵ2 within the resolution element r.

In (7.10) the contribution to λc for all bins is added. The contribution of a resolution element r to the total is then given by Nrϵr2, where Nr must be the number of photons in the resolution element. We keep here as a working definition of a resolution element the size of a region with a width equal to the FWHM of the instrument. An upper limit to Nr can be obtained by counting the number of events within one FWHM and multiplying it by the ratio of the total area under the lsf (should be equal to 1) to the area under the lsf within one FWHM (should be smaller than 1).

We now demand that the contribution to λc from all resolution elements is the same. Accordingly, resolution elements with many counts need to have f1 closer to f0 than regions with less counts.

If there are R resolution elements in the spectrum, we demand therefore for each resolution element r the following condition:

         2
λc,r ≡ Nrϵr = λc∕R
(7.17)

with

    ∑kr
ϵ2r ≡   p0nδ2n,
    n=1
(7.18)

where now p0n and δn are related to the lsf, not to the observed spectrum.

Further kr is the number of bins within the resolution element. The value of λc in (7.17) is given again by (7.14) where now k represents the total number of bins in the spectrum, i.e.

    R
k = ∑ k .
   r=1 r
(7.19)

Combining everything, we have for each resolution element r:

ϵr = (2k)0.25[p(α,f)]0.5(RNr )-0.5
(7.20)

7.3.6 Difficulties with the χ2 test

The first step in applying the theory derived above is evaluating (7.18) for a given lsf. Let us take a Gaussian lsf, sampled with a step size of 1σ as an example. Table 7.3 summarizes the intermediate steps.


Table 7.3: Contribution to λc,r for a gaussian lsf





np0n p1n p0n - p1np0nδn2





-86.11 10-16 3.04 10-4-3.04 10-41.51 108
-71.28 10-12-3.33 10-4 3.33 10-48.65 104
-69.85 10-10 3.65 10-4-3.65 10-41.35 102
-52.86 10-7 -3.99 10-4 3.99 10-45.57 10-1
-43.14 10-5 4.60 10-4-4.29 10-45.86 10-3
-31.32 10-3 8.77 10-4 4.41 10-41.48 10-4
-22.14 10-2 2.18 10-2-4.10 10-47.85 10-6
-11.36 10-1 1.36 10-1 3.04 10-46.78 10-7
03.41 10-1 3.42 10-1-1.14 10-43.80 10-8
13.41 10-1 3.42 10-1-1.14 10-43.80 10-8
21.36 10-1 1.36 10-1 3.04 10-46.78 10-7
32.14 10-2 2.18 10-2-4.10 10-47.85 10-6
41.32 10-3 8.77 10-4 4.41 10-41.48 10-4
53.14 10-5 4.60 10-4-4.29 10-45.86 10-3
62.86 10-7 -3.99 10-4 3.99 10-45.57 10-1
79.85 10-10 3.65 10-4-3.65 10-41.35 102
81.28 10-12-3.33 10-4 3.33 10-48.65 104






It is seen immediately that for large values of |n|, the probability p0n of a photon falling in bin n decreases rapidly. Also, the approximation is always better than 0.0005. However, the relative error of the approximation increases as n increases. Therefore, the largest contribution to χ2 comes from the tails of the Gaussian distribution! In practice one could circumvent this by cutting the summation at some finite n. It is well known that for the χ2 test to be valid, the expected number of events in each bin should not be too small. The summation could be stopped where Nrp0n becomes a few. However, since the largest contribution to ϵr2 comes from the largest value of |n|, the value of ϵr2 will be a very steep function of the precise cut-off criterion, which is an undesirable effect.

An alternative solution can be obtained by using the Kolmogorov-Smirnov test, which is elaborated in the following section.

7.3.7 The Kolmogorov-Smirnov test


Table 7.4: Maximum difference dn for a gaussian lsf



np0n dn



-86.106 10-16 1.586 10-4
-71.279 10-12-1.740 10-4
-69.853 10-10 1.909 10-4
-52.857 10-7 -2.079 10-4
-43.138 10-5 2.209 10-4
-31.318 10-3 -2.202 10-4
-22.140 10-2 1.896 10-4
-11.359 10-1 -1.139 10-4
03.413 10-1 2.675 10-9
13.413 10-1 1.139 10-4
21.359 10-1 -1.896 10-4
32.140 10-2 2.202 10-4
41.318 10-3 -2.209 10-4
53.138 10-5 2.079 10-4
62.857 10-7 -1.909 10-4
79.853 10-10 1.740 10-4
81.279 10-12-1.586 10-4




A good alternative to the χ2 test for the comparison of probability distributions is the Kolmogorov-Smirnov test. This powerful, non-parametric test is based upon the test statistic D given by

D = max |S(x)- F(x)|,
(7.21)

where S(x) is the observed cumulative distribution for the sample of size N. Clearly, if D is large F(x) is a bad representation of the observed data set. The statistic D√--
 ND for large N (typically 10 or larger) has the limiting Kolmogorov-Smirnov distribution. The hypothesis that the true cumulative distribution is F will be rejected if D > cα, where the critical value cα corresponds to a certain size α of the test. A few values for the confidence level are given in table 7.5. The expected value of the statistic D is ∘ ----
  π∕2ln2=0.86873, and the standard deviation is π∘--------------
 1∕12 - (ln2)2∕π=0.26033.


Table 7.5: Critical valuse cα for the Kolmogorov-Smirnov test as a function of the size of the test α




α cα α cα




0.011.630.271.00
0.051.360.500.83
0.101.220.950.52
0.151.140.980.47
0.201.070.990.44





Similar to the case of the χ2 test, we can argue that a good criterion for determining the optimum bin size Δ is that the maximum difference

λ ≡ √N--max |F (mΔ )- F(m Δ)|
 k        m   s
(7.22)

should be sufficiently small as compared to the statistical fluctuations described by D. Similar to (7.11) we can derive equations determining λk for a given size of the test α and f:

    FKS (cα)  =  1 - α,                            (7.23)
FKS(cα - λk) =  1 - fα.                           (7.24)

Equation (7.23) gives the critical value for the Kolmogorov-statistic as a function of the size α, under the assumption H0. Equation (7.24) gives the critical value for the Kolmogorov-statistic as a function of the size α, under the assumption H1. Taking α = 0.05 and f = 2, we find λk = 0.134. Again, as in the case of the χ2 test, this quantity depends only weakly upon α: for α = 0.01,f = 2 we would have λ = 0.110.

Similar for the case of the χ2 test, we now change our focus to the case of multiple resolution elements. Suppose that in each of the R resolution elements we perform a Kolmogorov-Smirnov test. How can we combine the tests in the different regions? For the χ2 test we could simply add the contributions from the different regions. For the Kolmogorov-Smirnov test it is more natural to test for the maximum δ:

δ ≡ max ∘NrDr,
     r
(7.25)

where as before Nr is the number of photons in the resolution element r, and Dr is given by (7.21) over the interval r. Note that the integral of Sr(x) and Fr(x) over the interval r should be 1. Our test statistic therefore combines ”normalized” KS-tests over each resolution element. All the √---
 NrDr are independently distributed according to a KS distribution, hence the cumulative distribution function for their maximum is simply the Rth power of a single stochastic variable with the KS distribution. Therefore we obtain instead of (7.23, 7.24):

    FKS (cα)R  =  1- α,                            (7.26)
FKS (cα - λk)R  =  1- fα.                           (7.27)


PIC

Figure 7.1: Displacement λk as a function of the number of resolution elements R. A polynomial approximation is given in the text.


The hypothesis H0 is rejected if δ > cα. For our choice of α = 0.05 and f = 2 we can calculate λk as a function of R by solving numerically the above equations using the exact Kolmogorov-Smirnov cumulative probability distribution function. In fig. 7.1 we show the solution. It is seen that it depends only weakly upon R; it has a maximum value 0.134 at R = 1 and is less than 3 times smaller at R = 1010. We can approximate λk as a function of R by the following polynomial:

       ∑5    10log(R )
λk(R) =   ai(--------)i,
       i=0      10
(7.28)

where the numbers ai are given in table 7.6. Note the factor of 10 in the denominator! The rms deviation of the approximation is less then 10-4. The corresponding value for cα is plotted in fig. 7.2, and is approximated with an rms deviation of 0.0004 by the polynomial:


PIC

Figure 7.2: Critical value cα as a function of the number of resolution elements R. A polynomial approximation is given in the text.


       ∑5    10log(R)
cα(R) =   bi(-------)i,
       i=0      10
(7.29)

where the numbers bi are given in table 7.6. This approximation is only valid for the range 0.5 < R < 1010.


Table 7.6: Coefficients for the approximations for λk and cα



i ai bi



0 0.1342 1.3596
1-0.3388 4.0609
2 0.7994-4.3522
3-1.1697 5.2225
4 0.9053-3.7881
5-0.2811 1.1491




7.3.8 Examples

Let us apply the theory derived before to some specific line spread functions. First we consider a Cauchy (or Lorentz) distribution, given by

       ---a-----
f(x) = π(a2 + x2),
(7.30)

where a is the width parameter. The FWHM equals 2a. Next we consider the Gaussian distribution, given by

      --1-- - x2∕2σ2
f(x) = √2-πσe       ,
(7.31)

with σ the standard deviation. The FWHM is given by √ln256σ, which is approximately 2.3548σ.

We have computed the value for δ λk√N-- numerically for both distributions. We have taken into account also the effect of possible phase shifts, i.e. that the center of the distribution does not coincide with a grid point but with a certain phase of the grid. The maximum bin width Δ/FWHM as a function of the maximum absolute difference δ between the true and approximated cumulative density distribution is plotted for both distributions in fig. 7.3.


PIC

Figure 7.3: Required bin width Δ as a function of the accuracy parameter δ for a Gaussian distribution (solid line) and a Cauchy distribution (dashed line).


We can approximate both curves with sufficient accuracy by a polynomial in x 10 log δ as

                  ∑4    10 log(δ)
10log(Δ ∕FW HM  ) =   ci(-------)i,
                  i=0     10
(7.32)

                   4
10log(Δ ∕FW HM  ) = ∑ g (10log(δ))i,
                  i=0 i   10
(7.33)

for the Cauchy distribution (7.32) and Gaussian distribution (7.33) respectively, with the coefficents given in table 7.7.


Table 7.7: Coefficients for the approximations for the Cauchy and Gauss distribution.



i ci gi



00.4000.277
15.0653.863
29.3218.470
39.3339.496
43.5843.998




The rms deviation of these fits over the range 10-9 < δ < 1 is 0.0086 for the Cauchy distribution and 0.0070 for the Gauss distribution. Outside of this range the approximation gets worse, however those values for δ are of no practical use.

The bin size as a function of the number of resolution elements R and number of photons per resolution element Nr is now obtained by combining (7.32) or (7.33) with (7.28) and using

λk = ∘Nr-δ.
(7.34)


PIC

Figure 7.4: Required bin width Δ relative to the FWHM for a Gaussian distribution as a function of Nr for R=1, 10, 100, 1000 and 10000 from top to bottom, respectively.



PIC

Figure 7.5: Required bin width Δ relative to the FWHM for a Lorentz distribution as a function of Nr for R=1, 10, 100, 1000 and 10000 from top to bottom, respectively.


In fig. 7.4 we show the required binning, expressed in FWHM units, for a Gaussian lsf as a function of R and Nr, and in fig. 7.5 for a Lorentzian.

For the Gaussian, the resolution depends only weakly upon the number of counts Nr. However in the case of a pure Lorentzian profile, the required bin width is somewhat smaller than for a Gaussian. This is due to the fact that the Fourier transform of the Lorentzian has relatively more power at high frequencies than a Gaussian (exp[- ωσ] versus exp[- (ω σ)2∕2] respectively). For low count rate parts of the spectrum, the binning rule 1/3 FWHM usually is too conservative!

7.3.9 Final remarks

We have estimated conservative upper bounds for the required data bin size. In the case of multiple resolution elements, we have determined the bounds for the worst case phase of the grid with respect to the data. In practice, it is not likely that all resolution elements would have the worst possible alignment. In fact, for the Gaussian lsf, the phase-averaged value for δ at a given bin size Δ is always smaller than 0.83, and is between 0.82–0.83 for all values of ΔFWHM < 1.3. However we recommend to use the conservative upper limits as given by (7.33).

Another issue is the determination of Nr, the number of events within the resolution element. We have argued that an upper limit to Nr can be obtained by counting the number of events within one FWHM and multiplying it by the ratio of the total area under the lsf (should be equal to 1) to the area under the lsf within one FWHM (should be smaller than 1). For the Gaussian lsf this ratio equals 1.314, for other lsfs it is better to determine it numerically, and not to use the Gaussian value 1.314.

Finally, in some cases the lsf may contain more than one component. For example, the grating spectra obtained with the EUVE SW, MW and LW detector have first, second, third etc. order contributions. Other instruments have escape peaks or fluorescent contributions. In general it is advisable to determine the bin size for each of these components individually, and simply take the smallest bin size as the final one.