# William J. Pearson

## PhD Candidate at SRON - The Netherlands Institute for Space Research and The Kapteyn Institute - University of Groningen.

I am a PhD student at SRON and The Kapteyn Institute. My work focuses around star formation across cosmic time. I use multi-wavelength data sets, from UV to sub-mm, to derive the physical properties of galaxies. As well as this, I employ deep learning techniques to help classify galaxies' morphologies. You can find out more below.

Outside of the department, I write silly little programs, play my saxophone or piano and enjoy long walks on the beach. Okay, maybe not long walks on the beach but the rest is true.

# Projects

## Effect of galaxy mergers on star formation rates with deep learning

This is probably the final project for my PhD. Expanding work done with CNNs I have done previously, I am currently looking at what galaxy mergers do to star formation rates. Old wisdom says that high star formation rate objects (star bursts) are mergers with the implication that all star bursts are mergers. More recent wisdom has brought this into doubt.

Using data from CANDELS and KiDS, we have trained another two CNN to compliment the SDSS CNN we trained earlier. In doing so, I also had a fun discussion about how we can identify mergers for the KiDS CNN as that survey does not (or did not) have a merger catalogue. The star formation rate - stellar mass plane was then populated with galaxies (both the merging and non-merging flavour) and we had a look at where the merging galaxies lied with respect to the non-merging galaxies.

Across all three surveys, the merging galaxies can lie where ever they feel like. There are merging quiescent galaxies, merging star forming galaxies and merging star burst galaxies. With in these three populations, the merging galaxies do not appear to do much either, with very little change in the average star formation rates for merging galaxies.

You can read the paper here

## Identifying Galaxy Mergers in Observations and Simulations with Deep Learning

A Convolutional Neural Network (CNN, also ConvNet (I wish people could agree on the acronym...)) is a useful tool to classify images. This type of neural network is often behind the fancy object detection that you often hear Google is doing. In this work, we have trained a CNN to be able to identify galaxy mergers in SDSS observations and the EAGLE hydrodynamical simulation.

The CNN trained to identify mergers in SDSS is great. It can correctly identify merging and non-merging galaxies more than 90 % of the time. Although there are reasons why it's so good: the training images are obviously merging galaxies to a human eye. With the machine's eye trying to be a human eye, it makes sense why this is so good.

The EAGLE galaxies are not so good. We know these galaxies are mergers, the simulation says so, but the merging features are less obvious and many merging galaxies look like single galaxies. So the accuracy is around 65 %. This isn't helped by fiddling with the EAGLE images to make them look like real SDSS galaxies.

We also passed the wrong images through our CNNs: EAGLE images through the SDSS trained CNN and vice versa. EAGLE through the SDSS CNN was terrible, about 50 % accuracy. It's not as bad as it sounds and there is preferential assignment to the non-merger class, so the network is 'thinking'. But the hard to see mergers are not doing well. The other way, SDSS through EAGLE, is better. It's around 65 % accuracy but it shows promise. I think this can be improved upon and will end up being used with the big upcoming surveys.

## Main Sequence of Star Forming Galaxies Beyond the Herschel Confusion Limit

Herschel SPIRE data are confused. Not in a mental way but as in a the data are as deep as it can go way. We have worked to improve this using some clever statistics (see below). Here we make practical use of the de-blended data by examining the galaxy Main Sequence (MS). If you were unaware, the MS is a tight correlation between the star formation rate (SFR) and stellar mass (M*) of all star forming galaxies. Star burst galaxies lie above the MS and quenched or quiescent galaxies lie below.

Far infrared (FIR)/sub-mm data are important in constraining SFRs. Young stars emit in the UV, which is partially absorbed by dust and then re-radiated thermally in the FIR/sub-mm. Thus, ignoring the FIR regime will result in missing the obscured star formation. Sub-mm interferometers, like ALMA, aren't really survey telescopes and single dish FIR/sub-mm observatories (like JCMT) have to look through the water filled atmosphere. This leaves Herschel and it's (now not as badly) confused data.

With our better SFR estimates, we looked to see how the MS evolves since redshift of 6 until z = 0.2. As far as I'm aware, this is the widest redshift range for a single, self consistent MS study. With this, we discovered that the slope of the MS, effectively how high mass galaxies for stars with respect to their low mass counterparts, increases with redshift out to z = 6.0.

You can read the paper here.

## De-blending Deep Herschel Surveys: A Multi-wavelength Approach

Deep Far-infrared (FIR) surveys suffer from confusion. They also are useful to study the star formation rate (SFR) of galaxies as the UV light from young stars is reprocessed by dust and emitted in the FIR. So, we want to use the FIR data but confusion prevents us from going deep.

To help deal with the confusion issue, the Herschel Extragalactic Project (HELP), more specifically Peter Hurley, developed a tool called XID+ to de-blend confused images. It uses Bayesian statistics with MCMC to determine the most likely flux for each source in a source catalogue. The source catalogue should come from a higher resolution instrument (typically near infrared or optical) and is assumed to be complete (so it will unfortunately miss really red objects). However, in its vanilla form, the prior on the flux density is fairly simplistic, just a flat prior between zero and the brightest pixel in the map and the same for all objects.

To improve this, with help from Peter, we have implemented a more informed, truncated Gaussian prior for flux densities. The mean and standard deviation for the Gaussian come from spectral energy distribution (SED) fitting using a multi-wavelength source catalogue and the SED modelling and fitting tool CIGALE. CIGALE can generate estimates for the fluxes in any band when it fits SEDs to the data. As with vanilla XID+, we truncate the Gaussian to be between zero and the brightest pixel in the map.

To make sure we are not just whimsically adding a useless prior, we compared the vanilla prior with our informed prior using data from ALMA. With CIGALE, the ALMA flux densities were estimated with the flat prior and informed prior and compared to ALMA observations. We found that the informed prior provides better agreement with the ALMA observations than the vanilla prior.

# Latest Preprints

Preprints of the latest papers. Assuming the query works...

## Effect of galaxy mergers on star formation rates

### W. J. Pearson, L. Wang, M. Alpaslan et al.

Galaxy mergers and interactions are an integral part of our basic understanding of how galaxies grow and evolve over time. However, the effect that galaxy mergers have on star formation rates (SFR) is contested, with observations of galaxy mergers showing reduced, enhanced and highly enhanced star formation. We aim to determine the effect of galaxy mergers on the SFR of galaxies using statistically large samples of galaxies, totalling over 200\,000, over a large redshift range, 0.0 to 4.0. We train and use convolutional neural networks to create binary merger identifications (merger or non-merger) in the SDSS, KiDS and CANDELS imaging surveys. We then compare the galaxy main sequence subtracted SFR of the merging and non-merging galaxies to determine what effect, if any, a galaxy merger has on SFR. We find that the SFR of merging galaxies are not significantly different from the SFR of non-merging systems. The changes in the average SFR seen in the star forming population when a galaxy is merging are small, of the order of a factor of 1.2. However, the higher the SFR above the galaxy main sequence, the higher the fraction of galaxy mergers. Galaxy mergers have little effect on the SFR of the majority of merging galaxies compared to the non-merging galaxies. The typical change in SFR is less than 0.1~dex in either direction. Larger changes in SFR can be seen but are less common. The increase in merger fraction as the distance above the galaxy main sequence increases demonstrates that galaxy mergers can induce starbursts.

## Identifying Galaxy Mergers in Observations and Simulations with Deep Learning

### W. J. Pearson, L. Wang, J. W. Trayford et al.

Mergers are an important aspect of galaxy formation and evolution. We aim to test whether deep learning techniques can be used to reproduce visual classification of observations, physical classification of simulations and highlight any differences between these two classifications. With one of the main difficulties of merger studies being the lack of a truth sample, we can use our method to test biases in visually identified merger catalogues. A convolutional neural network architecture was developed and trained in two ways: one with observations from SDSS and one with simulated galaxies from EAGLE, processed to mimic the SDSS observations. The SDSS images were also classified by the simulation trained network and the EAGLE images classified by the observation trained network. The observationally trained network achieves an accuracy of 91.5% while the simulation trained network achieves 65.2% on the visually classified SDSS and physically classified EAGLE images respectively. Classifying the SDSS images with the simulation trained network was less successful, only achieving an accuracy of 64.6%, while classifying the EAGLE images with the observation network was very poor, achieving an accuracy of only 53.0% with preferential assignment to the non-merger classification. This suggests that most of the simulated mergers do not have conspicuous merger features and visually identified merger catalogues from observations are incomplete and biased towards certain merger types. The networks trained and tested with the same data perform the best, with observations performing better than simulations, a result of the observational sample being biased towards conspicuous mergers. Classifying SDSS observations with the simulation trained network has proven to work, providing tantalizing prospects for using simulation trained networks for galaxy identification in large surveys.

## A multi-wavelength de-blended Herschel view of the statistical properties of dusty star-forming galaxies across cosmic time

### L. Wang, W. J. Pearson, W. Cowley et al.

We aim to study the statistical properties of dusty star-forming galaxies, such as their number counts, luminosity functions (LF) and dust-obscured star-formation rate density (SFRD). We use state-of-the-art de-blended Herschel catalogue in the COSMOS field, generated by combining the Bayesian source extraction tool XID+ and informative prior on the spectral energy distributions, to measure the number counts and LFs at far-infrared (FIR) and sub-millimetre (sub-mm) wavelengths. Thanks to our de-confusion technique and deep multi-wavelength photometry, we are able to achieve more accurate measurements while probing ten times below the confusion limit. Our number counts at 250 microns agree well with previous Herschel studies. However, our counts at 350 and 500 microns are considerably below previous Herschel results. This is due to previous studies suffering from source confusion which worsens towards longer wavelength. Our number counts at 450 and 870 microns show excellent agreement with previous determinations derived from single dish and interferometric observations. Our measurements of the monochromatic LF and the total IR LF agree well with previous results. The increased dynamic range of our measurements allows us to better measure the faint-end slope of the LF and measure the dust-obscured SFRD out to z~6. We find that the fraction of dust obscured star-formation activity is at its highest around z~1 which then decreases towards both low and high redshift. We do not find a shift of balance between z~3 and z~4 in the cosmic star-formation history from being dominated by unobscured star formation at higher redshift to obscured star formation at lower redshift. However, we do find 3

## Deep Learning for Galaxy Mergers in the Galaxy Main Sequence

### William J. Pearson, Lingyu Wang, James Trayford et al.

Starburst galaxies are often found to be the result of galaxy mergers. As a result, galaxy mergers are often believed to lie above the galaxy main sequence: the tight correlation between stellar mass and star formation rate. Here, we aim to test this claim. Deep learning techniques are applied to images from the Sloan Digital Sky Survey to provide visual-like classifications for over 340 000 objects between redshifts of 0.005 and 0.1. The aim of this classification is to split the galaxy population into merger and non-merger systems and we are currently achieving an accuracy of 91.5%. Stellar masses and star formation rates are also estimated using panchromatic data for the entire galaxy population. With these preliminary data, the mergers are placed onto the full galaxy main sequence, where we find that merging systems lie across the entire star formation rate - stellar mass plane.

## Main sequence of star forming galaxies beyond the Herschel confusion limit

Deep far-infrared (FIR) cosmological surveys are known to be affected by source confusion, causing issues when examining the main sequence (MS) of star forming galaxies. This has typically been partially tackled by the use of stacking. However, stacking only provides the average properties of the objects in the stack. This work aims to trace the MS over $0.2\leq z<6.0$ using the latest de-blended Herschel photometry, which reaches $\approx10$ times deeper than the 5$\sigma$ confusion limit in SPIRE. This provides more reliable star formation rates (SFRs), especially for the fainter galaxies, and hence a more reliable MS. We built a pipeline that uses the spectral energy distribution (SED) modelling and fitting tool CIGALE to generate flux density priors in the Herschel SPIRE bands. These priors were then fed into the de-blending tool XID+ to extract flux densities from the SPIRE maps. Multi-wavelength data were combined with the extracted SPIRE flux densities to constrain SEDs and provide stellar mass (M$_{\star}$) and SFRs. These M$_{\star}$ and SFRs were then used to populate the SFR-M$_{\star}$ plane over $0.2\leq z<6.0$. No significant evidence of a high-mass turn-over was found; the best fit is thus a simple two-parameter power law of the form log(SFR)$=\alpha[$log(M$_{\star})-10.5]+\beta$. The normalisation of the power law increases with redshift, rapidly at $z\lesssim1.8$, from $0.58\pm0.09$ at $z\approx0.37$ to $1.31\pm0.08$ at $z\approx1.8$. The slope is also found to increase with redshift, perhaps with an excess around $1.8\leq z<2.9$. The increasing slope indicates that galaxies become more self-similar as redshift increases, implying that the specific SFR of high-mass galaxies increases over $z=0.2$ to $z=6.0$, becoming closer to that of low-mass galaxies. The excess in the slope at $1.8\leq z<2.9$, if present, coincides with the peak of the cosmic star formation history.