Graphics with scientific data become clearer when the colours are chosen carefully. It is convenient to have good default schemes ready for each type of data, with colours that are:
My default colour scheme for qualitative data is the bright scheme in Fig. 1. Colour coordinates (R,G,B) are given in the RGB colour system (red R, green G and blue B), decimal at the top and hexadecimal below. One alternative is the vibrant scheme in Fig. 2, designed for data visualization framework TensorBoard. Another alternative is the muted scheme in Fig. 3, which has more colours, but lacks a clear red or medium blue.
The bright, vibrant and muted schemes work well for plot lines and map regions, but the colours are too strong to use for backgrounds to mark (black) text, typically in a table. For that purpose, the pale scheme is designed (Fig. 4, top). The colours are inherently not very distinct from each other, but they are clear in a white area. The dark scheme (Fig. 4, bottom) is meant for text itself on a white background, for example to mark a large block of text. The idea is to use one dark colour for support, not all combined and not for just one word.
There are situations where a scheme is needed between the bright and pale schemes, for example (Fig. 7) for backgrounds in a table where more colours are needed than available in the pale scheme and where the coloured areas are small. For this purpose, the light scheme of Fig. 5 is designed.
Colour names have been added to the scheme definitions as mnemonics for the maker of a figure, not necessarily for use in text: a reader should not have to guess what olive looks like. Colours are identified uniquely by their names within the collective of the bright, pale, dark and light schemes, whereas the vibrant and muted schemes reuse some names for different colours.
The colours within a qualitative scheme are given in order of changing hue, but the colours can be picked at random. Often, a data type suggests an appropriate choice or similar data types can be grouped by giving them similar colours. If the colours have to be picked in a fixed sequence, a good order for each scheme is as follows.
Examples of the use of the qualitative schemes are given in Fig. 6 for lines of the Tokyo metro and in Fig. 7 for cell backgrounds and text blocks. The application to maps is shown in Fig. 8. It is a stylized variation of the diagnostic map used by the ColorBrewer website. In one area, all colours are shown in a random pattern. In other areas, basically one colour is shown, but with one small area of each other colour included. This indicates how well the colours within a scheme are identifiable when all of them are used.
The design of the qualitative schemes involved four types of calculations:
All colours on this site are defined in sRGB colour space, the default used by most software and displays. Printers work in a different colour space that also varies from model to model. When they conform to international standard ISO 12647-2 and the exact printing conditions are not known beforehand, it is recommended to assume the CMYK colour space provided by colour profile ISO Coated v2 300 %. All scheme colours are taken from the overlap between this and the sRGB colour spaces. Individual printers may deviate, probably not so much that colours become unrecognizable, but enough to push some colours closer together. However, it is not possible to take individual printers into account.
The Netherlands Standardization Institute NEN has issued a code of practice which includes a recommended scheme with eight colours, three greys and white. The colours are bright, but differences between them in colour-blind vision are often much smaller than the smallest difference in the bright, vibrant or muted schemes, two colours are not print-friendly and they cannot be quoted without infringing copyright.
Diverging schemes are for ordered data between two extremes where the midpoint is important. Such schemes could be constructed simply by scaling the colour coordinates linearly, e.g. from blue to white to red. However, by including subtle hue changes, the colours are more distinct and the schemes more attractive. Figures 9, 10 and 11 show the sunset, BuRd and PRGn schemes, which are tweaked versions of schemes on the ColorBrewer website. The darkest shades of the original versions have been removed, because they are too dark and similar to be used in practice. The circled colour is meant for bad data, without drawing attention away from good data with a large deviation from zero. The sunset scheme was designed for situations where bad data have to be shown white. The three schemes look similar in colour-blind vision, so if more than one is used, do not reverse the direction in one of them. If more colours than shown are needed from a given scheme, use a continuous version of the scheme instead of the discrete colours, by linearly interpolating the colour coordinates. If fewer colours are needed, pick colours at equidistant points in the continuous version. Examples of the use of the diverging schemes for maps are given in Figs. 12 and 13.
Sequential schemes are for ordered data from low to high. The YlOrBr scheme given in Fig. 14 is a tweaked version of the ColorBrewer YlOrBr scheme. The most distinct grey is also given, useful for data gaps; it is not meant for extreme values. If more colours than shown are needed from this scheme, use a continuous version of the scheme instead of the discrete colours, by linearly interpolating the colour coordinates. If fewer colours are needed, pick colours at equidistant points in the continuous version.
There are many warnings that ordered data should not be shown with a rainbow scheme. The arguments are:
The discrete rainbow colour scheme is inspired by the temperature map of the weather forecast in newspaper de Volkskrant: unconnected curves in CIELAB colour space for purples, blues, greens and oranges, each sampled three times but the last one twice extra for yellow and red, in total 14 colours. The curves were straightened, shifted and sampled equidistantly to make the colours more distinct, reasonably colour-blind safe and print-friendly. Later, the lines were resampled with smaller distances and the scheme was extended towards white and black, to get 23 colours. Figure 17 shows how the two sets can be combined to make a scheme with any number of colours up to 23.
People usually find out at an early age whether they are colour-blind. However, there are subtle variants of colour-vision deficiency. The two main types are:
To simulate green-blindness, all RGB colours in an image are converted to R′G′B′ colours with
R ′ = (4211 + 0.677 G 2.2 + 0.2802 R 2.2)1/2.2,
G ′ = (4211 + 0.677 G 2.2 + 0.2802 R 2.2)1/2.2,
B ′ = (4211 + 0.95724 B 2.2 + 0.02138 G 2.2 − 0.02138 R 2.2)1/2.2,
with parameters R, G and B in the range 0–255 and the output values rounded. To simulate red-blindness, colours are shifted as follows:
R ′ = (782.7 + 0.8806 G 2.2 + 0.1115 R 2.2)1/2.2,
G ′ = (782.7 + 0.8806 G 2.2 + 0.1115 R 2.2)1/2.2,
B ′ = (782.7 + 0.992052 B 2.2 − 0.003974 G 2.2 + 0.003974 R 2.2)1/2.2.
These conversions should be applied in sRGB colour space, i.e. they work on a standard video display, but not necessarily on paper. The conversion can be performed with the free software suite ImageMagick. The following two commands make green-blind and red-blind versions of original image original.png, respectively:
|scheme||normal vision||green-blind vision||red-blind vision|
The colour schemes have not been designed to work after conversion to grey scale. Subsets of qualitative schemes that work are (from light to dark):
There are sequential schemes that have been designed specifically to work also after grey-scale conversion. Figure 22 shows two of them: viridis and cubehelix, taken from Python plotting library matplotlib. However, schemes like viridis do not seem to have more discernible colours than YlOrBr. The ends of the cubehelix scheme have an almost constant colour for a relatively long value range, and the scheme is ugly. Please consider the environment and do not print at all.
Some data sets need a very specific colour scheme. An example is the global land cover classification, as generated by the University of Maryland Department of Geography from AVHRR data acquired between 1981 and 1994, available at a resolution of 1 km. There is a recommended colour scheme, but the colours are not distinct, some not even in normal vision. Figure 23 gives a more subtle and logical scheme where all colours are distinct in all visions. Figure 24 shows the world with a reduced resolution of 20 km using this scheme. Figure 25 shows only North America at a resolution of 5 km, using almost all classes.
The following figures show the true physical figure of the Earth using two colour schemes: the smooth sunset scheme defined here and a traditional rainbow scheme as defined by many programs (e.g. IDL). Most rainbow schemes contain bands of almost constant hue with sharp transitions between them, which are perceived as jumps in the data. In this example large areas are green without features, while for example the yellow line in northern Australia implies a sudden change that does not exist in reality. The oceanic trench at top right can be seen over the whole length with the sunset scheme, while it becomes difficult to see near Japan with the traditional scheme. This shows that it is more important that a scheme is smooth than that it contains many colours. In addition, colour-blind people have difficulty distinguishing some colours of the rainbow. With the traditional scheme, the yellow spot in the middle of the Sahara is not visible in red-blind vision, making the green areas effectively even larger than they are in normal vision. Last but not least, the sunset scheme emphasizes more clearly the average (light colours) and the extremes (dark colours of contrasting hues).
In this context the meaning of the figures is not really important, but if you're interested: they show the distance between the WGS84 ellipsoid and the geoid calculated with the EGM96 gravity model.
Currently I produce maps in two steps. First, I perform the analysis in whatever program and export the data as a particular type of ASCII table. Then, I apply a colour look-up table to the data and export the result as a PNG file. This way, I don't have to redo the analysis if I want a different colour scale, e.g. to emphasize a different value range. The second step is performed on the command line with the command "convert" of ImageMagick, a free, cross-platform and open-source program suite for image manipulation.
The data are exported to a plain portable graymap (PGM) file, which is an ASCII file starting with "P2 w h 65535 ", where w is the width and h the height of the map, followed by a list of integers in the range 0 to (in my case) 65535 with a space or newline between the values. Because the width and height are given, the spaces and newlines can be put wherever you like. The "P2" identifies the file as a PGM file. I scale the data so all values are in the range 0–65532 (but you don't have to fill this whole range). The other possible values are reserved: 65533 for bad data, 65534 for text and lines and 65535 for no data (e.g. outside the projection of the world).
The colour look-up table (clut) is exported to a plain portable pixmap (PPM) file, which is an ASCII file starting with "P3 1 n 255 ", where n is the number of colours, followed by a list of RGB coordinates of the colours as integers in the range 0–255 with a space or newline between the values. I don't use perfectly white, because that may become transparent later on.
Now the image can be produced. The data is stored in input.pgm, the clut in clut.ppm and the image should be output.png. If none of the 3 reserved values are used, then the command is:
An example: given a map of the albedo at 750 nm including the legend (input.pgm) and a list of colours (clut.ppm), the output of the last two commands above (with "-level 0,9999" and without "-interpolate integer") is this image.
The SRON style for presentations has a dark grey background. I have produced a design template for PowerPoint 2003 or earlier and a theme for PowerPoint 2007 or later with a palette including the regular colours white (for titles), light yellow (for normal text) and orange (for stressed text), but also three extra colours that are suitable for use with beamers: the light blue of the footer and shades of green and red that work well.
The current official style of SRON reports, technotes, etc. can also be achieved in LaTeX and pdfLaTeX. Two files are needed:this. Under Debian, you may need to execute first (once) the command "sudo apt-get install texlive-fonts-extra".
I am an Instrument Scientist with a PhD in atomic physics, working on the TROPOMI project in the Earth programme of SRON. I have normal colour vision, but many colleagues have not. My email address: email@example.com.