Discover the Queenslander

Discover The Queenslander is a web-based interface to a wonderful collection of high resolution scans of “The Queenslander”; a weekly supplement to the Brisbane Courier newspaper published between 1866 and 1939. Mitchell Whitelaw and I conducted the project through Digital Treasures, a research program focused on developing new ways to represent, access and apply digital cultural collections. There is lots to report on with regard to this project (data manipulation, interface design, and working with Angular.js for starters) but for this entry I want to outline our investigation into colour. Colour is definitely one of the most striking characteristics of The Queenslander collection and we determined early in the project to use colour as a way to explore and classify the works – effectively employing colour as another form of metadata.

Massaging metadata

When working with image collections there is a heavy reliance on metadata. It’s the metadata that is most commonly used to describe and organise the items within a collection. Date, name, title, role, location, media, and format are typical meta keys that can be used to allow an audience to find specific items. These keys can also be used in concert to reveal connections and narratives enmeshed within a collection. For example, collect all images produced within a date-range. Or, all images produced by a particular artist, in a specific location, and within a date range.

An added advantage to working with this kind of metadata is that it is easily legible (being text) and also easy to work with computationally. For example, sorting and grouping by values such as date, title, author, role, media, or format is a cinch from a technical perspective but can provide valuable fresh perspectives on a collection. In addition to such meta information there is also the data of the digital items themselves. The actual words of a text document can be employed for searching and organising a collection. When it comes to images there is no easily legible text but there is the potential to work with pixels in much the same way.

Pixels as metadata

So while textual/numerical metadata is powerful and convenient it’s also worth considering less legible and non-textual forms of metadata. The pixels in an image for example. We are familiar with a pixel as a tiny tile in the mosaic which comprises a digital image, but in the context of digital collections pixels can serve as another valuable form of metadata. In the same way that we can filter a set of items based on their media type, or author, or publication date, we can filter items based on the values of their pixels.

Scale and fidelity

When working with colour as metadata two key problems immediately crop up. The first is scale. Searching through every pixel in every image in a collection quickly becomes untenable. For example, a collection of 10,000 images, each at 150 x 250 pixels = 375 million pixels. So each time you want to search or sort by a particular colour, you need to compare 375 million pixels. That’s an awful lot of data-crunching and would require serious computing power to return results in a timely fashion. Add more images to the collection and the problem worsens.

The second key issue that arises is that of fidelity. The standard 8-bit RGB colour space supports up to 16.8 million colours, which is great when it comes to showing rich colours but impractical as a set of meta keys. Even amongst the 375 million pixels in our 10,000 image collection there will be very few exact colour matches – the pixels will tend to always be slightly different.

Combining these issues, we have a process which is unreasonably slow (comparing 375 million pixels) producing a result which is unreasonably fussy (finding very few matches for any given colour).

Normalisation

One way to address both problems is with image normalisation. Normalisation reduces the fidelity of colour, making images visually coarse but more uniform. For example, instead of each pixel being one of 16.8 million possibilities we could restrict it to be one of the 139 prescribed colours in the CSS4 palette. 139 colours is much more manageable than 16.8 million and the incidence of colour matching in our 10,000 image collection is radically improved. You can see this approach in action at the Cooper Hewitt. A key advantage is that much of the colour matching can be pre-baked: the images can be processed offline and their CSS colours recorded just like any other metadata. So when a user selects CSS “crimson” (or #dc143c) the server can retrieve the relevant items just as it would with any other meta keyword (author, media, year, etc.). By comparison, in the live-computation example above the colours of 375 million pixels need to be compared every time a user selects a colour – there is no pre-baking and it is therefore much, much more computationally intensive (slow).

The Cooper Hewitt example shows how fast and effective the pre-bake approach can be. However, I think the main limitation is with regards to the subtlety of the colours – a lot of colour character is lost in the process of normalisation. This kind of loss was particularly concerning when considering the Queenslander collection because the subtlety of the colours is so integral to the character of the images and collection as a whole. Intent to find a solution that preserved that colour subtlety but without compromising speed and utility we decided to pursue a par-cook approach.

Par-cooked

Our work combines the pre-baked and live-computation approaches into a kind of par-cooked solution. As with the pre-baked approach, we prepare normalised palettes for each image but unlike the pre-bake we don’t prescribe the palette. Instead of forcing the image colours into a predefined CSS palette (or similar), we reduce them down to a 12 colour palette. The 12 colours of each image are determined by the image itself and as a result the individual local palettes are much truer to the character of each image.

Queenslander grid image
Grid image
Queenslander grid image - meta info
The image local palette

 

 

 

 

 

Because we tune our local palettes to each image the process produces many more colours than the 139 of the CSS palette. As a result, unlike Cooper Hewitt, we cannot pre-bake colour matches and instead need to live-compute colour matches. In the 10,000 image example above the live-compute approach did not scale – the more images/colours added the slower it became. However, in the case of the Queenslander the live-compute approach is feasible because of the modest collection size: 989 images, each with a palette of 12 colours = a maximum of 11,868 swatches to compare for each colour sort – not an issue for contemporary computers (even the mobile varieties).

When completing the live colour matching we also determine the colour “weight” of each matched image: items with a large quantity of the filter colour have a greater weighting than those with only small traces. It means we can sort the matched items by colour weight.

Queenslander colour sort
Items are sorted by colour weight – those with more red appear first.

In addition to colour matches (and weights) the live-compute approach allows us to generate global palettes dynamically. Instead of the predefined 139 colours of the CSS palette, we generate a global palette of 64 colours based on the colour swatches of the items in the current selection. The process is similar to how the local image palettes are produced and like them, the global palette offers a truer representation of a particular set of images.

Queenslander colour ribbon
Queenslander global colour ribbon for all 989 images