>A Grammar for Graphics

> A collection of terms and concepts to declare data visualization systematically

\ Otho Mantegazza _ Dataviz for Scientists _ Part 2.2

Can you Describe a Graph?

If we find a way to describe graphs systematically, then we can design and develop them more easily.

Most technical graphs can be declared with a system of rules called “Grammar of Graphics”.

This system of rules is the basis for many data visualization packages, such as ggplot2, Seaborn and Altair.

A Grammar for Graphics

The “Grammar of Graphics” was developed by Leeland Wilkinson.

It was later extended by Hadley Wickham, who started encoding it in the R package ggplot2.

Recently, a new API in the style of ggplot2 was included in a new version of Seaborn, for Python,

A Grammar for Graphics

The layered grammar of graphics defines graphics as composed of:

  • A default dataset and set of mappings from variables to aesthetics.
  • One or more layers, with each layer having one geometric object, one statistical transformation, one position adjustment, and optionally, one dataset and set of aesthetic mappings.
  • One scale for each aesthetic mapping used.
  • A coordinate system.
  • The facet specification.

Aesthetics

The word aesthetic is derived from the Ancient Greek αἰσθητικός (aisthētikós, “perceptive, sensitive, pertaining to sensory perception”), which in turn comes from αἰσθάνομαι (aisthánomai, “I perceive, sense, learn”) and is related to αἴσθησις (aísthēsis, “perception, sensation”). [Wikipedia]

Let’s Describe Graphs

Let’s describe three historical graphs in terms of the Grammar of Graphics.

  1. How are data mapped to aesthetics?
  2. What statistical transformation is applied?
  3. Which geometric object is used?
  4. What is the coordinate system?
  5. Are the data split in facets?

Describe the weather history by Robert Plot.

  • Aesthetics Mapping:
    • x: atmospheric pressure
    • y: day of the month
  • Statistical Transformation:
    • none / identity
  • Geometric Object:
    • stepped line
  • Coordinate System:
    • cartesian
  • Facets:
    • by month

Describe this semigraph by Lambert.

  • Aesthetics Mapping:
    • x: …
    • y: …
  • Statistical Transformation:
  • Geometric Object:
  • Coordinate System:
  • Facets:

Describe the radial histogram by Nightingale

[previous page]

  • Aesthetics Mapping:
    • x: …
    • y: …
  • Statistical Transformation:
  • Geometric Object:
  • Coordinate System:
  • Facets:

Let’s Describe Graphs

Just one more.

Let’s challenge ourselves a bit more.

Now describe the web based data visualization on the next page. Is a weather map taken from the beautiful app Windy.

Can you do it with the Grammar of Graphics as before? How many layers of information can you notice?

Describe the weather map by the app Windy.

[previous page]

  • Aesthetics Mapping:
    • x: …
    • y: …
  • Statistical Transformation:
  • Geometric Object:
  • Coordinate System:
  • Facets:

For the main data visualization, how many layers of information do you notice?

Seaborn’s Interface

Seaborn is one of the main tools for declaring graphics in Python, together with Pyplot and Altair.

Seaborn’s object interface is under active development and is based on the layered grammar of graphics.

It can be used both for explorative analysis and for publication ready graphs.

Packages

# Seaborn Object Interface
import seaborn.objects as so

# The Palmer penguins dataset;
# that we are going to use for practice
from palmerpenguins import load_penguins
penguins = load_penguins()

Learn more about Palmer Penguins for R and Python.

A Scatterplot…

A default dataset…

(
    so.Plot(
      data = penguins
      )
    .layout(size=(5, 6))
)

A set of mappings from variables to aesthetics…

(
  so.Plot(
      data = penguins,
      x = 'bill_length_mm',
      y = 'bill_depth_mm'
      )
    .layout(size=(5, 6))
  )

One or more layers, with geometric object, related to the aesthetic mappings.

(
  so.Plot(
      data = penguins,
      x = 'bill_length_mm',
      y = 'bill_depth_mm'
      )
    .add(
      so.Dot()
    )
    .layout(size=(5, 6))
  )

More variables mapped to aesthetics and represented by the geometric object.

(
  so.Plot(
      data = penguins,
      x = 'bill_length_mm',
      y = 'bill_depth_mm',
      color = 'species',
      )
    .add(
      so.Dot(),
      marker = 'sex'
    )
    .layout(size=(5, 6))
  )

A layer with a different geometric object and a statistical transformation.

(
  so.Plot(
      data = penguins,
      x = 'bill_length_mm',
      y = 'bill_depth_mm',
      color = 'species'
      )
    .add(
      so.Dot(),
      marker = 'sex'
    )
    .add(
      so.Line(),
      so.PolyFit(order = 1)
    )
    .layout(size=(5, 6))
  )

Aesthetics can be mapped to all or to just one layer.

(
  so.Plot(
      data = penguins,
      x = 'bill_length_mm',
      y = 'bill_depth_mm',
      color = 'species'
      )
    .add(
      so.Dot(),
      marker = 'sex'
    )
    .add(
      so.Line(),
      so.PolyFit(order = 1)
    )
    .layout(size=(5, 6))
  )

A facet specification.

(
  so.Plot(
      data = penguins,
      x = 'bill_length_mm',
      y = 'bill_depth_mm',
      color = 'species'
      )
    .add(
      so.Dot(),
      marker = 'sex'
    )
    .add(
      so.Line(),
      so.PolyFit(order = 1)
    )
    .facet(
      row='sex'
    )
    .layout(size=(5, 6))
  )

A Histogram

A default dataset…

(
  so.Plot(
      data = penguins
  )
  .layout(size=(5, 6))
)

A set of mappings from variables to aesthetics…

(
  so.Plot(
      data = penguins,
      x = 'bill_length_mm'
  )
  .layout(size=(5, 6))
)

A layer including geometric objects and a statistical transformation.

(
  so.Plot(
      data = penguins,
      x = 'bill_length_mm'
  )
  .add(
    so.Bar(),
    so.Hist()
  )
  .layout(size=(5, 6))
)

More aesthetic mappings, and a position adjustment.

(
  so.Plot(
      data = penguins,
      x = 'bill_length_mm',
      color = 'species'
  )
  .add(
    so.Bar(),
    so.Hist(),
    so.Stack()
  )
  .layout(size=(5, 6))
)

The facet specification.

(
  so.Plot(
      data = penguins,
      x = 'bill_length_mm',
      color = 'species'
  )
  .add(
    so.Bar(),
    so.Hist(),
    so.Stack()
  )
  .facet(
    row='sex'
  )
  .layout(size=(5, 6))
)

A Horizontal Stacked Bar Chart

A default dataset…

(
  so.Plot(
      data = penguins
  )
  .layout(size=(5, 6))
)

A set of mappings from variables to aesthetics…

(
  so.Plot(
      data = penguins,
      y = 'species'
  )
  .layout(size=(5, 6))
)

A layer including geometric objects and a statistical transformation.

(
  so.Plot(
      data = penguins,
      y = 'species'
  )
  .add(
    so.Bar(),
    so.Hist()
  )
  .layout(size=(5, 6))
)

More aesthetic mappings, and a position adjustment.

(
  so.Plot(
      data = penguins,
      y = 'species',
      color = 'sex'
  )
  .add(
    so.Bar(),
    so.Hist(),
    so.Stack()
  )
  .layout(size=(5, 6))
)

A facet specification.

(
  so.Plot(
      data = penguins,
      y = 'species',
      color = 'sex'
  )
  .add(
    so.Bar(),
    so.Hist(),
    so.Stack()
  )
  .facet(
    row = 'island'
  )
  .layout(size=(5, 6))
)

Remove empty bars.

(
  so.Plot(
      data = penguins,
      y = 'species',
      color = 'sex'
  )
  .add(
    so.Bar(),
    so.Hist(),
    so.Stack()
  )
  .facet(
    row = 'island'
  )
  .share(y = False)
  .layout(size=(5, 6))
)

Change the position adjustment.

(
  so.Plot(
      data = penguins,
      y = 'species',
      color = 'sex'
  )
  .add(
    so.Bar(),
    so.Hist(),
    so.Dodge()
  )
  .facet(
    row = 'island'
  )
  .share(y = False)
  .layout(size=(5, 6))
)

Exercise

Learn about the visual models available in Seaborn and use them to explore the Palmer Penguins dataset, that you can import into python using the palmerpenguins package.

For each visual model that you use:

  • Describe it in term of the Grammar of Graphics.
  • Explain what it shows about the data, which pattern it highlights, what impression it gives us about the patterns in the data.