This text is meant as an introduction to the subject of typography, both in general but more importantly as it relates to R. If you are more interested in how to use systemfonts to use fonts installed on your computer during plotting then please see the package introduction vignette.
The code examples in this vignette is based on fonts that may not be available on the users machine. As such, you should not expect the provided examples to execute locally out of the box.
Digital typography
Many books could be, and have been, written about the subject of typography. This blog post is not meant to be an exhaustive deep dive into all areas of this vast subject. Rather, it is meant to give you just enough understanding of core concepts and terminology to appreciate how it all plays into using fonts in R.
Typeface or font?
There is a good chance that you, like 99% of world, use “font” as the term describing “the look” of the letters you type. You may, perhaps, have heard the term “typeface” as well and thought it synonymous. This is in fact slightly wrong, and a great deal of typography snobbery has been dealt out on that account (much like the distinction between packages and libraries in R). It is a rather inconsequential mix-up for the most part, especially because 99% of the population wouldn’t bat an eye if you use them interchangeably. However, the distinction between the two serves as a good starting point to talk about other terms in digital typography as well as the nature of font files, so let’s dive in.
When most people use the word “font” or “font family”, what they are actually describing is a typeface. A typeface is a style of lettering that forms a cohesive whole. As an example, consider the well-known “Helvetica” typeface. This name embraces many different weights (bold, normal, light) as well as slanted (italic) and upright. However, all of these variations are all as much Helvetica as the others - they are all part of the same typeface.
A font is a subset of a typeface, describing a particular variation of the typeface, i.e. the combination of weight, width, and slant that comes together to describe the specific subset of a typeface that is used. We typically give a specific combination of these features a name, like “bold” or “medium” or “italic”, which we call the font style1. In other words, a font is a particularly style within a typeface.
In the rest of this document we will use the terms typeface and font with the meaning described above.
Font files
Next, we need to talk about how typefaces are represented for use by computers. Font files record information on how to draw the individual glyphs (characters), but also instructions about how to draw sequences of glyphs like distance adjustments (kerning) and substitution rules (ligatures). Font files typically encode a single font but can encode a full typeface:
typefaces <- systemfonts::system_fonts()[, c("path", "index", "family", "style")]
# Full typeface in one file
typefaces[typefaces$family == "Helvetica", ]
#> # A tibble: 6 × 4
#> path index family style
#> <chr> <int> <chr> <chr>
#> 1 /System/Library/Fonts/Helvetica.ttc 2 Helvetica Oblique
#> 2 /System/Library/Fonts/Helvetica.ttc 4 Helvetica Light
#> 3 /System/Library/Fonts/Helvetica.ttc 5 Helvetica Light Oblique
#> 4 /System/Library/Fonts/Helvetica.ttc 1 Helvetica Bold
#> 5 /System/Library/Fonts/Helvetica.ttc 3 Helvetica Bold Oblique
#> 6 /System/Library/Fonts/Helvetica.ttc 0 Helvetica Regular
# One font per font file
typefaces[typefaces$family == "Arial", ]
#> # A tibble: 4 × 4
#> path index family style
#> <chr> <int> <chr> <chr>
#> 1 /System/Library/Fonts/Supplemental/Arial.ttf 0 Arial Regular
#> 2 /System/Library/Fonts/Supplemental/Arial Bold.ttf 0 Arial Bold
#> 3 /System/Library/Fonts/Supplemental/Arial Bold Italic.ttf 0 Arial Bold Italic
#> 4 /System/Library/Fonts/Supplemental/Arial Italic.ttf 0 Arial Italic
Here, each row is a font, with family giving the name of the typeface, and style the font style.
It took a considerable number of tries before the world managed to nail the digital representation of fonts, leading to a proliferation of file types. As an R user, there are three formats that are particularly important:
TrueType (ttf/ttc). Truetype is the baseline format that all modern formats stand on top of. It was developed by Apple in the ’80s and became popular due to its great balance between size and quality. Fonts can be encoded, either as scalable paths, or as bitmaps of various sizes, the former generally being preferred as it allows for seamless scaling and small file size at the same time.
OpenType (otf/otc). OpenType was created by Microsoft and Adobe to improve upon TrueType. While TrueType was a great success, the number of glyphs it could contain was limited and so was its support for selecting different features during shaping. OpenType resolved these issues, so if you want access to advanced typography features you’ll need a font in OpenType format.
Web Open Font Format (woff/woff2). TrueType and OpenType tend to create large files. Since a large percentage of the text consumed today is delivered over the internet this creates a problem. WOFF resolves this problem by acting as a compression wrapper around TrueType/OpenType to reduce file sizes while also limiting the number of advanced features provided to those relevant to web fonts. The woff2 format is basically identical to woff except it uses the more efficient brotli compression algorithm. WOFF was designed specifically to be delivered over the internet and support is still a bit limited outside of browsers.
While we have mainly talked about font files as containers for the shape of glyphs, they also carries a lot of other information needed for rendering text in a way pleasant for reading. Font level information records a lot of stylistic information about typeface/font, statistics on the number of glyphs and how many different mappings between character encodings and glyphs it contains, and overall sizing information such as the maximum descend of the font, the position of an underline relative to the baseline etc. systemfonts provides a convenient way to access this data from R:
systemfonts::font_info(family = "Helvetica")
#> # A tibble: 1 × 24
#> path index family style italic bold monospace weight width kerning color scalable vertical n_glyphs n_sizes
#> <chr> <int> <chr> <chr> <lgl> <lgl> <lgl> <ord> <ord> <lgl> <lgl> <lgl> <lgl> <int> <int>
#> 1 /Sys… 0 Helve… Regu… FALSE FALSE FALSE normal norm… FALSE FALSE TRUE FALSE 2252 0
#> # ℹ 9 more variables: n_charmaps <int>, bbox <list>, max_ascend <dbl>, max_descend <dbl>,
#> # max_advance_width <dbl>, max_advance_height <dbl>, lineheight <dbl>, underline_pos <dbl>,
#> # underline_size <dbl>
Further, for each glyph there is a range of information in addition to its shape:
systemfonts::glyph_info("j", family = "Helvetica", size = 30)
#> # A tibble: 1 × 9
#> glyph index width height x_bearing y_bearing x_advance y_advance bbox
#> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <list>
#> 1 j 77 6 27 -1 21 7 0 <dbl [4]>
These terms are more easily understood with a diagram:
The x_advance
in particular is important when rendering
text because it tells you how far to move to the right before rendering
the next glyph (ignoring for a bit the concept of kerning)
Text shaping
The next important concept to understand is text shaping, which, in the simplest of terms, is to convert a succession of characters into a sequence of glyphs along with their locations. Important here is the distinction between characters, the things you think of as letters, and glyphs, which is what the font will draw. For example, think of the character “f”, which is often tricky to draw because the “hook” of the f can interfere with other characters. To solve this problem, many typefaces include ligatures, like “fi”, which are used for specific pairs of characters. Ligatures are extremely important for languages like Arabic.
A few of the challenges of text shaping include kerning, bidirectional text, and font substitution. Kerning is the adjustment of distance between specific pairs of characters. For example, you can put “VM” a little closer together but “OO” needs to be a little further apart. Kerning is an integral part of all modern text rendering and you will almost solemnly notice it when it is absent (or worse, wrongly applied).
Not every language writes text in the same direction, but regardless of your native script, you are likely to use arabic numerals which are always written left-to-right. This gives rise to the challenge of bidirectional (or bidi) text, which mixes text flowing in different directions. This imposes a whole new range of challenges!
Finally, you might request a character that a font doesn’t contain. One way to deal with this is to render a glyph representing a missing glyph, usually an empty box or a question mark. But it’s typically more useful to use the correct glyph from a different font. This is called font fallback and happens all the time for emojis, but can also happen when you suddenly change script without bothering to pick a new font. Font fallback is an imprecise science, typically relying on an operating system font that has a very large number of characters, but might look very different from your existing font.
Once you have determined the order and location of glyphs, you are still not done. Text often needs to be wrapped to fit into a specific width, it may need a specific justification, perhaps, indentation or tracking must be applied, etc. Thankfully, all of this is generally a matter of (often gnarly) math that you just have to get right. That is, all except text wrapping which should happen at the right boundaries, and may need to break up a word and inserting a hyphen etc.
Like I said, the pit of despair is bottomless…
Font handling in R
You hopefully arrive at this section with an appreciation of the horrors that goes into rendering text. If not, maybe this blog post will convince you.
Are you still here? Good.
Now that you understand the basics of what goes into handling fonts and text, we can now discuss the details of fonts in R specifically.
Fonts and text from a user perspective
The users perception of working with fonts in R is largely shaped by plots. This means using either base or grid graphics or one of the packages that have been build on top of it, like ggplot2. While the choice of tool will affect where you specify the font to use, they generally agree on how to specify it.
Graphic system | Argument | ||
---|---|---|---|
Typeface | Font | Size | |
Base Arguments are passed to |
family |
font |
cra (pixels) or cin (inches) multiplied by
cex
|
Grid Arguments are passed to the |
fontfamily |
fontface |
fontsize (points) multiplied by cex
|
ggplot2 Arguments are set in |
family |
face (in element_text() ) or
fontface (in geoms) |
size (points when used in element_text() ,
depends on the value of size.unit argument when used in
geom) |
From the table it is clear that in R
fontfamily
/family
is used to describe the
typeface and font
/fontface
/face
is used to select a font from the typeface. Size settings is just a
plain mess.
The major limitation in fontface
(and friends) is that
it takes a number, not a string, and you can only select from four
options: 1
: plain, 2
: bold, 3
:
italic, and 4
: bold-italic. This means, for example, that
there’s no way to select Futura Condensed Extra Bold. Another limitation
is that it’s not possible to specify any font variations such as using
tabular numbers or stylistic ligatures.
Fonts and text from a graphics device perspective
In R, a graphics device is the part responsible for doing the
rendering you request and put it on your screen or in a file. When you
call png()
or ragg::agg_png()
you open up a
graphics device that will receive all the plotting instructions from R.
Both graphics devices will ultimately produce the same file type (PNG),
but how they choose to handle and respond to the plotting instructions
may differ (greatly). Nowhere is this difference more true than when it
comes to text rendering.
After a user has made a call that renders some text, it is funneled
through the graphic system (base or grid), handed off to the graphics
engine, which ultimately asks the graphics device to render the text.
From the perspective of the graphics device it is much the same
information that the user provided which are presented to it. The
text()
method of the device are given an array of
characters, the typeface, the size in points, and an integer denoting if
the style is regular, bold, italic, or bold-italic.
This means that it is up to the graphics device to find the appropriate font file (using the provided typeface and font style) and shape the text with all that that entails. This is a lot of work, which is why text is handled so inconsistently between graphics devices. Issues can range from not being able to find fonts installed on the computer, to not providing font fallback mechanisms, or even handling right-to-left text. It may also be that certain font file formats are not well supported so that e.g. color emojis are not rendered correctly.
There have been a number of efforts to resolve these problems over the years:
extrafont: Developed by Winston Chang, extrafont sought to mainly improve the situation for the
pdf()
device which generally only had access to the postscript fonts that comes with R. The package allows thepdf()
device to get access to TrueType fonts installed on the computer, as well as provide means for embedding the font into the PDF so that it can be opened on systems where the font is not installed. (It also provides the capabilities to the Windowspng()
device).sysfonts and showtext. These packages are developed by Yixuan Qiu and provide support for system fonts to all graphics devices, by hijacking the
text()
method of the graphics device to treat text as polygons or raster images. This guarantees your plots will look the same on every device, but it doesn’t do advanced text shaping, so there’s no support for ligatures or font substitution. Additionally, it produces large files with inaccessible text when used to produce pdf and svg outputs.systemfonts and textshaping. These packages are developed by me to provide a soup-to-nuts solution to text rendering for graphics devices. systemfonts provides access to fonts installed on the system along with font fallback mechanisms, registration of non-system fonts, reading of font files etc. textshaping builds on top of systemfonts and provides a fully modern engine for shaping text. The functionality is exposed both at the R level and at the C level, so that graphics devices can directly access to font lookup and shaping.