```
library(ggiraph)
library(plotly)
library(highcharter)
library(dplyr)
library(purrr)
library(palmerpenguins)
<- na.omit(palmerpenguins::penguins) penguins
```

# Interactive Data Visualization with R

Interactive figures are an essential tool for communicating data insights, in particular in reports or dashboards. In this blog post, I compare different packages for dynamic data visualization in R. Before we dive into the comparison, here is a quick introduction to each contestant.

`ggiraph`

is a package designed to enhance the interactivity of `ggplot2`

visualizations. `ggiraph`

allows users to create dynamic and interactive graphics that can include features such as tooltips, clickable elements, and JavaScript actions. This is particularly useful for web-based data visualizations and interactive reporting.

`plotly`

is a powerful framework for creating interactive, web-based data visualizations directly from R. It serves as an interface to the `Plotly`

javascript library, enabling R users to create a wide range of highly interactive and dynamic plots that can be viewed in any web browser. One of the key features of `plotly`

is its ability to add interactivity to plots with minimal effort. Interactive features include tooltips, zooming, panning, and selection capabilities, allowing users to explore and interact with the data in depth. Furthermore, `plotly`

integrates seamlessly with the `ggplot2`

package, allowing users to convert `ggplot2`

figures into interactive plotly charts using the `ggplotly()`

function.

The `highcharter`

package is a wrapper for the `Highcharts`

javascript library and its modules. `Highcharts`

is very flexible and customizable javascript charting library and it has a powerful API. `highcharter`

stands out for its emphasis on creating visually appealing, interactive charts.

I compare code to generate `ggiraph`

, `plotly`

, `ggplotly`

, and `highcharter`

output in the post below. The types of plots that I chose for the comparison heavily draw on the examples given in R for Data Science - an amazing resource if you want to get started with data visualization. Spoiler alert: I’m not always able to replicate the same figure with all approaches (yet).

## Loading packages and data

We start by loading the main packages of interest (`ggiraph`

, `plotly`

, `highcharter`

), `dplyr`

and `purr`

for data manipulation tools, and the popular `palmerpenguins`

data. We then use the `penguins`

data frame as the data to compare all functions and methods below. Note that I drop all rows with missing values because I don’t want to get into related messages in this post.

## A full-blown example

Let”s start with an advanced example that combines many different aesthetics at the same time: we plot two columns against each other, use color and shape aesthetics do differentiate species, include separate regression lines for each species, manually set nice labels, and use a theme. You can click through the results in the tabs below.

Unfortunately, I wasn’t able to add species-specific regression lines to the `plotly`

output - do you have any idea? Feel free to drop a comment below. You can also see that adding regression lines to `highcharter`

plots requires a lot of manual tinkering compared to `ggplot2`

. Moreoever, `plotly`

does not support subtitles, while, for some reason, `plotly::ggplotly()`

and `highcharter`

don’t display the subtitles.

```
<- penguins |>
fig_full ggplot(aes(x = bill_length_mm, y = bill_depth_mm,
color = species, shape = species)) +
geom_point_interactive(
aes(tooltip = paste("bill_length_mm:", bill_length_mm, "<br>",
"bill_depth_mm:", bill_depth_mm)),
size = 2
+
) geom_smooth_interactive(method = "lm", formula = "y ~ x") +
labs(x = "Bill length (mm)", y = "Bill width (mm)",
title = "Bill length vs. bill width",
subtitle = "Using the ggiraph package",
color = "Species", shape = "Species") +
theme_minimal()
girafe(ggobj = fig_full)
```

```
|>
penguins plot_ly(x = ~bill_length_mm, y = ~flipper_length_mm,
color = ~species, symbol = ~species,
type = "scatter", mode = "markers", marker = list(size = 10)) |>
layout(
plot_bgcolor = 'white',
xaxis = list(title = "Bill Length (mm)", zeroline = FALSE, ticklen = 5),
yaxis = list(title = "Flipper Length (mm)", zeroline = FALSE, ticklen = 5),
title = "Bill length vs. bill width"
)
```

```
<- penguins |>
fig_full ggplot(aes(x = bill_length_mm, y = bill_depth_mm,
color = species, shape = species)) +
geom_point(size = 2) +
geom_smooth(method = "lm", formula = "y ~ x") +
labs(x = "Bill length (mm)", y = "Bill width (mm)",
title = "Bill length vs. bill width",
subtitle = "Using ggplot2 and ggplotly() from plotly",
color = "Species", shape = "Species") +
theme_minimal()
ggplotly(fig_full)
```

```
<- penguins |>
fig_full hchart(type = "scatter",
hcaes(x = bill_length_mm, y = flipper_length_mm,
color = species),
marker = list(radius = 5)) |>
hc_xAxis(title = list(text = "Bill length (mm)")) |>
hc_yAxis(title = list(text = "Bill width (mm)")) |>
hc_title(text = "Bill length vs. bill width") |>
hc_subtitle("Using highcharter") |>
hc_add_theme(hc_theme_ggplot2())
<- levels(penguins$species)
species_unique <- c("#440154", "#21908C", "#FDE725")
colors for(j in seq_along(species_unique)) {
<- penguins |>
penguins_subset filter(species == species_unique[j])
<- lm(flipper_length_mm ~ bill_length_mm, data = penguins_subset)
regression <- regression$coefficients[1]
intercept <- regression$coefficients[2]
slope <- range(penguins_subset$bill_length_mm, na.rm = TRUE)
values_range
<- tibble(
line_data bill_length_mm = values_range,
flipper_length_mm = intercept + slope * values_range
)
<- fig_full |>
fig_full hc_add_series(data = list_parse2(line_data),
type = "line", marker = "none",
color = colors[j])
}
fig_full
```

## Visualizing distributions

### A categorical variable

Let’s break down the differences in smaller steps by focusing on simpler examples. If you have a categorical variable and want to compare its relevance in your data, then `ggiraph::geom_bar_interactive()`

, `plotly::plot_ly(type = "bar")`

and `highcharter::hchart(type = "column")`

are your friends. However, to show the counts, you have to manually prepare the data for `plot_ly()`

and `hchart()`

(as far as I know).

Notice how you have to manually specify the `tooltip`

to show the counts on hover in `geom_bar_interactive()`

and that `data_id`

determines which bar is highlighted on hover.

```
<- penguins |>
fig_categorical ggplot(aes(x = island)) +
geom_bar_interactive(aes(tooltip = paste("count:", after_stat(count)),
data_id = island))
girafe(ggobj = fig_categorical)
```

```
|>
penguins count(island) |>
plot_ly(data = _, x = ~island, y = ~n, type = "bar") |>
layout(barmode = 'stack')
```

```
<- penguins |>
fig_categorical ggplot(aes(x = island)) +
geom_bar()
ggplotly(fig_categorical)
```

```
|>
penguins count(island) |>
hchart(type = "column",
hcaes(x = island, y = n))
```

### A numerical variable

If you have a numerical variable, usually histograms are a good starting point to get a better feeling for the distribution of your data. `ggiraph::geom_histogram_interactive()`

, `plotly::plot_ly(type = "histogram")`

, `highcharter::hchart()`

with options to control bin widths or number of bins are the functions for this task.

Note that the binning algorithms are different across the approaches: while `ggpplot2`

creates bins around a midpoint (e.g. 34), `plotly`

and `highcharter`

create bins across a range (e.g. between 34-35.9). This leads to seemingly different histograms, but none of them is wrong.

Moreover, note that the `data`

property is not available for histograms in `highcharter`

, unlike most other Highcharts series,^{1}, so we need to pass `penguins$bill_length_mm`

. This is tidy anti-pattern and cost me quite some time to figure out.

```
<- penguins |>
fig_numerical ggplot(aes(x = bill_length_mm)) +
geom_histogram_interactive(
aes(tooltip = paste("bill_length_mm:", after_stat(count))),
binwidth = 2
)girafe(ggobj = fig_numerical)
```

```
plot_ly(penguins, x = ~bill_length_mm, type = "histogram",
xbins = list(size = 2))
```

```
<- penguins |>
fig_numerical ggplot(aes(x = bill_length_mm)) +
geom_histogram(binwidth = 2)
ggplotly(fig_numerical)
```

```
hchart(penguins$bill_length_mm) |>
hc_plotOptions(series = list(binWidth = 2))
```

## Visualizing relationships

### A numerical and a categorical variable

To visualize relationships, you need to have at least two columns. If you have a numerical and a categorical variable, then histograms or densities with groups are a good starting point. The next example illustrates the use of densities via `ggiraph::geom_density_interactive()`

and `plotly::plot_ly(histnorm = "probability density)`

. For `highcharter`

, you need to comute the density estimates yourself and then add them as lines to a plot.

Note that `plotly`

offers no out-of-the-box support for density curves as `ggplot2`

, so we’d have to manually create densities and draw the curves. Also, note that it is currently not possible to use the `after_stat(density)`

aesthetic in the tooltip.

```
<- penguins |>
fig_density ggplot(aes(x = body_mass_g, color = species, fill = species)) +
geom_density_interactive(
aes(tooltip = paste("Species:", species)),
linewidth = 0.75, alpha = 0.5
)girafe(ggobj = fig_density)
```

```
plot_ly(penguins, x = ~body_mass_g,
type = "histogram", histnorm = "probability density",
color = ~species, opacity = 0.5) |>
layout(barmode = "overlay")
```

```
<- penguins |>
fig_density ggplot(aes(x = body_mass_g, color = species, fill = species)) +
geom_density(linewidth = 0.75, alpha = 0.5)
ggplotly(fig_density)
```

```
<- map(levels(penguins$species), function(x){
series <- penguins |>
penguins_subset filter(species == x)
<- density(penguins_subset$body_mass_g)[1:2] |>
data as.data.frame() |>
list_parse2()
list(data = data, name = x)
})
highchart() |>
hc_add_series_list(series)
```

### Two categorical columns

Stacked bar plots are a good way to display the relationship between two categorical columns. `geom_bar_interactive()`

with the `position`

argument, `plotly::plot_ly(type = "bar")`

and `highcharter::hchart(type = "column")`

are your aesthetics of choice for this task. Note that you can easily switch to counts by using `position = "identity"`

in `ggplotl2`

instead of relative frequencies as in the example below, while you have to manually prepare the data to funnel counts or percentages to `plotly`

and `highcharter`

, while `ggplot2`

handles these things automatically.

```
#
<- penguins |>
fig_two_categorical ggplot(aes(x = species, fill = island)) +
geom_bar_interactive(
aes(tooltip = paste(fill, ":", after_stat(count)),
data_id = island),
position = "fill"
)girafe(ggobj = fig_two_categorical)
```

```
|>
penguins count(species, island) |>
group_by(species) |>
mutate(percentage = n / sum(n)) |>
plot_ly(x = ~species, y = ~percentage, type = "bar", color = ~island) |>
layout(barmode = "stack")
```

```
<- penguins |>
fig_two_categorical ggplot(aes(x = species, fill = island)) +
geom_bar(position = "fill")
ggplotly(fig_two_categorical)
```

```
|>
penguins count(species, island) |>
group_by(species) |>
mutate(percentage = n / sum(n)) |>
hchart(type = "column",
hcaes(x = species, y = percentage, group = island)) |>
hc_plotOptions(series = list(stacking = "percent"))
```

### Two numerical columns

Scatter plots and regression lines are definitely the most common approach for visualizing the relationship between two numerical columns and we focus on scatter plots for this example (see the first visualization example if you want to see again how to add a regression line). Here, the `size`

parameter controls the size of the shapes that you use for the data points in `ggiraph::geom_point_interactive()`

relative to the base size (i.e., it is not tied to any unit of measurement like pixels). For `plotly.plot_ly(type = "scatter")`

you also have the `size`

to control point sizes manually through the `marker`

options, where size is measured in pixels. For `highcharter`

, you can specify point sizes via `radius`

in the `marker`

options, where it is also measured in pixels (so to get points with diameter 10 pixels, you need a radius of 5).

```
<- penguins |>
fig_two_columns ggplot(aes(x = bill_length_mm, y = flipper_length_mm)) +
geom_point_interactive(
aes(tooltip = paste("bill_length_mm:", bill_length_mm, "<br>",
"flipper_length_mm:", flipper_length_mm)),
size = 2
)girafe(ggobj = fig_two_columns)
```