Creating Non-Parametric Violin Plots with ggstatsplot: A Step-by-Step Guide

Introduction to ggstatsplot and Non-Parametric Plots

In recent years, the R programming language has gained immense popularity for data analysis, visualization, and modeling. One of the key packages in this realm is ggstatsplot, which provides a set of functions to create high-quality statistical plots. In this article, we will delve into the world of non-parametric plots using ggstatsplot and explore how to display the mean value in such plots.

Setting Up R and Loading Required Packages

Before diving into the code, let’s ensure our R environment is set up correctly. We’ll need to load the required packages, including ggsci, which provides the functionality for statistical plotting.

# Install ggsci if not already installed
install.packages("ggsci")

# Load necessary packages
library(ggsci)

Creating a Sample Dataset

To illustrate our points, we’ll create a sample dataset with 100 observations in each group, consisting of two variables group and X1. We’ll use the replicate() function to generate random data for these variables.

# Set seed for reproducibility
set.seed(1)

# Create sample dataset
df <- data.frame(
  group = rep(c("A", "B", "C", "D", "F"), each = 100),
  X1 = replicate(2, sample(0:10, 100, replace = TRUE))
)

Creating a Non-Parametric Violin Plot

Next, we’ll use ggbetweenstats() to create a non-parametric violin plot. The plot.type argument is set to "violin", which will produce the desired type of plot.

# Create non-parametric violin plot
pmed <- ggbetweenstats(
  data = df,
  x = group,
  y = X1,
  xlab = "Group",
  ylab = "Support",
  plot.type = "violin",
  type = "nonparametric",
  conf.level = 0.95,
  violin.args = list(width = 0.9, alpha = 0.2),
  title = "",
  messages = FALSE,
  package = "ggsci",
  palette = "default_nejm"
)

Creating a Parametric Violin Plot

To display the mean value, we’ll create an additional parametric violin plot using ggbetweenstats(). The type argument is set to "parametric", which will provide the desired type of plot. We’ll also use the centrality.label.args argument to add labels for the mean and median.

# Create parametric violin plot
pmean <- ggbetweenstats(
  data = df,
  x = group,
  y = X1,
  xlab = "Group",
  ylab = "Support",
  plot.type = "violin",
  type = "parametric",
  centrality.label.args = list(nudge_x = -0.4),
  conf.level = 0.95,
  violin.args = list(width = 0.9, alpha = 0.2),
  title = "",
  messages = FALSE,
  package = "ggsci",
  palette = "default_nejm"
)

Merging Layers

To add the mean value to the non-parametric plot, we’ll merge specific layers from the parametric plot into the non-parametric one.

# Merge layers from parametric plot to non-parametric plot
pmed$layers[[6]] <- pmean$layers[[4]]
pmed$layers[[7]] <- pmean$layers[[5]]

Displaying the Plot

Finally, we’ll display our resulting plot with both mean and median values.

# Print final plot
print(pmed)

Conclusion

In this article, we explored how to create a non-parametric violin plot using ggstatsplot and add the mean value to it. We used a sample dataset to illustrate our points and demonstrated how to merge specific layers from a parametric plot into a non-parametric one. By following these steps, you can easily display both the mean and median values in your plots.

Further Reading

For more information on ggstatsplot and statistical plotting in R, please refer to the official documentation and tutorials provided by the package authors.

# Additional resources
https://ggsci.tidytastic.net/index.html

Last modified on 2025-01-01