Seaborn Statistical Visualization Overview

Seaborn is a Python visualization library for creating publication-quality statistical graphics. Use this skill for dataset-oriented plotting, multivariate analysis, automatic statistical estimation, and complex multi-panel figures with minimal code.

Design Philosophy

Seaborn follows these core principles:

Dataset-oriented: Work directly with DataFrames and named variables rather than abstract coordinates Semantic mapping: Automatically translate data values into visual properties (colors, sizes, styles) Statistical awareness: Built-in aggregation, error estimation, and confidence intervals Aesthetic defaults: Publication-ready themes and color palettes out of the box Matplotlib integration: Full compatibility with matplotlib customization when needed Quick Start import seaborn as sns import matplotlib.pyplot as plt import pandas as pd

Load example dataset

df = sns.load_dataset('tips')

Create a simple visualization

sns.scatterplot(data=df, x='total_bill', y='tip', hue='day') plt.show()

Core Plotting Interfaces Function Interface (Traditional)

The function interface provides specialized plotting functions organized by visualization type. Each category has axes-level functions (plot to single axes) and figure-level functions (manage entire figure with faceting).

When to use:

Quick exploratory analysis Single-purpose visualizations When you need a specific plot type Objects Interface (Modern)

The seaborn.objects interface provides a declarative, composable API similar to ggplot2. Build visualizations by chaining methods to specify data mappings, marks, transformations, and scales.

When to use:

Complex layered visualizations When you need fine-grained control over transformations Building custom plot types Programmatic plot generation from seaborn import objects as so

Declarative syntax

( so.Plot(data=df, x='total_bill', y='tip') .add(so.Dot(), color='day') .add(so.Line(), so.PolyFit()) )

Plotting Functions by Category Relational Plots (Relationships Between Variables)

Use for: Exploring how two or more variables relate to each other

scatterplot() - Display individual observations as points lineplot() - Show trends and changes (automatically aggregates and computes CI) relplot() - Figure-level interface with automatic faceting

Key parameters:

x, y - Primary variables hue - Color encoding for additional categorical/continuous variable size - Point/line size encoding style - Marker/line style encoding col, row - Facet into multiple subplots (figure-level only)

Scatter with multiple semantic mappings

sns.scatterplot(data=df, x='total_bill', y='tip', hue='time', size='size', style='sex')

Line plot with confidence intervals

sns.lineplot(data=timeseries, x='date', y='value', hue='category')

Faceted relational plot

sns.relplot(data=df, x='total_bill', y='tip', col='time', row='sex', hue='smoker', kind='scatter')

Distribution Plots (Single and Bivariate Distributions)

Use for: Understanding data spread, shape, and probability density

histplot() - Bar-based frequency distributions with flexible binning kdeplot() - Smooth density estimates using Gaussian kernels ecdfplot() - Empirical cumulative distribution (no parameters to tune) rugplot() - Individual observation tick marks displot() - Figure-level interface for univariate and bivariate distributions jointplot() - Bivariate plot with marginal distributions pairplot() - Matrix of pairwise relationships across dataset

Key parameters:

x, y - Variables (y optional for univariate) hue - Separate distributions by category stat - Normalization: "count", "frequency", "probability", "density" bins / binwidth - Histogram binning control bw_adjust - KDE bandwidth multiplier (higher = smoother) fill - Fill area under curve multiple - How to handle hue: "layer", "stack", "dodge", "fill"

Histogram with density normalization

sns.histplot(data=df, x='total_bill', hue='time', stat='density', multiple='stack')

Bivariate KDE with contours

sns.kdeplot(data=df, x='total_bill', y='tip', fill=True, levels=5, thresh=0.1)

Joint plot with marginals

sns.jointplot(data=df, x='total_bill', y='tip', kind='scatter', hue='time')

Pairwise relationships

sns.pairplot(data=df, hue='species', corner=True)

Categorical Plots (Comparisons Across Categories)

Use for: Comparing distributions or statistics across discrete categories

Categorical scatterplots:

stripplot() - Points with jitter to show all observations swarmplot() - Non-overlapping points (beeswarm algorithm)

Distribution comparisons:

boxplot() - Quartiles and outliers violinplot() - KDE + quartile information boxenplot() - Enhanced boxplot for larger datasets

Statistical estimates:

barplot() - Mean/aggregate with confidence intervals pointplot() - Point estimates with connecting lines countplot() - Count of observations per category

Figure-level:

catplot() - Faceted categorical plots (set kind parameter)

Key parameters:

x, y - Variables (one typically categorical) hue - Additional categorical grouping order, hue_order - Control category ordering dodge - Separate hue levels side-by-side orient - "v" (vertical) or "h" (horizontal) kind - Plot type for catplot: "strip", "swarm", "box", "violin", "bar", "point"

Swarm plot showing all points

sns.swarmplot(data=df, x='day', y='total_bill', hue='sex')

Violin plot with split for comparison

sns.violinplot(data=df, x='day', y='total_bill', hue='sex', split=True)

Bar plot with error bars

sns.barplot(data=df, x='day', y='total_bill', hue='sex', estimator='mean', errorbar='ci')

Faceted categorical plot

sns.catplot(data=df, x='day', y='total_bill', col='time', kind='box')

Regression Plots (Linear Relationships)

Use for: Visualizing linear regressions and residuals

regplot() - Axes-level regression plot with scatter + fit line lmplot() - Figure-level with faceting support residplot() - Residual plot for assessing model fit

Key parameters:

x, y - Variables to regress order - Polynomial regression order logistic - Fit logistic regression robust - Use robust regression (less sensitive to outliers) ci - Confidence interval width (default 95) scatter_kws, line_kws - Customize scatter and line properties

Simple linear regression

sns.regplot(data=df, x='total_bill', y='tip')

Polynomial regression with faceting

sns.lmplot(data=df, x='total_bill', y='tip', col='time', order=2, ci=95)

Check residuals

sns.residplot(data=df, x='total_bill', y='tip')

Matrix Plots (Rectangular Data)

Use for: Visualizing matrices, correlations, and grid-structured data

heatmap() - Color-encoded matrix with annotations clustermap() - Hierarchically-clustered heatmap

Key parameters:

data - 2D rectangular dataset (DataFrame or array) annot - Display values in cells fmt - Format string for annotations (e.g., ".2f") cmap - Colormap name center - Value at colormap center (for diverging colormaps) vmin, vmax - Color scale limits square - Force square cells linewidths - Gap between cells

Correlation heatmap

corr = df.corr() sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm', center=0, square=True)

Clustered heatmap

sns.clustermap(data, cmap='viridis', standard_scale=1, figsize=(10, 10))

Multi-Plot Grids

Seaborn provides grid objects for creating complex multi-panel figures:

FacetGrid

Create subplots based on categorical variables. Most useful when called through figure-level functions (relplot, displot, catplot), but can be used directly for custom plots.

g = sns.FacetGrid(df, col='time', row='sex', hue='smoker') g.map(sns.scatterplot, 'total_bill', 'tip') g.add_legend()

PairGrid

Show pairwise relationships between all variables in a dataset.

g = sns.PairGrid(df, hue='species') g.map_upper(sns.scatterplot) g.map_lower(sns.kdeplot) g.map_diag(sns.histplot) g.add_legend()

JointGrid

Combine bivariate plot with marginal distributions.

g = sns.JointGrid(data=df, x='total_bill', y='tip') g.plot_joint(sns.scatterplot) g.plot_marginals(sns.histplot)

Figure-Level vs Axes-Level Functions

Understanding this distinction is crucial for effective seaborn usage:

Axes-Level Functions Plot to a single matplotlib Axes object Integrate easily into complex matplotlib figures Accept ax= parameter for precise placement Return Axes object Examples: scatterplot, histplot, boxplot, regplot, heatmap

When to use:

Building custom multi-plot layouts Combining different plot types Need matplotlib-level control Integrating with existing matplotlib code fig, axes = plt.subplots(2, 2, figsize=(10, 10)) sns.scatterplot(data=df, x='x', y='y', ax=axes[0, 0]) sns.histplot(data=df, x='x', ax=axes[0, 1]) sns.boxplot(data=df, x='cat', y='y', ax=axes[1, 0]) sns.kdeplot(data=df, x='x', y='y', ax=axes[1, 1])

Figure-Level Functions Manage entire figure including all subplots Built-in faceting via col and row parameters Return FacetGrid, JointGrid, or PairGrid objects Use height and aspect for sizing (per subplot) Cannot be placed in existing figure Examples: relplot, displot, catplot, lmplot, jointplot, pairplot

When to use:

Faceted visualizations (small multiples) Quick exploratory analysis Consistent multi-panel layouts Don't need to combine with other plot types

Automatic faceting

sns.relplot(data=df, x='x', y='y', col='category', row='group', hue='type', height=3, aspect=1.2)

Data Structure Requirements Long-Form Data (Preferred)

Each variable is a column, each observation is a row. This "tidy" format provides maximum flexibility:

Long-form structure

subject condition measurement 0 1 control 10.5 1 1 treatment 12.3 2 2 control 9.8 3 2 treatment 13.1

Advantages:

Works with all seaborn functions Easy to remap variables to visual properties Supports arbitrary complexity Natural for DataFrame operations Wide-Form Data

Variables are spread across columns. Useful for simple rectangular data:

Wide-form structure

control treatment 0 10.5 12.3 1 9.8 13.1

Use cases:

Simple time series Correlation matrices Heatmaps Quick plots of array data

Converting wide to long:

df_long = df.melt(var_name='condition', value_name='measurement')

Color Palettes

Seaborn provides carefully designed color palettes for different data types:

Qualitative Palettes (Categorical Data)

Distinguish categories through hue variation:

"deep" - Default, vivid colors "muted" - Softer, less saturated "pastel" - Light, desaturated "bright" - Highly saturated "dark" - Dark values "colorblind" - Safe for color vision deficiency sns.set_palette("colorblind") sns.color_palette("Set2")

Sequential Palettes (Ordered Data)

Show progression from low to high values:

"rocket", "mako" - Wide luminance range (good for heatmaps) "flare", "crest" - Restricted luminance (good for points/lines) "viridis", "magma", "plasma" - Matplotlib perceptually uniform sns.heatmap(data, cmap='rocket') sns.kdeplot(data=df, x='x', y='y', cmap='mako', fill=True)

Diverging Palettes (Centered Data)

Emphasize deviations from a midpoint:

"vlag" - Blue to red "icefire" - Blue to orange "coolwarm" - Cool to warm "Spectral" - Rainbow diverging sns.heatmap(correlation_matrix, cmap='vlag', center=0)

Custom Palettes

Create custom palette

custom = sns.color_palette("husl", 8)

Light to dark gradient

palette = sns.light_palette("seagreen", as_cmap=True)

Diverging palette from hues

palette = sns.diverging_palette(250, 10, as_cmap=True)

Theming and Aesthetics Set Theme

set_theme() controls overall appearance:

Set complete theme

sns.set_theme(style='whitegrid', palette='pastel', font='sans-serif')

Reset to defaults

sns.set_theme()

Styles

Control background and grid appearance:

"darkgrid" - Gray background with white grid (default) "whitegrid" - White background with gray grid "dark" - Gray background, no grid "white" - White background, no grid "ticks" - White background with axis ticks sns.set_style("whitegrid")

Remove spines

sns.despine(left=False, bottom=False, offset=10, trim=True)

Temporary style

with sns.axes_style("white"): sns.scatterplot(data=df, x='x', y='y')

Contexts

Scale elements for different use cases:

"paper" - Smallest (default) "notebook" - Slightly larger "talk" - Presentation slides "poster" - Large format sns.set_context("talk", font_scale=1.2)

Temporary context

with sns.plotting_context("poster"): sns.barplot(data=df, x='category', y='value')

Best Practices 1. Data Preparation

Always use well-structured DataFrames with meaningful column names:

Good: Named columns in DataFrame

df = pd.DataFrame({'bill': bills, 'tip': tips, 'day': days}) sns.scatterplot(data=df, x='bill', y='tip', hue='day')

Avoid: Unnamed arrays

sns.scatterplot(x=x_array, y=y_array) # Loses axis labels

Choose the Right Plot Type

Continuous x, continuous y: scatterplot, lineplot, kdeplot, regplot Continuous x, categorical y: violinplot, boxplot, stripplot, swarmplot One continuous variable: histplot, kdeplot, ecdfplot Correlations/matrices: heatmap, clustermap Pairwise relationships: pairplot, jointplot

Use Figure-Level Functions for Faceting

Instead of manual subplot creation

sns.relplot(data=df, x='x', y='y', col='category', col_wrap=3)

Not: Creating subplots manually for simple faceting

Leverage Semantic Mappings

Use hue, size, and style to encode additional dimensions:

sns.scatterplot(data=df, x='x', y='y', hue='category', # Color by category size='importance', # Size by continuous variable style='type') # Marker style by type

Control Statistical Estimation

Many functions compute statistics automatically. Understand and customize:

Lineplot computes mean and 95% CI by default

sns.lineplot(data=df, x='time', y='value', errorbar='sd') # Use standard deviation instead

Barplot computes mean by default

sns.barplot(data=df, x='category', y='value', estimator='median', # Use median instead errorbar=('ci', 95)) # Bootstrapped CI

Combine with Matplotlib

Seaborn integrates seamlessly with matplotlib for fine-tuning:

ax = sns.scatterplot(data=df, x='x', y='y') ax.set(xlabel='Custom X Label', ylabel='Custom Y Label', title='Custom Title') ax.axhline(y=0, color='r', linestyle='--') plt.tight_layout()

Save High-Quality Figures fig = sns.relplot(data=df, x='x', y='y', col='group') fig.savefig('figure.png', dpi=300, bbox_inches='tight') fig.savefig('figure.pdf') # Vector format for publications

Common Patterns Exploratory Data Analysis

Quick overview of all relationships

sns.pairplot(data=df, hue='target', corner=True)

Distribution exploration

sns.displot(data=df, x='variable', hue='group', kind='kde', fill=True, col='category')

Correlation analysis

corr = df.corr() sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)

Publication-Quality Figures sns.set_theme(style='ticks', context='paper', font_scale=1.1)

g = sns.catplot(data=df, x='treatment', y='response', col='cell_line', kind='box', height=3, aspect=1.2) g.set_axis_labels('Treatment Condition', 'Response (μM)') g.set_titles('{col_name}') sns.despine(trim=True)

g.savefig('figure.pdf', dpi=300, bbox_inches='tight')

Complex Multi-Panel Figures

Using matplotlib subplots with seaborn

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

sns.scatterplot(data=df, x='x1', y='y', hue='group', ax=axes[0, 0]) sns.histplot(data=df, x='x1', hue='group', ax=axes[0, 1]) sns.violinplot(data=df, x='group', y='y', ax=axes[1, 0]) sns.heatmap(df.pivot_table(values='y', index='x1', columns='x2'), ax=axes[1, 1], cmap='viridis')

plt.tight_layout()

Time Series with Confidence Bands

Lineplot automatically aggregates and shows CI

sns.lineplot(data=timeseries, x='date', y='measurement', hue='sensor', style='location', errorbar='sd')

For more control

g = sns.relplot(data=timeseries, x='date', y='measurement', col='location', hue='sensor', kind='line', height=4, aspect=1.5, errorbar=('ci', 95)) g.set_axis_labels('Date', 'Measurement (units)')

Troubleshooting Issue: Legend Outside Plot Area

Figure-level functions place legends outside by default. To move inside:

g = sns.relplot(data=df, x='x', y='y', hue='category') g._legend.set_bbox_to_anchor((0.9, 0.5)) # Adjust position

Issue: Overlapping Labels plt.xticks(rotation=45, ha='right') plt.tight_layout()

Issue: Figure Too Small

For figure-level functions:

sns.relplot(data=df, x='x', y='y', height=6, aspect=1.5)

For axes-level functions:

fig, ax = plt.subplots(figsize=(10, 6)) sns.scatterplot(data=df, x='x', y='y', ax=ax)

Issue: Colors Not Distinct Enough

Use a different palette

sns.set_palette("bright")

Or specify number of colors

palette = sns.color_palette("husl", n_colors=len(df['category'].unique())) sns.scatterplot(data=df, x='x', y='y', hue='category', palette=palette)

Issue: KDE Too Smooth or Jagged

Adjust bandwidth

sns.kdeplot(data=df, x='x', bw_adjust=0.5) # Less smooth sns.kdeplot(data=df, x='x', bw_adjust=2) # More smooth

Resources

This skill includes reference materials for deeper exploration:

references/ function_reference.md - Comprehensive listing of all seaborn functions with parameters and examples objects_interface.md - Detailed guide to the modern seaborn.objects API examples.md - Common use cases and code patterns for different analysis scenarios

Load reference files as needed for detailed function signatures, advanced parameters, or specific examples.

安装

Load example dataset

Create a simple visualization

Declarative syntax

Scatter with multiple semantic mappings

Line plot with confidence intervals

Faceted relational plot

Histogram with density normalization

Bivariate KDE with contours

Joint plot with marginals

Pairwise relationships

Swarm plot showing all points

Violin plot with split for comparison

Bar plot with error bars

Faceted categorical plot

Simple linear regression

Polynomial regression with faceting

Check residuals

Correlation heatmap

Clustered heatmap

Automatic faceting

Long-form structure

Wide-form structure

Create custom palette

Light to dark gradient

Diverging palette from hues

Set complete theme

Reset to defaults

Remove spines

Temporary style

Temporary context

Good: Named columns in DataFrame

Avoid: Unnamed arrays

Instead of manual subplot creation

Not: Creating subplots manually for simple faceting

Lineplot computes mean and 95% CI by default

Barplot computes mean by default

Quick overview of all relationships

Distribution exploration

Correlation analysis

Using matplotlib subplots with seaborn

Lineplot automatically aggregates and shows CI

For more control

Use a different palette

Or specify number of colors

Adjust bandwidth