Seaborn
This library builds on Matplotlib to provide a higher-level language to quickly create all sorts of statistical plots. We can use load_dataset
to quickly load any of these datasets to mess around with. Matplotlib colormaps can be entered into palette
arguments to change the color schemes.
Distribution Plots
-
distplot plots histogram for single quantitative variable, also by default (
kde=True
) plots the Kernel Density Estimation (KDE) -
jointplot plots histogram of each of two quantitative variables and their combined scatter plot; also supports linear regression (
kind='reg'
) -
pairplot plots pairwise relationships across entire dataframe for all quantitative variables; supports
hue
argument for categorical variables -
rugplot - plots dashmark for every point in a univariate distribution; can be used to build the KDE (the summation of a normal distribution curve at every point in the rugplot)
Categorical Plots
-
barplot aggregates a quantitative variable
y
for each value of a categorical variablex
based on some function, by default the mean -
countplot plots a barplot of the count of the number of occurrences of each value of a categorical variable
x
-
boxplot plots the distribution for some quantitative variable
y
for each value of a categorical variablex
; each such distribution can be split by another categorical variable usinghue=z
; can also show distributions for each quantitative variable in dataframe by omittingx
andy
; the box plus the two whiskers show the four quartiles of the distribution, outliers are points outside the whiskers -
violinplot same idea & usage as boxplot but plots the kernel density estimation of the data’s underlying distribution; when using
hue
can also usesplit=True
for “a single violin” comprised of thex
andhue
d variables (rather than two symmetrical, side-by-side violins) -
stripplot a scatterplot where one variable is categorical (rather than both quantitative); can use
jitter=True
to de-overlap the stacked dots nearer the mode(s); often used as overlap of violin plot for more detail -
swarmplot similar to stripplot but points are adjusted toward the shape of a violin plot; doesn’t scale well to big datasets
-
factorplot the most general categorical plot; takes in a
kind
parameter to build any of these plots:{ point, bar, count, box, violin, strip }
Matrix Plots
-
heatmap - display color-encoding of data already in matrix form; useful for displaying correlations
sns.heatmap(df.corr())
-
clustermap - hierarchically-clustered heatmap, computationally intensive
Grids
-
PairGrid the general case of pairplot; can use
map
,map_diag
,map_upper
, andmap_lower
to customize what kind of graph appears in each cell in the grid -
FacetGrid plots a single (or multiple, as in scatter) quantitative variable by categorical “facets” to explore relationships between the categorical variables and the quantitative variables; this plots the histograms (distributions) of ages for males and females:
sns.FacetGrid(titanic, col='sex') .map(plt.hist, "age")
Regression
- lmplot (linear model plot) plots scatter plot with linear regression; takes in
x,y, data