This library builds on Matplotlib to provide a higher-level language to quickly create all sorts of statistical plots. We can use load_dataset to quickly load any of these datasets to mess around with. Matplotlib colormaps can be entered into palette arguments to change the color schemes.

Distribution Plots

  • distplot plots histogram for single quantitative variable, also by default (kde=True) plots the Kernel Density Estimation (KDE)

  • jointplot plots histogram of each of two quantitative variables and their combined scatter plot; also supports linear regression (kind='reg')

  • pairplot plots pairwise relationships across entire dataframe for all quantitative variables; supports hue argument for categorical variables

  • rugplot - plots dashmark for every point in a univariate distribution; can be used to build the KDE (the summation of a normal distribution curve at every point in the rugplot)

Categorical Plots

  • barplot aggregates a quantitative variable y for each value of a categorical variable x based on some function, by default the mean

  • countplot plots a barplot of the count of the number of occurrences of each value of a categorical variable x

  • boxplot plots the distribution for some quantitative variable y for each value of a categorical variable x; each such distribution can be split by another categorical variable using hue=z; can also show distributions for each quantitative variable in dataframe by omitting x and y; the box plus the two whiskers show the four quartiles of the distribution, outliers are points outside the whiskers

  • violinplot same idea & usage as boxplot but plots the kernel density estimation of the data’s underlying distribution; when using hue can also use split=True for “a single violin” comprised of the x and hued variables (rather than two symmetrical, side-by-side violins)

  • stripplot a scatterplot where one variable is categorical (rather than both quantitative); can use jitter=True to de-overlap the stacked dots nearer the mode(s); often used as overlap of violin plot for more detail

  • swarmplot similar to stripplot but points are adjusted toward the shape of a violin plot; doesn’t scale well to big datasets

  • factorplot the most general categorical plot; takes in a kind parameter to build any of these plots: { point, bar, count, box, violin, strip }

Matrix Plots

  • heatmap - display color-encoding of data already in matrix form; useful for displaying correlations sns.heatmap(df.corr())

  • clustermap - hierarchically-clustered heatmap, computationally intensive

Grids

  • PairGrid the general case of pairplot; can use map, map_diag, map_upper, and map_lower to customize what kind of graph appears in each cell in the grid

  • FacetGrid plots a single (or multiple, as in scatter) quantitative variable by categorical “facets” to explore relationships between the categorical variables and the quantitative variables; this plots the histograms (distributions) of ages for males and females:

      sns.FacetGrid(titanic, col='sex')
         .map(plt.hist, "age")
    

Regression

  • lmplot (linear model plot) plots scatter plot with linear regression; takes in x,y, data