Seaborn ExamplesĀ¶
Seaborn python package build on top of matplotlib
for plotting statistical data. It has great support for the commonly used pandas data structure in handling tabular data. It's more opinionated than matplotlib
which makes it easier to get started with and provides higher level API for plotting which will help us create plots with fewer number of lines and configuration that we need in matplotlib
for the same graphics.
Visualizing data using tools like seaborn
help us explore and understand the data. In this post we'll explore some of the available graphics demonstrate how we can use to plot different data types we might have.
To help with the clarity of this post, I've organized it into six sections. Following this introductory section, we'll see the prerequisite setup and imports. In the setup section I'll discuss the sample datasets that come with seaborn
. The third and fourth sections are about seaborn
plots for Categorical and Continues Data, respectively. Then I'll discuss how we can plot comparative plots followed by a section on utilities such as styling and saving plots.
%pip install seaborn
Requirement already satisfied: seaborn in /Users/yoseph/anaconda3/envs/CS5805/lib/python3.11/site-packages (0.13.0) Requirement already satisfied: numpy!=1.24.0,>=1.20 in /Users/yoseph/anaconda3/envs/CS5805/lib/python3.11/site-packages (from seaborn) (1.23.5) Requirement already satisfied: pandas>=1.2 in /Users/yoseph/anaconda3/envs/CS5805/lib/python3.11/site-packages (from seaborn) (2.0.3) Requirement already satisfied: matplotlib!=3.6.1,>=3.3 in /Users/yoseph/anaconda3/envs/CS5805/lib/python3.11/site-packages (from seaborn) (3.7.1) Requirement already satisfied: contourpy>=1.0.1 in /Users/yoseph/anaconda3/envs/CS5805/lib/python3.11/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (1.0.5) Requirement already satisfied: cycler>=0.10 in /Users/yoseph/anaconda3/envs/CS5805/lib/python3.11/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (0.11.0) Requirement already satisfied: fonttools>=4.22.0 in /Users/yoseph/anaconda3/envs/CS5805/lib/python3.11/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (4.25.0) Requirement already satisfied: kiwisolver>=1.0.1 in /Users/yoseph/anaconda3/envs/CS5805/lib/python3.11/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (1.4.4) Requirement already satisfied: packaging>=20.0 in /Users/yoseph/anaconda3/envs/CS5805/lib/python3.11/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (23.0) Requirement already satisfied: pillow>=6.2.0 in /Users/yoseph/anaconda3/envs/CS5805/lib/python3.11/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (9.4.0) Requirement already satisfied: pyparsing>=2.3.1 in /Users/yoseph/anaconda3/envs/CS5805/lib/python3.11/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (3.0.9) Requirement already satisfied: python-dateutil>=2.7 in /Users/yoseph/anaconda3/envs/CS5805/lib/python3.11/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /Users/yoseph/anaconda3/envs/CS5805/lib/python3.11/site-packages (from pandas>=1.2->seaborn) (2022.7) Requirement already satisfied: tzdata>=2022.1 in /Users/yoseph/anaconda3/envs/CS5805/lib/python3.11/site-packages (from pandas>=1.2->seaborn) (2023.3) Requirement already satisfied: six>=1.5 in /Users/yoseph/anaconda3/envs/CS5805/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.3->seaborn) (1.16.0) Note: you may need to restart the kernel to use updated packages.
ImportsĀ¶
To get started with seaborn
we need to import the following packages. We import pyplot
form matplotlib
to use the lower level APIs later for styling and customization. While working with Jupyter notebooks we might want to suppers warnings in the output.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
Jupyter ConfigurationsĀ¶
Ask Jupyter to display plots within the notebook
%matplotlib inline
%reload_ext autoreload
%autoreload 2
Suppress warnings
import warnings
warnings.filterwarnings('ignore')
DataĀ¶
Plotting only makes sense if we have some data to visualize. We can plot any data stored in pandas data structure using seaborn
. This let's us load and process data using pandas and use seaborn
for visualization in parallel. For this post, I'll make use of sample datasets shipped with seaborn
. To get the full list of datasets available in our installation we can as seaborn
itself
print(sns.get_dataset_names())
['anagrams', 'anscombe', 'attention', 'brain_networks', 'car_crashes', 'diamonds', 'dots', 'dowjones', 'exercise', 'flights', 'fmri', 'geyser', 'glue', 'healthexp', 'iris', 'mpg', 'penguins', 'planets', 'seaice', 'taxis', 'tips', 'titanic']
crashes = sns.load_dataset('car_crashes')
titanic = sns.load_dataset('titanic')
Examine the content of the data
crashes.head()
total | speeding | alcohol | not_distracted | no_previous | ins_premium | ins_losses | abbrev | |
---|---|---|---|---|---|---|---|---|
0 | 18.8 | 7.332 | 5.640 | 18.048 | 15.040 | 784.55 | 145.08 | AL |
1 | 18.1 | 7.421 | 4.525 | 16.290 | 17.014 | 1053.48 | 133.93 | AK |
2 | 18.6 | 6.510 | 5.208 | 15.624 | 17.856 | 899.47 | 110.35 | AZ |
3 | 22.4 | 4.032 | 5.824 | 21.056 | 21.280 | 827.34 | 142.39 | AR |
4 | 12.0 | 4.200 | 3.360 | 10.920 | 10.680 | 878.41 | 165.63 | CA |
titanic.head()
survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
1 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
2 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
3 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
4 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
Categorical DataĀ¶
Categorical data is form of qualitative data that can be stored and identified based on distinct labels. Instead of being measured numerically, it is a type of information that can be grouped into categories. For example, the sex column in our titanic dataset has two labels: male and female. Similarly the abbrev column is a list of state name abbreviations in the US which has a total of 50 labels.Ā In this section, we will see bar and count plots from seaborn
as two graphics to visualize categorical data.
In this section, we will see bar and count plots from seaborn
as two graphics to visualize categorical data.
Bar PlotsĀ¶
Also referred to as bar chart
, bar plot is, a bar plot allows us to visualize the comparisons between the discrete labels or categories in our data. Bar chart is a graph that represents the category of data with horizontally or vertically rectangular bars with lengths and heights that is proportional to the values which they represent. One of the axis of the plot represents the specific categories being compared, while the other axis represents the measured values corresponding to those categories.
sns.barplot(data = titanic, x = 'sex', y='fare')
<Axes: xlabel='sex', ylabel='fare'>
By Default the data is aggregated by mean of y
sns.barplot(data = titanic, x = 'sex', y='fare', estimator=np.median)
<Axes: xlabel='sex', ylabel='fare'>
Count PlotsĀ¶
Similar to bar plot but uses count as the estimator
sns.countplot(data = titanic, x = 'sex')
<Axes: xlabel='sex', ylabel='count'>
We can display a count of the number rows in each category of values in the alive column
sns.countplot(data = titanic, x = 'alive')
<Axes: xlabel='alive', ylabel='count'>
Distribution PlotsĀ¶
Distribution plots for continuous data variables.In the past, Seaborn
had a distplot
method which supported displaying a histogram plot by with kde on top default. distplot
is deprecated and it is recommended we use displot
or histplot
for find grained control. distplot
allow as to display a histogram of univariate or bivariate distribution of the data in a dataset.
sns.distplot(a=crashes['alcohol'])
<Axes: xlabel='alcohol', ylabel='Density'>
If we don't what the kde plot to be visible we can tell seaborn not to show it
sns.distplot(a=crashes['alcohol'],kde=False)
<Axes: xlabel='alcohol'>
sns.displot(data=crashes['alcohol'])
<seaborn.axisgrid.FacetGrid at 0x1513a3010>
The equivalent plot can be displayed using the new displot
method
# We can specify name of the data column in the dataset if there are more than one
sns.displot(data=crashes, x='alcohol')
<seaborn.axisgrid.FacetGrid at 0x15144ef50>
displot
combines a histogram with optional components, such as a Kernel Density Estimation (KDE) line or rug plot. We can specify which type we want to plot using the kind key (default is hist)
sns.displot(data=crashes['alcohol'], kind='kde')
<seaborn.axisgrid.FacetGrid at 0x15165e890>
We can enable an overlay of other visualization on top of the default. We can do this by passing a boolean value for the parameters hist
, ecdf
, kde
, rug
sns.displot(data=crashes['alcohol'], kde=True, rug=True)
<seaborn.axisgrid.FacetGrid at 0x1515b2690>
sns.displot(data=titanic, x='age', hue='sex', kind='kde', multiple='stack')
<seaborn.axisgrid.FacetGrid at 0x151748250>
KDE PlotĀ¶
We can display kde plots using the kdeplot
function as well
sns.kdeplot(data=titanic, x='fare')
<Axes: xlabel='fare', ylabel='Density'>
Categorical PlotsĀ¶
Box PlotĀ¶
Compare different variables
sns.boxplot(data = titanic, x = 'alive', y='fare', hue='sex')
<Axes: xlabel='alive', ylabel='fare'>
Violin PlotĀ¶
Compare different variables in a different visualization
sns.violinplot(data = titanic, x = 'alive', y='fare', hue='sex')
<Axes: xlabel='alive', ylabel='fare'>
sns.violinplot(data = titanic, x = 'alive', y='fare', hue='sex', split=True)
<Axes: xlabel='alive', ylabel='fare'>
#survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
sns.violinplot(data = titanic, x = 'alive', y='fare')
<Axes: xlabel='alive', ylabel='fare'>
Strip PlotĀ¶
sns.stripplot(data = titanic, x = 'class', y='fare')
<Axes: xlabel='class', ylabel='fare'>
sns.stripplot(data = titanic, x = 'class', y='fare', hue='sex')
<Axes: xlabel='class', ylabel='fare'>
Swarm PlotĀ¶
sns.swarmplot(data = titanic, x = 'alive', y='fare')
<Axes: xlabel='alive', ylabel='fare'>
Comparing DataĀ¶
sns.displot(data=titanic, x='age', col='survived', kind='kde')
<seaborn.axisgrid.FacetGrid at 0x151881a50>
sns.displot(data=titanic, x='age', col='survived', hue='sex', kind='kde', multiple='stack')
<seaborn.axisgrid.FacetGrid at 0x1519cbf10>
Joint PlotĀ¶
Used for comparing two distributions. By default it uses scatter plot
sns.jointplot(data=crashes, x='speeding',y='alcohol')
<seaborn.axisgrid.JointGrid at 0x151b780d0>
sns.jointplot(data=crashes, x='speeding',y='alcohol', kind='kde')
<seaborn.axisgrid.JointGrid at 0x151c91350>
sns.jointplot(data=crashes, x='speeding',y='alcohol', kind='reg')
<seaborn.axisgrid.JointGrid at 0x151e06dd0>
sns.jointplot(data=titanic, x='fare',y='age')
<seaborn.axisgrid.JointGrid at 0x15207a6d0>