10 Pandas Plotting Functions for Quick Data Visualization

Telegram Group Join Now
WhatsApp Group Join Now

Hey everyone! As a Python-based data analyst, I mainly use the incredibly helpful Pandas package when working with data. Pandas makes it so simple to explore and analyse our data, which is one of my popular features. I’ll go over the main charting features of Pandas in this post, along with some tips on how to apply them for understanding your data.

Why Visualize Data with Pandas

Before jumping into the different plotting options, let me briefly explain why Pandas is such a fantastic tool for data visualization. Here are some of the major benefits:

  • Tightly integrated with the DataFrame and Series data structures, making plotting just one method call away
  • Simple syntax that generates publication-quality graphs with just a few lines of code
  • Control over customization like colors, labels, styles, etc.
  • Support for visualization of time series data
  • Built-in plotting functions allow quick exploration as part of your EDA process

Now that you know why Pandas plotting capabilities are so useful for data scientists and analysts, let’s look at exactly what kind of plots you can create using some real examples.

1. Line Plots

One of the most common types of visualizations – line plots – are incredibly easy to generate using the .plot() DataFrame method.

df['Column1].plot()

This basic syntax will automatically plot each row on the y-axis against its index on the x-axis and connect them with a line. Let’s see how we can customize this further:

df.plot(x ='Date', y='Value_to_Plot', linestyle ='--', marker ='o')

We can control:

  • x and y values
  • line style (solid, dashed, etc.)
  • data point markers
  • figure size, title, legends
  • colors
  • transparency (alpha)

And much more! This flexibility allows us to quickly visualize trends over time, and growth metrics, compare categories, and identify outliers.

2. Bar Plots

To understand category-based metrics like sales by product line or user signups by country, bar charts are extremely useful visualizations. Creating them with Pandas is just as straightforward:

 df['Category'].plot(kind ='bar')

The kind='bar' parameter tells Pandas we want a bar plot instead of the default line plot from before.

Some key ways to customize bar plots:

df.plot(kind ='bar', figsize=(8, 5), color=['red', 'blue'], legend=True)

We added:

  • Larger figure size
  • Custom colors
  • A legend based on index names

And Pandas handled the rest! Bar charts shine when comparing metrics across discrete categories.

3. Histograms

Now let’s explore histograms – plots that allow us to visualize the distribution of numerical data. This can reveal insights like skewness and identify clusterings.

Creating them is as easy as:

 df['Numeric_Column'].plot(kind ='hist', bins=25)

The kind='hist' parameter switches the plot type to the histogram. We also set the number of bins (x-axis divisions) to 25. Customizations like titles, labels, bin sizes, cumulative histograms, etc. can make histograms even more informative.

Histograms provide another invaluable visualization for EDA and analytics. Identifying the shape and spread of your metric’s distribution is extremely valuable!

4. Scatter Plots

If we want to assess the relationship or correlation between two numeric values, scatter plots are the way to go. Pandas handles these effortlessly:

df.plot(kind ='scatter', x='Var1', y='Var2')

We specified one numeric column on the x-axis and another on the y-axis. Pandas plotted each row as a data point. Now we can instantly visualize clustering, positive or negative correlation, and more. Advantageous customizations are:

df.plot(kind ='scatter', x='Var1', y='Var2', alpha = 0.3, s = df['SizeMetric']*10)

Here we added:

  • 30% data point transparency to prevent overlap
  • Data point size scaled by another numeric column

And Pandas took care of the plotting! Scatter plots provide immense analytical value in just one simple chart.

5. Box Plots

For visualizing distributional statistics like quartiles while detecting outliers, box plots (or box-and-whisker plots) are invaluable. Pandas has us covered once again:

df['Metrics'].plot(kind = 'box')

By passing kind='box', Pandas drew the familiar 5-number summary box plots either grouped or for a single column depending on the DataFrame shape. Extremely advantageous customizations are:

df.boxplot(column=['Metric1','Metric2'], by ='Category', rot=90)

Here we generated side-by-side box plots of groups defined by 'Category', with:

  • Multiple numeric columns plotted
  • Boxplot facets rotated vertically saving space

The box plot is a powerful data visualization that exposes distribution statistics compactly. Pandas integration makes generating them a breeze!

6. Area Plots

For plotting time-series data, area plots can reveal insights like trends over time or contributions to a whole across periods. Pandas provide specialty time series handling coupled with easy area plot generation:

df['Sales'].plot(kind ='area', xlim=(0, 10), stacked=False,alpha=0.25, figsize=(20, 10))

Our time-series line plot instantly became an area plot with kind='area', and customizations like:

  • X-axis limits
  • Disable stacking
  • Transparent colored regions
  • Larger figure size for visibility

As you can see above, Pandas area plots enable easy generation of time-series visualizations ideal for trends!

7. Pie Charts

A classic chart perfect for illustrating proportions like market/sales share split is the pie chart. Pandas integrated pie chart plotting with:

category_proportions.plot(kind ='pie', autopct='%0.2f%%', radius = 3, figsize=(8, 8), subplots=True)

We passed kind='pie' to get pie plots based on our proportion data, also customizing with:

  • Formatted category percentage labels
  • Radius / overall pie size
  • Square figure for visibility
  • Subplots for multiple pies

Now visualizing part-to-whole relationships is a walk in the park with Pandas!

8. Heatmaps

Correlation is another important relationship to visualize, especially with larger datasets having many variables. For this, heatmaps are my absolute favorite – using color shading to quickly signify patterns.

Thankfully, Pandas has heatmap plotting covered:

correlation_matrix.figure(figsize=(12, 12))
heatmap(correlation_matrix, annot=True)

By calling .figure() before plotting with .heatmap(), we increased the output’s visibility significantly. The annot=True parameter prints the actual correlation coefficients in each cell along with shading based on the numeric value.

As you can see, heatmaps generated through Pandas provide an invaluable data exploration tool! I can instantly identify strongly correlated variables worth investigating further.

9. Violin Plots

Now let’s explore violin plots – a powerful method to visualize numeric data distribution combined with traditional box plots. Specifically showing variable width to denote distribution shape and density:

df.violinplot(y='Metric1', by ='Category', figsize=(6, 8), rot=90)

We customized our violin plot with:

  • The selected numeric column on the y-axis
  • Plots are split categorically by color
  • Flipped orientation vertically
  • Adjusted figure size for clarity

And Pandas handled the rest! Violin plots combine both distributional and summary statistics elegantly.

10. Hexbin Plots

The last specialty plot I want to mention is the Hexbin plot – aggregating bivariate scatter plot data into hexagonal bins shaded by frequency. This reveals areas of data concentration clearly:

df.plot(kind ='hexbin', x='Var1', y='Var2', gridsize=20)

By passing kind='hexbin', we grouped nearby data points from two numeric columns into hexagons, sizing proportional to count. This exposes insights much better compared to regular crowded scatter plots. Customizations include color scaling, bin sizing, normalization, and more!

As you can see, hexbin plots let interesting patterns just pop out visually that crowded scatter plots would hide otherwise.

10 Pandas Plotting Functions for Quick Data Visualization

Recap and Summary

We’ve covered quite a bit of ground detailing various Pandas’ plotting capabilities! Let’s do a quick recap:

  • Line plots – ideal for visualizing trends over time
  • Bar plots – compare categorical data and metrics
  • Histograms – identify distributions of numeric data
  • Scatter plots – assess relationships between two numerical variables
  • Box plots – summarize distributions with a 5-number statistical summary
  • Area plots – stacked time-series data visualization
  • Pie charts – show categorical proportional splits
  • Heatmaps – correlation strength between variables through color encoding
  • Violin plots – a combination of statistical distribution and density information
  • Hexbin plots – scatter plots aggregated into shaded hexagonal bins

As you can see, Pandas comes fully stocked with data visualization capabilities essential for any analytics toolbox!

The simple syntax we explored fits right into intuitive data analysis workflows. Combined with the power to customize for publication-quality output, crafting meaningful charts is a breeze.

I hope this overview provides ample inspiration on how you can utilize these plotting functions for your own datasets and analytical needs. Visual data exploration is a key first step before diving deeper.

Please feel free to reach out with any questions or feedback on using these indispensable Pandas plotting tools! I’m always happy to help you better leverage the power within your data.

Thanks for reading.

Leave a comment