STAT 29000: Project 6 — Spring 2021

Motivation: Being able to analyze and create good visualizations is a skill that is invaluable in many fields. It can be pretty fun too! In this project, we are going to dive into matplotlib with an open project.

Context: We’ve been working hard all semester and learning a lot about web scraping. In this project we are going to ask you to examine some plots, write a little bit, and use your creative energies to create good visualizations about the flight data using the go-to plotting library for many, matplotlib. In the next project, we will continue to learn about and become comfortable using matplotlib.

Scope: python, visualizing data

Learning objectives
  • Demonstrate the ability to create basic graphs with default settings.

  • Demonstrate the ability to modify axes labels and titles.

  • Demonstrate the ability to customize a plot (color, shape/linetype).

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset

The following questions will use the dataset found in Scholar:

/class/datamine/data/flights/*.csv (all csv files)

Questions

Question 1

Here are the results from the 2009 Data Expo poster competition. The object of the competition was to visualize interesting information from the flights dataset. Examine all 8 posters and write a single sentence for each poster with your first impression(s). An example of an impression that will not get full credit would be: "My first impression is that this poster is bad and doesn’t look organized.". An example of an impression that will get full credit would be: "My first impression is that the author had a good visualization-to-text ratio and it seems easy to follow along.".

Items to submit
  • 8 bullets, each containing a sentence with the first impression of the 8 visualizations. Order should be "first place", to "honourable mention", followed by "other posters" in the given order. Or, label which graphic each sentence is about.

Question 2

Creating More Effective Graphs by Dr. Naomi Robbins and The Elements of Graphing Data by Dr. William Cleveland at Purdue University, are two excellent books about data visualization. Read the following excerpts from the books (respectively), and list 2 things you learned, or found interesting from each book.

Items to submit
  • Two bullets for each book with items you learned or found interesting.

Question 3

Of the 7 posters with at least 3 plots and/or maps, choose 1 poster that you think you could improve upon or "out plot". Create 4 plots/maps that either:

  1. Improve upon a plot from the poster you chose, or

  2. Show a completely different plot that does a good job of getting an idea or observation across, or

  3. Ruin a plot. Purposefully break the best practices you’ve learned about in order to make the visualization misleading. (limited to 1 of the 4 plots)

For each plot/map where you choose to do (1), include 1-2 sentences explaining what exactly you improved upon and how. Point out some of the best practices from the 2 provided texts that you followed. For each plot/map where you choose to do (2), include 1-2 sentences explaining your graphic and outlining the best practices from the 2 texts that you followed. For each plot/map where you choose to do (3), include 1-2 sentences explaining what you changed, what principle it broke, and how it made the plot misleading or worse.

While we are not asking you to create a poster, please use RMarkdown to keep your plots, code, and text nicely formatted and organized. The more like a story your project reads, the better. In this project, we are restricting you to use matplotlib in Python. While there are many interesting plotting packages like plotly and plotnine, we really want you to take the time to dig into matplotlib and learn as much as you can.

Items to submit
  • All associated Python code you used to wrangling the data and create your graphics.

  • 4 plots, with at least 4 associated RMarkdown code chunks.

  • 1-2 sentences per plot explaining what exactly you improved upon, what best practices from the texts you used, and how. If it is a brand new visualization, describe and explain your graphic, outlining the best practices from the 2 texts that you followed. If it is the ruined plot you chose, explain what you changed, what principle it broke, and how it made the plot misleading or worse.

Question 4

Now that you’ve been exploring data visualization, copy, paste, and update your first impressions from question (1) with your updated impressions. Which impression changed the most, and why?

Items to submit
  • 8 bullets with updated impressions (still just a sentence or two) from question (1).

  • A sentence explaining which impression changed the most and why.