Assignment 3: EDA and Data Visualization

This assignment is quite a bit different than the first two assignments. Now that we’ve built a solid foundation of spreadsheet modeling/engineering skills and done some pretty structured assignments, you are going to get a chance to be more creative. In this assignment, you are going to do some exploratory data analysis (EDA) and create data visualizations and dashboards. We spent two weekly modules exploring these topics and our Moodle site contains numerous resources for you to draw upon. In particular, our BA textbook does a nice job of reviewing some Excel EDA specifics in Chapters 2, 3, and 17.2. You should also read through the series of whitepapers I posted in the first module on this topic:

Cycle Share data analysis

Many cities have started bike sharing programs. These programs often capture very detailed data related to bike usage and such data can be quite helpful in managing these systems. In this assignment, we’ll be using data from Seattle’s Cycle Share program as well as weather related data from the National Climatic Data Center. You can find the three datasets in the data folder within the zip file. Here’s a description of the files:

Context

The Pronto Cycle Share system consists of 500 bikes and 54 stations located in Seattle. Pronto provides open data on individual trips, stations, and daily weather.

Content

There are 3 datasets that provide data on the stations, trips, and weather from 2014-2016.

  • Station dataset
    • station_id: station ID number
    • name: name of station
    • lat: station latitude
    • long: station longitude
    • install_date: date that station was placed in service
    • install_dockcount: number of docks at each station on the installation date
    • modification_date: date that station was modified, resulting in a change in location or dock count
    • current_dockcount: number of docks at each station on 8/31/2016
    • decommission_date: date that station was placed out of service
  • Trip dataset
    • trip_id: numeric ID of bike trip taken
    • starttime: day and time trip started, in PST
    • stoptime: day and time trip ended, in PST
    • bikeid: ID attached to each bike
    • tripduration: time of trip in seconds
    • from_station_name: name of station where trip originated
    • to_station_name: name of station where trip terminated
    • from_station_id: ID of station where trip originated
    • to_station_id: ID of station where trip terminated
    • usertype: “Short-Term Pass Holder” is a rider who purchased a 24-Hour or 3-Day Pass; “Member” is a rider who purchased a Monthly or an Annual Membership
    • gender: gender of rider
    • birthyear: birth year of rider
  • Weather dataset contains daily weather information in the service area

I want you to take the perspective of an analyst who has been asked to provide a thorough analysis of the usage of this bike share program based on the data provided. Pretend a new operations manager of the Cycle Share (CS) has just started and wants you to provide her with summary statistics and data visualizations to help her understand how various location, weather, demographic and time related variables affect bike share use. Some examples of basic questions she has include things like:

How many people use CS every month? Is ridership increasing, decreasing, staying the same? Do these trends differ by different rider types, age, gender or other variables?

What are popular stations to rent from? Popular destinations? Popular trips?

How long are trips? How is ride duration impacted by rider, station, or weather characteristics?

How does ridership vary by day of week and time of day?

How does weather affect ridership?

Of course, these basic questions lend themselves to more nuanced questions such as whether rain or temperature has a greater effect on ridership and how do weather and temporal factors interact to impact ridership? She doesn’t even know all the questions she has and is counting on you to enlighten her. Be creative, be analytical, wow her with your insightful analysis.

So, your job is to create a series of tables, graphs, dashboards, infographics, supporting text, or whatever else you think is appropriate to give your new manager the information and insight she needs to chart a good future course for the CB program. Your visualizations and analysis should be woven together in a coherent way as to tell a story to the new manager.

You could do the whole thing in Excel but are welcome to also use other tools like Access, Tableau or PowerPivot. You can definitely use PowerPoint or Word or similar tools as a “container” to help structure your story. You could also use the Story feature in Tableau to structure your presentation.For example, you might use Excel to do your data prep and analysis and then paste visualizations into a Powerpoint presentation which includes your analytical commentary and summarizations. Other people might choose to use some combination of Excel and Tableau for the analysis and use the Tableau Story feature to put it all together (or just copy and paste Tableau visualizations into a Powerpoint presentation. Use your creativity and imagination to combine tools as you see fit.

You CANNOT use any programmatic analysis tools like R or Python.

Make sure you use principles of graphical excellence, solid Excel graph and table designs, and, most importantly, that you tell a coherent and compelling story based on the bike share data. Don’t just create a hodge podge of unrelated graphs and tables. Weave your visualizations together in some coherent and logical way. I’ve given you numerous resources in our course website to help you.

In addition to the actual analysis deliverables, you’ll also be turning in a short supporting technical document (use Word) that describes the various steps you took in creating the deliverables. For example, this supporting document will describe what you did in the data prep phase and which tools you used, how you went about creating the analytical outputs (e.g. did you use Pivot Tables or Tableau or whatever). So, as you are doing the assignment, take some notes in a Word doc so that you can then turn that into this supporting technical document.

Getting Started and Data Prep

You might want to start by importing the CSV files into Excel and just browse through the three tables and the descriptions above to become familiar with the data.

Then, think about creating any new computed columns you might want to have which will facilitate your analysis. For example, there is a datetime field called starttime in the trip table. I’d recommend creating an additional field called something like tripdate and computing the date (with no time) based on the starttime field. We learned all about Excel dates and times and this should be easy for you. Why might you want to do this? Well, if you look in the weather table you’ll see that it contains one row per date. So, if you want to try to create one big master table containing data from the trip and weather tables, you can use the date to look up values from tables as needed.Feels like a VLOOKUP, eh?

STRONG SUGGESTION: If you do a bunch of VLOOKUPs to create a master data table, when you are all done, go ahead and do a Copy – Paste Special – Values into a new workbook sheet. This will make subsequent analysis much quicker since Excel won’t have to recompute a bunch of formulas.

Of course, if you use Tableau or PowerPivot, you might be able to simply join the tables and avoid creating a big master table. Either or both approaches are fine. In fact, here’s a link to a short article by the Tableau folks about this very issue:

http://onlinehelp.tableau.com/current/pro/desktop/en-us/help.htm#multiple_connections.html.

The primary goal for the data preparation step is to make it easy to do the types of analysis you want to do. You may end up creating a few different data tables for analysis. That’s up to you. Just document what you do.

Analysis Suggestions

When doing a comprehensive analysis for an upper level manager, it’s almost always a good idea to start with high level, important, overall statistics and visualizations and then dive into the details to explore further. For example, do NOT start by showing some detailed analysis of how windy Tuesdays impact ridership during the spring. For example, you could start with overall volume trends and rider demographics.

Get Ideas and Inspiration

A few years ago, the state of Colorado held a data visualization contest focused on the state of Colorado’s public school system. All the winning entries used to be posted – however, now just the following one is still available. Again, this is just to provide some ideas and inspiration.

http://infogr.am/Colorado-school-equality

The Tableau Public site has some nice examples of specific visualizations done in Tableau.

Remember, the goal isn’t to be “fancy”, it’s to concisely convey to the new manager what is going on with the CS program in Seattle.

Deliverables

Obviously, the nature of your deliverables is affected by the tools you use. There are two main parts to your deliverables:

  • The analysis products themselves – i.e. what you’d deliver to your manager. For example this might be a Tableau Story, or a Powerpoint presentation or Word document with the graphs/tables created in either Excel or Tableau. It could even just be a nicely structured Excel file with nagivational aids, graphs (possibly interactive) and summary text in text boxes or something similar. Use your imagination and ingenuity.

    IMPORTANT: If you end up copying and pasting graphs/tables from Excel into Powerpoint or Word, I also want the Excel files containing the actual data and graphs/tables so that I can see how you created these things.

  • A technical background document in MS Word that describes how you did what you did. This doesn’t need to be super detailed but should make it clear to me how:
    • You did your data prep
    • You constructed the various visualizations
    • and how you constructed overall story.

IMPORTANT: You MUST put all of your deliverables into a folder, zip the folder and then upload the zip file. Give your zip file a good filename.

Looking for a solution written from scratch with No plagiarism and No AI?

WHY CHOOSE US?

We deliver quality original papers

Our experts write quality original papers using academic databases.We dont use AI in our work. We refund your money if AI is detected  

Free revisions

We offer our clients multiple free revisions just to ensure you get what you want.

Discounted prices

All our prices are discounted which makes it affordable to you. Use code FIRST15 to get your discount

100% originality

We deliver papers that are written from scratch to deliver 100% originality. Our papers are free from plagiarism and NO similarity.We have ZERO TOLERANCE TO USE OF AI

On-time delivery

We will deliver your paper on time even on short notice or  short deadline, overnight essay or even an urgent essay