Final Project: Proposal

Projects should essentially look like any of our modules, but with you identifying the dataset and questions rather than working from a template. You may choose to replicate a figure from an existing study, but are also encouraged to use this opportunity to perform some exploratory analysis on data you may be using in your DS421 team projects or your own dissertation research. Remember that as with modules, finals will be graded on workflow and presentation rather than particular conclusions. Statistical analysis is not necessary, but any project should demonstrate a suite of skills selected from the list in the final project rubric.


Please prepare a short proposal on your final project idea. The proposal should include:

  • Title & description of the project
  • Your name & partner’s name
  • A description of the data required, and how it will be obtained (e.g. URL/DOI to data source)
  • 3 questions / analysis tasks you will perform on the data; in the spirit of the assignments we have been doing.

You may choose to work with your partner or independently on the final project. Please indicate which clearly in your proposal.

Please create your proposal in a markdown file called in the root directory of the final project repo.

Preliminary Rubric (additional areas will be added)

Project questions must illustrate all of the following tasks:

  • Some form of data access / reading into R
  • Data tidying preparation
  • Initial data visualization
  • Use of GitHub
  • Reproducible execution with use of Travis
  • RMarkdown writeup, with final submission as a nicely formatted PDF document that includes code and results.
  • Overall clean and clear presentation of repository, code, and explanations.

and at least three of the following skills (this list may be modified/extended):

  • Use of at least 5 dplyr verbs / functions
  • Writing / working with custom R functions
  • Creating an R package for functions used in the analysis
  • Interaction with an API
  • Use of regular expressions
  • Use of an external relational database
  • Preparing processed data for archiving / publication
  • Parsing extensible data formats (JSON, XML)
  • Use of spatial vector data (sf package) and visualization of spatial data
  • Creation of an R package
  • Use of purrr package functions for iteration
  • Manipulation of dates or strings
  • Unique challenges you encounter: particularly messy data/ non-standard formats, special emphasis on visualization or presentation of results.