Problem Statement

In 2001, the U.S. national on-time high school graduation rate was only 71.7%. As a result of progress over the last decade, that figure is now 81.4%!  To put that in perspective, 1.8 million more students graduated from from high school in 2013 than in 2004. This is good news, but to reach the GradNation campaign goal of 90 % on-time completion by 2020, over 300,000 additional graduates need to graduate in the class of 2020 than did in 2013.

Recent progress is due not just to changing demographic and economic trends, but also the efforts of leaders in schools, districts, communities and states who are working hard to drive change. New analysis of district-level data shows that some school districts are making tremendous progress, while others are lagging or even regressing. We need to find out more about the root causes so that we can take action.

We already understand some of the reasons students don’t make it through high school. Income, ethnicity, family dynamics, and pregnancy (just to name a few) all play a significant part. But what are the other factors affecting graduation rates that we don’t know about? Does bullying play a part? Crime? What about local gas prices & transportation, or the layout of a city? We need your help to find the answers.

Source: 2015 Building a GradNation Report


  1. Analyze - Using one or more of the provided public datasets, incorporate data science, data visualization, and ideation into a compelling argument that will help increase high school graduation rates.
  2. Create - Highlight your insights in a static or interactive data visualization, OR a data-centric, functional software application.

You may use additional datasets in addition to one or more of the required datasets, as long as you have the right (or license) to do so.


Required Datasets

There are two distinct approaches for tackling this challenge:

  1. The first is to use our pre-built dataset, which includes graduation data joined with the maximum overlapping Census data. [Download the pre-built dataset.]
  2. The second is to take a more in depth look at our graduation problem. We’ve provided the individual data sets that we used to join the graduation data and Census data together. In addition, we’ve also provided the mapping logic that includes information about every Census tract that overlaps each school district. We highly encourage you take these all into account. [Download the individual datasets.]

Not sure how to attack these datasets? Have no fear, the Data for Diplomas data wrangling team has put together some sample benchmark analyses to get you started.

Thank you to the Everyone Graduates Center at Johns Hopkins for their partnership with the data wrangling for the Data for Diplomas competition.


Additional Data

Want even more data? You may want to check out the data below. These datasets won’t fulfill your required dataset (you need to use one of the datasets listed above), but you’re welcome to use it in addition in your submission or for your own learning!

Civic data sources:

Existing data applications: 

  • DataLook is a repository of reusable data-for-social-good projects.


Explaining the Judging Criteria

Your submission will be judged based on how well it meets the criteria below.

Predictive Capability (20%)
Includes the extent to which the submission can account for variation in graduation rates.

Explanation: In our first attempt at accounting for variation in high school graduation rates (Benchmark I), we were able to improve our predictive power. After adding in a few parameters to our model (state name,PLI (percentage low income), our R-squared value  was nearly 25%. Because there are too many states to include in the output sample above, we have trimmed our results down to show only the last few states for conceptualization. Note that some states have a positive coefficient of determination, and some have a negative. This supports the hypothesis that high school graduation rates vary by state.  *PLI = (ECD_COHORT_1112/ALL_COHORT_1112)

In our second attempt to explain variation in graduation rates (Benchmark II), our efforts were much more fruitful with a new R-squared value of 38.12%. We decided to engineer several variables from our graduation dataset in order to try and extract more information. For example, we decided to take a look at the demographic characteristics of the school districts. For example, from above, our DISABILITY_PERC variable was one that was derived from our original dataset. It was calculated by taking the total number of disabled students within a school district and dividing by the total number of students in that district. This gave us the fraction of disabled students per district. As we can see, not only was the parameter statistically significant, but it also had a negative effect on our graduation rate. 

By incorporating more information, we hope that your team can help exploit the causes for variation within districts. The graduation dataset has valuable information, but we believe there is large, unchartered territory for exploring Census data in conjunction.


Actionable Insights Gathered (40%)
Includes the extent to which the data driven insights can be actionable and work towards our common goal of increasing high school graduation rates.

Explanation: Not everything you find useful for predicting high school drop out rates will also be useful in helping to mitigate it. Our judges are looking for data driven insights that can be actionable and work towards our common goal of social good. It’s in your best interest to - as much as possible - provide proof from the data of your findings as well as support for why your insights can make a difference. Yes, things like household income might influence graduation rates, but is there anything we can really do?


Data Visualization / Presentation (40%)
Includes the creative appeal and potential usefulness of the data visualization(s) or the data-centric software application.

Explanation: Get creative! The opportunity to exhibit your imagination and UX abilities in this section are boundless. Just don’t forget to give us access to experience your hard work!


Additional Ideas

Here are a few ideas to get your brain going.

  • What insights were you able to find that would potentially be actionable from the housing unit variables, household or other percent variables from the census data? What insight does the financial school district data provide? 
  • Were you able to find and merge any other data sets (e.g. bullying, crime, and safety factor data), and does that add any additional (beyond the required data sets) predictability into the dropout rates? 
  • How does bullying, crime, and safety factor into dropout rates? Create a data visualization using these factors verses other known dropout reasons like low income, attendance, and geography.
  • Determine how after school programs play an import part in successful graduation rates using data visualizations or app.
  • Compare dropout rates with a community's climate patterns and create a data visualization about your findings.
  • Create apps that let parents and community members see which schools have the highest and lowest graduation rates in their area.
  • Consider how urban planning (or the setup of a city) might impact graduation rates, make a data visualization about your findings.


About AT&T Aspire

AT&T Aspire is AT&T’s signature education initiative that drives innovation in education by bringing diverse resources to bear on the issues of high school success and career readiness including funding, technology, employee volunteerism, and mentoring. Aspire invests in innovative education organizations, tools, and solutions to prepare students for success tomorrow.


About GradNation

The GradNation campaign, launched in 2010, included the creation of a Civic Marshall Plan, to bring together policymakers, educators, business leaders, community allies, parents and students around the goal of raising the national graduation rate to 90% by the Class of 2020. The campaign addresses the dropout epidemic by raising awareness and inspiring action through the annual Building a Grad Nation report, a national summit, and community networks and summits.


Additional Resources

Use of these resources is not required, but you may want to review them to round out your understanding of the problem.


  • AT&T Aspire
  • GradNation 
  • Civic Enterprises 
  • Everyone Graduates Center at Johns Hopkins
  • New Tech Network is a national non-profit school development organization that partners with public school districts and charter schools to implement an innovative model so that all students graduate college and career ready.
  • BlueLabs is an analytics and technology company dedicated to innovation for social good. Our projects range from helping create tools to explore healthcare data in real time, to helping provide cell phones to homeless LGBTQ youth. 




More Questions? 

For questions about the Data for Diplomas competition, email, or post them to the Discussion Board.