Sunith Suresh • over 4 years ago
Missing values on graduation cohort data
I have a few questions on missing values in your dataset. I am especially interested in the missing values in "MAM_COHORT_1112", "MAS_COHORT_1112", "MBL_COHORT_1112", "MHI_COHORT_1112", "MTR_COHORT_1112","MWH_COHORT_1112".
In the data set features such as "MAM_ COHORT_1011" which is the "Number of Native American students in the graduation cohort" have over 60% NA values. Is it because the values here are actually 0?
If the value is not 0, I could impute. However with such a high number of missing values, imputation is difficult.
I could estimate the missing values from the census data, however I noticed some discrepancies in values. For instance, observation 89 (ALABAMA - Mobile County) records MAM_ COHORT_1011=45. However the census data for the same observation shows NH_AIAN_alone_CEN_2010 = 2. NH_AIAN_alone_CEN_2010 is defined as "Number of people who indicate no Hispanic origin and their only race as "American Indian or Alaska Native" or report entries such as Navajo, Blackfeet, Inupiat, Yup'ik, or Central/South American Indian groups in the 2010 Census population.
Thank you for your help,
Comments are closed.