DTSA 5304 - Master's Degrees
Post Secondary Degrees Awarded in Colorado
The following data visualizations will use the Degrees Awarded to Post-Secondary Graduates in Colorado data set available on the Colorado Information Marketplace (https://data.colorado.gov/Higher-Education/Degrees-Awarded-to-Post-Secondary-Graduates-in-Col/hxf8-ab6k). The data includes all degrees, including both certificates, associates, undergraduate, and graduate degrees for all degrees awarded by the Colorado Board of Higher Education since 2001 and up to 2017. The data is broken down further by Institution, gender, ethnicity, age, and program.
Some key questions we can seek to answer with this data include:
- What are the most popular programs?
- What are the demographics of people completing certificates, associates, undergraduate, and graduate degrees?
- Are there noticeable shifts in enrollment in certain programs over time?
- Are there noticeable demographic shifts in enrollment over time?
As the dataset is fairly large, I am choosing to focus just on Masterâs Degrees awarded. This will help further refine our goals and tasks:
- Task: View various metrics on Masters degrees awarded in Colorado to discern various relationships, including: degrees earned, gender breakdown, institution breakdown, and more
- Means: This task will be
- Characteristics: This task seeks to learn the relationships between Masters degrees earned in Colorado over time in conjunction with the additional metrics provided in the data set
- Target data: The data will need to be pre-aggregated based on metrics, as the dataset is quite large
- Workflow: A user will be able to select a field of study to update multiple graphs to show the relationship between Masters Degrees over time
- Roles: The potentials roles are people who are interested in the change in Masters degrees awarded in Colorado over time, which could include those who work in higher education, prospective Masters degree students, and more
# Libraries
import pandas as pd
import numpy as np
from dateutil import parser
from datetime import timezone,datetime,timedelta
from dateutil.relativedelta import relativedelta
import math
import altair as alt
from sodapy import Socrata
# Loading Data
import pandas as pd
from sodapy import Socrata
# Sending request (can take a bit of time)
client = Socrata("data.colorado.gov", None)
results = client.get_all("hxf8-ab6k")
# Convert to pandas DataFrame (can take a bit of time)
deg = pd.DataFrame.from_records(results)
deg = deg.astype({'programname':'str', 'institutionname':'str', 'recordcount':'float', 'cip':'str', 'cip2':'str'})
# Cleaning Data
deg = deg[deg['institutionname'].isna() == False]
deg = deg[deg['recordcount'].isna() == False]
# Filtering to specific data frames
masters_deg['year'] = masters_deg['year'].apply(lambda x: pd.to_datetime(x, format='%Y').date())
masters_deg = deg[(deg['degreelevel'] == 'Masters')]
masters_deg = masters_deg[(masters_deg['programname'].isna() == False) & (masters_deg['programname'] != 'nan') ]
masters_deg.loc[masters_deg['ethnicity'] == 'Unknown', 'ethnicity'] = 'Unknown Ethnicity'
all_deg = masters_deg.groupby(['year','programname'], as_index=False).sum('recordcount')
deg_program = masters_deg.groupby(['year','programname','institutionname'], as_index=False).sum('recordcount')
age_desc = masters_deg.groupby(['year','programname','agedesc'], as_index=False).sum('recordcount')
gender = masters_deg.groupby(['year','programname','gender'], as_index=False).sum('recordcount')
ethnicity = masters_deg.groupby(['year','programname','ethnicity'], as_index=False).sum('recordcount')
residency = masters_deg.groupby(['year','programname','residency'], as_index=False).sum('recordcount')
# Program Name Dropdown
program_dropdown = alt.binding_select(options = np.unique(masters_deg['programname']), name='Degree Program: ')
selection = alt.selection_point(fields = ['programname'], bind=program_dropdown)
# Individual Legend Selection
deg_selection = alt.selection_point(fields = ['institutionname'], bind='legend')
age_selection = alt.selection_point(fields = ['agedesc'], bind='legend')
# gender_selection = alt.selection_point(fields = ['gender'], bind='legend')
ethnicity_selection = alt.selection_point(fields = ['ethnicity'], bind='legend')
# residency_selection = alt.selection_point(fields = ['residency'], bind='legend')
all_deg_chart = alt.Chart(all_deg).mark_bar(width = 15).encode(
alt.X('year:T', axis=alt.Axis(format='%Y')).title('Year'),
alt.Y('sum(recordcount):Q').title('Degrees Awarded'),
alt.Color('sum(recordcount):Q').title('Degrees Awarded').scale(scheme='viridis'),
tooltip = [alt.Tooltip('sum(recordcount):Q',title='Degrees')]
).add_params(selection).transform_filter(selection).properties(title='Degrees Over Time')
deg_chart = alt.Chart(deg_program).mark_bar(width = 15).encode(
alt.X('year:T', axis=alt.Axis(format='%Y')).title('Year'),
alt.Y('sum(recordcount):Q').title('Degrees Awarded'),
alt.Color('institutionname:N').title('Institution'),
tooltip = [alt.Tooltip('institutionname:N',title='Institution'), alt.Tooltip('sum(recordcount):Q',title='Degrees')]
).add_params(selection,deg_selection).transform_filter(selection).transform_filter(deg_selection).properties(title='Institution')
age_chart = alt.Chart(age_desc).mark_bar(width = 15).encode(
alt.X('year:T', axis=alt.Axis(format='%Y')).title('Year'),
alt.Y('sum(recordcount):Q').title('Degrees Awarded'),
alt.Color('agedesc:N').title('Age Range').scale(scheme='yellowgreenblue'),
tooltip = [alt.Tooltip('agedesc:N',title='Age'), alt.Tooltip('sum(recordcount):Q',title='Degrees')]
).add_params(selection,age_selection).transform_filter(selection).transform_filter(age_selection).properties(title='Age')
gender_chart = alt.Chart(gender).mark_bar(width = 15).encode(
alt.X('year:T', axis=alt.Axis(format='%Y')).title('Year'),
alt.Y('sum(recordcount):Q').title('Percent Degrees Awarded').stack("normalize"),
alt.Color('gender:N').title('Gender').scale(scheme='pastel1'),
tooltip = [alt.Tooltip('gender:N',title='Gender'), alt.Tooltip('sum(recordcount):Q',title='Degrees')]
).add_params(selection).transform_filter(selection).properties(title='Gender')
ethnicity_chart = alt.Chart(ethnicity).mark_bar(width = 15).encode(
alt.X('year:T', axis=alt.Axis(format='%Y')).title('Year'),
alt.Y('sum(recordcount):Q').title('Degrees Awarded'),
alt.Color('ethnicity:N').title('Ethnicity').scale(scheme='set2'),
tooltip = [alt.Tooltip('ethnicity:N',title='Ethnicity'), alt.Tooltip('sum(recordcount):Q',title='Degrees')]
).add_params(selection,ethnicity_selection).transform_filter(selection).transform_filter(ethnicity_selection).properties(title='Ethnicity')
residency_chart = alt.Chart(residency).mark_bar(width = 15).encode(
alt.X('year:T', axis=alt.Axis(format='%Y')).title('Year'),
alt.Y('sum(recordcount):Q').title('Percent Degrees Awarded').stack("normalize"),
alt.Color('residency:N').title('Residency').scale(scheme='purples'),
tooltip = [alt.Tooltip('residency:N',title='Residency'), alt.Tooltip('sum(recordcount):Q',title='Degrees')]
).add_params(selection).transform_filter(selection).properties(title='Residency')
# deg_chart | age_chart | gender_chart | ethnicity_chart | residency_chart
full_chart = alt.vconcat(all_deg_chart, deg_chart, age_chart, gender_chart, ethnicity_chart, residency_chart) \
.resolve_scale(color='independent').resolve_legend(color='independent').properties(title='Masters Degrees Awarded Over Time')
full_chart
The interactive visual is below. Scroll down to the bottom of the visualization to change the Degree Program in a dropdown menu. Select different legend values on certain charts to further filter by legend value.
# extract chart
full_chart.save('DataVisFinal_ChartOnly.html')
The key elements of the final product are the following:
- Charts:
- Degrees Awarded Over Time: This is the only visual that directly tracks the discrete number of Masterâs Degrees awarded over time.
- Institution Breakdown Over Time: Institution is one of the most important facets when describing a Masterâs Degree.
- Age Range Breakdown Over Time: Age Range is interesting to look at over time and across different degree programs. While the ranges are large, the breakdown can still provide valuable insight.
- Gender Breakdown Over Time: Gender is also interesting to look at over time and across different degree programs, especially when looking at the gender breakdown of the cohort over time.
- Ethnicity Breakdown Over Time: Ethnicity is also interesting to look at over time and across different degree programs.
- Residency Breakdown Over Time: Residency is less important, but can still be very interesting when looking across different degree programs.
- Interactions:
- Degree Program Filter: Degree program must be in a filter, as there are too many degrees to visualize on one chart.
- Individual Legend Filters: For almost all charts, the legend can also work as a filter on the specific legend values. This was important to implement, as some legend values make up a very small subset of the overall data displayed, and thus could be difficult to view.
I specifically made the Gender Breakdown and Residency Breakdown graph a normalized bar chart. This was necessary to implement as the most important relationship, which is the overall percent breakdown of the cohort, was easier to display as a normalized bar chart.
In a perfect world, my preliminary evaluation will be a survey that is directly linked to the dashboard/visualization. A survey evaluation would also allow feedback from the largest and likely widest subset of users. I would like to design the survey to collect qualitative data on the overall design, visualizations, and ease of use. In the survey, there would be a section of general questions, like âhow easy was this dashboard to navigate,â âhow easy was each chart to read.â Another section would focus more on individual visual marks and representation, like âdid you find the colors distinct enoughâ and âwere the axes easy to read and interpret.â The last section would ask specific questions on overall insights a user would be able to gain from the graph. Asking these questions would help determine if valuable and correct insights were being gained from looking at the data. Lastly, there would be a free text option to provide any additional questions or comments. The dashboard would be successful if most responses were positive on the dashboardâs features and visual representation, and if the user correctly identified overall trends within the data.
However, as I am likely not able to collect that type of feedback on my personal project website, I have instead chosen to ask my friends and family for direct feedback on this iteration of the visualization.
In future versions, I would like to further refine the filtering and display of the data. The individual Degree Program filter is fine, but I would like to have the overall list searchable and allow for multi-selection to look at multiple degree programs in aggregate. I would also like to find out why there doesnât seem to be data for 2011, which makes the overall visualization look unreliable as it is missing data.