📜 MS-DS Projects
Masters in Data Science
University of Colorado Boulder Spring 2023 - Current
Here is a repository of all projects completed for my MS-Data Science! Projects will be grouped by associated class and main programming language used will be listed.
DSTA 5301 - Introduction to Data Science (🦏 R)
NYPD Shooting Data Analysis
Short project analyzing NYPD open access shooting data.
COVID-19 Data Analysis
Project analyzing John Hopkins University global COVID data. The first part of the analysis, labeled “DSTA In Class Analysis” was part of the lectures for the class. The “Additional Analysis” section is my own analysis on Colorado county level COVID data. Interestingly, El Paso county seems to be the worst faring county in Colorado regarding COVID.
DSTA 5304 - Data Visualization (🐍 Python)
Master’s Degrees Awarded in Colorado
Project visualizing Master’s degrees awarded in the state of Colorado from 2001-2017. The visualization was built in Altair and is interactive, both via the bottom degree program filter and on each individual plot legend. This page includes an interactive visualization!
DTSA 5800 - Network Analysis (🐍 Python)
Building Network Graphs for Twitter Data
Using twitter tweets (from a json file), mentions, and users to create various user networks and semantic networks. Final project for Network Analysis class.
DTSA 5506 - Data Mining Project (🐍 Python)
Ask A Manager Salary Survey Data Pipeline Project
Data mining pipeline project that goes through the cleaning and tidying of data, visualization and analysis, clustering, and prediction modeling. Major packages used include scapy, sci-kit learn, and plotly. Data mining pipeline project that goes through the cleaning and tidying of data, visualization and analysis, clustering, and prediction modeling. Major packages used include scapy, sci-kit learn, and plotly.
DTSA 5509 - Supervised Learning (🐍 Python)
Predicting Natural Gas Spot Prices
Using Supervised Learning to predict Natural Gas Spot Prices (Henry Hub) 5 weeks in advanced. Uses sci-kit learn and matplotlib to model and visualize results. Final project for Supervised Learning class.
DTSA 5510 - Unsupervised Learning (🐍 Python)
Non-negative Matrix Factorization to Cluster BBC News Article Topics (Kaggle)
Using Non-negative Matrix Factorization techniques to cluster articles by topic. Utilizes spaCy and TF-IDF vectorization to transform the article text to a matrix. Also compares NMF techniques to supervised classification techniques. Part of this Kaggle competition.
Clustering Mushrooms Based on Edibility
Using Unsupervised Learning to cluster mushrooms based on mushroom features to attempt to classify mushrooms as poisonous or edible to humans. Uses sci-kit learn and matplotlib to model and visualize results. Final project for Unsupervised Learning class.
DTSA 5511 - Deep Learning (🐍 Python)
Cancer Detection (Kaggle)
Detecting cancer from images of cells using a Convoluted Neural Network. Part of this Kaggle competition.
Classifying Disaster Tweets (Kaggle)
Classifying if tweets are about a disaster event or not using a Sequential Neural Network. Part of this Kaggle competition.
Monet Art Generation (Kaggle)
Building a CycleGAN network to train model to output Monet like paintings from real images. Part of this Kaggle competition.
AI Generated Essay Classification
Using a Deep Learning Network to classify essays (multi-paragraph text with around 250+ words) as AI-Generated or non AI-Generated. Uses TensorFlow and Keras, as well as sci-kit learn to model and visualize results. Final project for Deep Learning class.