Honors Thesis

Generating a Dataset for Comparing Linear vs. Non-Linear Prediction Methods in Education Research

Date of Completion

5-4-2022

Degree Type

Honors Thesis

Discipline

Mathematics (MATH)

First Advisor

Dr. Anna Bargagliotti

Abstract

Machine learning is often used to build predictive models by extracting patterns from large data sets. Such techniques are increasingly being utilized to predict outcomes in the social sciences. One such application is predicting student success. Machine learning can be applied to predicting student acceptance and success in academia. Using these tools for education-related data analysis, may enable the evaluation of programs, resources and curriculum. Currently, research is needed to examine application, admissions, and retention data in order to address equity in college computer science programs. However, most student-level data sets contain sensitive data that cannot be made public. To help facilitate research and the application of machine learning models to this field, we generate an artificial student-level data set of 50,000 students to simulate college admissions data. We generate this data set for public access and without privacy concerns. Once the data is generated, we then analyze it using logistic regression, K-Nearest Neighbor, random forest, neural networks, and XGBoost techniques to demonstrate and compare the type of analyses that can be conducted on data sets of this type. Finally we provide an analysis on whether the predictive gains of machine learning models outweigh the potential loss of interpretability in comparison to classical statistical methods.

Recommended Citation

Mauro, Jack; Martinez, Elena; and Bargagliotti, Anna, "Generating a Dataset for Comparing Linear vs. Non-Linear Prediction Methods in Education Research" (2022). Honors Thesis. 446.
https://digitalcommons.lmu.edu/honors-thesis/446

Download

Included in

Applied Statistics Commons, Data Science Commons, Other Applied Mathematics Commons

COinS

Honors Thesis

Generating a Dataset for Comparing Linear vs. Non-Linear Prediction Methods in Education Research

Date of Completion

Degree Type

Discipline

First Advisor

Abstract

Recommended Citation

Included in

Search

Browse

Submissions

Links

Resources

About

Honors Thesis

Generating a Dataset for Comparing Linear vs. Non-Linear Prediction Methods in Education Research

Author

Date of Completion

Degree Type

Discipline

First Advisor

Abstract

Recommended Citation

Included in

Share

Search

Browse

Submissions

Links

Resources

About