Date of Award


Access Restriction


Degree Name

Master of Science


Computer Science

School or College

Seaver College of Science and Engineering

First Advisor

Mandy Korpusik

Second Advisor

Lei Huang

Third Advisor

Junyuan Lin


Self-assessment of food intake is important for preventing and treating obesity. The current self-assessment methods of food intake are inaccurate and hard to use. In this thesis, we explore ways to improve machine learning (ML) food classification methods which are the core technical problem of food intake self-assessment. We present a food detection system that utilizes a state-of-the art multi-modal architecture called Vision and Language Transformer (ViLT). This architecture combines both food appearance via the image modality, and description via the textual modality to improve the accuracy of food classification. To further enhance the performance, we incorporate other improvements such as curating a branded food item dataset. We apply transfer learning, an ML method that allows reusing a pre-trained model from a related high-resource task as the starting point for a low-resource task such as ours. This approach reduces the cost and time required compared to building a model from scratch. In addition, we compare our approaches with that of Visual Chat GPT, a combination of vision foundation model and a large language model, and find that our approach for food intake assessment is both accurate and cost-effective.