Date of Award

5-2-2023

Access Restriction

Thesis

Degree Name

Master of Science

Department

Computer Science

School or College

Seaver College of Science and Engineering

First Advisor

Mandy Korpusik

Second Advisor

Lei Huang

Third Advisor

Junyuan Lin

Abstract

Self-assessment of food intake is important for preventing and treating obesity. The current self-assessment methods of food intake are inaccurate and hard to use. In this thesis, we explore ways to improve machine learning (ML) food classification methods which are the core technical problem of food intake self-assessment. We present a food detection system that utilizes a state-of-the art multi-modal architecture called Vision and Language Transformer (ViLT). This architecture combines both food appearance via the image modality, and description via the textual modality to improve the accuracy of food classification. To further enhance the performance, we incorporate other improvements such as curating a branded food item dataset. We apply transfer learning, an ML method that allows reusing a pre-trained model from a related high-resource task as the starting point for a low-resource task such as ours. This approach reduces the cost and time required compared to building a model from scratch. In addition, we compare our approaches with that of Visual Chat GPT, a combination of vision foundation model and a large language model, and find that our approach for food intake assessment is both accurate and cost-effective.

Share

COinS