ScienceTutor - A VLM-Based Educational Application For Children

Highlight

Category: DS/ML
Year: 2023
Keywords: Large Vision-Language Model (LVLM), Application Development, Cloud Computing, Python

Description

Visual Question Answering (VQA) generates natural language answers to questions about images. In science education, VQA systems could help address the growing demand for personalized learning and mitigate teacher shortages in the U.S., where vacancies rose from 15 to 26 per 10,000 students by 2023. Science Tutor is an educational, interactive application specifically tailored to children and provides instant expert-level responses to science questions, covering the domains of natural, social, and language science. The application leverages the ScienceQA dataset and the state-of-the-art LLaVA model to perform VQA in the science domain. The application includes the following components:

A finetuned LLaVA-7b model on the post-processed ScienceQA dataset to adapt to the specific science domain
A user-friendly front-end React application to interact with the ScienceTutor chatbot in the web browser
A robust back-end API service implemented using Flask to expose the fine-tuned LLaVA-7b model functionality to the front-end