Project 03 - 35 Points
Date Assigned: Thursday, November 13, 2025.
Initial Proposal Due Date: Monday, December 1, 2025.
Due Date: Friday, December 12, 2025 at 5 PM CST (The day of the Final Exam for this class) Use of AI. You are allowed to use AI tools like ChatGPT to help you with the coding portion of this project, but you must include a document called “Use of AI” with your project. For each time you used the output of an AI tool, add an entry into your “Use of AI” document with the following fields:
Tool: The tool used, such as ChatGPT.
Prompt: The prompt you provided to the tool.
Output: The output from the tool that you used.
Number each entry in the document like [1], [2], [3], etc. For example:
Use of AI
---------
[1]. Tool: ChatGPT
Prompt: Write python code to read a csv file into a pandas dataframe
Output:
# Replace 'your_file.csv' with the path to your CSV file
df = pd.read_csv("your_file.csv")
[2]. *Additional entries here*...
Then, within the code (i.e., jupyter notebook) put a comment referencing the entry of the Use of AI document. For example:
# The code below was generated by AI; see [1].
df = pd.read_csv("cars.csv")
Please do not use AI to generate the report for part 3. Learning to communicate effectively is an important life skill. If you do not use AI, please include the document with a comment “No use of AI.”
Group Assignment: Students can work individually or in groups of two on this assignment. When working in groups, we expect both students to contribute equally to all aspects of the project. You are allowed to talk to students in other groups about the project, but please do not copy any code for the notebook or text for the report.
If you use ChatGPT, please state exactly how you used it. For example, state which parts of the code it helped you generate, which errors it helped you debug, etc. Please do not use ChatGPT to generate the report for part 3.
Late Policy: This project is due during the final exam period for our class. As a result, we will not be able to accept late projects.
Project Description: Open-ended ML
The last project is open-ended, allowing you to propose an idea for a project that is of interest to you. You may build upon any of the previous projects, and you may incorporate any of the techniques or ideas presented in the lectures. You may also choose to study and incorporate a published paper or a new ML technology.
Initial Proposal
Begin by writing an initial proposal for your project. The initial proposal has a maximum of 2 pages in length, though you can include additional figures and/or data samples, etc., beyond the two pages, as necessary.
Include the following in your proposal:
Introduction and topic and/or problem statement – A short introduction and summary of the goals of the project
Data sources that will be used – A reference to any datasets utilized in the project
List of high-level methods, techniques and/or technologies that you are considering using.
Products to be delivered – what are the primary deliverables for the project? This is what we will be grading
The initial proposal will be worth 4 points. Groups or individuals that do not submit the proposals by the due date will lose the 4 points and will still be required to get a proposal approved in order to submit the final project. Commit your initial proposal to your project 3 git repository.
We will do our best to review all initial proposals by end of day Tuesday, December 2nd. For proposals that need modifications, we will notify you before class on Tuesday, December 2nd, and you should schedule time with us soon thereafter to understand any changes that need to be made. We will make the following times available for people to discuss:
Tuesday, December 2nd: 12:30-2pm; and 3:30-4:30 pm
Thursday, December 4th: 12:30-2pm; and 3:30-4:30pm (after class)
Git Repository
The project products should be saved into an organized git repository, similar to the way we have done the other projects. The initial proposal should be among the documents included in your git repository, and it should be there by the initial proposal due date (Monday, December 1, 2025).
Final Report and Video
Submit a written report of your project. The following sections should be covered:
Introduction and project statement
Data sources and technologies used
Methods employed
Results
References
The final report should be a maximum of 10 pages.
Also, create a short video, no more than 10 minutes. The video should be a presentation of your report and cover the primary aspects of your work. Record you video using Zoom (use the “record to the cloud” option) and email a shareable link to the zoom recording to the instructors and TA.
We will watch all of the videos during the final exam week. (Attendance will be optional).
Grading
The final project will be graded as follows:
- Initial Proposal – 4 points (no partial credit for late proposals)
First draft submitted on time
Any required updates made within 1 week.
- Project Concept – 6 points
Does the project concept involve relevant machine learning topics, ideas and/or technologies? (3 pts)
Is the project useful and/or interesting? (3 pts)
Is the project unique? (Bonus points)
- Project Products – 15 points
Do the products achieve the described goals? (5 pts)
Are the products available (e.g., in a code repository)? (5 pts)
Are the products well-documented? (5 pts)
- Final Report & Video – 10 points (6 points report, 4 points video)
Does the the final report/video cover all sections? (2 pts report, 2 pts video)
Is the writing/video easy to follow? (i.e., there is a logical progression of the presentation, important details are not missing but we are not drowned in minutiae either, etc.) (3 pts report, 2 pts video)
Are all sources referenced? (1 pt, report only)
Project Ideas
Here is a list of potential project ideas, but this is just a list to help you get started brainstorming. We encourage you to come up with your own project idea based on a dataset/topic/technology, etc., that is of interest to you.
Investigate advanced classical algorithms such as XGBoost or Support Vector Machines for tabular data. One option would be to use the dataset from Project 2; alternatively, you could find a separate dataset of interest to you. Compare the performance these methods compare to those we studied in Unit 2.
Perform sentiment analysis, text summarization or other classical NLP tasks on commonly available datasets such as social media postings, product reviews, articles or papers, etc. What model(s)/techniques you will use? You might consider using LLMs/transformers as part of this project. How will you evaluate how well your model performs?
Investigate methods for utilizing transformer models/LLMs for structured/tabular data. How does few-shot or even zero-short learning perform on structured datasets we have looked at? Compare the results to methods we looked at in class.
Explore a search space of neural network architectures to find the optimal architecture or explore other hyperparameters. What search technique will you use? Consider investigating the Keras Tuner package to explore hyperparameters associated with a Keras model. The package includes different search strategies you can try.
Model Chaining and Serving – Create multiple models that can be chanined together and serve them as part of an inference server deployment. For example, a first model could do image to text and a second model could do sentiment analysis on the text produced by the first.
Truthfulness of LLMs – Run the TruthfulQA benchmark on a number of LLMs from Hugging Face and report the results.
LLM fine-tuning – Fine tune a language model on a specific task of interest to you. Think about a problem that will allow you to build a data set that can be used for fine-tuning. Evaluate the model on the task both before and after fine-tuning. Also, evaluate the model on a different task, both before and after the fine-tuning. Does the fine-tuning process cause the model to “forget” (i.e., get worse at) the task it was not fine-tuned on?