Project 02 - 20 Points (with 2 possible bonus points)

Date Assigned: Thursday, October 16, 2025.

Due Date: Thursday, November 13, 2025, 5 pm CST.

Group Assignment: Students can work individually or in groups of two on this assignment. When working in groups, we expect both students to contribute equally to all aspects of the project. You are allowed to talk to students in other groups about the project, but please do not copy any code for the notebook or text for the report.

Use of AI. You are allowed to use AI tools like ChatGPT to help you with the coding portion of this project, but you must include a document called “Use of AI” with your project. For each time you used the output of an AI tool, add an entry into your “Use of AI” document with the following fields:

Tool: The tool used, such as ChatGPT.
Prompt: The prompt you provided to the tool.
Output: The output from the tool that you used.

Number each entry in the document like [1], [2], [3], etc. For example:

Use of AI
---------

[1]. Tool: ChatGPT
     Prompt: Write python code to read a csv file into a pandas dataframe
     Output:
       # Replace 'your_file.csv' with the path to your CSV file
       df = pd.read_csv("your_file.csv")

[2]. *Additional entries here*...

Then, within the code (i.e., jupyter notebook) put a comment referencing the entry of the Use of AI document. For example:

# The code below was generated by AI; see [1].
df = pd.read_csv("cars.csv")

Please do not use AI to generate the report for part 3. Learning to communicate effectively is an important life skill. If you do not use AI, please include the document with a comment “No use of AI.”

Late Policy: Late projects will be accepted at a penalty of 1 point per day late, up to five days late. After the fifth late date, we will no longer be able to accept late submissions. In extreme cases (e.g., severe illness, death in the family, etc.) special accommodations can be made. Please notify us as soon as possible if you have such a situation.

Project Description: You are given a dataset which contains satellite images from Texas after Hurricane Harvey. There are damaged and non-damaged building images organized into respective folders. You can find the project 3 dataset on the course GitHub repository here.

Your goal is to build multiple neural networks based on different architectures to classify images as containing buildings that are either damaged or not damaged. You will evaluate each of the networks you develop and produce and select the “best” network to “deploy”. Note that this is a binary classification problem, where the goal it to classify whether the structure in the image has damage or does not have damage.

Part 1: (3 points) Data preprocessing and visualization

You will need to perform data analysis and pre-processing to prepare the images for training. At a minimum, you should:

Write code to load the data into Python data structures
Investigate the datasets to determine basic attributes of the images
Ensure data is split for training, validation and testing and perform any additional preprocessing (e.g., rescaling, normalization, etc.) so that it can be used for training/evaluation of the neural networks you will build in Part 2.

Part 2: (7 points) Model design, training and evaluation

You will explore different model architectures that we have seen in class, including:

A dense (i.e., fully connected) ANN
The Lenet-5 CNN architecture
Alternate-Lenet-5 CNN architecture, described in the following paper (Table 1, Page 12 of the research paper https://arxiv.org/pdf/1807.01688.pdf, but note that the dataset is not the same as that analyzed in the paper.)

You are free to experiment with different variants on all three architectures above. For example, for the fully connected ANN, feel free to experiment with different numbers of layers and perceptrons. Train and evaluate each model you build,and select the “best” performing model.

Note that the input and output dimensions are fixed, as the inputs (images) and the outputs (labels) have been given. These have important implications for your architecture. Make sure you understand the constraints these impose before beginning to design and implement your networks. Failure to implement these correctly will lead to incorrect architectures and significant penalty on the project grade.

Note: You can also try to run the VGG-16 architecture from class, however, you may run into long runtimes and/or memory limits on the VM. It is also possible, depending on the architecture that you choose, that you could also run into memory constraints with any of the other architectures. If you are hitting memory issues, you can try to decrease the batch_size parameter in the .fit() function, as described in the notes.

Part 3: (7 points) Model inference server and deployment

For the best model built in part 2, persist the trained model to disk so that it can be reconstituted easily. Develop a simple inference server to serve your trained model over HTTP. There should be at least two endpoints:

A model summary endpoint GET /summary providing metadata about the model.

Note: This endpoint must be accept requests to: GET /summary and it must return a JSON response.
An inference endpoint POST /inference that can perform classification on an image.

Note: This endpoint must accept requests to POST /inference. It must accept a binary message payload containing the image without any preprocessing to classify, and it must return a JSON response containing the results of the inference. The JSON response must be a JSON object (not a list) and include a top-level attribute, prediction, with values damage or no_damage. For example: { "prediction": "damage"}. The grader is automated and will be looking for these exact values so be sure to review your code carefully and make sure it conforming to the requirements (and use the grader code, see below).

Note: We are providing you with test code that will call your server (the GET /summary and the POST /inference endpoints) and evaluate the responses to make sure they are in the correct format and the outputs can be processed by our grader. You can find the test grader code with instructions on how to run the code here.

Failure to conform to the correct specification for the inference server will lead to significant penalty on the project grade.

Package your model inference server in a Docker container image and push the image to the Docker Hub. Be sure that the image is an x86 architecture image. Provide instructions for starting and stopping your inference server using the docker-compose file. Provide examples of how to make requests to your inference server. Note: We strongly recommend that you build the image on your class VM, as this is x86 architecture. If you build your image on a Mac, you will create an image in the ARM architecture which will not meet the requirements.

Bonus: We will evaluate each of the model inference servers submitted against a reserved dataset. The top three models will get bonus points as follows:

1st place: 2 points
2nd place: 1 point

Part 4: (5 points) Write a 3 page report summarizing your work. Please keep the report to a maximum of 3 pages, otherwise we will have to deduct points. Be sure to include something about the following:

Data preparation: what did you do? (1 pt)
Model design: which architectures did you explore and what decisions did you make for each? (2 pts)
Model evaluation: what model performed the best? How confident are you in your model? (1 pt)
Model deployment and inference: a brief description of how to deploy/serve your model and how to use the model for inference (this material should also be in the README with examples) (1 pt)

Submission guidelines: Part 1 and Part 2 should be submitted as one notebook file. Part 3 should include a Dockerfile, a docker image (prebuilt and pushed to Docker Hub) and a docker-compose.yml file for starting the container. It should also include a README with instructions for using the container image, docker-compose file and example requests. Part 4 should be submitted as a PDF file.

In-class Project Checkpoint Thursday, November 6th. We will devote the first portion of Thursday’s class to checking in on the project and answering questions.