Project 2 Summary

Grades

Overall the class did an excellent job!

28/34 were 19 or higher!

20+ (14 projects): 21.5, 21, 21, (20, 20), (20, 20), 20, 20, 20, (20, 20), 20, 20, (20, 20), 20, 20
19+ (7 projects): (19.5, 19.5), (19.5, 19.5), (19.5, 19.5), 19.5, 19.5, 19, 19,
18+ (2 projects): (18.5, 18.5), 18
17+ (1 projects): 17

Leader Board

1st place, +2 bonus: 1.00 accuracy
2nd place, +1 bonus: 0.9942 accuracy —- Two groups!
3rd place: 0.9883 accuracy
4th place: 0.9766 accuracy —- Three groups!
5th place: 0.9707 accuracy —- Two groups

General Comments:

Insufficient number of epochs
Docker image was not public on Docker Hub
README did not provide enough or clear instruction
Late submission significantly affected the score.

Overview of the 1st-Place Model

Regarding the 1st-place model in this project, the group not only explored the models required by the assignment but also tested a Swin Transformer (Swin-T), a vision transformer architecture. Unlike CNNs, which capture only local spatial patterns through convolutions, Swin-T leverages hierarchical self-attention to learn both local and global image dependencies, enabling significantly stronger feature representation and higher accuracy.