Project 2 Summary
Grades
Overall the class did an excellent job!
28/34 were 19 or higher!
20+ (14 projects): 21.5, 21, 21, (20, 20), (20, 20), 20, 20, 20, (20, 20), 20, 20, (20, 20), 20, 20
19+ (7 projects): (19.5, 19.5), (19.5, 19.5), (19.5, 19.5), 19.5, 19.5, 19, 19,
18+ (2 projects): (18.5, 18.5), 18
17+ (1 projects): 17
Leader Board
1st place, +2 bonus: 1.00 accuracy
2nd place, +1 bonus: 0.9942 accuracy —- Two groups!
3rd place: 0.9883 accuracy
4th place: 0.9766 accuracy —- Three groups!
5th place: 0.9707 accuracy —- Two groups
General Comments:
Insufficient number of epochs
Docker image was not public on Docker Hub
README did not provide enough or clear instruction
Late submission significantly affected the score.
Overview of the 1st-Place Model
Regarding the 1st-place model in this project, the group not only explored the models required by the assignment but also tested a Swin Transformer (Swin-T), a vision transformer architecture. Unlike CNNs, which capture only local spatial patterns through convolutions, Swin-T leverages hierarchical self-attention to learn both local and global image dependencies, enabling significantly stronger feature representation and higher accuracy.