Class Introduction
Instructors:
Joe Stubbs, jstubbs@tacc.utexas.edu, ACB 2.246 (on Pickle Research Campus), ASE 5.224 (Main Campus)
Anagha Jamthe, ajamthe@tacc.utexas.edu
Time: Tues/Thur 11:00am - 12:30pm
Location: UTC 1.118
Teaching Assistant: Shukai Cai (shukai.cai@utexas.edu)
Important Links:
Lecture Notes: https://coe379l-fa25.readthedocs.io/en/latest/
Slack: https://tacc-learn.slack.com/ #coe-379l-fa25 (channel)
From the Syllabus
Catalog Description:
Covers design, implementation, operation and assurance of intelligent software systems based on data intensive computing and machine learning techniques. Course materials and assignments will utilize real-world datasets from engineering disciplines.
Prerequisites:
Computational Engineering 332 with a grade of at least C-.
Knowledge, Skills, and Abilities Students Should Have Before Entering This Course:
This course assumes knowledge of the Python programming language (data structures, conditionals, loops, and functions), software design, distributed systems, and asynchronous programming (concurrent programming, tasks queues, microservice architectures, etc). We also assume knowledge of Multivariate Calculus and Linear Algebra as well as a basic, working knowledge of the Linux command line. We will not introduce/cover basic programming skills or debugging. Ultimately, each student is expected to be able to write and debug working Python programs and to have experience designing larger systems as a composition of smaller components. This is not an introductory programming class nor an introductory design class; we will not have time to help students debug their programs.
Knowledge, Skills, and Abilities Students Gain from this Course (Learning Outcomes):
The objective of this course is to introduce students to scalable data analysis and machine learning techniques for designing, implementing, validating and operating responsible intelligent systems. The course covers algorithms, techniques and tools for applying data analytics and machine learning to real-world problems. Through a series of projects spanning the course of the semester, students design and implement responsible, intelligent computational systems. The course strikes a balance between foundational and state-of-the-art techniques.
Class Format
The class will be delivered in-person or following the guidance of the University of Texas. Class meetings will consist of lectures/demonstrations and hands-on labs. Students are expected to attend every lecture and actively participate in the hands-on labs during the class. In some cases, the hands-on portions will provide partial solutions to project assignments. Lecture materials with worked examples will be posted to the class website right before the class meeting. Additionally, there will be a class Slack channel for discussing ideas about the course with your fellow students.
Computer:
The entire course will be computer based. The instructor will provide remote servers for students to work on. Students are expected to have access to a personal / lab computer with a web browser and a terminal and SSH/SCP client. On Thursday we will assign each of you a student VM.
Text: No textbook will be used for this course. However, we will provide materials for all lectures on the class website, and we will often supplement them with additional reading materials available online.
Office Hours:
Office hours will be for 1 hour immediately following the class and/or by appointment. We plan to use Slack for general communications and to help with the materials. https://tacc-learn.slack.com/
Project Assignments
Three multi-week projects assigned throughout the semester will cover the topics presented in the course. On the first two projects, each student must work alone. On the last project, students will be allowed to work alone or with one other student; however, each student group must write up and submit their own solution, including code their own programs and write their own analysis. Students are not allowed to submit duplicates of other students’ work on any of the projects.
Details about the Projects
The first two projects will be centered around a dataset. You will be given a dataset and asked to use the techniques from class to analyze it, build one or more models using the data and assess the model(s) you build. You may also be asked to package your model as an inference server. You will host the code you develop for the projects in a git repository. In addition to the code, you will also be asked to provide a written description of the work. Both projects are likely to be classification projects.
The third and last project will be open ended. You will be free to design an idea for a project that is of interest to you, building upon any of the previous projects. You may incorporate any of the techniques or ideas presented in lecture, and you may also choose to study and incorporate outside ideas, such as ideas from published papers or a new ML technology. You will initially “pitch” your project by writing up a short project proposal. The final project will include a final report (max of 10 pages) and a video presentation (less than 10 minutes) in addition to any “products” to be developed, as described in the proposal.
We will provide more details about each of the projects as the assignment date approaches.
Exam
There will be one in-class exam, to be held on Tuesday, November 4th. All students are required to take the exam at the scheduled time. If you have a conflict with this time, please discuss with us immediately. Unfortunately, we will not be able to offer a makeup exam.
You will not be able to reference any notes or online materials during the exam. However, the exams will be “conceptual”, focusing on your understanding of the foundational concepts. They will not require you to memorize code library APIs, etc. Prior to the exam, we will provide a comprehensive study guide to help you prepare.
Grading
The grade for the course will be based on the attendance, exam, and project grades, as follows:
Attendance - 5%
Project 1 - 20% (Individual project)
Project 2 - 20% (Individual project)
Exam - 25% (Individual, “closed book”)
Project 3 - 30% (Individual or groups of two project)
Attendance
Regular attendance is expected. We will conduct mini “quizzes” during the class via UT Canvas. These will be simple multiple choice questions based on the content we are covering that lecture. No “make up” quizzes will be offered, but the lowest 4 attendance grades will be dropped. For extenuating circumstances, such as a serious illness requiring the student to miss multiple weeks of classes, we will work out an arrangement on a case by case basis.
Other Administrative Matters
DISABILITY & ACCESS (D&A) The university is committed to creating an accessible and inclusive learning environment consistent with university policy and federal and state law. Please let me know if you experience any barriers to learning so I can work with you to ensure you have equal opportunity to participate fully in this course. If you are a student with a disability, or think you may have a disability, and need accommodations please contact Disability & Access (D&A). Please refer to the D&A website for more information: http://diversity.utexas.edu/disability/. If you are already registered with D&A, please deliver your Accommodation Letter to me as early as possible in the semester so we can discuss your approved accommodations and needs in this course.
Special Notes: The University of Texas at Austin provides upon request appropriate academic adjustments for qualified students with disabilities. For more information, contact the Office of the Dean of Students at 471-6259, 471-4641 TDD or the Cockrell School of Engineering Director of Students with Disabilities at 471-4321.
Evaluation: Note that the Measurement and Evaluation Center forms for the Cockrell School of Engineering will be used during the last week of class to evaluate the course and the instructor. They will be conducted in an electronic format for Fall 2025. You may also want to note any other methods of evaluation you plan to employ.
Artificial Intelligence
The creation of artificial intelligence tools for widespread use is an exciting innovation. These tools have both appropriate and inappropriate uses in classwork just as in the “real world”. Learning how best to use these tools is very important. At the same time, learning foundational concepts and ideas remains essential, not least of which because AI tools are fallible, and often times, identifying their mistakes requires expertise.
For these reasons, the use of artificial intelligence tools in this class shall be permitted but with some limits. On each assignment, you will be informed exactly how AI may be utilized. In general, we will ask that you do not use AI tools for any written portions of assignments (e.g., the written reports that are required for the projects). Learning to write is an important skill in life. On the other hand, using AI tools to help you debug code, brainstorm ideas for projects, or study for an exam are excellent uses. We highly recommend you use the UT AI Hub for pre-approved tools, specifically:
UT Spark: UT’s all-in-one AI platform, available for free to all current faculty, staff and students.
UT Sage: A safe, secure AI-powered tutor endorsed for use in UT classrooms.
For each project, we will require a separate “AI Usage” Document that will specify all use of AI that led to any code used in the project. The AI Usage document will be similar to a bibliography, with each entry including:
The prompt used to generate the AI response.
The AI response used in the project.
Within the code (e.g., Jupyter notebook file), we’ll ask that you reference the AI Usage entry number next to the location where the code was added (similar to a bibliography).
Note that using AI tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.
Note also that you will still need to master fundamental concepts as these will be tested on the exam.
Software Design for Responsible Intelligent Systems
In COE 332, we cover software system design concepts for systems that can perform non-trivial data analysis, but we barely scratch the surface of the subject of data analysis itself.
In this course, we are going to cover techniques and technologies for building applications utilizing data analysis and machine learning, specifically.
We will focus more on applications of machine learning, applying the techniques to real datasets, and less on the theoretical basis for the algorithms. However, we will introduce the ideas involved with most of the algorithms we cover, so that you can get a feel for the flavor.
As with COE 332, we will emphasize applications written in the Python programming language. We will make use of a number of open source libraries, including numpy, pandas, matplotlib, seaborn, scikitlearn, tensorflow and keras. This year, we’ll be doing more with transformers and large language models (LLMs) than previously, and we’ll introduce libraries such as transformers and LangChain for those topics.
We’ll assume you know the topics we covered in COE 332, for example:
Python programming and best practices with respect to code organization within a repo.
How to commit and work with code in a git repository.
How to install a package; how to build a Docker image with a package installed.
How to read the documentation for a package and use it in your code.
The basics of HTTP, Docker, flask (for building web APIs)
If you have not taken COE 332, we encourage you to use the COE 332 materials to learn any of the above topics that you are not familiar with.
What is Artificial Intelligence and Machine Learning?
Some people consider the birth of the term “artificial intelligence” to be a summer workshop held at Dartmouth college in 1956, the “Dartmouth Summer Research Project on Artificial Intelligence”. Others say the origins date back to as early as 1940, with efforts at places such as MIT and CMU.
Many definitions have been given; for example, the proposal for the 1956 Dartmouth Workshop states:
The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.
If we look back even just a couple of decades, we see that the field of Artificial Intelligence had already grown into a huge field and encompassed techniques from logic, probability, perception, reasoning, and learning.
Many consider Artificial Intelligence: A Modern Approach by Stuart Russell of UC Berkeley and Peter Norvig, Director of Research at Google, to be the definitive book on AI. It’s topics include:
Search Algorithms
Intelligent Agents
Logical Agents and First Order Logic,
Knowledge Representation (ontologies)
Automated Planning
Uncertainty, Probabilistic Reasoning, and Probabilistic Programming
Multi-agent Decision Making
Machine Learning
Deep Learning
Robotics
Cover of the textbook Artificial Intelligence: A Modern Approach [1]; considered by many to be the definitive resource. The first edition was published in 1995.
As recently as the last decade or so, the Machine Learning and Deep Learning as subfields within AI have taken off. Some say ML is the dominant subfield of AI. These topics will be the focus of our course.
What is Machine Learning?
Machine Learning (ML) is the subfield of AI that develops algorithms to analyze and infer patterns in data.
Here, data is the key word. Instead of using logic, or a search technique, or a formal knowledge representation, ML looks for patterns in exsiting data sets and attempts to apply those patterns to future data.
Why is Machine Learning having so much success now? Two primary reasons:
There is an abundance of data, thanks to the internet, automation and IoT devices.
Computing power has continued to increase so that algorithms that were not tractable a decade ago can now complete in a relatively short amount of time.
And as a result, we are seeing applications of ML to virtually all fields. In this class we will explore datasets and applications from fields including:
Computational Biology and health informatics (e.g., predicting diabetes)
Structural/Civil Engineering (e.g., classifying damage to buildings)
Traditional IT (e.g., spam email classification)
And many more.
With Power Comes Responsibility
While this is undeniably an exciting time for the field, the power to create models that accurately predict outcomes in various fields comes with significant responsibilities. In this class, we will try to highlight some of the important aspects of these responsibilities. We will ask questions such as
How do use data in a responsible way? Do we just throw a bunch of ML algorithms at the data and see what gives us the result we are looking for?
As we train our models, how do we ensure our results are reproducible?
How do we build trust in our models? How do we develop confidence in our models? Is accuracy the only important measure (hint: no)
How do you update an existing model once you a version is running?
What about bias in models? If models reflect patterns in data, and data have bias, won’t our models have bias too?
We’ll look at many of these topics throughout the semester.
Class Schedule
Class Schedule (approximate, subject to change)
Week 1 (Aug 26, 28): Syllabus, Introduction to the course; TACC accounts and Onboarding
Week 2 (Sep 2, 4): Using the Class VM, Jupyter notebooks; Introduction to data analysis, Numpy
Week 3 (Sep 9, 11): Pandas, Matplotlib, Seaborn, Exploratory Data Analysis
Week 4 (Sep 16, 18): Introduction to machine learning, Linear Regression, Linear Classification. Assign Project 1
Week 5 (Sep 23, 25): Linear Classification Cont, Metrics for Model Quality
Week 6 (Sep 30, Oct 2): K-nearest neighbor, cross-validation; Improving Specific Classification Metrics, Decision Trees
Week 7 (Oct 7, 9): Random Forests, Ensemble methods, Project 1 Due; Boosting & Stacking, Model Pipelines
Week 8 (Oct 14, 16): Introduction to Neural Networks and Deep Learning; Introduction to Convolution Neural Networks (CNNs), Assign Project 2
Week 9 (Oct 21, 23): CNNs Cont; MLOps
Week 10 (Oct 28, 39()): MLOps Cont; Catch up and Exam Review Project 2 Due
Week 11 (Nov 4, 6): In-class Exam; Introduction to Transformers Assign Project 3
Week 12 (Not 11, 13): Introduction to Transformers Cont; Hands-on Transformers
Week 13 (Nov 18, 20): Fine-tuning Transformers; Linear Workflows with LLMs and LangChain; Project 3 Proposal Due
Week 14 (Nov 25, 27): Introduction to Retrieval Augmented Generation (RAG); Thanksgiving Break
Week 15 (Dec 2, 4): Special Topics (e.g., Graph Databases; Non-linear Workflows with LLMs; Agentic Architectures; LLM Benchmarks)
Final projects (Project 3) will be due during the Final Exam day for our class; TBD
Before We Leave Class
1. Make sure you have an active TACC account and MFA pairing. You can check the status of your account be logging into the TACC User portal: https://portal.tacc.utexas.edu/
Go to the Account Profile (https://tacc.utexas.edu/portal/account)
If you need help with your account you can submit a ticket: https://tacc.utexas.edu/portal/tickets
Add your TACC account username to the Google doc spreadsheet shared in class.
Send an email to myself and Anagha (jstubbs AND ajamthe AT tacc DOT utexas DOT edu). Include the following:
To: jstubbs, ajamthe @ tacc.utexas.edu
Subject: COE 379L
Body:
Please include the following:
1) Name
2) TACC username
3) EID
4) What do you want to get out of this class?
We will have VMs created for each person enrolled.
Future Classes
Bring your laptop computer to class for each lecture. Next time, we will make sure everyone can connect to their student VM.
Student Responses: Goals for the Course
We’ll update this section with responses from the class.
References and Additional Resources
Russell, Stuart J., Peter. Norvig. Artificial Intelligence: A Modern Approach (4th edition). Pearson 2020, ISBN 9780134610993 .