New Generative AI Course Brings the Cutting Edge to the Caltech Classroom

Class in session. Courtesy Georgia Gkioxari

This year, social media channels were suddenly flush with artistic images generated by the Midjourney software, which builds artworks based on nothing more than a text prompt. Text-generating software such as ChatGPT, meanwhile, can write resumes, compose poetry, and even help doctors communicate with their patients. These kinds of generative artificial intelligence have produced headlines ranging from the anxious (“How Could AI Destroy Humanity”) to the breathlessly enthusiastic (“Why AI Will Save the World”).

But while the world grapples with generative AI as a mainstream phenomenon, that phenomenon is already changing the way researchers do science and develop technology. In recognition of that reality, Caltech has begun offering a unique course on the leading edge of generative AI technology, EE/CNS/CS 148: Large Language and Vision Models. This spring, the course let students peer behind the curtain of generative AI to see how it really works and learn how to make it work for their needs.

“We wanted to bring that technology to Caltech students,” says Georgia Gkioxari, Caltech assistant professor of computing and mathematical sciences and electrical engineering and a William Hurt Scholar, “to teach our students how to think about all these new advances beyond what they're being used for right now to think about how we can extend them for all the scientific applications that people work on here at Caltech.”

Gkioxari taught the course this spring and says she will continue teaching it each spring alongside Professor of Electrical Engineering Pietro Perona, who had long taught an introductory computer vision course. Perona decided to revamp the computer vision offering when Gkioxari joined the Caltech faculty in January, bringing with her industry experience accumulated during a six-year tenure with Facebook/Meta AI.

“I was feeling like I was falling behind and not keeping up with all of these new techniques [in generative AI],” Perona says. “When Georgia came, I said, ‘OK, she’s smart, very energetic, she knows more than I do. If I teach a class with her, I will learn something.’”

What is Generative AI?

AI is an umbrella term and a somewhat colloquial one, according to Gkioxari. She and Perona and their colleagues think of these technologies as forms of machine learning—as software that teaches itself based on large amounts of data rather than artificially intelligent software that in any way “thinks.”

“We would like to achieve true artificial intelligence,” she says. “But in the context of what we’re teaching and what we’re working on as a community today, I would call it machine learning or data-driven statistics.”

Perona agrees: “AI will come when our machines are able to draw inferences and reason in ways that go beyond the training data. We see very little of this happening right now.”

While specialized programs like ChatGPT or DALL-E2 deal with text and images, the technology underlying generative AI is the same: all are based on deep neural networks.

There’s a value in practical application, and then there’s a value in teaching how to think, and I think that this class does both of those. Students start with a vague understanding, and they end in a place where they can read something in a paper on machine learning and implement it themselves.
— Suzanne Stathatos

Although these networks are algorithms running on high-powered computers, they are conceptually inspired by the nerve cells, or neurons, in our brains; each of our neurons receives electrochemical inputs from other neurons. Depending on the number and frequency of these inputs, the electrical charge of the nerve cell can be reversed, almost like flipping a switch, which sends an electrochemical signal to other neurons down the line.

In a deep neural network, there are multiple layers of neuron-like computational units that receive numerical inputs; the units multiply each input by a numerical constant. The units then sum the results and pass those new inputs to the units in the next layer. 

On a basic level, these are simple calculations done on a massive scale, Perona says. “And it turns out that these simple calculations, when you combine them or composite them, achieve very interesting properties.”

Researchers train deep neural networks by presenting them with a vast number of examples, such as images of different types of plants, foods, or animals. The network itself adjusts the weights that help determine the calculations in each layer, with the shifting weights representing the self-learning process of the system.  

Once a network has been trained and users set it upon a task, the weights no longer change, according to Perona. “You input some data and obtain some meaning as an output. For instance, it labels this as a croissant or that as a frog.”

Unlike the human brain, Perona adds, deep neural networks don’t learn and perform at the same time. “Our brain constantly learns,” he says. “But for these machines, there are two phases: One is training or learning, and that's painful, takes a long time, lots of data. Then, once I've trained the network, I use it for a task.”

Training a neural network to recognize objects in an image is something machine learning has gotten good at over the past decade. Generative AI goes a step further: It employs more layers than earlier artificial neural networks, and can recognize terms in a string of text, for example, and then produce an original output in response. Tell DALL-E2 to paint Albert Einstein riding a dinosaur, and it will render an original image even though it wasn’t trained on any images co-mingling dinosaurs or Einstein.

“What you’re asking about was not seen in the training data, and yet it can answer,” Gkioxari says. “That is something that has taken everyone by surprise and has created this new excitement about this technology.”

Caltech’s Leading-Edge AI Curriculum

While the course curriculum for EE/CS 148 is at the graduate level, the class is open to undergraduates, although Perona and Gkioxari tell those students that prior experience with neural networks and programming would be beneficial.

“This class is open to all Caltech students in so far as they are curious and they want to learn something really useful,” Perona says. “We did our best to make it not too difficult to qualify for the class, although the class is fairly advanced.”

That was the experience of Chase Blagden (BS ’23), who studied information and data science while at the Institute. He says he found the pace of the class brisk and the material challenging but totally fascinating.

“I never personally skipped a lecture because it was actually one of the classes I was most looking forward to going to,” Blagden says. “It’s one of the best courses I’ve taken at Caltech.”

The course is organized into roughly four sections. It begins with a review of the basics of neural networks and the programming frameworks necessary to work with them. Students then dive into a type of neural network architecture called a transformer. When applied to large language models, which are deep neural networks trained to recognize patterns in natural language, transformers can power programs like ChatGPT.

In the next section of the class, students use a transformer to build their own small version of a ChatGPT-like program. The final section of the class then pivots to focus on diffusion models, the technology behind the DALL-E2 and Midjourney applications.

During the term, Perona also leads a discussion on ethics in AI, considering questions such as human racial biases becoming expressed in algorithms and who is responsible for the behavior of AI systems, as well as a more general discussion on how to think about new and disruptive technologies.

“Agriculture is an extreme example. The farmers took over the world and the hunter-gatherers basically got wiped out,” Perona says. “Something that is very good for some may not be good for everyone, so you have to think about it.”

Perona remains an optimist and sees a proper understanding of both social issues and AI technology as an opportunity to improve areas in which purely human institutions have fallen short. “It's much easier to fix algorithms than people,” he says. “The principle is: think ahead about what could go right and what could go wrong, what to watch for. Know how to measure effects and then be proactive and be involved, because it can have a big positive impact in society.”

The course’s mix of theory and hands-on experimentation is not intended to simply teach students how to implement a ChatGPT-type program but to understand how it works and develop new ideas based on it, according to Suzanne Stathatos, a second-year PhD student studying with Perona, who served as a teaching assistant for the class.

“There’s a value in practical application, and then there’s a value in teaching how to think, and I think that this class does both of those,” Stathatos says. “Students start with a vague understanding, and they end in a place where they can read something in a paper on machine learning and implement it themselves.”

Beyond the technical skills, Stathatos says, students can learn practical lessons based on Gkioxari’s industry experience.

“One thing that I really liked about the class is Georgia feeds in little quizzes throughout the lecture,” Stathatos says. “She said those are the kinds of questions that appear in interviews for machine learning engineering jobs, so you’re also getting prepared to have this information at the tip of your fingers to do well for an interview for that kind of field.”

Gkioxari and Perona also invited guest lecturers such as Ross Girshick, a research scientist at Facebook AI Research, who led a team that recently developed a large language model called Segment Anything that can be used to identify objects from images.

 “[Segment Anything] is state of the art, and it came out, I think, in the first two weeks of class,” Blagden says. “It was super exciting that they could get someone like Girshick to lecture to the class about this essentially brand-new, weeks-old thing.”

Ultimately, Gkioxari and Perona hope the class will help to disseminate knowledge of machine learning and generative AI techniques throughout the sciences as students go on to pursue different fields of research in their careers.

“It will help our colleagues in science and engineering be more effective, more efficient,” Perona says. “Slowly but surely, AI, machine learning, is becoming a foundational topic in the sciences.”

And that’s a foundation Perona expects graduating students will begin to build upon right away as they go on to academic posts, pursue further studies in the sciences, or enter industry, as Blagden plans to.

“I'm going to be working as a machine learning research engineer at Aurora Flight Sciences, which is a Boeing subsidiary,” he says. “But this class has inspired me. I kind of want to go back to grad school.”

The class culminates in final project proposals, a paper where they describe an AI research project they would like to undertake themselves. Perona and Gkioxari plan to help the students with the most interesting proposals as they begin work on their proposed projects over the summer.

“These students will be able to use what they learned in class to change the world yet again,” Perona says.