Stanford CS 25 | Transformers United

Content

Since their introduction in 2017, Transformers have revolutionized Natural Language Processing (NLP). Now, Transformers are finding applications all over Deep Learning, be it Computer Vision (CV), Reinforcement Learning (RL), Generative Adversarial Networks (GANs), Speech, or even Biology. Among other things, Transformers have enabled the creation of powerful language models like ChatGPT and GPT-4 and significantly elevated the capabilities and impact of artificial intelligence.

In this seminar, we examine the details of how Transformers work, and dive deep into the different kinds of Transformers and how they're applied in various fields, with a focus on LLMs. We do this through a combination of instructor lectures, guest lectures, and classroom discussions. We will invite people at the forefront of Transformers research across different domains for guest lectures. Prerequisites: Basic knowledge of Deep Learning (should understand attention) or CS224N/CS231N/CS230. Please enroll for the course on Axess.

Logistics

Lectures are on Tuesdays from 10:30AM - 11:50AM Pacific Time in McCullough 115
Zoom: Link (Password: 252525; Note: Only works for those with Stanford email addresses)
Attendance: Following each lecture, submit a response to our Google Form.
Auditing: To audit, please join lectures using the Zoom link (if you are a Stanford student/staff/faculty). No need to email or contact us beforehand.
Discord: Join the class Discord server here for general discussion on Transformers and related topics!
Contact: If you have any questions about the course, contact us at cs25-aut2324-staff@lists.stanford.edu.

Recordings

Talks from previous quarters can be found here.

Previous Iterations

V1 (Fall 2021)
V2 (Winter 2023)

Instructors

Faculty Advisor

Schedule

The current class schedule is below (subject to change)

Date	Title	Description
Sep 26	CANCELLED	Seminars will begin on October 3rd!
Oct 3	Llama 2: Open Foundation and Fine-Tuned Chat Models Speaker: Sharan Narang, Meta AI	Meta AI recently released Lllama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. The fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. These Llama 2 models outperform open-source chat models on most benchmarks we tested and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. In this talk, Sharan will describe the process for creating the improved LLaMA-2 family of LLMs and the techniques we used to convert these to high-quality, safe dialogue agents.
Oct 10	Low-level Embodied Intelligence with Foundation Models Speaker: Fei Xia, Google Deepmind	This talk introduces two novel approaches to low-level embodied intelligence through integrating large language models (LLMs) with robotics, focusing on "Language to Reward" and "Robotics Transformer-2". The former employs LLMs to generate reward code, creating a bridge between high-level language instructions and low-level robotic actions. This method allows for real-time user interaction, efficiently controlling robotic arms for various tasks and outperforming baseline methodologies. "Robotics Transformer-2" integrates advanced vision-language models with robotic control by co-fine-tuning on robotic trajectory data and extensive web-based vision-language tasks, resulting in the robust RT-2 model which exhibits strong generalization capabilities. This approach allows robots to execute untrained commands and efficiently perform multi-stage semantic reasoning tasks, exemplifying significant advancements in contextual understanding and response to user commands. These projects demonstrate that language models can extend beyond their conventional domain of high-level reasoning tasks, playing a crucial role not only in interpreting and generating instructions but also in the nuanced generation of low-level robotic actions. Recommended Reading: Language to Rewards for Robotic Skill Synthesis RT-2: Vision-Language-Action Models
Oct 17	CANCELLED
Oct 24	Generalist Agents in Open-Ended Worlds Speaker: Jim Fan, NVIDIA AI	Everything that moves will eventually be autonomous. In this talk, we will discuss the principles of how to build generalist agents, combine the power of LLM with low-level control, and apply them to open-ended tasks in Minecraft and robotics.
Oct 31	Recipe for Training Helpful Chatbots Speaker: Nazneen Rajani, HuggingFace	There has been a slew of work in training helpful conversational agents using Large language models (LLMs). These models draw upon diverse datasets, including open-source repositories, private data, and even synthetic data generated from LLMs like GPT-4. However, curating datasets for supervised fine-tuning involves critical decisions, such as defining task distributions, data volume, prompt length, and more. While prior research underscores the importance of data quality, the nuanced impact of these various dataset factors on model performance remains unclear. In this talk, I’ll present our approach for data curation for supervised fine-tuning and Reinforcement Learning for Human Feedback (RLHF) in the context of training helpful chatbots. Next, I will delve into the results of experiments that illuminate the nuanced effects of different dataset attributes on the training process of helpfulness in chatbots. Finally, I will provide an overview of the current state of chatbot evaluation methodologies and highlight the existing challenges that shape this evolving field.
Nov 7	How I Learned to Stop Worrying and Love the Transformer Speaker: Ashish Vaswani	Ashish will present the motivations behind the Transformer and how it's evolved over the years. He will conclude with a few useful research directions. Ashish Vaswani is a computer scientist working in deep learning, who is known for his significant contributions to the field of artificial intelligence (AI) and natural language processing (NLP). He is one of the co-authors of the seminal paper "Attention is All You Need" which introduced the Transformer model. He was also a co-founder of Adept AI Labs and a former staff research scientist at Google Brain.
Nov 14	No Language Left Behind: Scaling Human-Centered Machine Translation Speaker: Angela Fan, Meta AI	Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high-quality results, all while keeping ethical considerations in mind? In this talk, I introduce No Language Left Behind, an initiative to break language barriers for low-resource languages. In No Language Left Behind, we took on the low-resource language translation challenge by first contextualizing the need for translation support through exploratory interviews with native speakers. Then, we created datasets and models aimed at narrowing the performance gap between low and high-resource languages. We proposed multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. Critically, we evaluated the performance of over 40,000 different translation directions using a human-translated benchmark, Flores-200, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety. Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art, laying important groundwork towards realizing a universal translation system in an open-source manner.
Nov 21	CANCELLED : Thanksgiving!	Enjoy the break!
Nov 28	Going Beyond LLMs: Agents, Emergent Abilities, Intermediate-Guided Reasoning, BabyLM Speaker: Instructors	In this talk, we will explore cutting-edge topics in the realm of AI, particularly focusing on going beyond a single monolithic Large Language Model (LLM) to Autonomous Agentic AI Systems, as well as discussing the emergent abilities of LLMs as they scale up. Further, there is discussion about different approaches for LLM intermediate-guided reasoning: methods of breaking down the reasoning process for text generation to arrive at a final answer (e.g. into a series of steps, such as chain-of-thought). Additionally, the talk will delve into a concept known as BabyLM, aimed at creating small yet highly efficient language models that can learn on similar amounts of training data as human children. This talk will not only highlight the technical aspects of these developments but also discuss the ethical implications and future prospects of AI in our increasingly digital world.
Dec 5	Retrieval Augmented Language Models Speaker: Douwe Kiela, Contextual AI	Language models have led to amazing progress, but they also have important shortcomings. One solution for many of these shortcomings is retrieval augmentation. I will introduce the topic, survey recent literature on retrieval augmented language models and finish with some of the main open questions.