Alignment Research Engineer Accelerator.
An ML engineering program that provides the skills, tools, and environment for upskilling in technical AI safety.
What?
ARENA (Alignment Research Engineer Accelerator) is an ML boot camp focused on AI safety. The goal of the program is to develop skills in ML engineering and general software engineering practices. Participants work through ML programming exercises spanning deep learning fundamentals, transformers, mechanistic interpretability, reinforcement learning, and LLM evaluations.
Since admission to the official program is very restricted, we are planning to run a (reduced) local instance of the ARENA program in Zürich this semester! In particular, we will run bi-weekly meetings (in-person and remotely) on Saturdays from 10 am until 5 pm, where people work on the material autonomously and can discuss the material and any questions
Who?
We welcome applicants who care about AI safety and have some experience in math and coding skills (particularly in Python). The program is open to individuals at various experience levels, from students to graduates. It is particularly suitable for those interested in leveling up their coding/ML skills and who are preparing for research engineering roles in prominent AI safety organizations.
When
& Where?
During the spring semester of 2025, we plan to run biweekly meetings (in-person and remotely) on Saturdays from 10 am until 5 pm. The exact dates will soon be communicated.
Curriculum
Chapter 0 - Fundamentals
Before embarking on this curriculum, it is necessary to understand the basics of deep learning, including basic machine learning terminology, what neural networks are, and how to train them.
In this chapter, you’ll learn about some coding best practices, become familiar with the PyTorch library, and build & train your own neural networks (CNNs and ResNets).
You can find the actual content at this page.
Chapter 1 - Transformers & Mechanistic Interpretability
The transformer is an important neural network architecture used for language modeling, and it has made headlines with the introduction of models like ChatGPT.
In this chapter, you will learn all about transformers, and build and train your own. You’ll also learn about Mechanistic Interpretability of transformers, a field which has been advanced by Anthropic’s transformer circuits sequence, and work by Neel Nanda.
You can find the actual content at this page.
Chapter 2 - Reinforcement Learning
Reinforcement learning is an important field of machine learning. It works by teaching agents to take actions in an environment to maximize their accumulated reward.
In this chapter, you will be learning about some of the fundamentals of RL, and working with OpenAI’s Gym environment to run your own experiments. You’ll also learn about Reinforcement Learning from Human Feedback, and apply it to the transformers you trained in the previous section.
You can find the actual content at this page.
Chapter 3 - LLM Evaluations
You will learn how to evaluate LLMs. We'll take you through the process of building a multiple choice benchmark from scratch and using this to evaluate current models. We'll then move on to study LM agents: how to build them and how to evaluate them.
You can find the actual content at this page.