Services 4 — Zurich AI Alignment

What?

Note: The AI Safety Fundamentals Programme will be run again in Fall 2025. Stay tuned!

The AI Safety fundamentals programme is designed to make the space of AI alignment and governance of risks from advanced AI more accessible. Since the field of AI safety is so new, there are no textbooks or widespread university courses, we are looking to supplement and fill this gap.

In this programme, we bring together experts and knowledgeable facilitators with participants, to discuss 6 weeks of curated reading that together introduces the field. The programme involves introduction lectures, talks from guest speakers in the alignment space, and active weekly discussions with your group (4-5 people) and an experienced facilitator. Our curriculum is largely adapted from analogous programs run at Cambridge University and Harvard University.

Who?

Anyone with an interest in ensuring the development of AI happens in a safe manner, whether through a technical perspective, or policy perspective, is welcome to apply. Please note that as our curriculum is targeted at technical AI safety, a basic knowledge of Machine Learning techniques is useful, but not strictly necessary! We have had several previous applicants who have successfully completed the program despite having no previous knowledge. You have the opportunity to indicate your level of knowledge on the application form so that we can match you with people with similar backgrounds.

When?

Introduction evening: Wednesday, March 27, 18:30 Sign up now!

Application period: March 27 - April 5 (Apply here!)

Programme duration: 6 weeks, from April 15 - May 24. Each group has 1 meeting per week. The exact dates are determined for each group individually.

Curriculum

The programme curriculum consists of 6 weeks of readings and facilitated discussions. Participants are divided into groups of 4-6 people, matched based on their prior knowledge about ML and safety. (No background machine learning knowledge is strictly required, but participants will be expected to have some fluency in basic statistics and mathematical notation.)

Each week, each group and their discussion facilitator will meet for 1.5 hours to discuss the readings and exercises. Broadly speaking, the first half of the course explores the motivations and arguments underpinning the field of AI safety, while the second half focuses on proposals for technical solutions. After the last week, participants will have the chance to explore a topic of their choice in depth if they wish, and/or join the ZAIA biweekly reading group.

Overview

The main focus each week will be on the core readings and one exercise of your choice out of the exercises listed, for which you should allocate around 2-3 hours preparation time. Most people find some concepts from the readings confusing, but that’s totally fine — resolving those uncertainties is what the discussion groups are for. Approximate times taken to read each piece in depth are listed next to them. Note that in some cases only a small section of the linked reading is assigned. In several cases, blog posts about machine learning papers are listed instead of the papers themselves; you’re only expected to read the blog posts, but for those with strong ML backgrounds reading the paper versions might be worthwhile.

If you’ve already read some of the core readings, or want to learn more about the topic, then the further readings are recommended; see the notes for descriptions of them. However, none of them are compulsory. Also, you don’t need to think about the discussion prompts in advance - they’re just for reference during the discussion session

Expectations

Key Topics

Introduction to Machine Learning

How do ML models today produce outputs?

This week focuses on the foundational concepts in machine learning you'll need to know to understand the rest of the course. Those already adept with machine learning may be able to just skim this week's exercises to check they're familiar with the content.

AI and the Years ahead

How could AI transform society over the next few decades?

AI capabilities have rapidly advanced in recent years, across all types of models, and this trend looks likely to continue. This might bring about transformative AI: systems that would bring us into a new, qualitatively different future. In addition, in the short-term it looks likely that existing AI systems will be deployed much more broadly.

What is AI Safety

What do we need to do to ensure AI systems do what we want, and why is this difficult?

In the last session we explored the potential impacts of transformative AI. In this session, we will explore the challenges involved in ensuring those impacts are positive, or we at least avoid the most negative impacts of powerful AI systems.

Reinforcement Learning from human (or AI) feedback

Why do AI systems today mostly do what we want?

Last week we looked at the difficulties of getting very powerful AI systems to do what we want. Despite this, we do have many AI systems today that do seem to try to do what we want.

In this session, we’ll dive into the main way today's AI systems achieve this: using Reinforcement Learning from Human Feedback (RLHF)

04

Scalable Oversight

How might we scale human feedback for more powerful and complex models?

In the previous sessions, we explored the challenge of AI alignment and how RLHF is used to tame today's language models. However, we also learnt RLHF is has many limitations: one of the biggest obstacles being that it can be hard for humans to accurately judge complex tasks. This contributes to problems like sycophancy, deception and hallucinations.

Mechanistic Interpretability

05

How might we understand what’s going on inside an AI model?

As modern AI systems grow more capable, there is an increasing need to make their internal reasoning and decision making more interpretable and transparent. This week we'll look at how people are analyzing models' learned representations and weights through methods like circuit analysis, and teasing apart behaviours like superposition with dictionary learning

06

Technical governance approaches

How might we measure and mitigate the risks of deploying AI models?

If we are able to create extremely powerful AI systems without solving the alignment problem, rigorous technical governance approaches will be critical for mitigating the risks posed by these AI systems. Pre-deployment testing, AI control techniques, and global coordination approaches (such as pausing) could all play a role in limiting dangerous AI capabilities.

Resource

The full curriculum we will be following is given here:

AI Safety Fundamentals website

Note that this is the standard baseline curriculum, and your group and facilitator may wish to skip and/or supplement the suggested readings depending on your skill level & interests.

Interested in Facilitating?

Name

First Name

Last Name

Email

Subject

Message

Are you interested in leading discussions on AI alignment and helping to build the AI safety community? We’d love to have you join our team of facilitators. In general, our facilitators:

Have a solid background in a STEM-related field, or have done independent research/projects in ML
Most of our facilitators are studying in an MSc or PhD, however, this is not a requirement
Have solid communication skills, able to motivate discussions and participants to engage
Approachable, friendly, and able to encourage participants to ask questions

Please fill out the form below and we will get in touch regarding facilitating opportunities.

The AI Safety Fundamentals Programme.

A 6-week programme that introduces you to the fundamentals of AI safety and current issues.