Stanford CS521 - AI Safety Seminar
Stanford,
Updated On 02 Feb, 19
Stanford,
Updated On 02 Feb, 19
4.1 ( 11 )
Jacob Steinhardt, UC Berkeley
May 18, 2022
Modern ML systems sometimes undergo qualitative shifts in behavior simply by “scaling up” the number of parameters and training examples. Given this, how can we extrapolate the behavior of future ML systems and ensure that they behave safely and are aligned with humans? I’ll argue that we can often study (potential) capabilities of future ML systems through well-controlled experiments run on current systems, and use this as a laboratory for designing alignment techniques. I’ll also discuss some recent work on “medium-term” AI forecasting.
More about the speaker: https://jsteinhardt.stat.berkeley.edu/
For more information about Stanfords Artificial Intelligence professional and graduate programs visit: https://stanford.io/ai
0:00 Introduction
1:07 Rest of Talk
2:53 Reward Hacking: Motivation
3:16 Reward Hacking Example
4:30 Reward Hacking: Example
10:00 Summary of Full Results
16:21 Reward Hacking: Summary
18:10 Making NLP Models Truthful
23:29 Contrastive Representation Clustering
31:19 Results on Unified QA
35:09 Caveat: True Answers Work Too
40:21 Forecasting: Motivation
42:21 Forecasting Competition
44:35 Forecasting Questions
46:36 Summary of Benchmark Forecasts
48:19 Results So Far
49:54 Forecasting: Lessons Learned
51:06 Forecasting Class
#artificialintelligence
Sam
Sep 12, 2018
Excellent course helped me understand topic that i couldn't while attendinfg my college.
Dembe
March 29, 2019
Great course. Thank you very much.