Full Transcript

·YouTLDR

OpenAI’s Chief Scientist on Continual Learning Hype, RL Beyond Code, & Future Alignment Directions

58:4713,171 words · ~66 min readEnglishTranscribed Apr 11, 2026
0:00

I definitely agree that continual

0:01

learning is really the thing. It's

0:03

really the thing that we're building.

0:04

But I don't really think this is like a

0:05

problem that's ignored and and off the

0:07

path of what we're doing currently. I

0:08

think it is what we're working toward.

0:09

>> What are like the other research areas

0:11

within alignment that you're paying

0:12

attention to or that you think are

0:13

promising?

0:13

>> A lot of the like longerterm challenge

0:15

with alignment is about generalization.

0:18

What are the values that the model falls

0:19

back on?

0:20

>> What are the things that you need to

0:21

figure out to be able to really make

0:23

models work well in some of these other

0:25

spaces?

0:25

>> I come back to this.

0:26

>> Akopi is the chief scientist of OpenAI.

0:29

I think literally one of the most

0:30

important people on the planet. And

0:32

today on Unsupervised Learning, I got to

0:34

ask him literally everything that I've

0:36

been thinking about and I know a bunch

0:38

of people in the ecosystem have too. We

0:40

talked a lot about model progress,

0:42

what's required to make longrunning

0:43

agents work, as well as the really

0:45

interesting work Open AI has done in the

0:46

AI for science world and the progress he

0:49

sees in that over the next years. We

0:51

talked a lot about how companies should

0:52

be thinking about model building in this

0:53

moment, when they should be doing

0:54

reinforcement learning, how they should

0:56

be thinking about the evolution of

0:57

harnesses and the impact that will have.

0:59

We hit on a lot of his really

1:00

interesting research, including the work

1:02

he's done around alignment, the work

1:04

that OpenAI broadly has done around math

1:06

competitions. And we also talked about

1:07

this focusing moment in OpenAI and what

1:09

it means for the research organization

1:10

and how he runs his team. literally just

1:13

such an awesome opportunity to talk to

1:15

someone who is driving so much of the

1:18

change that has revolutionized this

1:20

space in the world. I hope folks enjoy

1:21

this wide-ranging conversation as much

1:23

as I did.

1:26

I feel like you are the perfect person

1:29

to talk to about all the questions

1:30

everyone has in the ecosystem. Uh what's

1:32

you know happening with model progress.

1:34

A lot of companies are thinking about

1:36

how they should be building things based

1:37

on what's happening with the models. A

1:39

lot of people at a societal level are

1:40

thinking about the impact AI is going to

1:42

have on science and broader society. Uh

1:44

and you've been at the forefront of the

1:46

space for pretty much every generation

1:48

of uh of improvement uh these past years

1:50

and so really excited to have you on the

1:52

podcast.

1:52

>> Happy to be here.

1:53

>> I think I'll start with one of the mo

1:54

the juiciest things you said which is

1:56

you know four months ago I think you and

1:58

the open team talked about aiming for a

2:00

system with research level intern

2:02

capabilities by September of this year.

2:05

So coming up uh I think that's uh what 6

2:07

months from now. and then a more fully

2:09

automated AI researcher by March 2028.

2:12

And so I guess you know checking in four

2:13

months later, how are you feeling about

2:15

those timelines?

2:16

>> Yeah, I think you know over I think over

2:18

over the last months I think like the

2:21

change that really happened is we've

2:22

seen this explosive growth of coding

2:25

tools.

2:26

>> Yeah.

2:26

>> Um

2:27

>> it's an understatement. Yeah, we've

2:29

definitely like really kind of gone um

2:32

to a place uh in OpenAI where we use

2:34

Codex for the um for the majority of um

2:38

you know actual coding. Um and so I

2:42

think I think for most people like the

2:43

kind of the act of programming has has

2:45

has changed quite a bit. Um

2:49

so I definitely see this as a signal

2:50

that like you know something here is on

2:52

track. The other kind of like very

2:54

interesting update over the last few

2:55

months to me has been the progress on

2:57

the math research capabilities. Uh also

3:01

the results we've kind of seen in

3:03

physics in other fields. I think I think

3:05

this kind of level of capability this

3:07

level of like ability to provide insight

3:09

when combined with

3:11

ability to access infrastructure ability

3:13

to use maybe uh more computed test time

3:16

that's something that cod is using

3:17

currently uh and very strong improvement

3:21

in general intelligence which I also

3:23

expect over over the next couple of

3:24

months. Yeah, it's something we're still

3:27

very much planning for and very focused

3:29

on.

3:29

>> And how do you like know when you've

3:31

you've gotten there? like what's like a

3:32

a workflow you might look to to say hey

3:34

okay I think we've got these you know

3:36

research intern level capabilities

3:38

>> the the way I would distinguish you know

3:39

a research intern from from full

3:42

automated researcher uh is um the kind

3:45

of span of time that that we would have

3:48

it work um mostly autonomously or the

3:51

kind of like specificity of the task

3:53

that has to be given so I don't expect

3:55

uh you know we'll have systems where you

3:56

kind of just tell them oh like you know

3:58

go improve your model capability go

4:00

solve align ignment uh and you know and

4:03

they will do it not this year you know I

4:05

think we might get there at some point

4:07

uh but I think for like more specific

4:09

technical ideas like I I have this

4:11

particular idea how to improve the

4:12

models how to like you know run this

4:13

evaluation differently I think I think

4:16

we have the pieces that we mostly just

4:18

need to put together Carpathy released

4:20

you know a pretty viral version of of uh

4:23

using some of these models to you know

4:24

improve some of his uh you know

4:26

obviously way less complex models than

4:28

what you guys are building here but did

4:30

that feel like generally in this uh you

4:32

know in the spirit of of uh some of what

4:34

these tools might look like.

4:36

>> Yeah, I think it's in the spirit. Yeah,

4:37

I mean I I I expect it to look like a

4:39

pretty continual evolution uh from kind

4:43

of where Codex is now. I think towards a

4:46

bit more autonomy uh running for a

4:48

longer time. Um but yeah, I I think I

4:52

think we'll see a lot of this sort of

4:53

application. I think in general we'll

4:54

see we'll see like more autonomous and

4:57

higher compute use of these models for

4:58

different things. you mentioned kind of

5:00

like the math and physics side and

5:01

obviously you've had these really

5:03

impressive breakthroughs uh in math on

5:05

you know uh some interesting like

5:06

different kinds of competition uh you

5:08

know problems maybe you know I think for

5:11

our listeners it like intuitively makes

5:12

sense how progress in coding directly

5:14

translates to something like you know

5:16

helping with AI research how does like

5:18

math and physics progress like also tie

5:20

into this

5:21

>> the the biggest role that like u you

5:23

know focusing on these math benchmarks

5:25

has played for us as as a general yeah

5:28

like benchmark and and and and a

5:31

northstar for like how to improve this

5:32

technology. Like math is very

5:34

measurable, right? It's much easier to

5:36

tell whether you've actually solved the

5:38

math problem than whether you've even

5:39

like produced a good uh you know piece

5:41

of software and also it can get very

5:43

hard right so you can have things where

5:44

like it's very definite whether you've

5:46

solved them but it can be like

5:47

arbitrarily pretty much hard to to

5:48

actually solve them. You know, I would

5:50

say like up until not too long ago like

5:53

um you know, my perspective has been

5:54

like well okay like we you know our

5:56

models are not you know maybe able to

5:58

solve like simple math problems. Okay,

5:59

our models are able to solve simple

6:00

enough problems but are not able to

6:02

solve like IMO level problem. So clearly

6:03

there is just like a gap in just like

6:05

this uh you know intelligence of these

6:07

models that like that that is very

6:09

measurable very

6:11

you know very easy to run at. It's very

6:13

clear what we need to do and you know

6:16

and this has be kind of our northstar

6:17

for like reasoning models and so forth.

6:19

Now of course um that is changing quite

6:22

a bit right and we are um you know we

6:24

have kind of reached these milestones

6:26

that we've been working towards of like

6:29

yeah IMO goals level solving IMO problem

6:31

six and you know and making forests into

6:34

research level mathematics

6:36

um and you know from this point I think

6:38

I think there still is uh you know there

6:41

definitely still is utility like

6:43

continuing to measure progress on this I

6:45

think there's also like you know there's

6:47

definitely like transfer that that you

6:49

can get from like getting better at

6:50

mathematical reasoning to getting better

6:52

at AI research. You know, a lot of our

6:55

uh best researchers uh are uh you know

6:58

mathematicians we're training or from

6:59

other kind of theoretical fields. But

7:01

definitely we are uh you know we are

7:03

very much uh changing how we think about

7:09

you know these nerf stars and we are

7:10

very focused on how the models the next

7:14

models that we're producing are actually

7:16

useful in the real world you know useful

7:19

you know especially for a research but

7:20

also for other kind of economically

7:22

valuable activities and for other uh

7:24

fields of science uh and especially

7:27

maybe more applied sciences. And the

7:29

reason for this shift is because we

7:30

believe the models are now capable

7:32

enough, not as smart as people and

7:34

always, but capable enough to actually

7:37

materially change the economy, change

7:39

how things are done. And so, uh, yeah,

7:41

we feel a lot of urgency about that.

7:43

>> In the early days, uh, picking a domain

7:45

like math that is so, uh, hard to solve,

7:48

but then easily to verify whether you

7:49

did it, like it's kind of the the

7:50

perfect place to get started. And I

7:51

think code obviously shares a lot of

7:53

attributes to that. You know, uh

7:55

possible to check uh and verify and

7:57

great for reinforcement learning. I

7:59

think one question that a lot of people

8:00

are are thinking about is okay, we've

8:02

seen reinforcement learning work

8:04

incredibly well in these domains where

8:05

you can verify it rather easily. A lot

8:07

of, you know, valuable tasks in the

8:09

world, medicine, law, finance, you know,

8:12

there's some level of of the ability to

8:13

do that, but it's certainly not to the

8:14

same extent that math and code are. And

8:17

so I think a lot of people are trying to

8:19

figure out, you know, are we going to

8:20

see similar improvements? You know,

8:22

obviously code and math the the rates of

8:24

improvement have been so astronomical

8:25

and shocking.

8:26

>> Yeah, I definitely expect so. Um I think

8:29

an interesting duality that we think

8:31

about a lot is um you know for this more

8:35

general task for these tasks are kind of

8:37

harder to evaluate. They share a lot lot

8:39

of common uh commonalities with um just

8:43

longer horizon tasks, right? Because if

8:45

you think about even like a very well

8:46

specified math or coding problem again

8:48

like if it's it's something that you

8:50

need to work on for like a year then uh

8:52

you know even it's very clear what the

8:54

criteria of success are in the long term

8:56

like what to do on your first day of

8:58

working on it is a pretty open-ended

9:00

problem. Yeah. And so I I kind of

9:02

believe this these difficulties coincide

9:04

and they're very clearly the next the

9:06

next frontier

9:08

uh for for how these systems develop.

9:10

And I think we've definitely seen very

9:12

encouraging signs both on just like our

9:13

ability to scale RL on these more

9:15

general domains. And I I think also like

9:18

we can we can scale um

9:21

efforts that that that that that's a lot

9:23

of promise.

9:24

>> In these other domains, it feels like

9:27

one of the hardest things to know is

9:28

just what was success in a task, right?

9:30

And you can imagine you know there's

9:32

going to be you know whatever the

9:34

problems you are that are facing code of

9:35

math that are short-term tasks and then

9:37

longerterm tasks feels it will be

9:39

amplified in the space that is you know

9:41

outside of those right where a

9:42

short-term uh legal task or medical task

9:44

may be harder to run thousands of

9:46

iterations on right and figure out you

9:48

know was that done correctly and then

9:50

those longer term tasks like even harder

9:52

I'm curious like how you even

9:53

conceptualize that research challenge

9:55

like what are the things that need to be

9:57

that you need to figure out to be able

9:58

to to really make models work well in

10:01

some of these other spaces.

10:02

>> Yeah, I think I think I I come back to

10:04

this reality of like how do we make the

10:06

models work for a very long time and how

10:09

do we teach them to evaluate kind of

10:11

partial progress.

10:12

>> Yeah. I mean I think if if you look at

10:14

like even outside of RL like like where

10:17

that sort of progress on longer horizons

10:19

is coming from right like I mean as the

10:21

models kind of become more consistent

10:24

from just like pure supervision in

10:25

pre-training

10:27

um they uh they gain some idea of like

10:30

you know oh what what does like a good

10:32

partial artifact here look like and so I

10:33

think I think even if we weren't like

10:36

scaling RL very meaningfully we would

10:38

see an alongation of these horizons over

10:40

time yeah it's definitely um you know a

10:44

research challenge to like to figure out

10:46

how to like leverage this new ideas from

10:48

RL and so forth to to apply this to

10:50

general domains. But I'm quite

10:51

optimistic about that.

10:52

>> Yeah. And it's interesting. It sounds

10:53

like part of your mental model is like

10:55

the models themselves being able to

10:56

check progress with some at some sort of

10:58

cadence that is, you know, reliable

11:00

enough from the outside at least. It's

11:01

not totally clear if we've seen like

11:03

generalization in RL yet. feels like we

11:05

yeah clearly you seem to have some

11:06

techniques that really optimize models

11:08

around whatever we choose to focus on

11:10

but it's like almost feels like an older

11:12

school version of of ML of like one one

11:14

thing at a time is that like you know I

11:17

guess would you agree with that

11:18

characterization and like you know how

11:19

do you kind of see this this current

11:21

climate

11:21

>> well we are buying a lot of compute

11:25

right because we we don't I mean we

11:29

still believe a bit less and we believe

11:31

you know more than ever to some degree

11:33

yeah we've seen you know, new techniques

11:35

and I think new ways to scale, but like

11:38

that that is kind of the the lens

11:39

through which we've been viewing things.

11:41

Yeah, I think there is a certain amount

11:44

of

11:46

complexity that

11:49

we needs to grapple with and kind of

11:51

everyone needs to grapple with because,

11:53

you know, we're no longer really like

11:56

purely building like um um you know,

12:00

brain the sky that's completely isolated

12:02

from the real world, right? Like if you

12:04

actually you know if you want this model

12:06

to do like medical research if you want

12:08

it to cure cancer at some point it needs

12:10

to like learn about the real world is a

12:12

meaningful way you know maybe conduct

12:13

some experiment and learn from its

12:15

results and for that you you need to

12:17

figure out how to actually connect it

12:19

right and that is going to involve

12:20

something that is yeah that that goes in

12:22

the direction you described but I I

12:24

don't think that goes counter to

12:26

actually scaling the the like finding

12:29

and scaling the simple algorithms that

12:31

that we've been developing. I feel like

12:32

I talk to a lot of companies and like I

12:34

one of the main questions everyone seems

12:36

to be asking these days is like should

12:38

we be doing you know our own

12:40

reinforcement learning like take an open

12:42

source model and like we have some data

12:44

on a task that people do. um we have

12:46

evals cuz we know our domain pretty well

12:48

like is this something that makes sense

12:50

for us to do or like should we just wait

12:52

for the models to continue to get better

12:53

at at some of these things. you know

12:54

what advice would you guess would you

12:56

give for like the many builders that

12:57

listen to the podcast as they think

12:59

through you know uh the extent to which

13:00

they invest on the on the reinforcement

13:02

learning side reinforcement learning

13:04

definitely can be a very data efficient

13:06

way to like really improve the model as

13:08

some sort of task right there is a much

13:10

more data efficient way of learning that

13:12

we know right which is like learning in

13:14

context right and this is maybe the most

13:16

fundamental way that people you know

13:17

teach these models you just prompt them

13:19

with like examples with with with

13:20

instructions for what you want I expect

13:22

that learning is going to get much

13:25

better over time. And so I think it

13:27

definitely really matters that the

13:28

models can adapt to your context. They

13:31

can adapt adapt to kind of the the kind

13:33

of tasks you care about. So I think that

13:34

will be very important. I'm not sure if

13:36

like you know replicating the kind of

13:38

current a pipeline is going to be like

13:41

the right way to go about it. But yeah,

13:43

it's definitely a problem that that

13:45

we're thinking about.

13:46

>> Yeah. So it's almost like yeah you still

13:47

have to do the work like you still

13:48

should you know figure out what the eval

13:49

are that matter gather the data the

13:51

examples but like it may just turn out

13:52

in the future you're far better off just

13:54

feeding that into this context than

13:55

trying to like do anything on on you

13:57

know your own model. Yeah, I think I

13:59

think that's quite plausible. And I

14:01

think that like you know obviously

14:03

people have seen the success of of tools

14:04

like Codex which I know you know you've

14:06

obviously been a key part of and um and

14:08

wondered like you know hey do we need to

14:10

build like our own kind of you know

14:12

should we build our own harnesses or our

14:13

own ways of of using these things or you

14:16

know uh for for our own domains whether

14:18

it's like you know uh legal or finance

14:21

or or healthcare or do we kind of just

14:22

like take the harnesses that the large

14:24

models do um and and kind of use them

14:27

within you know with with the context

14:29

that we have. uh any any thoughts around

14:32

like that

14:32

>> like the implementation of the harness

14:34

shouldn't really be a limitation for a

14:35

very long time. I think we'll be able to

14:37

get like much more general harnesses

14:39

that people can use for uh for all sorts

14:42

of other domains. I mean I think codex

14:43

is pretty good actually if you try using

14:45

it for things beyond coding.

14:46

>> That's so interesting. Like a much more

14:47

general harness being something that's

14:49

almost like uh adaptive to or like just

14:51

works across whatever the you know

14:53

specific set of tools you have in your

14:55

domain or specific set of things you

14:56

want to expose to the model.

14:59

>> Yeah. I mean I I think and you know I

15:01

think it's also worth thinking about

15:02

like you know why like you know what

15:04

what what is kind of the kind of

15:07

ultimate interface that we want to

15:08

interact to the model with. So, so the

15:09

model gives some the models gives some

15:11

UI hard forensicness, right? They can

15:12

build their own UIs. They can kind of do

15:14

things that uh you know people would

15:17

find very timeconuming. Um but I yeah I

15:20

definitely think there is also just like

15:22

a lot of space to kind of enable the

15:24

models to access like the current

15:26

interfaces that we use for for people

15:27

right. So I think like we want to have

15:30

um

15:32

um you know AIs on Slack for example or

15:35

that that are kind of plugged into our

15:36

our context and uh and yeah and are able

15:40

to to learn from it and a able to kind

15:42

of yeah to realize this existing things

15:44

right so definitely like there is some

15:48

meet in the middle here but definitely I

15:49

believe like longterm like uh you know

15:52

like by default the AI should kind of

15:54

meet you where where you are uh and if

15:57

Not that would be because it kind of it

15:59

has new abilities, not because it has

16:00

limitations.

16:01

>> Yeah, it's an interesting point that

16:02

basically today it feels like these

16:04

harnesses are so bespoke to certain

16:05

environments, but like over time as you

16:07

add more and more skills and tools and

16:09

models can navigate uh across those

16:11

effectively, it's like there just be a

16:13

general like you know the way humans

16:14

have uh that that makes a tremendous

16:17

amount of sense. I guess I'm curious

16:19

like you know you uh obviously I'm I'm

16:22

sure like every day you see kind of

16:23

crazy stuff on the research side at this

16:25

point like what are the milestones that

16:27

are like still meaningful to you as you

16:29

think about like it would be pretty

16:30

crazy if I you know uh did a run one day

16:33

and saw like X or Y like what are the

16:35

things you're paying most attention to?

16:38

>> Yeah. Um I mean at this point it really

16:41

is about um

16:44

research right like is it about it is

16:46

about can the model discover new things

16:49

can it execute on like a longer horizon

16:53

um research problem.

16:54

>> It's almost like looking for some sort

16:56

of insight that you're like oh someone

16:57

on my team had come up with that that

16:58

would I've been pretty intrigued by

17:00

Yeah, we we've actually had like some

17:02

minor uh um but I think I think quite

17:05

impactful ideas uh come from uh even

17:08

like GPT 5.2 Pro uh that that we're

17:11

using entirely. But you know, I think

17:13

it's still very very small compared to

17:14

where I expect it to be.

17:16

>> Yeah, I mean it seems like almost

17:17

inevitably like these models are going

17:18

to get better. They will be used in

17:20

research. They'll be used in science

17:21

more generally. You're like one of the

17:23

first people interacting directly with

17:24

these models as like research partners

17:26

almost at this stage. anything like

17:28

you've learned around the right way to

17:29

do that or do you think about like what

17:30

a research organization you know as

17:33

these models continue to get better

17:34

might look like? Yeah, I I I think we're

17:37

definitely kind of at um at a transition

17:39

point where kind of the shortterm

17:42

immediate quality of the model uh is

17:46

about to be a quite determining factor

17:48

for the pace of our research progress

17:50

because the models are going to drive a

17:52

lot of that. And so that definitely

17:54

requires um you know rewiring some

17:56

intuitions about how to um run a

17:59

research organization. Uh you know

18:00

normally you kind of try to not be too

18:02

focused on like immediate quality. you

18:04

try to be much more focused on like the

18:06

longer term. I think we have like a lot

18:08

of very exciting uh stuff queued up that

18:11

we are kind of working towards but I

18:13

feel a lot of urgency to kind of yes to

18:16

actually

18:17

>> u execute on it and to actually use this

18:19

advances in model intelligence to um

18:22

accelerate research on the AI and

18:24

especially AI alignment. Yeah, it's such

18:26

a fascinating point because I've heard

18:27

you talk before about running a research

18:28

organization and I feel like in the past

18:30

it was like giving people the space to,

18:32

you know, pursue a lot of things that

18:33

weren't like directly, you know, hey,

18:35

this is for a month or two months of

18:37

progress, but it's like what are the

18:38

ideas that are really going to drive

18:39

things forward, but it makes total sense

18:41

that we're in a time now where uh you're

18:43

like, look, everything we do will be so

18:45

much better if we just focus on this in

18:47

the in the short term and make it

18:48

better. It must be like fascinating to

18:50

navigate uh that and like these maybe

18:53

further off research ideas at the same

18:54

time and like running an organization.

18:56

>> Yeah. Yeah. It's definitely Yeah, it's

18:58

definitely something we we spend a lot

18:59

of time on with Mark nowadays. Yeah.

19:02

>> Right now you have um you know a a ton

19:05

of compute as a company, but you

19:06

obviously you have great scaling laws on

19:07

the pre-training side, you have great

19:09

scaling on the RL side, you have

19:11

probably lots of experiments going on

19:12

that have nothing to do with either of

19:13

those vectors, but are like interesting

19:15

new ways. How do you even think about

19:17

like allocating compute across all of

19:19

this stuff?

19:20

>> Yeah, it can get very complicated,

19:21

right? Because there's so many things

19:23

that we need to do. One thing we've been

19:25

one kind of discipline we've started

19:27

keeping is we um we try to make sure we

19:30

just like explicitly budget like a large

19:32

chunk of our compute to the most

19:33

scalable methods to the things that we

19:35

believe are the most responsible for

19:36

driving general model intelligence. And

19:38

you know even if it's not the most

19:40

efficient allocation of comput at all

19:41

times because you know if you're

19:43

allocating so much compute to like one

19:44

experiment or like one set of

19:45

experiments you know there's so many

19:47

things you can accelerate a little bit

19:48

of that compute elsewhere. Uh but you

19:52

know but I think it's easy to kind of

19:54

like with all the all all the all the

19:56

interesting and important things that

19:57

we're doing I think it'll be very easy

19:58

to kind of partner all of it and like

20:00

not not really end up doing the things

20:02

that we believe are most important. You

20:04

definitely want to like understand the

20:05

kind of empirical evidence. You

20:06

definitely want to make sure your

20:08

evaluations are in order and the kind of

20:10

experimental rigor is there. And then

20:11

you also want to apply some

20:13

regularization based on like okay do we

20:14

understand this method? Do we actually

20:15

expect it will scale? Do we expect this

20:17

is something you can actually build on

20:19

in the future? Is this kind of a

20:20

one-off? Right. And I think and based on

20:22

that uh determine the priority.

20:24

>> Yeah, it's so interesting. probably find

20:25

all the yeah ways that you like know you

20:27

could improve things but they feel maybe

20:28

like uh off off a little bit to the side

20:30

of where you think the overall arc of

20:32

progress is and so you end up leaving

20:34

some of these like lowhanging fruits to

20:36

some extent because really the most

20:37

important thing is finding the future

20:38

direction and then the scaling within

20:40

that and uh devoting compute toward that

20:42

obviously the the place where we talked

20:44

about codeex a lot and and the success

20:46

of coding and it feels like you know

20:47

last year was like the year of just

20:49

incredible hill climbing on on coding

20:51

I'm curious you know obviously Codex has

20:53

been a super successful product in many

20:55

ways like anthropic was kind of first to

20:57

this market you know claude code you

20:59

know was it was a dominant product there

21:01

what do you kind of like you know

21:02

reflecting on that I guess like what do

21:04

you make of the success anthropics had

21:05

in this space

21:07

>> yeah I think I think it's a matter of

21:09

you know really focusing your product

21:10

direction or on where where you believe

21:13

the kind of the the next application of

21:15

the technology is right and um you know

21:19

if you look at the kind of priorization

21:21

we've had on the on our product right I

21:24

mean we have been right like working on

21:26

on cutting products but they have kind

21:28

of been like a secondary thing right

21:30

compared to like our main priorities and

21:33

the interesting thing is that is not

21:34

very reflective of like the priorities

21:36

of the research organization within open

21:37

AI uh I think you know given that like

21:41

we've kind of had this you know

21:43

explosive success of charg you know

21:45

charging as it was you know I I think

21:47

charging

21:49

quite a bit and it's going to evolve

21:50

quite a bit but as it was in 23 right is

21:52

this particular you know product that's

21:53

maybe not, you know, I think it's

21:55

definitely quite aligned with our vision

21:58

of like where AI is going, but but like

21:59

it's not really like the like

22:01

representative of like everything that

22:03

that that that it enables. And so the

22:06

majority of like our work in research

22:08

has been focused on like that that

22:09

future thing. And I think increasingly

22:12

it has decoupled from our our our kind

22:14

of like short-term product strategies,

22:16

right? Yeah. I'm very kind of um

22:19

confident about um the things we've been

22:23

building and the things we we we are

22:24

building on on on the research on the

22:26

model intelligence side. You know, a lot

22:28

of our our rep refriation and increased

22:31

focus on the on the product side is

22:33

about actually kind of getting to deploy

22:34

them and the belief that actually they

22:35

are uh the thing that really matters

22:37

now.

22:38

>> Yeah. And now it feels like you know the

22:39

uh clearly the whole company priority

22:42

you know is so locked in and focused on

22:43

this and you've seen just incredible

22:45

improvement in codecs in recent months

22:47

for all the developers that listen to

22:48

the podcast like if again it's almost

22:51

like hard to comprehend like what the

22:53

world looks like as these models keep

22:54

hole climbing on longer and longer tasks

22:55

like what do you think will look

22:57

different in their lives or like how

22:58

will they be using codecs in you know

23:00

three six months. I realize 3 months and

23:03

six months are very different timelines

23:04

in this world, but take whichever uh

23:06

whatever in between point you'd like.

23:08

>> I would expect um just a a gradual

23:12

increase in just the level of autonomy

23:16

uh you feel comfortable uh foring the

23:18

model just the the fagness of

23:20

description that can work with you know

23:21

the level of supervision it needs. I

23:23

think we're not very far for models that

23:25

can work autonomously for a couple days.

23:27

Um maybe use quite a bit more computer

23:29

than they're using now and produce much

23:30

higher quality artifacts on their own.

23:32

Do you have a gut instinct on like what

23:33

like you know there's always been this

23:34

question of like will the world you know

23:36

do you need that software engineering

23:37

skill set to supervise these models

23:39

running for a few days or like hey does

23:40

it turn out at some point of like being

23:42

able to run for a while you know anybody

23:44

can can use coding agents and supervise

23:46

them to to some sort of output. I mean I

23:49

think definitely for like a lot of

23:50

outputs you already don't need much

23:53

experience right I think I think still

23:54

the distinction I would draw between

23:56

like you know an intern here and like

23:58

really an autonomous researcher software

24:00

engineer would be that like if you want

24:02

to build something bigger like you know

24:05

you probably still want to apply

24:06

supervision you still kind of want to

24:07

have like an overarching thing you want

24:08

to recognize like what what what

24:10

building blocks fit in and what which

24:12

don't but yeah I definitely expect that

24:14

like that desired skill set uh to shift

24:17

quite a bit over

24:18

Yeah,

24:19

>> towards towards this like more general

24:21

uh vision setting.

24:23

>> You know, I guess on on the on the

24:24

research side, I feel like there's been

24:26

uh you know, maybe maybe like a month

24:27

ago, I feel like all anyone could talk

24:29

about was continual learning and there's

24:30

just you know, it was in the Zeitgeist.

24:32

There's all these neolabs starting to go

24:33

focus on continual learning. Some folks

24:35

left OpenAI to go focus on that. Um I'm

24:38

curious like you know I think it part

24:40

maybe part behind that is a belief that

24:42

like you know uh RL alone you know

24:45

either won't get us there or will get us

24:47

to like some level of very inefficient

24:49

scaling and it's kind of different than

24:50

the way you know humans learn. I think

24:51

even I've heard you say before like that

24:53

you know RL is still very different

24:55

today than the way that humans learn.

24:57

What's your take on on like that you

24:59

know that whole movement?

25:02

Yeah, I I am a little bit confused by it

25:04

because you know in my mind like

25:08

the whole kind of like excitement that

25:10

like we've had I mean even even if you

25:12

look at the titles of like the GPT uh

25:14

you know three paper right like it is

25:17

that like oh you know this class of

25:19

models is actually capable of continue

25:21

learning right it's capable of like

25:23

learning uh um learning to learn in

25:26

context right that has been really you

25:29

know the driving force behind the kind

25:31

of excitement to like scale these GPD

25:33

models further. That has been like the

25:35

premise for why we really need to teach

25:38

them with RL like learn in context more

25:40

efficiently. And so I definitely agree

25:42

that continual learning is really the

25:45

thing, right? Like it's really the thing

25:46

that we're building, but I I don't

25:47

really think this is like a problem

25:49

that's like, oh, you know, it's kind of

25:51

ignored and off the path of what we're

25:52

doing currently. I think it is what

25:53

we're working towards.

25:54

>> Yeah. Like in your mind, this is like

25:55

the single best path to get there is to

25:57

continue to kind of scale uh the

25:59

pre-training in RL. I think that is kind

26:01

of how we've made the most progress on

26:02

this problem so far and you know I think

26:04

there are I think that there definitely

26:07

are like more ideas more steps um I

26:09

think also a lot of improvements that

26:11

will just come from scale

26:12

>> yeah and I guess like you know we have a

26:14

lot of folks listening that maybe have

26:16

you know have been able to do a lot of

26:17

simpler things with these models and

26:18

then they try to do like some of these

26:20

more complex you know I don't know call

26:21

it 100 step or longer term tasks and

26:23

they're like oh you know the the models

26:25

don't work for this yet and I think it's

26:26

harder you on the inside constantly feel

26:28

this improvement but for them it feels

26:30

like hey this is like night and day away

26:32

from you know being able to do this much

26:34

longer thing. How do you kind of

26:35

articulate to them I guess the set of

26:38

things that need to be true for these

26:39

like much longer steps to happen. Is it

26:41

around kind of checking in more often as

26:43

you were talking about before or I feel

26:45

like there's just this belief uh among

26:47

the research community of like oh all of

26:49

these tasks will be solved in the next

26:50

year or two and then in the wild a lot

26:52

of people maybe not totally groing that

26:54

like improvement line that we've been

26:56

seeing.

26:58

>> Yeah. I mean I think a lot of that

27:00

prediction comes from just looking at

27:02

like historical improvement lines,

27:04

right? And but I think increasingly we

27:06

can we can roughly see the the the the

27:10

shape here. I do think a lot of this is

27:11

about just the models becoming

27:12

intelligent enough to recognize like

27:14

whether you know they're making

27:15

progress. Um I think some of this is

27:19

like yeah this very kind of pragmatic

27:20

work of like are the models actually

27:24

you know can they actually access you

27:26

know all the context all the files all

27:27

the infrastructure they need to do the

27:29

work you want them to do which yeah I

27:31

remember like in the past when we were

27:32

discussing you know the kind of the the

27:35

road map uh that we're taking with RL

27:38

you know I definitely view like okay we

27:39

just need to teach the model to kind of

27:41

reason with its own tokens as kind of

27:43

the priority and then of course we'll

27:44

need it to use tools like the

27:46

environment, you know, at some point we

27:48

definitely need to teach it to see,

27:50

right? At some point, we need to teach

27:51

it to use a physical body, right? Like,

27:53

but like uh yeah, I mean, I think we're

27:55

definitely like well into the stage

27:56

where, you know, really needs to like

27:57

interact with the environment and it

27:58

really needs to see uh and you know,

28:00

someday soon we'll we'll really cover

28:01

about robots, but yeah.

28:02

>> Yeah. I mean, it does feel like a lot of

28:03

the times when I hear people complain

28:05

about, oh, a model can't do X or Y, it's

28:06

like literally just because you haven't

28:08

fed, you know, or connected it to

28:09

systems or fed enough context into it.

28:10

Actually, I do wonder if like context

28:12

was universally applicable and able to

28:14

flow into these things. Like I feel like

28:16

a lot of these problems would actually

28:17

just be solved with today's models. You

28:19

know, I want to talk about some of the

28:20

AI for science stuff um that you guys

28:21

have been working on. And one thing in

28:23

particular, you know, I feel like the

28:25

coding stuff is something that everyone

28:26

feels very viscerally um you know, in

28:28

every company they're using these tools

28:30

and getting tons of productivity. You

28:31

know, on the math side, not all of us

28:33

competed in in in IMO competitions and

28:36

uh necessarily have as much of like an

28:37

intuitive feel for some of these

28:38

breakthroughs. And so one of them I know

28:40

that was really interesting that you

28:41

guys did is you use some compelling work

28:43

around like first proof, right? And I

28:45

think these are like very different

28:46

problems than kind of traditional

28:47

competition math. I wonder if you could

28:49

just speak a little bit to that because

28:50

I think it's just a space that our

28:51

listeners might be less familiar with

28:52

and kind of less familiar with

28:54

understanding the implications of models

28:55

being able to do pretty cool work here.

28:58

Yeah, I mean you know I think yeah I I

29:01

was very excited with the first proof

29:02

challenge and you know again like I I

29:04

kind of you particular one is kind of a

29:06

benchmark right it's like a couple you

29:08

know respected mathematicians

29:09

theoretical computer scientists

29:10

releasing problems that like they

29:11

believe are like representative of their

29:13

day-to-day work but haven't been

29:15

published anywhere so that we can really

29:16

have our models take a crack. We were so

29:18

excited about this challenge, but you

29:19

know, it was kind of dropped um without

29:22

any any any

29:25

advanced warning um with like a week-l

29:28

long deadline to actually execute. Um we

29:31

had a we had a very exciting model

29:32

training uh at the time. And so uh um uh

29:38

um one of the people in charge of

29:39

training James Lee kind of started

29:42

prompting the uh that model just um by

29:46

hand and and and and

29:49

uh and yeah and actually kind of seeing

29:51

oh okay it's actually solving these

29:52

problems was really a fascinating things

29:54

to see. uh you know one of these powers

29:57

actually is from a domain that I I I I

29:59

did my PhD in and yeah seeing the model

30:02

kind of come up with these ideas which I

30:03

would you know quite proud to come up

30:05

with like in a in a week or or two uh

30:08

seeing it come up with them in like an

30:09

hour or so that was very uh yeah it's a

30:13

very weird feeling right like like yeah

30:15

I think like in the past the when I felt

30:19

like that was like when watching our

30:20

data bot like play just like very

30:22

interesting data games infinitely right

30:24

and it feels like just there's some sort

30:25

of magic happening because like you know

30:28

interesting things should not be like

30:31

>> indefinite.

30:32

>> Yeah. And so seeing that happened for

30:34

math right for something that I believe

30:35

like you know is actually like quite

30:38

representative of of of our our or you

30:41

know a precursor to a lot of the work

30:42

that we're doing and a lot of the work

30:44

that like really matters in the world.

30:45

Um yeah definitely really increase my

30:48

feeling of urgency. One thing that's

30:49

fascinating too is the idea that you're

30:51

you're training these models and it's

30:52

like you know you pro you throw these

30:54

problems in and it's like nobody knows

30:55

whether you know how good will they be

30:56

at solving them and and I think just

30:58

like it must just be fascinating to see

31:00

uh something that you know so well and

31:02

and a space that you spend so much time

31:03

in and and realizing hey probably the

31:05

previous generation of models wouldn't

31:06

have been able to do that and you

31:08

wouldn't even thought necessarily that

31:09

this was like the the benchmark to do

31:10

but it's like just generally showing the

31:12

the general purpose capabilities and and

31:14

improvements of the models. I mean it it

31:16

is at a stage where like you know we

31:17

needed to like seek out experts in the

31:20

in the particular domains to be to be

31:22

able to tell us whether these particular

31:23

proofs are correct or not but you know

31:25

it's still much easier to like tell

31:28

whether you've you've actually made

31:29

progress than you know than for

31:31

something like uh even coding right like

31:33

because sure like competitive

31:34

programming you can evaluate but most

31:35

programming is not competitive

31:36

programming and it's you know it's about

31:38

like are the abstractions right are

31:39

handling all the all the cases and yeah

31:41

>> yeah I guess like you know I feel like

31:43

there was this maybe common critic

31:44

system a year ago and I don't know if

31:46

it's as strided now that like okay these

31:48

models are like pattern matchers but

31:49

like you really want AI for science like

31:51

we're not going to get new ideas or like

31:53

you know entirely novel things out of

31:55

out of pattern matching feels like we

31:57

continue to like chip away at that

31:58

narrative are we getting closer to kind

32:00

of fundamentally disproving that

32:02

>> I believe so yeah I mean I think kind of

32:04

on schedule we're starting to see like

32:07

minor advancements right like not huge

32:10

things right like a small idea here or

32:12

there I mean maybe maybe some like

32:13

bigger papers in collaboration with with

32:15

scientists, right? But, you know, was

32:19

Alpha Zero a pattern match, Alpha Go a

32:21

pattern matcher? You know, our our datab

32:25

match like they did kind of come up with

32:27

new strategies for the respective games.

32:29

>> Yeah.

32:29

>> Um,

32:30

>> it's funny that there's counter examples

32:31

to it all the way back to, you know,

32:32

2016, 2017.

32:33

>> Right. Right. And and, you know, and you

32:35

can say like, well, I guess you can

32:37

always fall to flaws in that which I

32:39

think is interesting like AlphaGo can be

32:40

beaten with some strategy. our data bots

32:43

could have been been bitten with some

32:44

with some strategy. I think I think

32:46

there will be a lot of definitiones for

32:48

a while of of like these models, right?

32:50

But but I think also like they they are

32:53

able to discover new things because they

32:55

have a lot of these capabilities and

32:56

like the way you know yeah I mean it's

33:00

you know taken a couple years to like

33:02

get go from like this like very tiny

33:04

game environments to like this much more

33:07

um general scientific research. it

33:09

required kind of going through um you

33:12

know like a decent approximation of like

33:14

all human knowledge in the meantime and

33:17

you know learning all the human

33:18

languages and so forth but but um but I

33:20

think the basic principle is is is very

33:22

similar.

33:23

>> Yeah. You know, it's funny. I think like

33:24

when you guys had these first proof

33:26

results, um I remember like the

33:28

organizers said, you know, they were

33:29

commenting on these AI solutions and

33:30

they were like this feels like, you

33:32

know, 19th century mathematics of like

33:34

brute force, you know, computationheavy

33:36

approaches rather than these like

33:37

elegant modern techniques. Um which I'm

33:39

not sure is a feature or bug of of you

33:41

know, obviously the the way these models

33:42

work, but like you know, hearing that I

33:45

mean does that like does that concern

33:46

you, excite you?

33:47

>> It doesn't concern me. I mean I think

33:50

it's expected that like I I'm sure I I

33:53

thought for at least one of the problems

33:54

like actually actually our produced

33:56

pretty pretty nice pro that was quite a

33:58

bit shorter than like the intended one

33:59

you know but I think in general you

34:00

would expect like yeah this models kind

34:02

of you know they can produce so much

34:04

more reasoning in a short time than like

34:06

a person can right just like in terms of

34:07

just raw number of like tokens or

34:09

thoughts I don't expect that to be like

34:11

kind of a long-term feature

34:13

>> it feels like there's so much momentum

34:14

behind AI for science right now and you

34:16

mentioned obviously like you know at

34:17

some point you do have to connect these

34:19

these models to the physical world and

34:20

you guys released some cool stuff with

34:22

GKO and like some of these other things

34:23

you've been experimenting with. I'm sure

34:25

you've thought a lot about like AI for a

34:27

bunch of different areas of science. You

34:29

know, as you've kind of dug into some of

34:31

this stuff, have you dealt with any

34:32

intuition for as you think about like 3

34:34

years from now, the spaces where of

34:36

science where you're like, "Oh, that

34:37

there's going to be crazy progress there

34:39

versus the ones that might prove like a

34:40

little more resistant to immediate

34:42

change." You know, a tempting answer

34:44

would be that like oh, you know, it's

34:45

really about like um you know, do you

34:49

uh you know, what are the things that

34:51

kind of require some some you know,

34:53

manual work like where the models are

34:55

not like not not quite plugged in the

34:57

ecosystem or you know like the that the

35:00

the different laboratories will also

35:01

kind of evolve pretty quickly to adopt

35:03

to like these new technologies

35:04

>> within those STEM fields. Obviously, you

35:06

know, I feel like there's a question of

35:07

is it like an LLM with access to the

35:10

physical world or you've obviously had

35:12

companies that are have been started

35:13

specifically around these domains,

35:14

right? Like an isomorphic in biology or

35:16

periodic in in material sciences or

35:19

physical intelligence and robotics.

35:21

What's your kind of gut instinct on the

35:22

extent to which it makes sense to pursue

35:24

some of these things like independent

35:26

with different model architectures

35:27

versus like all within the context of

35:29

one place?

35:30

Yeah, I think it's kind of similar to

35:32

you know my answer about like the um UI

35:35

for you know for codex which like I I

35:37

would build around the capabilities of a

35:38

technology and not around it limitations

35:40

so much. Um so you know you definitely

35:44

like if you have something that like can

35:46

suddenly design like a huge amount of

35:48

like interesting like chemical or

35:49

biological experiments like yeah I mean

35:51

it makes sense to uh you know build labs

35:54

that enable that. You know, I think if

35:56

we if we did get to a place where like

35:58

the model is like very capable of

35:59

designing high quality experience. It

36:00

also makes sense to like have it work

36:02

with humans in a loop, right? Like we

36:03

shouldn't think of it as like oh it's

36:04

either you kind of automated fully and

36:06

you have this like fun thing using some

36:08

tools on the side. Like we will get to a

36:10

world where like it's just very natural

36:12

to be collaborating with um you know AI

36:14

scientists that are that are working

36:16

hard on a problem.

36:16

>> Yeah, it's so interesting. It's almost

36:17

like a different vision. It's like one

36:19

world where this works is like hey you

36:20

just train a model you know to basically

36:22

run these endto-end tasks and like be

36:24

the automated like you know uh biologist

36:27

or you know chemist or whatever it is

36:29

and there's another one which is like

36:30

well you're building really tools to you

36:32

know both propose run kind of work in

36:35

tandem with a bunch of human researchers

36:37

>> I mean you know I wouldn't necessarily

36:38

categorize it as I mean you know of

36:40

course there are tools in some sense but

36:41

I think like you know we will get to a

36:42

point where they're driving a lot of the

36:44

like design and and ideation for the

36:45

whole process. Yeah, with with like an

36:47

LLM architecture, but just like you know

36:49

being able to figure out the right way,

36:50

the right kinds of experiments to run

36:52

and and then actually design it. And

36:53

yeah, when it comes to like different

36:54

architectures and you know, I mean, you

36:57

know, for sure like you know like

36:59

natural language reasoning like the kind

37:02

of the kind of things u that that we're

37:04

prioritizing that gives you a lot of

37:06

generality like there there are things

37:08

that are that you know you kind of want

37:10

to train it you want to train a

37:11

different model to to model right you

37:13

know I think even like yeah if if you

37:15

want to create a very good you know G

37:17

model I I don't think like large

37:19

language models are like the most

37:20

efficient way to go about this although

37:22

they might result in the best model

37:23

eventually but uh you know I think it's

37:25

similar for like uh you know protein

37:27

folding or or other task of this kind.

37:29

>> Yeah. So you think it makes sense to

37:30

have like some independent efforts

37:31

around that but obviously the like you

37:33

know that will end up being paired with

37:35

like a core really good researcher large

37:37

language model that is you know helping

37:38

drive a bunch of this stuff.

37:40

>> Yeah. I want to also make sure just to

37:41

talk about AI safety because I think

37:42

that's an area that you've done a lot of

37:44

really pioneering work on. Um and you

37:46

know I'm not sure all our listeners will

37:48

be familiar with uh you actually did

37:49

some really interesting work across the

37:50

labs right uh and and were focused on

37:53

you know chain of thought monitoring and

37:55

so maybe to start just talk tell us a

37:57

little bit about that work and and you

37:59

know uh you know what you found.

38:00

>> Yeah so this is um a realization that

38:04

actually we had um around the time we

38:07

actually saw like the first um reasoning

38:11

models of kind of the current crop. We

38:14

realized that like okay like well this

38:15

works right and we were pretty uh you

38:18

know we were thinking a lot about what

38:19

this means we kind of were like okay

38:21

like probably the word really changes

38:22

over the next I don't know year or two

38:24

or three you know we were thinking what

38:26

this means for for safety and for for

38:28

our ability to kind of understand what

38:29

these models are doing and we realize

38:30

that because of the way we train these

38:32

models that because we don't supervise

38:35

the reasoning process directly right

38:36

it's not like you know chpt is trained

38:38

to kind of um you know be be polite and

38:41

nice and like Um, and

38:43

>> it always tells me I have great ideas.

38:45

>> Yeah. Well, you know, that's a separate

38:48

issue, right? Like, but but you know,

38:50

but but like even assuming it's like

38:52

aligned exactly in the way we would want

38:54

it to, which is definitely not, you

38:55

know, uh, sick ofic like it's still kind

38:57

of not going to be uh, you know, there

39:00

are just still still some things it's

39:02

not going to reveal about its

39:03

motivations and time because, you know,

39:05

maybe it would be unsafe or maybe it

39:07

would be unkind. um um or you know or

39:10

maybe because it's not maybe it's

39:12

actually not aligned the way we think

39:13

but it wants to hide that right and uh

39:17

and the way we train the reasoning

39:19

models like the the the train of thought

39:21

doesn't have any of that it's not

39:22

optimized to uh to be in any particular

39:26

way because it's just not not directly

39:28

great it's only great in how it relates

39:30

to like producing a high quality output

39:34

um and realize this is actually a very

39:36

powerful

39:38

paradigm time for being able to

39:40

interpret what the model is doing,

39:41

right? It's actually not a very

39:43

different idea from uh um mechanistic

39:45

interpretability, right? Because in

39:46

mechanistic like the idea is again like

39:48

you kind of have this model, you have

39:50

these activations of the model um that

39:53

you know are not directly supervised to

39:55

predict any label. they're they're kind

39:57

of like indirectly supervised but you

39:59

know the model kind of has never been

40:01

trained with like any sort of like uh

40:03

you know inspection of the of these

40:04

activations and so these activations

40:05

might reveal something about this in

40:07

inner workings but the big advantage of

40:09

the chains of thought is that you know

40:10

by default they are in English right and

40:12

so it's so much easier to understand

40:13

what is going on especially you know as

40:16

the concepts get more advanced u and the

40:19

other interesting thing is um you know

40:22

we were just talking about how probably

40:25

you know how how we believe in in the

40:26

future where we go uh well these models

40:29

work for a very long time they work

40:30

autonomously right and so there there is

40:32

much more of this reasoning uh and so

40:34

you know if this is a big axis of how

40:37

the capability of these models increases

40:40

um that the sort of our ability to

40:42

supervise them will will scale uh uh

40:45

comately. Yeah, this really comes down

40:47

to this

40:49

principle though that like you know

40:51

you're not supposed to supervise the

40:52

train of thought and so this is actually

40:53

something uh when we originally you know

40:56

we're releasing the preview model like

40:58

we made this decision to like hide the

41:00

chains of thought and

41:01

>> yeah I remember

41:01

>> and um you know for me that was the

41:04

primary motivation that was the reason

41:06

like I didn't really even want to

41:09

consider releasing it in different ways

41:11

you know there definitely was a bit of

41:12

internal discussion about this but like

41:14

the reason I felt very strongly like we

41:15

should we should just hide it is because

41:17

of this. Uh then there was this other

41:19

concern that like I didn't initially

41:20

think about but I think was also like

41:21

very valid of like well you know like

41:24

this model is going to be distilled to

41:25

some extent blah blah uh and you know

41:26

and that's definitely also been like a

41:28

big factor here. Uh but but yeah but I

41:31

actually think that like this uh you

41:34

know allowing the models some sort of

41:36

private space uh oh and by the way like

41:38

why do I think it's important that we

41:40

don't like you know show this change of

41:41

thought in product you know um if if if

41:45

I'm saying like the important thing is

41:46

not to supervise them during training

41:47

well I think if we did show in if we

41:49

like established a paradigm where like

41:51

oh you just show this chains of thought

41:52

in product uh eventually you kind of

41:55

have to train them right like you'll

41:56

have to train them for the same reasons

41:58

you have to train like whatever models

41:59

you ship. Um and I just think that

42:02

>> we might not all want to know what the

42:03

chain of thought our model has that gets

42:04

to a response for

42:05

>> right I mean you know I think I think

42:07

it'll be useful to some extent and we

42:09

are trying to capture most of that value

42:11

you know either with like chain of

42:12

summaries uh which I think are kind of

42:15

like a little bit of a stop gap. I think

42:16

the longer term solution here is having

42:18

the model actually talk to you in real

42:19

time which you know the later the latest

42:21

version of Codex kind of do latest

42:23

version of of the reasoning GP models

42:25

kind of do but I think I think that will

42:26

get much better um

42:30

yeah but but yeah I think there's

42:32

something very exciting here about just

42:34

like not u not having the training

42:38

signal fight against us right and not

42:42

not Yes because yeah I think if you

42:46

If you want to be able to understand

42:47

what the model does in the long term,

42:49

but you know you're scaling a method

42:51

that is like kind of going directly

42:52

against that, it's you're probably not

42:54

going to have a good time, right? That's

42:55

the other side of the better lesson. Uh

42:58

and so this decoupling I think is a very

43:00

it's an idea that gives me a lot of hope

43:02

for our ability to at least understand

43:05

um you know how these models motivations

43:07

and generalization evolve as they get

43:09

better as they as they work for longer.

43:11

Um yeah, I don't think it's a complete

43:14

solution to AI as alignment by a long

43:16

shot. I think it's just another tool in

43:18

our in our toolbox. Uh but I am hopeful

43:21

that building our toolbox with technical

43:23

tools like this, we can actually

43:25

continue chipping away at the

43:26

fundamental problems here.

43:27

>> Yeah, it seems like almost like over

43:28

the, you know, medium term, it's like

43:30

something that's going to be incredibly

43:31

helpful. Probably not the catchall

43:33

solution for for long-term alignment.

43:35

Yeah, I mean I think it's a tool that

43:36

can help us understand like I think it's

43:38

actually very useful to like build

43:40

understanding of long-term alignment,

43:41

right? For example, there has been this

43:43

very exciting quark um from um um um

43:48

from a planning collaboration with other

43:50

labs uh on uh model scheming where they

43:54

investigate uh you know depending on

43:56

kind of what environment you pro you put

43:58

the model in, how you train it like is

43:59

it is it prone to like start kind of

44:01

like having hidden objectives that it

44:03

pursues and you know what enables that

44:05

that whole line of work is chain of fat

44:07

monitoring right is this notion of like

44:09

oh you can actually inspect what the

44:10

most motivations are uh so you know and

44:14

I think from that like that might take

44:16

us in a completely different in terms of

44:18

mitigations right like maybe the right

44:19

way is like changing the pre-training

44:20

data of the model or maybe it's

44:22

something like uh you know the

44:23

inoculation prompting from a topic like

44:25

I think I think those are very

44:26

interesting ideas but I think like

44:27

having this ability to like understand

44:28

is very helpful to to evaluate these

44:30

>> yeah it's almost like foundational for

44:31

any further uh area of research what are

44:34

like the other research areas within

44:35

alignment that you're paying attention

44:36

to or that you think are promising you

44:38

know areas to focus on Um yeah, I think

44:42

I think a lot of the

44:44

a lot of the like longer term challenge

44:46

with alignment is about generalization,

44:49

right? Like we can train our models to

44:51

do well and and and and or you know at

44:54

least mostly to some extent like we we

44:56

can mostly kind of control their

44:58

behavior in the in the things that that

45:01

you know are in distribution that that

45:02

we train for. Um, but you know the

45:05

things that are worrisome is like well

45:07

what happens when animal is asked to do

45:08

something very very different or it

45:10

finds itself in a very different

45:11

situation or it's like much smarter than

45:13

it ever was before and and and you know

45:14

it has all these capabilities. It's like

45:16

we haven't really kind of thought about

45:18

how to train for and so yeah so so I

45:20

think I think you know the study of like

45:22

this kind of longer term value alignment

45:24

is really a study of generalization like

45:26

what are the values that the model falls

45:28

back on. Um like one line of research

45:31

I'm very excited about here and

45:33

something that we're uh investing in

45:36

quite a bit is uh understanding like how

45:39

that um how the generalization falls

45:42

back onto the pre-training data. Um

45:46

um yeah and yeah I I I think there's

45:50

quite a lot there. I guess over like you

45:53

know the last six months have your

45:54

concerns around alignment increased

45:56

decreased like how do you you know where

45:57

are we kind of trending overall uh you

46:00

know with this work

46:01

>> I I I will speak to like the the the

46:03

longer term challenges of like fignment

46:05

right or like what happens when you have

46:06

very smart models the the way my

46:08

thinking about the problem has evolved

46:09

over the past few years is definitely

46:11

kind of gone from

46:13

you know oh is this like very nebulous

46:15

problem that like is just like very hard

46:17

to even grapple with or define uh to

46:19

like oh you know I think we can actually

46:21

make prog progress at it by very

46:23

concrete technical solutions and

46:24

technical insights. And this is why

46:26

we've really been uh

46:29

viewing alignment as like just a core

46:32

part of of research and really uh you

46:34

know making sure that like we are you

46:36

know designing our reasoning models uh

46:39

thinking about this and we are you know

46:40

and we are kind of like conducting our

46:42

alignment research with like these

46:43

reasoning models in mind and so forth.

46:45

Um

46:46

so I think my general kind of uh belief

46:51

that there's like a research path here

46:53

that actually gets us to an extremely

46:55

happy world uh has increased quite a

46:57

lot. Um,

46:59

at the same time, right, I think

47:02

uh my timelines to very capable models

47:05

have definitely decreased a lot, right?

47:06

I think we're we're not that far, right?

47:08

Again, I don't think these are models

47:09

that are smarter than all the ways, but

47:10

I think these are models that are just

47:11

very transformative. And so, I'm quite

47:14

optimistic like we can keep a good grip

47:16

on like how we're doing on the alignment

47:19

problem, how to roughly evaluate the

47:22

risks of of of of

47:25

our models or or the problems with them.

47:26

you know, but I do think we have to be,

47:28

you know, as an industry as really

47:30

prepared to like take trade-offs and,

47:31

you know, and possibly, you know, slow

47:33

down development uh um depending on what

47:36

we see. It

47:37

>> it's already interesting to see a lot of

47:38

this work happening across the major

47:39

labs. You know, the fact that you did

47:40

this in collaboration with I think

47:41

Anthropic and Deep Mind and you know, it

47:43

seems like uh has that just come up

47:46

organically or imagine like is there a

47:47

lot of like alignment talk between you

47:49

know, the the major players, you know,

47:51

uh given I guess the three of you are

47:52

really at the forefront of all this?

47:54

There's definitely some I mean there's

47:55

definitely like shared interest in this

47:57

topics. Yeah.

47:57

>> I want to shift a little bit to going

47:59

inside OpenAI. I feel like no no company

48:01

probably or the world has been more

48:03

interested in over the last uh 2 three

48:05

years and you know I think particularly

48:06

what it's like to run a research

48:08

organization. You know we talked a

48:09

little bit about this uh previously but

48:11

you talked before about how it's you

48:13

know important part of your job is

48:15

giving researchers you know uh to to

48:18

kind of have comfort and space to you

48:19

know almost be cave dwellers right and

48:21

think about what the models will look

48:22

like in a few years. Um, you know, we

48:24

were kind of alluding to it earlier.

48:25

We're also in a time where it feels like

48:27

there's just massive competitive race

48:30

and you know, uh, it's it's it's

48:31

certainly, you know, everyone's going

48:33

really gung-ho on these coding models.

48:35

I'm wondering like how do you actually

48:36

operationalize this balance today and

48:38

and you know, anything you've kind of

48:40

changed in your thinking, you know,

48:42

overseeing this organization around the

48:43

right way to do this? you know I focus

48:45

on on just high quality experiments

48:48

recognizing you know are we actually

48:50

making progress being honest with

48:51

ourselves and you know and promoting

48:53

honesty about about the results um I

48:55

don't think that has changed right and

48:57

and uh you know even though our work

48:59

will evolve a lot I believe we still

49:01

have quite a lot of work left to do and

49:04

so I don't think it's like oh you know

49:05

we need to wrap up all our projects uh

49:07

um you know very very quickly so yeah I

49:10

don't think those fundamentals change I

49:11

think what what does change is uh you

49:13

know a level of urgency to really kind

49:15

of bring some of these things that we

49:16

think are most promising uh to fruition

49:19

>> and then obviously you know I feel like

49:20

there's been um you know some very

49:22

public internal moments of open AI over

49:24

over the years you've been here for a

49:26

long time as you kind of reflect back

49:28

like what were some of the difficult

49:30

decisions that you guys made that maybe

49:31

were like 5149 that really you know

49:34

defined the company or any any any as

49:36

you think back of the movie of the last

49:37

you know seven eight years of your life

49:39

um you know the key moments that kind of

49:41

stick out to you. Well, yeah. I mean,

49:42

there's certainly a number of, you know,

49:44

dramatic moments, uh, like this. Um, you

49:47

know, I think the ways the company

49:48

underwent the most change is not really

49:50

this like snap changes, snap decisions,

49:53

but more like just like shifts and and

49:56

how it operates, right? I would say like

49:58

opening has gone for a couple phases.

50:00

you know when I joined at the start of

50:01

2017 2017 very much kind of uh felt like

50:05

very academic lab pursuing like a lot of

50:08

different ideas not so you know scaling

50:10

pill in practice uh and I think that was

50:12

like the first like big change with the

50:15

data product with GPT we've kind of

50:17

moved to okay like we actually are going

50:19

to have to buy big computers we're

50:20

actually going to have to um scale

50:22

things we going to have to develop the

50:24

science of scaling we'll have to develop

50:25

the infrastructure for it um and so that

50:29

kind of started the second phase of of

50:31

okay now we're scaling right like we're

50:33

we're we're still going to pursue like a

50:35

lot of these basic research ideas but we

50:37

are going to evaluate them like for the

50:38

act are this are they scalable um um

50:44

then yeah then there was this

50:45

interesting period I talked about

50:46

earlier right where you kind of have

50:48

>> chat GPT is this big thing

50:52

yeah I mean I thought it would look a

50:55

little bit differently right like I

50:56

think I I was actually surprised that

50:57

like text models

51:00

I was pleasantly surprised like text

51:02

models are actually kind of the first

51:03

thing. I thought we would be in a world

51:04

where like it's more the kind of like

51:07

you know video style uh uses of

51:09

generative AI are kind of like the first

51:12

>> uh the first big thing to take off and

51:13

like and we'll have to like trade off

51:15

like pursuing the kind of longer longer

51:17

term text based research. Uh so yeah so

51:21

so so but yeah but I think definitely

51:24

like we anticipated that like this sort

51:25

of tension would arise right where like

51:27

you have a thing that is kind of like

51:29

popular now but it's like you know you

51:30

believe it's going to evolve quite a lot

51:32

before you get to where you're going and

51:34

so I think that's kind of the phase

51:35

we've been in for a while um and yeah I

51:39

think now we're we're like uh

51:43

um well yeah I mean we believe we are

51:46

kind of like starting to be in this

51:47

phase where yeah we're actually

51:48

deploying AGI or you know deploying

51:50

models that are actually very economic

51:52

transformative.

51:53

>> No, it's uh it certainly seems that way.

51:55

Well, I guess we always like to end

51:57

interviews with a standard set of

51:58

quickfire questions which are basically

52:00

me just stuffing all my overly broad

52:01

questions I couldn't fit anywhere else.

52:03

Uh so if you you'll shamelessly indulge

52:04

me uh you know I guess to kick it off

52:07

would love what's one thing you've

52:08

changed your mind on in the AI world in

52:10

the last year? Yeah, I mean I I think I

52:12

think it's really, you know, starting to

52:14

reconcile this tension between, you

52:18

know, the AI that you build ultimately

52:20

is something that affects the world,

52:21

but, you know, until you until you kind

52:23

of get pretty close, it's like a pretty

52:25

theoretical thing that you're just kind

52:27

of, you know, u training and developing

52:29

algorithms for. And so, you know,

52:32

recognizing that okay, now we actually

52:35

need um we really need to um

52:40

you know make a lot of pro progress and

52:42

focus on like how actually we're

52:43

deploying this technology and um in a

52:46

while. This is definitely something I've

52:48

been I've been thinking about a lot

52:49

lately.

52:50

>> Yeah, it's so interesting. basically

52:51

like you know uh outside of chat it was

52:54

almost like more in the in the abstract

52:56

or research hill climbing you know with

52:58

some usage in the real world and then in

52:59

this last year we've obviously seen you

53:01

primarily via coding agents just you

53:03

know it it trickle in you know in in a

53:05

pretty massive way.

53:06

>> Yeah I I I I think I I believe is kind

53:09

of going in the same direction as like

53:10

the coding models where like it's

53:12

actually going to be something um you

53:14

know very useful it's going to be

53:16

something that's like a meaningful part

53:18

of of of people's lives. when you say

53:20

going in the same way you mean just like

53:21

executing longer term tasks or more like

53:23

you know the

53:24

>> I feel that's part of it right but also

53:26

just um you know coming to become like a

53:29

dependable trustworthy assistant or

53:31

compion

53:32

>> yeah it's amazing to watch the way

53:33

younger people use jet I'd argue it's

53:35

it's already pretty much there for uh

53:37

the way a lot of folks in in high school

53:39

and college and you know uh seem

53:41

increasingly you know comfortable using

53:42

it um you know I wouldn't be a shameless

53:45

podcaster if I didn't ask a top

53:46

researcher you know timelines for a few

53:48

things I think particularly interesting

53:49

is the stuff outside of the core LM

53:51

world and so think there's a lot of buzz

53:53

around robotics these days. Do you have

53:55

any like in I mean obviously it's hard

53:56

to pinpoint like a moment robotics quote

53:59

works but I think you know whether it's

54:00

finding scaling laws or finding some

54:02

sort of like chatbtesque moment for

54:04

robotics.

54:05

>> Yeah. I mean I definitely think there

54:06

are like very promising algorithmic

54:08

ideas there that I I believe are going

54:10

to work that are you know not too

54:12

dissimilar from the space of ideas. So

54:14

I'm I'm quite optimistic about about

54:17

timelines there. Uh although I do think

54:19

they're longer than like the kind of the

54:21

virtual um AI.

54:23

>> Obviously I'm sure you think a lot about

54:24

you know cuz you're always thinking

54:25

about the next frontier for what these

54:27

models can do. Um you know just the

54:29

impact on on society as a whole as you

54:31

think about this kind of pace of

54:32

continued model improvement. You know

54:33

what's maybe one thing that you think

54:34

we're underthinking right now as a

54:36

society in terms of the impact of these

54:38

models? Yeah, I I I think getting to a

54:42

point where so much intellectual work um

54:45

can be automated I think comes with

54:49

pretty big problems that I don't think

54:51

have obvious solutions. One natural is a

54:54

question of jobs and you know

54:56

concentration of wealth and I suspect

54:59

this requires like real policy maker

55:02

involvement. Yeah, I've heard some kind

55:04

of optimistic takes on how is this

55:06

resolved, but I think I think at a at

55:09

fundamental level it does seem like you

55:10

know some things that like used to be

55:13

very valuable used to kind of cost a lot

55:15

and used to provide something like now

55:16

can be done pretty cheaply and you know

55:19

in the long term it should be a good

55:20

thing but I think it does lead like I

55:22

think it can happen quite quickly.

55:25

Um

55:27

and there is a related question of

55:30

you know you really can like if you

55:32

actually have you know an automated

55:36

research laboratory an automated company

55:37

that can do so many things like it can

55:39

be controlled by a very small number of

55:40

people right it can be it can do a lot

55:42

right and this gets this gets you know

55:45

even more crazy when you have robots but

55:46

but you don't need to have robots and

55:48

you know I think figuring out like what

55:50

does governance of such things looks

55:51

like look like right like what are these

55:53

like organizations that like so powerful

55:55

and yet maybe made of like only a couple

55:58

of people like what how to think about

55:59

these things I think is uh it's a new

56:01

question we have to grapple with our

56:02

society when speaking of other new

56:03

questions one thing that's very top of

56:05

mind for me I I recently had a kid and

56:06

I've been thinking a lot about like you

56:08

know what is his life going to look like

56:10

in in 10 years um you're really close to

56:12

this stuff how has your work on on on AI

56:15

changed the way you think about like the

56:17

way in in which you know this next

56:19

generation should be raised

56:21

>> a task for all of us right is to build

56:24

the AI right build a world in a way

56:26

where uh you know at the end of the day

56:28

humans have the agency right humans set

56:30

the the direction right and you know

56:32

maybe a lot of the

56:34

the technical challenges that we cherish

56:36

right now will become more of a you know

56:39

past time that's something that we

56:40

really kind of like needs to do in order

56:42

to make progress and and the challenges

56:44

will be more and like figuring out like

56:45

what are the things that are important

56:47

what are the things we should go do you

56:48

know I think that that will still be you

56:51

know I think I think you know in that

56:54

world like people can end up with you

56:56

know more things to do and definitely

56:58

more more exciting things to to do and

57:00

you know I think I think you still want

57:01

like to have an understanding of you

57:04

know of like uh you know some

57:06

understanding of like you know

57:07

technology like all all the kind of like

57:10

uh basic you know education however you

57:12

want to acquire it for the sake of being

57:14

able to think about these problems.

57:15

>> Well this has been fascinating man I

57:16

really appreciate you sitting down and

57:17

and talking about so many different

57:19

things. Um, I want to make sure to leave

57:20

the last word to you. Like anything you

57:23

uh want to point our listeners to,

57:24

whether it's research you're doing or

57:26

products you're excited about or really

57:28

anything you'd like to uh to plug uh the

57:30

floor is yours. Um, you know, anything

57:32

I'm sure there's tons of threads people

57:33

want to uh pull out of this

57:35

conversation.

57:35

>> I think the set of problems we just

57:37

discuss, right, and also the questions

57:40

around alignment, monitorability, I I I

57:43

think I think those are growing to be

57:45

very urgent challenges. And I don't

57:47

think there are challenges only for AI

57:49

researchers, right? I think there are

57:50

challenges challenges for policy makers,

57:53

but also also just things we have to

57:55

think through as a society and uh yeah,

58:00

I I'm you know, I'm happy to see some

58:02

discourse starting to arise and I I

58:04

think we need more of it.

58:05

>> Yeah. Well, I thought I could talk to

58:06

you for hours more, but I'd be doing the

58:08

world a great disservice by keeping you

58:09

from your actual work of continuing to

58:10

improve these models. Thank you so much

58:12

for doing this. This was a ton of fun.

58:14

>> Thank you. I'm Jacob Efron and this has

58:15

been Unsupervised Learning, a podcast

58:17

where I get to talk to the smartest

58:19

people in AI and ask them tons of

58:21

questions about what's happening with

58:23

models and what it means for businesses

58:24

in the world. As I hope is clear, I have

58:26

a ton of fun doing this. It's a nights

58:28

and weekends project in addition to my

58:30

day job as an investor at Redpoint. But

58:32

our ability to get these incredible

58:33

guests on really comes from folks like

58:35

you subscribing to the podcast, sharing

58:37

it with friends. It's really what

58:39

ultimately makes this whole thing work.

58:40

And so, please consider doing that. And

58:42

thank you so much for your support and

58:43

listening. We'll see you next episode.

Get the TLDR of any YouTube video

Transcribe, summarize, and repurpose videos in 125+ languages — free, no signup required.

Try YouTLDR Free