Full Transcript

·YouTLDR

The Friction is Your Judgment — Armin Ronacher & Cristina Poncela Cubeiro, Earendil

18:233,657 words · ~18 min readEnglishTranscribed Apr 19, 2026
0:15

morning. Thanks for having us. Um, today

0:17

I want to talk with Christina about

0:19

friction a little bit. Um

0:23

this is um a a social preview that came

0:28

up automatically when someone submitted

0:30

an issue um to

0:34

um basically there was this is a forum

0:36

post that goes with um a security

0:38

incident that was deployed accidentally.

0:40

It was a configuration change that

0:42

caused a problem and the social preview

0:44

post had the marketing tagline of that

0:46

company which said ship without

0:48

friction.

0:49

Um, and we want to encourage to add a

0:53

little bit of friction to it. Um, and

0:56

I'll tell you why. So, who are we? Um,

0:59

I've been doing software development for

1:01

20 years, most of it in the open source

1:03

space. Um, I have created Flask, which

1:06

is a Python framework, which ironically

1:08

is so much in the weights that a lot of

1:10

people um are learning about it now

1:13

because the machines are producing it.

1:15

Um, and I left my previous company that

1:18

I worked for, Sentry, in April last

1:19

year, which perfectly coincided with um,

1:22

me having time and then obviously Cloud

1:24

Code. And so I fell deep into a hole of

1:27

aicing engineering and I started writing

1:29

on my blog and and and a lot of people

1:31

reached out to me over the last year um,

1:33

being all excited about this. Um, and

1:36

then I started with a friend in October,

1:39

a company called Arendelle where we are

1:42

trying to make sense of all the AI

1:43

things. Um,

1:46

>> yeah, and my name is Christina and I

1:48

work with Armen at this company called

1:50

Arendelle. But importantly, I am what I

1:52

like to call a native AI engineer. And

1:55

what that basically means is that these

1:57

tools have been around longer than I

1:59

have. Um, so what this means is like

2:01

they've been super foundational in how

2:03

I've become a software engineer. Not

2:05

just because obviously I use them to

2:06

work, but also because this is the means

2:08

by which I've learned to do what I do.

2:11

And before Arendelle I was working at

2:13

bending spoons.

2:16

>> So we want to share a little bit from

2:18

practice not just theory but um I will

2:20

readily admit that I don't think we have

2:22

all the solutions. So we have been

2:24

building with or on agents for a good 12

2:26

months. Um we had huge leverage and

2:29

great disappointment and we we really

2:32

keep running into two types of problems.

2:34

Um I I think especially if you listen to

2:37

some earlier talks at at this conference

2:39

you will have learned a lot about um

2:41

that you should keep using your brain.

2:43

Um it's for some reason that's really

2:45

really hard. So there's a psychological

2:46

problem and the other one is the

2:48

engineering challenge is like the they

2:49

seem to be producing worse code for some

2:52

people and better code for some other

2:53

people and like what is it that actually

2:54

makes that work. Um and so this is

2:57

really not a solution as it is our part

2:59

of the journey of how we think so far we

3:01

have managed. Um yeah, so problem number

3:06

one is the psychology part which is like

3:08

why is it even though everybody told you

3:10

many times over that you should be using

3:11

your brain, you should be slowing down,

3:13

it's actually incredibly hard. It's just

3:14

one more prompt and and we don't sleep

3:16

that much. Like what is it that actually

3:18

makes it so hard? And then would it be

3:21

that hard if the machines would actually

3:22

be writing perfect code and we wouldn't

3:23

have to think quite as much and like

3:25

what is it is there something we can do

3:27

to make this a little bit better?

3:29

So I'll begin by introducing the first

3:31

part of these problems, the psychology

3:33

problem. And what I want to talk first

3:35

about is the shift. So I'm sure a lot of

3:38

us here who have been playing with these

3:39

tools for a while now experienced this

3:41

at some point. We were prompting

3:43

prompting not so good and then at some

3:45

point suddenly it clicked and they were

3:47

really really useful for us and it was

3:50

fun in the beginning and they gave us a

3:52

lot of extra time right because not

3:53

everyone was using them. They were

3:55

actually tools that made us more

3:56

productive, that made it more fun to do

3:58

our jobs. But very quickly, because they

4:00

were so useful and they got us so

4:01

hooked, everyone was using them. And so

4:03

this kind of had the opposite effect

4:06

where suddenly the baseline expectation

4:08

was just that everyone is now using them

4:10

and you have to use them. And so this

4:12

this fun and free time translated into

4:15

pressure. Now we all have to ship faster

4:17

and produce more code. And it is just

4:20

not sustainable to review and to

4:22

actually have time to think.

4:25

And so this leads us to the trap and I

4:28

actually think there's two parts of this

4:30

problem of this trap and one of them a

4:32

lot of engineers have spoken about and

4:33

it's that these tools are super

4:35

addictive. You never know if that next

4:38

prompt is going to be the one that makes

4:40

your product work and you've added a new

4:41

feature or if it's going to be that last

4:43

drop of slop that brings your product

4:45

crashing down. And so it's very

4:48

addictive. We keep doing what we're

4:49

doing. It's not a great solution. But

4:52

also most importantly, and I don't think

4:53

we realize this as much is that because

4:55

we produce a lot of output very fast, we

4:58

are tricked into thinking that we're

4:59

actually being more efficient doing more

5:01

work. And this is quite the opposite

5:03

because now we don't have as much time

5:05

to actually stop and think and design

5:07

what we're doing. Ask ourselves, is this

5:08

the best way in which I can implement

5:10

this or could I be some doing something

5:12

better? And when you're in this flow,

5:15

it's very difficult for yourself to stop

5:17

and it's definitely very difficult for

5:18

your agent to stop because it's running

5:20

around and it's reading files that it

5:22

should have never even read. So we are

5:24

the ones that need to actually have the

5:26

agency to be in control here.

5:29

>> And one thing that from a if you start

5:32

scaling this from like one person to an

5:34

engineering team that actually took me

5:35

quite a while to realize is that it

5:37

really changes the composition of the

5:39

engineering team. We we were really

5:41

supply constrained by creation of code

5:43

and so like the balance between writing

5:45

code and reviewing code and engineering

5:47

teams was usually quite decent. Now

5:49

every engineer has a multitude of

5:52

producing power compared to their

5:54

reviewing power and so obviously we are

5:56

piling up on poll requests but we are

5:58

also slowly starting to expand the total

6:01

amount of humans in an organization that

6:03

are participating in engineering

6:04

process. I talked to a lot of engineers

6:06

over the last year and increasingly the

6:08

one of the things that came up is like

6:10

now I have marketing people shipping

6:11

code. I have um former CEOs sh CEOs that

6:16

used to be like engineers now shipping

6:18

code again. And so the the roles that

6:21

those people have in the companies also

6:23

doesn't give them there's not that much

6:26

um um the responsibility doesn't rest in

6:29

them. The the responsibility still rests

6:31

with the engineering team. And so the

6:34

the total number of entities both humans

6:36

and machines that are participating in

6:37

the code creation process outnumbers the

6:39

ones that can carry responsibility.

6:41

We're not there where the machine can be

6:42

responsible for the code changes. And so

6:44

that has led to more and more code

6:46

reviews being skipped being rubber

6:47

stamped. Um and on the goal to small PRs

6:51

that that we want to see again so that

6:53

this reviewing process goes um this

6:55

amplification is something that at the

6:57

very least we need to recognize.

6:59

And so when you get this pull request

7:02

that looks really daunting and has 5,000

7:04

lines of code in it, this is actually

7:05

when you should be thinking and that's

7:06

exactly when it's the most overwhelming

7:09

and and increasingly we're tapping out

7:10

of this.

7:13

On the engineering side, what we're

7:15

doing is we are creating larger pull

7:18

requests. We're creating these massive

7:20

changes because it is free now, right?

7:23

And the if you think about how the

7:25

agents work, they're really optimized to

7:27

creating code that runs. Like their main

7:29

objective is write some code, run the

7:32

tests, make some progress. The

7:33

reinforcement learning sort of gets this

7:35

in. And so the the agents are writing

7:37

kind of code that is is when you as a

7:41

human as an software engineer start

7:43

learning how to write code you wouldn't

7:45

necessarily write. So for instance, you

7:47

see quite a bit of code that tries to

7:49

read a config file and if it doesn't

7:50

read a config file, it loads some

7:51

defaults. And as an engineer, you know,

7:53

that's actually not great because I

7:54

might not notice that I'm reading

7:56

reading the default config file. And so

7:58

I might only discover that I have a

8:00

massive problem after two hours when I

8:03

already wrote database records with

8:05

wrong data. And so these machines, they

8:08

they optimize towards making progress to

8:10

shipping stuff to like unblocking

8:12

themselves. And as a result, they're

8:14

creating many more failure conditions

8:15

than human written code normally would

8:17

do. in parts is because you as a human

8:19

feel a little bit of a you feel bad when

8:22

you write code like this. There's

8:23

there's something that sort of builds up

8:24

emotionally in yourself, but the agent

8:26

doesn't have a reason for this. It it

8:28

doesn't feel anything. And so if you if

8:31

you create these services that are sort

8:33

of hobbling along and they're actually

8:34

willing to to recover from local

8:36

failures, you actually create very very

8:38

brittle systems. And this also means

8:42

that you're very quickly creating a

8:44

codebase of the size and complexity that

8:45

the agent itself can no longer dig

8:47

itself out from. It's going to start no

8:49

longer reading all the files that it

8:50

should. It's it's creating code in a new

8:52

file that has already done somewhere

8:54

else. And so this this entire machinery

8:58

over time creates much more entropy in a

9:00

source code than you would normally have

9:03

if if humans were on it. And a big part

9:05

of this is that humans feel bad and

9:07

agents don't really have any emotions

9:09

that they communicate to you.

9:11

>> But as Armen likes to say, don't worry,

9:14

not all is lost. We have s found some

9:16

correlation between what the agents

9:18

really excel at doing and the types of

9:20

code bases that we actually put them to

9:22

work into. And for example, the main

9:24

example here is libraries versus

9:26

products. What we found is that for

9:28

libraries, they tend to excel a lot

9:30

more. And this makes sense because

9:31

intrinsically when you're building a

9:33

library, you tend to have a very clearly

9:34

defined problem that you're trying to

9:36

solve. And most of the time you can even

9:38

map the set of features that you want to

9:40

build to the API service and it has very

9:43

tight constraints. And because this is

9:45

something that you probably want to

9:46

build on top of or make accessible to

9:48

other people, it's likely that it's

9:50

going to be a very simple core in which

9:52

you can then plug into. And on the other

9:54

hand, products and perhaps this is a bit

9:56

more unlucky for the rest of us because

9:58

we all probably are more into building

9:59

products. Uh it's much harder because

10:02

there are so many interacting concerns

10:04

and components like for example you have

10:06

your UI, your API response. You have

10:08

different permissions depending on the

10:10

feature flags, the billing and so on.

10:12

And so there's this very heavy

10:14

intertwining between different

10:15

components. And what this means is that

10:17

for the agent itself, it's impossible to

10:19

fe fit all of this into its context

10:22

window. it has no way to actually

10:24

understand the entire global structure

10:26

and so locally the agent tends to be

10:28

very reasonable but when it gets to the

10:31

global scale it becomes a bit demented.

10:34

So what we're proposing here is that

10:36

just as you would do with any type of

10:38

system design in the past, your codebase

10:40

has now become infrastructure and as

10:43

such you have to design it in the way so

10:45

that it is also legible for the agent

10:47

and it can make the most of it.

10:51

And so this is what we're proposing is

10:53

an agent legible codebase and one of the

10:56

main points that is very clear to all of

10:58

us I'm sure is modularization. So like

11:00

we have different components and this

11:02

makes it easy for the agent to add one

11:04

feature in one spot without corrupting

11:06

everything else. But importantly this

11:07

also means modularizing your code flow

11:09

itself. So for example I've been working

11:12

on some refactoring. We're building

11:13

somewhat of an AI assistant. And for me

11:16

it was super important to understand

11:18

which steps of my code are actually like

11:20

the main points. So say like you get

11:22

user message then I pass the message to

11:24

the agent loop and then I have to deal

11:26

with the output. And this is where these

11:30

points are very clearly defined for me.

11:31

So the code was not as messy. But it

11:34

happens to be that between these points,

11:35

between these steps, that's where the

11:37

agent tends to add the most fuzz. So it

11:39

will be parsing between different types.

11:41

It's adding things to state that

11:43

shouldn't be in state. And so you end up

11:45

with these behaviors that you didn't

11:46

want to support and that are unexpected

11:48

and can be quite dangerous. Another

11:51

point is trying to follow all of the

11:53

known patterns because I think we all

11:55

know by now there's no point in fighting

11:57

the RL the reinforcement learning. The

12:00

more we can lean into it the better that

12:02

our output is going to be and it's also

12:04

more scalable down the line. Then as

12:07

mentioned with libraries like if you

12:08

have a simple core and you push the

12:10

complexity to other abstraction layers

12:12

then it's going to be easier for

12:14

yourself and the agent to be able to

12:15

read your codebase and no hidden magic.

12:18

So for example here uh using react

12:21

server actions or using OM instead of

12:23

rorowsql what this does is that it hides

12:26

intent from the agent and if the agent

12:28

can't see something it can surely not

12:30

respect it

12:32

and so to be more precise these are the

12:35

examples of mechanical enforcement that

12:37

we have been using at the company and

12:40

most of these we actually achieve with

12:42

uh linting rules. So the main example

12:44

would be no bare catch holes. Great.

12:48

Imagine that there's an example here.

12:50

The agent found the very catch all and

12:51

was like, "Oh no, this is bad. Edited

12:54

it." But yeah, so we also try to have

12:58

our SQL uh always in one query interface

13:01

so that the agent doesn't have to go

13:02

hunting around the codebase finding all

13:04

of the different places because if it

13:06

misses one then you can have breaking

13:07

behaviors and again that's dangerous. We

13:10

try to have one primitives components

13:12

library for the UI and not have any raw

13:14

for example input uh input boxes. Uh so

13:17

that it's we always have one type of

13:19

styling. It's very consistent one kind

13:21

of behavior. We don't have any dynamic

13:23

imports. And this may not sound as

13:26

important but actually we enforce unique

13:28

function names. And the reason for this

13:30

is not just more legibility for you and

13:31

the agent, but it's actually also the

13:33

token efficiency. So if your agent is

13:35

gripping for a specific feature or

13:37

something in your codebase, if it only

13:38

gets one output, it's going to be much

13:40

better at continuing with the loop. And

13:43

we've started exploring something

13:45

recently called erasable syntax only

13:47

TypeScript mode. And what this does is

13:49

that your code is basically JavaScript

13:51

and it has the type annotations on top.

13:54

And this means that there's no

13:55

transpiling direction because there's

13:57

one source of truth between your actual

13:59

code and the compiler. And so when the

14:02

agent is looking for errors, it doesn't

14:03

have to have this like confusion of oh

14:06

my god, where am I looking at? It is

14:08

much better at finding them.

14:11

And so the goal really is get in this

14:15

loop somehow like get the agent to

14:17

produce as good code as it can, but you

14:19

really need to find a way to feel the

14:21

pain that the agent doesn't feel and you

14:24

need to be woken up in a way when you

14:27

should be looking at this. And one of

14:28

the things we have been doing is we

14:29

build a PI extension for our review

14:31

needs where we are separating out the

14:34

kind of input that normally would go

14:36

back to the agent. So this is mechanical

14:38

bugs. It is where it clearly violated

14:41

the agents MD. Um but then we

14:44

specifically call out the kind of

14:45

changes where the human's brain should

14:47

reactivate, right? It's like we don't

14:49

think that the database migration should

14:51

ever go in without the human making a

14:52

judgment call on this because it very

14:54

much depends on the locks, the size of

14:55

the data in production. Um if there are

14:58

permissioning changes, you better think

14:59

about this themselves rather than the

15:00

agent because they can be they can be

15:02

underdocumented.

15:04

Just some examples where we learned if

15:07

we miss it, we regret it. Um and you

15:11

will miss it. But this these machines

15:13

can help you find this and then you see

15:15

this and then you actually get a little

15:17

bit of a hit like, oh now now I have to

15:19

kick into gear and do something here. Um

15:22

this is what this looks like in pi. Um

15:25

you have the um on the bottom you have

15:27

the human call outs on the top you have

15:30

what is go what basically if you were to

15:32

end this review and say like fix the

15:34

issues the the agent would go back and

15:35

automatically act on the first two um

15:38

but but this is the moment where I will

15:40

now go and see like is this a dependency

15:41

I actually want to have in this codebase

15:43

like do I like the maintainers is this

15:45

does this work for me

15:48

and we obviously like the speed like

15:51

this is addictive it is great we feel

15:53

there's a lot of productivity

15:54

But it is so devious if you start

15:57

relying on it speed where you really

15:59

shouldn't. And so I can only encourage

16:02

you to find the areas where you you have

16:04

this feeling that this is actually net

16:05

positive. For me a lot of this is

16:08

reproduction cases like when a customer

16:10

reports an issue I can I can have the

16:11

age and reproduce this perfectly and I

16:14

have a really good starting point

16:16

exploring different type of product

16:17

directions for as long as you commit

16:18

yourself to doing this uh with the code

16:20

that it generates. Um all of this is

16:23

great but on the other hand system

16:24

architecture creating reliability in the

16:26

system they're not just very good at

16:29

because we really still have to go slow.

16:31

It's there is so much mess that can

16:33

appear in a codebase in so little time.

16:35

Mario was already talking about this

16:36

earlier but like we forget that we

16:37

producing months and months of technical

16:39

debt in the in in a time of weeks in a

16:42

time of days sometimes and it becomes so

16:45

much harder to actually understand

16:46

what's going on as codebase. the when

16:48

the understanding of your own code

16:50

drops, it is really really hard and it's

16:53

also psychologically hard. I've found

16:55

some code pieces that actually didn't

16:57

work in production and I was kind of

16:59

frustrated learning that I was the one

17:00

that committed it with the agent and

17:02

just didn't really see that. It's it's a

17:04

very disappointing experience when it

17:06

happens and then you realize that you

17:07

actually were the one that screwed up.

17:09

Um, and so it is it is psychologically

17:13

incredibly hard to to really judge

17:15

objectively the state of the codebase.

17:18

And the only way right now is to really

17:20

slow down a little bit on on that front

17:24

and this this friction. I know that

17:26

friction like every engineering team

17:28

I've ever worked at said like we need to

17:29

get rid of the friction in shipping and

17:31

and that is true. Like there's a lot of

17:33

stuff that's very very annoying and

17:35

shouldn't be there. But if you have

17:36

worked on large enough engineering work,

17:38

SLOs's are a great system that is

17:40

intentionally designed to put friction

17:41

into the engineering process to make you

17:43

think, do I need this reliability? Do I

17:45

need this criticality of the service? Am

17:48

I sufficiently staffed to run it? And

17:49

with the agents, we have now gotten this

17:52

idea that we should get rid of all of

17:53

this when in all reality we need of it.

17:56

Um because the friction actually in many

17:59

ways is what's necessary on a physical

18:01

level to steer. like without friction

18:03

there's no steering and and that is

18:05

really necessary. Um so you should you

18:08

should put a little bit more of a

18:10

positive association to this idea of

18:12

friction. Um because this is really

18:14

where your judgment is. This is where

18:15

your experience is and you should be

18:17

inserting that and start feeling it.

18:19

Thank you.

18:20

>> Thank you.

Get the TLDR of any YouTube video

Transcribe, summarize, and repurpose videos in 125+ languages — free, no signup required.

Try YouTLDR Free