Full Transcript

·YouTLDR

The Multi-Agent Architecture That Actually Ships — Luke Alvoeiro, Factory

18:201,434 summary words · ~7 min readEnglishTranscribed Jun 24, 2026
Summary

To overcome the bottleneck of human attention, software engineering must transition to multi-agent ecosystems called 'Missions' that manage long-running tasks asynchronously through structured state-sharing, serial execution, and rigorous adversarial validation.

Standard single-agent code assistants quickly drift and generate technical debt, whereas a highly-structured, multi-agent architecture with pre-code validation contracts allows systems to run autonomously for days or weeks while actually improving codebase quality.

Section summaries

0:00-1:00

Introduction & Background of Goose and Factory

optional

Luke Alvoeiro introduces himself and his background in developer tools, specifically highlighting his work at Block where he initiated the open-source coding agent Goose (now donated to the Agentic AI Foundation). He transitions into his current role leading the core agent harness at Factory. He states Factory's mission: delivering autonomy across the entire software development life cycle (SDLC).

  • Luke Alvoeiro created Goose, a leading open-source coding agent.
  • Factory's core focus is building highly autonomous agentic SDLC pipelines.

It provides useful speaker context and project background but does not contain the core architectural patterns.

1:00-2:00

The Human Attention Bottleneck in Software Engineering

watch

Luke argues that human attention, rather than LLM intelligence, is the primary bottleneck in contemporary software engineering. Even the strongest developers cannot push more than a couple of features simultaneously because they are constantly context-switching, supervising tasks, and conducting manual reviews. He poses a vision where humans decide the 'what' and an agent ecosystem handles the multi-day 'how' autonomously.

  • Modern models are smart enough to implement large backlogs, but fail due to a lack of autonomous supervision frameworks.
  • The software engineering bottleneck has shifted from raw intelligence to human attention limits.

Crucial for understanding why multi-agent orchestration must evolve past simple chat setups.

2:00-4:00

A Five-Part Taxonomy of Multi-Agent Interaction

watch

Luke presents a clean, five-part taxonomy of multi-agent patterns: delegation, creator-verifier, direct communication, negotiation, and broadcast. He breaks down the pros and cons of each, noting that while delegation is simplest, direct communication often suffers from fragmented state without a central coordinator. He highlights creator-verifier as crucial for removing implementation bias, and broadcast for maintaining long-term state coherence.

  • The five pillars of multi-agent systems are Delegation, Creator-Verifier, Direct, Negotiation, and Broadcast.
  • Direct peer-to-peer communication between agents without a central coordinator easily fragments state and loses truth.
  • Adversarial creator-verifier loops are essential because creators carry inherent cost-bias regarding their own code.

Provides the structural building blocks used to compose the broader 'Missions' architecture.

4:00-6:00

The 'Missions' Architecture and Three-Role System

watch

Luke introduces 'Missions,' Factory's ecosystem for long-running agent tasks that combines delegation, creator-verifier, broadcast, and negotiation. It operates asynchronously via structured handoffs, utilizing a clean, three-role agent architecture: Orchestrators, Workers, and Validators. The Orchestrator plans and outputs a validation contract, the Worker implements code in a completely isolated context, and the Validator verifies behavior.

  • A Mission is an ecosystem of agents executing via structured handoffs and shared state rather than a single session.
  • The three-role architecture splits responsibilities among Orchestrator (planning), Worker (clean-slate code implementation), and Validator (verification).
  • Workers start with clean context on every feature, eliminating accumulated baggage and degraded attention.

This is the core architectural blueprint of the entire presentation.

6:00-8:00

Rethinking Adversarial Validation and User-Testing Loops

watch

This section explains how to prevent drift in multi-day agent execution. Instead of relying on post-hoc testing, which merely confirms developer decisions, they write validation contracts prior to coding. They employ two kinds of validators at milestone boundaries: scrutiny validators (linters, test suites, and dedicated code-review agents) and user-testing validators. The user-testing validator acts like a QA engineer, spawning the application and using simulated computer inputs to verify complete, end-to-end user flows.

  • Validation contracts must be written during the planning stage, completely independent of implementation, to prevent code-drift.
  • Scrutiny validators run test suites and spin up isolated code-review agents for each completed feature.
  • User-testing validators interact with live applications using computer use, which consumes the majority of a mission's wall-clock time.

Crucial for engineers looking to design robust, self-correcting execution harnesses that prevent agent drift.

8:00-9:00

Structured Handoffs and System Self-Healing

watch

To prevent agents from losing context during multi-day tasks (with runs lasting up to 16 days), Factory mandates structured handoffs. When a worker completes a feature, it documents its actions, command exit codes, issues, and deviations. This data is fed directly to the orchestrator at milestone boundaries, allowing the system to catch errors and programmatically spin up corrective sub-tasks.

  • Structured handoffs force agents to explicitly write down state, avoiding reliance on native context memory.
  • Errors are programmatically caught and corrected at milestone boundaries, creating a systematic self-healing loop.

Explains the concrete state-management pattern that enables multi-week autonomous runs.

9:00-11:00

Serial Execution vs Parallel Pitfalls & Mission Control

watch

Luke explains why running writing agents in parallel fails in software development due to git conflicts, duplicate work, and divergent architectural paths. To solve this, Missions execute worker and validator tasks serially (one writer at a time) but leverage internal parallelization for read-only operations like repository searching, API research, and code-reviews. He introduces Mission Control, an asynchronous dashboard that allows human project managers to monitor progress, track budgets, and read handoffs.

  • Parallelizing active write operations leads to high coordination overhead, git conflicts, and inconsistent codebase architecture.
  • Safe parallelization in agentic systems should be isolated to read-only tasks like searching, code reviews, and API research.
  • Mission Control provides an asynchronous interface for human oversight of budgets, agent state, and validator outputs.

Counters a common industry assumption about agent parallelization with clear, empirical engineering constraints.

11:00-13:00

Droid Whispering: Composing Heterogeneous LLM Teams

watch

Luke introduces 'droid whispering,' the discipline of choosing different model families for specific roles based on their performance characteristics. Since planning, implementing, and validating require vastly different cognitive behaviors (e.g., slow reasoning vs fast code fluency vs precise instruction following), matching the right model to the right seat is a major advantage. This model-agnostic approach also prevents a validation loop from sharing the same training biases as the implementation worker.

  • No single model provider is the best across planning, implementation, and validation roles.
  • Using a distinct model family for validation prevents the validator from inheriting the same training-data biases as the implementing agent.
  • Rigorous verification structures and milestone checkpoints allow open-weights models to perform reliably.

Directly addresses how platform engineers should select and mix model APIs for optimal system performance.

13:00-15:00

Production Metrics & Building for the Bitter Lesson

watch

Luke shares empirical data from building a Slack clone using the Missions framework. He notes that 60% of execution time and tokens are spent on implementation, and validation rarely passes on the first try. To ensure the framework scales with model updates, Factory defines almost all of its orchestration rules inside system prompts and skills (about 700 lines of text) rather than hardcoding complex state machines.

  • Empirical runs show validation almost never passes on the first try, underscoring the absolute necessity of automated QA loops.
  • Defining orchestration rules in declarative prompts and skills future-proofs the framework for next-generation models.

Provides real-world validation data and critical advice on decoupling system logic from code.

15:00-18:00

The Changing Economics of Software Teams & Conclusion

optional

Luke discusses how autonomous agent architectures shift software team economics, allowing a small team of five developers to comfortably manage 30 concurrent workstreams. He highlights that because the agent ecosystem enforces validation contracts and clean git commits, the codebase actually ends up cleaner and with higher test coverage than when it started. He closes by encouraging viewers to run missions via the Droid platform.

  • Multi-agent architectures shift the human developer's role from manual execution to strategic architecture and scoping.
  • Rigorous multi-agent frameworks produce codebases with higher test coverage and fewer technical regressions than manual code generation.

Mainly focuses on high-level economic impacts and the final call to action.

Key points

  • The Validation Contract as an Anti-Drift Guardrail — To prevent agents from writing self-confirming tests that mask bugs, a 'validation contract' of assertions must be programmatically defined during the planning stage before any implementation code is generated.
  • Serial Execution Prevents Architectural Fragmentation — Running multiple coding agents in parallel leads to merge conflicts, duplicated effort, and architectural drift; running workers serially with targeted parallelization only for read-only tasks yields far superior results.
  • Droid Whispering and Heterogeneous Model Composition — No single model provider dominates in planning, implementation, and validation, requiring engineers to intentionally map specific models (e.g., reasoning-heavy models for planning, fast fluent models for coding) to matching roles.
  • Prompt-Driven Orchestration Future-Proofs the SDLC — Hardcoded state machines struggle to scale, so multi-agent orchestration is best defined in declarative prompts and skills (e.g., a 700-line system prompt) that adapt automatically to model upgrades.
The bottleneck in software engineering nowadays is not intelligence. It's now limited by human attention. Luke Alvoeiro
Tests written after implementation don't catch bugs. They confirm decisions. Luke Alvoeiro

AI-generated from the transcript. May contain errors.

0:07

[music]

0:15

>> Hi everyone. My name is Luke and my goal

0:18

is that 20 minutes from now you'll be

0:20

able to assemble agent teams that can

0:22

complete tasks orders of magnitude

0:24

harder than what you can complete with a

0:25

single agent today.

0:27

A little bit about me. So

0:30

I come from a background in dev tools.

0:32

About 2 and 1/2 years ago I started a

0:34

project at Block which is where I was

0:36

working at the time. And that project

0:38

evolved into Goose.

0:40

Goose is now one of the leading coding

0:43

agents is open source

0:45

and it's recently was was donated to the

0:47

AI

0:48

agentic AI Foundation. So it's been

0:51

really cool to see.

0:52

Now nowadays I work at Factory where I

0:55

lead our core agent harness and

0:57

Factory's mission is to

0:59

bring autonomy to the entire software

1:01

development life cycle.

1:04

So I want to start off with a claim.

1:06

The bottleneck in software engineering

1:07

nowadays is not intelligence. It's now

1:10

limited by human attention.

1:12

Even the best engineers can only

1:14

complete a couple of tasks at a time.

1:17

They may have a backlog of 50 features

1:19

but they can only drive a few forward

1:21

per day because every task requires

1:23

their attention. Every commit needs

1:25

their review.

1:26

Today's models are smart enough to

1:28

figure out all 50 of these tasks but

1:30

there's not enough uh just bandwidth to

1:33

supervise their implementation.

1:36

So we kept asking ourselves what if a

1:39

human decides what to build and then a

1:41

system figures out how to do so. Right?

1:43

An agent could just work for hours for

1:45

days and you come back to finish work.

1:47

So that's what I'm here to talk about.

1:50

When you start researching multi-agent

1:52

frameworks and systems you quickly

1:54

realize that the field's a bit of a

1:55

mess. Everyone has their own framework,

1:58

their own terminology, their own

2:00

opinions of what works and doesn't work.

2:02

And so I want to propose a simple

2:04

taxonomy. There's five frontier

2:06

multi-agent frameworks.

2:07

One is delegation. Right? This is where

2:09

one agent spawns another agent and the

2:12

parent agent may say go figure out the

2:14

database schema and then gets a response

2:16

back.

2:17

This is the simplest form of multi-agent

2:19

communication as what most people

2:21

implement first. You have you know sub

2:24

agents and coding tools are the most

2:26

common example.

2:28

The other one is creator verifier.

2:30

Right? Where one agent builds something

2:32

and then you have another agent that

2:33

checks that work.

2:35

And the key here is like a separation of

2:37

concerns. The parent the the agent that

2:39

implemented the the code is has some

2:42

cost bias. Right? Wants that code to

2:43

work.

2:45

A fresh agent with fresh context is way

2:46

more likely to find issues and this is

2:48

why we do code review as humans as well.

2:52

Another one is direct communication.

2:54

This is when agents communicate without

2:56

a central coordinator. Right? It's the

2:57

kind of like DMing each other.

3:00

It's hard to get right though because

3:02

state fragments across conversations

3:04

without that coordinator and there's no

3:07

single source of truth.

3:09

The next one is negotiation. Right?

3:11

Negotiation is when agents communicate

3:15

but over a shared resource. So that may

3:17

be you know they want to use the same

3:18

API. They want to modify the same

3:21

portion of the code base.

3:23

But negotiation doesn't need to be

3:24

adversarial. In fact the best use case

3:26

is when there is

3:28

net positive sum trading. Right? And

3:30

that's

3:32

when agents have like a potential

3:34

win-win situation while interacting. And

3:37

then the last one is broadcast and that

3:39

is when one agent sends information to

3:40

many.

3:41

Think of it like you know status

3:43

updates, new context that applies to

3:46

everyone, you shared constraints.

3:48

It's a bit less flashy than the other

3:51

ones but it's critical for maintaining

3:53

coherence over long-running tasks.

3:56

And so when you have all of these

3:57

different building blocks how do you

4:00

assemble that into a system that can run

4:02

for many days?

4:03

So missions is our answer. It's a system

4:06

that combines four of those. Delegation,

4:08

creator verifier,

4:10

broadcast and negotiation

4:12

into a single workflow. You describe a

4:15

goal.

4:16

You scope that through a conversation.

4:18

You approve a plan and then the system

4:20

handles execution for hours or days and

4:23

that enables you to focus on something

4:25

else.

4:27

Notably a mission is not a single agent

4:29

session. It's an ecosystem of agents

4:31

that communicate through structured

4:33

handoffs and shared state.

4:36

It uses a three-role architecture.

4:38

There's orchestrator, there's workers

4:40

and then there's validators.

4:42

The orchestrator handles planning. When

4:44

you describe what you want the

4:45

orchestrator is kind of like your

4:46

sounding board. Ask you the right

4:48

strategic questions. It

4:51

you know

4:52

checks out if there's any unclear

4:54

requirements in in the problem space and

4:56

then it eventually produces a plan that

4:58

includes features, milestones and then

5:00

something that's called a validation

5:01

contract. And that validation contract

5:04

defines what done sort of means before

5:07

any coding is done.

5:09

And I'll come back to why that matters

5:10

because it turns out to be really

5:11

important to the system.

5:13

The next role are workers. They handle

5:16

implementation.

5:17

When a feature is assigned to a worker

5:20

that worker has clean context, no

5:22

accumulated baggage, no degraded

5:24

attention. Right? The worker reads its

5:26

spec. It implements the feature and then

5:28

commits

5:30

by Git allowing the next worker to

5:32

inherit a clean slate and a working code

5:34

base. And then the last role are

5:35

validators. They handle verification.

5:38

And so most systems validate by maybe

5:40

running lint, type check, tests. Maybe

5:43

they do code review.

5:45

Missions does all of that but we also

5:47

validate behavior. Instead of just

5:49

asking you know does the code look

5:51

right? We wonder does this work end to

5:53

end? That's the difference that lets

5:56

lets missions run for many hours, many

5:58

days in a row without drifting. And

6:00

making it work had to involve sort of

6:03

rethinking validation entirely.

6:06

So

6:07

when you've worked with coding agents

6:09

before you've probably seen this pattern

6:11

where an agent builds a feature.

6:13

It writes some tests. The tests pass.

6:15

There's full coverage.

6:17

But the tests were sort of shaped by the

6:19

code not by what the code was attempting

6:21

to actually do.

6:23

Tests written after implementation don't

6:25

catch bugs. They confirm decisions. So

6:28

if you rely on validation like that your

6:31

system will eventually drift.

6:34

That's why this validation contract

6:35

exists. It's written during planning

6:38

before any code and it defines

6:40

correctness independently of

6:41

implementation. So for a complex project

6:44

this can be hundreds of assertions and

6:47

each feature is assigned one or more

6:48

assertions that it must satisfy.

6:50

The sum of all features must mean that

6:53

every assertion is covered.

6:57

After each after each milestone of

6:59

features we have two types of validators

7:02

that run.

7:03

So you have the scrutiny validator and

7:05

the user testing validator. The first

7:07

one

7:08

is more traditional. It runs the test

7:09

suite, type checking, lints and

7:11

critically it spawns

7:13

dedicated code review agents for each

7:15

completed feature within the milestone.

7:17

And then the second one which is the

7:19

user testing validator is more

7:21

interesting. It kind of acts like a QA

7:22

engineer. It spawns the application. It

7:25

interacts with it through computer use

7:27

or something similar to that. It fills

7:30

out forms, you know,

7:32

checks that pages render correctly,

7:34

clicks buttons and ensures that

7:36

functional flows work holistically.

7:38

So this step takes significantly longer

7:41

than the previous one of the scrutiny

7:43

validator

7:44

because the the system is interacting

7:46

with a live application. And what we've

7:48

noticed is that missions most of the

7:50

missions wall clock time is actually

7:52

spent here waiting for this like real

7:54

world execution to occur instead of

7:56

generating tokens.

7:59

Critically neither validator has seen

8:01

the code before.

8:03

They're not invested in the

8:04

implementation and so validation is

8:06

adversarial by design.

8:09

Okay. So then validation catches bugs.

8:11

Right? But for a system that runs for

8:14

many days you also need to make sure

8:15

that context isn't lost between the

8:18

agents.

8:19

When a worker finishes a feature it

8:21

doesn't just say I'm done.

8:23

It fills out a structured handoff

8:24

detailing what was completed, what was

8:27

left undone, what commands were run

8:29

throughout that that agent loop and what

8:32

were the the exit codes of those

8:33

commands.

8:35

What issues were discovered and did it

8:37

abide by the procedures that the

8:39

orchestrator defined for that worker?

8:43

That's how we catch issues and how the

8:45

system self-heals.

8:47

The errors get caught at milestone

8:49

boundaries. Corrective work gets scoped

8:51

and the mission sort of like pulls

8:53

itself back on track. Not by hoping that

8:55

agents remember what happened but by

8:57

forcing them to write it down and then

9:00

actually address issues and I'll I'll

9:03

present on that in just a sec.

9:06

Our longest mission ran for 16 days

9:08

which is much longer than a full sprint

9:10

and we believe that they can run for 30.

9:13

That's only possible because of the

9:14

structure.

9:17

So once we had this architecture the

9:18

next question became became how do we

9:21

actually run it? Right?

9:23

The most obvious choice is like

9:25

parallelism. If you have 10 agents

9:27

running at one point in time then you

9:29

have 10 times the throughput. But we

9:32

tried that and it doesn't really work

9:33

for tasks in the like software dev

9:35

domain because agents conflict. They

9:37

step on each other's changes. They

9:39

duplicate work. They make inconsistent

9:41

architectural decisions. And so the

9:44

coordination overhead ends up

9:46

eating up the speed gains all the while

9:48

you're burning tokens.

9:50

The difference with missions is that we

9:51

run features serially.

9:53

So there's only one worker or validator

9:56

running at any given point in time.

9:58

Within a feature, we allow for

10:00

parallelization on read-only operations.

10:03

So, you have something like

10:05

searching through the code base or

10:06

researching APIs, all that gets

10:08

parallelized. Within validators, we also

10:11

parallelize read-only operations such as

10:13

code review.

10:15

This is serial execution with with

10:17

targeted internal parallelization. It

10:19

seems slower on paper, but the error

10:21

rate drops dramatically, and when you

10:23

have tasks that run for many days, this

10:25

sort of correctness compounds.

10:29

Now,

10:30

your your standard chat interface

10:32

doesn't really work for something that

10:34

lasts many days. At a quick glance, you

10:36

need to be able to be able to see how

10:37

much of the project have you completed,

10:39

and what's what amount of the budget

10:41

that you originally like set off with

10:43

have you burned through.

10:45

So, using a mission actually, we built

10:47

mission control, which is a dedicated

10:49

view for this. You can see what does

10:51

what is active worker doing right now,

10:53

uh read off handoff summary is that

10:55

detail. What did the worker the

10:56

validator discover,

10:58

um how it's going to sort of like alter

11:00

its course moving forward.

11:03

Or,

11:04

you could just, you know,

11:06

go check out, um

11:08

go hang out with your friends that

11:09

night. This entire view lets you just

11:11

run missions asynchronously, and you

11:13

could be plugged in as a project manager

11:15

overseeing implementation, or you could

11:17

just, you know, go and and uh hang out

11:20

with your friends.

11:22

Okay. So, the right model in each role.

11:24

Um

11:26

everything here sort of assumes one

11:28

thing, and that is that you're using the

11:30

right model in each role. Planning

11:32

benefits from slow, careful reasoning,

11:35

implementation from fast code fluency

11:37

and creativity, validation benefits from

11:40

uh precise instruction following, right?

11:42

And so, no single model nor model

11:44

provider is best at all three of these.

11:47

Using systems like missions requires the

11:49

development of a new skill, which

11:51

internally we've been calling droid

11:52

whispering,

11:53

but it's this idea that you need to be

11:54

able to mentally model how different

11:57

LLMs interact, where they fail, how

11:59

those failures compound over a multi-day

12:01

run,

12:02

and then you need to make a deliberate

12:03

choice as to which model sits in which

12:05

seat.

12:06

Theo, the engineer who built our

12:08

missions prototype, came up with our our

12:10

model defaults, but we really encourage

12:12

people to make these uh their own and

12:14

customize them to the needs of their

12:15

project.

12:17

So, for example, validation might use a

12:19

different model provider entirely to

12:21

make sure that it's not biased by the

12:22

same training data.

12:24

This is a structural advantage of a

12:26

model-agnostic architecture.

12:28

You're only as strong as your weakest

12:30

link. And if you're locked into one

12:31

model provider, then you're constrained

12:34

by that family's weakest capability.

12:36

As models continue to specialize,

12:39

the ability to put the right model in

12:40

the right seat becomes a compounding

12:42

advantage.

12:44

It works in the other direction, too. If

12:45

you're using missions, the structure of

12:48

that can compensate for models that are

12:50

not quite at like the frontier level

12:52

performance. So, the validation

12:54

contracts, the milestone checkpoints,

12:57

they allow you to run missions very very

12:59

successfully even using open-weight

13:01

models.

13:04

Now, this all sounds quite theoretical.

13:06

What does it actually look like in

13:07

production?

13:08

I've got an example of building a clone

13:10

of Slack right here. This slide has a

13:12

ton of info, but I'll walk you through

13:14

just a few things that I want to call

13:15

out.

13:16

60% of our time is spent on

13:19

implementation,

13:20

and 60% of our tokens as well.

13:23

Notice how validation never succeeds on

13:25

the first go. That's in the mission

13:28

What's it?

13:29

The one on the bottom left. Um we almost

13:32

always have to create follow-up

13:33

features. So, it really demonstrates

13:35

like the value of a system that does

13:37

this QA loop.

13:38

You end up with with 50% of your lines

13:41

of code at the very end, in the bottom

13:42

right, being tests, and 90% of your uh

13:46

code is covered by those tests.

13:49

And lastly, we take advantage of prompt

13:51

caching heavily to make sure that we're

13:53

sort of offsetting

13:54

um

13:55

the the price of running such a long

13:57

task.

14:00

People have really taken to missions,

14:01

and it's been awesome to see what folks

14:03

have been building with them. Um some

14:06

examples I've included in this slide,

14:07

but ones that I want to call out are

14:09

specifically in the enterprise setting,

14:11

which is where Factory really shines. Um

14:13

they've been used to prototype new ideas

14:15

and features overnight, to um

14:18

make sure that people can uh build

14:20

internal tools at increasingly rapid

14:22

rates, to run huge refactors and

14:24

migrations, for ML search uh research,

14:27

sorry, and to modernize uh codebases so

14:30

that agents are more productive in them.

14:33

Um one thing that I wanted to talk about

14:35

was also this concept of like the bitter

14:38

lesson, because every person building

14:40

multi-agent systems has this fear of the

14:43

next model release sort of like making

14:45

their their architecture obsolete

14:47

overnight.

14:48

Um so,

14:50

when we were building missions, we

14:51

decided we had to make this system get

14:53

better with every model improvement.

14:56

This means that almost all of the

14:58

orchestration logic is defined in

14:59

prompts and skills,

15:01

um instead of like a hard-coded state

15:03

machine.

15:04

How it decomposes failures and um

15:07

or decomposes features and handles

15:08

failures is all in about like 700 lines

15:11

of text, and four sentences of this can

15:14

alter the execution strategy pretty

15:16

dramatically.

15:17

Worker behavior is driven by skills that

15:19

the orchestrator defines per mission, so

15:21

you get very customized behavior,

15:24

and the only deterministic logic is very

15:26

thin, and it's focused on enabling

15:28

models to do what they do best while the

15:30

system handles like the bookkeeping,

15:32

right? Stuff like running validation and

15:34

ensuring that progress is blocked when

15:36

there are some handoff issues that are

15:37

not addressed.

15:39

So, missions sort of ensure the the

15:40

discipline, and the models provide the

15:43

intelligence uh using primitives that

15:45

they're already familiar with, like

15:47

agents.md, skills, etc.

15:51

So, what does this unlock?

15:53

Remember the bottleneck that I started

15:54

off with? Human attention.

15:56

The economics are sort of changing.

15:58

Before, a team of five engineers might

15:59

be able to

16:00

uh work on 10 work streams at any given

16:03

point in time.

16:04

Now, maybe with missions, we can bring

16:06

that up to 30.

16:07

The team can focus on interesting

16:09

problems such as

16:11

uh the architecture, product decisions,

16:13

um instead of uh worrying about the

16:15

execution per se.

16:17

And the important thing is the codebase

16:20

ends up cleaner than when you started.

16:22

The end-to-end tests, the unit tests,

16:24

the skills, the structure that missions

16:26

provide uh means that agents and humans

16:29

are more productive in that environment

16:31

moving forward.

16:33

So, now that you understand how missions

16:35

are structured and how they actually

16:36

work, you can see that they're really a

16:38

composition of those original um

16:41

strategies, right? Delegation shows up

16:43

everywhere in how the orchestrator

16:45

spawns workers and how we spawn research

16:48

sub-agents. Creator-verifier is

16:50

fundamental in that validation and

16:51

implementation are always separate

16:53

agents with separate context. Broadcast

16:55

runs through the shared mission state

16:57

that every agent references, and

16:59

negotiation shows up at milestone

17:01

boundaries, where the orchestrator

17:02

defines, you know, does this does this

17:04

handoff summary sort of like look

17:06

correct? Do we need to create follow-up

17:08

features, rescope, etc.

17:11

But strategies aren't enough. You need

17:13

the connective tissue. You need uh these

17:15

structured handoffs so that agents don't

17:17

lose context, you need the right model

17:19

in each role, and you need an

17:20

architecture that will improve with each

17:22

model improvement.

17:24

So,

17:25

what I like to think about is that

17:27

people in this room who are thinking in

17:28

terms of agent ecosystems, who develop

17:31

an intuition for how different models

17:32

compose under pressure, um that those

17:35

folks are going to be really shipping

17:36

the next generation of innovation.

17:38

Uh there's a lot of open questions

17:40

still, right? Um how do we further

17:42

parallelize the workload of missions so

17:44

that they run faster? How do we start

17:46

orchestrating missions themselves into

17:48

even more complex workflows?

17:50

Uh but the data from production missions

17:51

is clear. This works on real projects at

17:54

scale today.

17:56

So,

17:57

this is what I'll leave you with. Open

17:59

Droid,

18:00

try running /missions,

18:03

argue with the orchestrator about the

18:04

scope,

18:05

approve the plan, and then go do

18:07

something else.

18:08

I'm excited to see what you guys build,

18:10

and I'll be around to answer any

18:11

questions uh for the rest of the day.

18:13

Thanks.

18:14

>> [applause]

18:18

[music]

More transcripts

Explore other videos transcribed with YouTLDR.

Get the TLDR of any YouTube video

Transcribe, summarize, and repurpose videos in 125+ languages — free, no signup required.

Try YouTLDR Free