Full Transcript

·YouTLDR

Agentic Engineering: Working With AI, Not Just Using It — Brendan O'Leary

27:045,126 words · ~26 min readEnglishTranscribed Apr 19, 2026
0:00

Let's talk a little bit about what I

0:01

mean by agentic engineering.

0:04

And let's maybe start with a question.

0:07

If I were to ask you right now, how are

0:09

you using AI in your work? Could you

0:11

actually really explain it?

0:13

Not just, you know, it helps me code

0:15

faster. It can write code really fast,

0:17

but like the real workflow.

0:19

What you hand off, what you keep, how

0:22

you decide in between.

0:26

Most engineers can't and that's a little

0:27

wild to me because 90% of engineers are

0:30

already using AI tools or have used

0:32

them. Maybe only half of them are using

0:34

them on a regular basis, but that's a

0:36

number that's definitely growing all the

0:38

time.

0:39

And that's the current state.

0:41

So, the question isn't whether your team

0:43

is using AI, they are. The question is

0:45

whether you're getting the most out of

0:47

it or you're just kind of auto

0:49

completing your way through the day.

0:52

That gap between using AI and being able

0:56

to articulate how you work with it,

0:59

that's what this talk is all about.

1:01

And really, I think it represents a

1:03

paradigm shift of how we think about AI.

1:07

And you know, the history of AI and

1:09

software engineering is moving

1:11

uh very fast. It's also very

1:13

surprisingly short, right? In the 20

1:16

early 2020s, we got tools that could

1:18

finish the lines for you. You type, you

1:20

know, half of a function signature and

1:22

the model would guess the rest of it.

1:24

You know, kind of like auto complete on

1:26

steroids. It's a neat trick.

1:28

And then in 2022,

1:30

models started to be able to suggest

1:32

entire functions, right? You could

1:34

describe what you wanted and chat with a

1:36

model and maybe get a working

1:38

implementation back. And this is where

1:40

GitHub Co-pilot first came on the scene

1:42

and broke through and millions of

1:44

developers started using it. And for the

1:46

first time it was starting to seem like

1:47

maybe AI wasn't a novelty, maybe it was

1:50

generally useful.

1:52

But then in 2025, something really

1:55

broke. It's, you know, what we're living

1:57

in now in 2026. The the models don't

2:00

just suggest, they can execute. They can

2:03

take a task and break it down and figure

2:06

out which files need to be touched and

2:07

make the changes and run the tests

2:10

themselves and then come back with an

2:11

actual pull request.

2:13

And so, that's not just fancy auto

2:15

complete. It's not just a a faster

2:17

horse. It's a collaborator. It's a

2:19

different way of working.

2:21

And Armin, the creator of Flask for

2:23

those Python folks here, put it, I

2:26

think, perfectly.

2:27

We're no longer just using machines.

2:30

We're now working with them.

2:32

And that framing, I think, captures this

2:34

real shift.

2:36

Right? Tools are things that you pick up

2:37

and put down. You use a hammer. You

2:40

don't work with a hammer.

2:42

But the AI coding agents we have today,

2:44

they're kind of somewhere more in

2:46

between and they're maybe a little bit

2:48

more like working with another engineer.

2:50

Now, it just happens to be an engineer

2:52

who's read every Stack Overflow answer

2:54

ever written.

2:57

And I think that needs a a mental model

2:59

shift. And this is the mental model I

3:01

want you to carry through the rest of

3:03

this video and honestly through the rest

3:04

of your, you know, next couple years of

3:06

your career in working with these tools.

3:08

I I do think they're still tools, but we

3:10

have to think about them differently.

3:12

You kind of have to think about your AI

3:14

agent as an energetic, enthusiastic,

3:18

extremely well-read, often confidently

3:21

wrong junior developer.

3:25

That junior developer is incredibly

3:27

fast. They don't easily get tired. They

3:30

don't have any ego about their code.

3:32

They'll happily rewrite something six

3:33

times if you ask them to.

3:35

And they have an astonishing breadth of

3:37

knowledge. They've seen lots of

3:38

languages. They've seen lots of

3:40

frameworks. They've seen lots of

3:42

patterns.

3:43

But, and this is critical, what they

3:45

don't have is judgment. They don't know

3:48

your business context. They don't

3:50

understand the reasons why you made that

3:51

very specific architectural decision 3

3:53

months ago.

3:55

And they'll confidently write code that

3:56

is technically correct and contextually

3:59

wrong.

4:01

Armand also said that he's gained more

4:03

than 30% of time in his day because the

4:05

machine is doing a lot of the work.

4:07

That's a real gain.

4:09

But he's getting that 30% because he

4:11

knows what he can hand off and what he

4:13

has to keep for himself.

4:15

He's not just blindly accepting every

4:16

suggestion. He's directing the work.

4:19

And that's the difference between using

4:20

AI and working with AI. And that's what

4:23

agentic engineering actually means.

4:28

And so, let's get tactical. If you're an

4:29

engineer, how do we really

4:31

get good at this?

4:33

I think the number one thing to think

4:35

about is context engineering.

4:37

And here Karpathy says, you know,

4:39

context engineering is a delicate art

4:42

and science of, you know, filling the

4:45

context window with just what needs to

4:47

happen for the agent to have the right

4:50

context for the right iteration for the

4:52

next step.

4:54

I think that's really critical for a

4:55

couple of reasons. First, context is

4:58

expensive, right? Every token you add

5:00

into the context is going to add cost

5:02

because all of those things, that whole

5:05

chat history, is sent back in as a input

5:09

tokens every time that you send it.

5:11

And

5:12

that, you know, can can add up pretty

5:15

quickly.

5:16

And the other key is that more context

5:18

doesn't always mean better results. And

5:20

in fact,

5:21

um it can make the model actually

5:23

dumber.

5:25

Right? It's not just about the money.

5:27

The quality can degrade as you get over

5:30

about 50% full.

5:32

And there's lots of things that can trap

5:33

you here. And not the least of which

5:35

are, you know, the facts that fact that

5:36

MCP servers became so popular

5:39

that we have a lot of these enabled all

5:41

the time now. Well, each one of those

5:42

loads more and more context. Uh you

5:45

know, more and more input code tokens in

5:47

the context.

5:48

And and that can be a real problem if

5:50

you start getting into this dumb zone

5:52

around 50% context.

5:54

And

5:55

that also isn't the only problem because

5:58

not only can more context be a problem,

6:00

but bad context can be a problem and can

6:02

poison everything.

6:04

Right? So, this happens when you're

6:06

maybe mis- mixing two different tasks

6:08

that didn't really overlap. Or you've

6:10

kind of got some outdated comments

6:12

either in the code or that you've made

6:13

to the agent. Or even worse, what I've

6:17

seen a lot of people do is they start

6:18

walking down the road with an agent and

6:20

then realize,

6:22

"Hey,

6:23

we're down the wrong path. We've made a

6:24

lot of wrong decisions." And they try to

6:26

steer the agent back.

6:28

But the problem is again, the agent is

6:29

not doing real reasoning like you and I

6:31

as a human. Right? It's taking all that

6:34

context every time.

6:36

And it may get lost in the middle or

6:38

even see some of those negative things

6:41

that you had before as still part of the

6:43

context.

6:44

And you see those negative patterns

6:46

creeping back in if you're not careful.

6:49

That's why it's better, you know, to not

6:50

let these things kind of compound.

6:53

But also, you know, always start a new

6:55

session once you realize things are kind

6:57

of off the rails.

6:58

Right? Because not only is context

7:00

expensive,

7:01

the more we have doesn't always mean

7:03

better quality. In fact, at a certain

7:05

point there's a tipping point where it

7:06

means worse quality.

7:08

And bad context can corrupt the output.

7:11

So, the real critical thing for

7:12

engineers is to manage the context. And

7:15

what does that mean?

7:16

Well, one, I think it means persisting a

7:18

lot of information outside of the

7:20

context window so that we can bring it

7:22

in, right? So, this is things like

7:24

scratch pads for things we're working

7:25

on, memory files, the agents.md,

7:29

those kinds of files that help the

7:31

agents have context to what you're

7:32

working on.

7:35

We also need to be very selective when

7:36

we're

7:37

selecting that context. So, that means

7:40

only pull in what's relevant for this

7:42

step of the problem, right? Don't just

7:44

pull in everything that might be useful.

7:46

And so, that could mean,

7:48

you know, things like bringing in the

7:49

right at mentions for files that we're

7:51

referencing. That could mean making sure

7:53

we don't have unnecessary MCP servers

7:56

enabled. Uh and it means, you know,

7:58

making sure that the agent has the right

8:00

data and that we as a human have curated

8:02

that data for the agent.

8:05

And then, as it's getting bigger and

8:06

that that window gets bigger, we want to

8:08

summarize and trim and compress that

8:11

context, right? If we've gone through a

8:12

whole big deep dive and debugging

8:15

session with the agent and now we think

8:17

we have the problem and the solution,

8:19

well, that's great. It might be time to

8:21

compress that context and just focus the

8:24

agent back in on, "Okay, now we

8:25

understand this problem. We're going to

8:27

go fix it."

8:28

Uh and then the other most important

8:30

thing is to isolate context. And I think

8:32

this is why we've seen this huge rise in

8:34

the past six or eight months of parallel

8:37

agents because splitting work across

8:39

several agents or several sessions can

8:42

help things not accumulate. And really

8:45

drive this kind of task separation.

8:48

And again, if you think about it, aren't

8:50

these all of the same things that I

8:51

would tell a brand new engineering

8:54

manager about about managing a junior

8:56

engineer?

8:58

Like the story I tell here is a when I

9:00

was early in my career, I spent a lot of

9:02

time as an engineering manager and

9:04

product manager before I

9:06

went into the dark arts of developer

9:08

relations.

9:10

And in my first job ever as an engineer

9:12

manager, I was at a healthcare software

9:15

company.

9:16

And there was this new thing coming out

9:17

called an iPad. And that dates me a

9:19

little bit. Um but it was it was

9:21

released in the market and we thought

9:23

this could be a great place to collect

9:25

patient history, you know, that form you

9:26

have to fill out every year at the

9:27

doctor. It's very critical to assessing

9:30

a lot of your, you know, risk of

9:32

disease.

9:33

Um but having to fill it out from

9:35

scratch every time is is not fun.

9:38

And so, I designed in this other archaic

9:41

tool that some people may have heard of

9:42

called Balsamiq, basically a wireframing

9:44

tool, a wireframe of what this would

9:46

look like.

9:48

Now, that wireframing tool used things

9:50

like Comic Sans and like silly smiley

9:53

face icons as placeholders.

9:55

And a lot of other stuff like that that

9:57

you'd expect from just a wireframe.

9:59

And I handed that to a set of interns

10:01

that we had working for us that summer

10:03

thinking this is a great green field

10:04

project for them to take some time on.

10:07

And you know, a few weeks later I got

10:09

back a working prototype

10:12

and the font was Comic Sans and there

10:14

were silly emoji placeholders.

10:17

And that's because that's what the spec

10:19

had in it.

10:20

And so so whose fault was that?

10:22

Obviously it was not the intern's fault.

10:24

It was my fault as an engineering

10:25

manager not giving the right context to

10:29

those junior engineers as to what's

10:31

important, what's not, and what we

10:33

really need to focus on and what problem

10:35

we're solving.

10:37

And so I think the habits that can tie

10:38

all of that together

10:40

are you don't need to think about all

10:41

four of those things for every task, you

10:43

just need to think about doing one task

10:45

per session,

10:47

keep an eye on your context meter, and

10:49

if you're in doubt and it feels like

10:51

things are off the route rails, you're

10:53

probably right.

10:54

So start a new session, ask it to

10:56

summarize the session for a new agent.

10:59

Turns out that AI is really great at

11:01

writing prompts for AI. So if you've

11:03

worked on something with an agent for a

11:04

while,

11:05

have that agent summarize where you're

11:07

at,

11:08

you can now read it, make sure it

11:10

matches with your understanding and then

11:11

start a new

11:13

uh session with just that right context.

11:15

Again,

11:16

it's a little bit of art and a little

11:17

bit of science.

11:19

So how do we put this into practice?

11:21

Well, I think there's a lot of

11:22

workflows, there's lots of things

11:23

written out there that you can read.

11:25

I've even compiled a lot of them at

11:27

path.kilo.ai.

11:29

It's a where you can find like all of

11:31

these kinds of trends and ideas and

11:33

workflow patterns that have been talked

11:35

about.

11:36

But what I think I keep coming back to

11:38

is is maybe one of the simpler ones

11:41

and that's the research plan implement

11:43

loop.

11:45

Right? And I think this really helps us

11:47

solve for a lot of like classic mistakes

11:49

that people do when they pick up agentic

11:52

engineering for the first time or pick

11:54

up AI to help

11:55

try to do some engineering.

11:57

Um and what most people do is say, "Hey,

11:59

help me implement this feature. I want

12:01

it to do X and Y."

12:02

And you know, these large language

12:05

models are very good at outputting lots

12:07

of code. In fact, when I joined Kilo

12:09

Code over a year ago,

12:11

I made a pronouncement that we would

12:14

never have our website be

12:16

just prompt and a whole lot of code

12:18

flying by.

12:19

Makes for a great demo and you've seen

12:21

lots and lots of coding agents that

12:23

maybe that's how they show it off.

12:25

But I think the reality is jumping

12:28

straight into code like that can cause a

12:30

lot of wrong assumptions, it can waste

12:32

even more time rather than saving time,

12:34

and just create a lot of frustration.

12:37

And it really creates that kind of

12:39

paradigm that we've seen where people

12:40

are kind of anti-AI or think that AI is

12:43

not a useful tool because they've jumped

12:45

right in and gotten, you know, put

12:47

garbage in and gotten garbage out. Uh or

12:50

maybe it's been a while since they've

12:51

used it, right? I mean, if you think of

12:52

the the Will Smith eating spaghetti when

12:55

it comes to AI videos, that's come a

12:57

long way in just the past two, three,

12:58

four years.

13:00

You know, the same is true of the AI

13:01

coding models, but you have to do what

13:03

works to give them the best chance at

13:06

getting a great result. And what that is

13:09

is first understanding the problem

13:10

really well and making sure you and the

13:12

AI agent can understand the problem

13:13

really well.

13:15

Then laying out explicit steps for

13:17

implementing that

13:18

uh that those changes or fixing that

13:20

problem.

13:21

And only then do we jump to the

13:23

implementation phase where we're writing

13:25

code.

13:26

And Dex Horthy has a great uh phrase

13:29

that he says here, which is a bad line

13:31

of research can potentially be hundreds

13:33

of lines of bad code.

13:34

And so we're really going to focus in on

13:36

how do we get the research and the plan

13:38

in place

13:39

in order to make give ourselves

13:41

the best chance of having great code

13:44

come out.

13:45

So in that first phase, we're going to

13:46

use a tool that is only going to be

13:49

focused in on research. And so for Kilo,

13:51

we call that ask mode.

13:53

And the reason we call it that is

13:55

because the ask mode can't actually do

13:57

anything. It can only chat. It can't

13:59

write files. It can maybe read files if

14:01

you let it,

14:02

but it can't, you know, start trying to

14:04

code a solution.

14:06

And so instead of trying to to code a

14:07

solution from the beginning, we're going

14:08

to first try to understand the system.

14:11

You know, how does it actually work

14:12

today? Where are the right files that

14:14

are going to be involved? What are the

14:16

right paradigms that we want to mirror

14:18

or how does this differ from something

14:20

that we have already?

14:22

And you know, just kind of learn where

14:24

in the code base this this is going to

14:26

go and you know, how the data is going

14:28

to flow through the system and how it's

14:30

going to change with our change as well

14:33

as like any edge cases we can need to

14:35

consider, right? AI is really great at

14:37

brainstorming and so it can help you

14:39

kind of brainstorm those things and make

14:40

sure you've really covered all of your

14:42

bases.

14:44

And then once you're done that research,

14:45

what's going to come out of that is an

14:47

actual output document that shows the

14:51

the details of that research that you

14:54

can then read and basically agree with

14:56

and understand, "Hey, this this matches

14:58

my understanding of the problem.

15:00

I think we're ready to move on to the

15:01

plan."

15:04

And so then once we've reviewed that as

15:06

a human, now we can say, "Okay, let's

15:08

outline the next steps. What kind of you

15:10

know,

15:11

files are we going to create or or

15:13

change? Maybe there's some code

15:15

snippets, but not always is it a good

15:16

idea to have a code snippet in the plan.

15:18

We are definitely going to include like

15:20

how is how are we going to verify and

15:22

know this change is correct? What are

15:23

the test either changes or additions

15:26

that we're going to make to know that?

15:28

And we're also going to be really

15:29

explicit at the plan planning phase

15:31

about what is in and out of scope, what

15:33

is going to change, what isn't going to

15:34

change.

15:36

And again, the output of that is going

15:37

to be a very clear plan file, right?

15:39

You'll see a lot of repositories

15:40

nowadays have a folder called plans.

15:43

Right? And we want to have that plan

15:45

file be step-by-step instructions with

15:48

specific changes that we're going to

15:49

make that have test commands to verify

15:52

it, that has a strategy for

15:53

understanding how it's going to change

15:54

the system. And it's going to be very

15:56

clear so that we can even use maybe a

15:59

smaller, faster, or cheaper model to

16:01

implement it because we've spent the

16:03

time in the research and plan phase to

16:06

really understand what we're going to be

16:07

doing once we get to

16:09

implementing the change.

16:11

And when we come to implementing the

16:12

change, we now can start over a new

16:14

session and give it just the plan

16:17

execution.

16:18

It allows us to keep the context in that

16:20

session very low. It allows us to

16:22

carefully review each change and I think

16:25

commit very frequently. Now, I used to

16:27

work at a company called GitLab for

16:28

many, many years. Uh so maybe I'm a

16:30

little biased towards Git, but I think

16:32

Git can be a huge helper here when it

16:35

comes to helping you slowly iterate and

16:38

understand the changes that the agents

16:39

are making.

16:41

I treat Git on my local machine kind of

16:44

like my own first pull request review

16:47

with my agents before I maybe put up an

16:49

actual pull request for my

16:52

uh you know, for my colleagues to look

16:54

at.

16:55

But I think again, it's critical to

16:57

understand here that human research at

16:59

the planning or sorry, human time at the

17:01

planning and research phases

17:04

is really the highest highest leverage

17:06

use of your time.

17:08

By the time you're implementing, you

17:09

want to have all that hard thinking

17:11

done.

17:12

Uh and that's really critical cuz again,

17:13

going back to Dex Horthy who's who's

17:15

spoken a lot on the subject and uh I I

17:18

highly recommend you check out his you

17:19

you know, videos of him on YouTube

17:21

talking about this.

17:22

He says very aptly that AI can't replace

17:25

thinking. It can only amplify the

17:26

thinking you've done

17:28

or the lack of thinking you haven't done

17:30

or you know,

17:31

the fact that you haven't thought it

17:32

through.

17:34

And so let's talk about how we can

17:36

figure our agents kind of like one more

17:38

step down from this

17:39

this uh paradigm of research plan

17:41

implement to really make sure we do

17:43

this.

17:44

So first we talked about modes and

17:46

customizations. We already talked about

17:48

these modes, ask, code, architect. These

17:51

modes that are specialized and focused

17:53

on the thing that we're trying to get

17:54

done. Right? Architect is maybe for

17:56

planning. Ask mode is for research. Code

17:59

mode is for actually implementing.

18:01

Uh then we also want to have, you know,

18:03

a set of rules that make sense for our

18:06

workspace, right? For the the repository

18:09

we're in.

18:10

Uh or maybe globally on our machine so

18:13

that we understand, you know, that we

18:14

have a certain set of rules that we

18:16

always want to adhere to.

18:18

Uh and agents are pretty good at loading

18:20

in and understanding those rules.

18:23

Uh but we have to have them written down

18:25

for them to have those in their context,

18:26

right?

18:28

And so I think a lot of the agent

18:30

behavior then

18:32

is are things that we want to tweak as

18:34

we're learning, right? How many Do we

18:36

want to do multiple agents at a time? Do

18:37

we want those agents to use work trees

18:40

so that we can then again, merge them

18:42

back in to our local uh repository

18:46

locally before committing them to to a

18:48

pull request?

18:50

Uh how much do we want to auto-approve,

18:52

right? So most agents have the ability

18:54

to tune, you know, what are the things

18:56

that it can do independently? What are

18:58

the tools it can use independently? Can

18:59

it read files? Can it read files inside

19:01

or outside of the workspace? Uh can it

19:04

run tests? You know, what can the agent

19:06

do autonomously without your

19:08

intervention versus what do you need to

19:09

approve?

19:10

Yeah, I think this is something that you

19:11

have to set up to be comfortable with in

19:13

the beginning and then also you need to

19:15

be comfortable changing as you learn how

19:17

to use these tools.

19:21

And then I think a good mental model um

19:23

for this agent configuration is maybe

19:25

kind of three distinct buckets, right?

19:27

We talked about modes, right? This is

19:29

that that role-based configuration, you

19:32

know, a behavior of an agent that we

19:34

want.

19:35

Uh but there's two other really key

19:37

things and that is the agents.md and

19:39

then skills.md that you'll hear about.

19:42

Uh and so what are those what's the

19:43

difference between the two?

19:45

Well, the agents.md is now quickly

19:47

becoming the de facto standard for where

19:50

all agents go kind of for their readme,

19:52

for the like always-on rules and details

19:55

about the project. Uh so I think it's

19:58

critical that your project has an

19:59

agents.md with a minimal amount of

20:01

information that an agent needs to know

20:03

about, you know, what are the

20:04

conventions that we're using, what are

20:06

the commands that we're using to get it

20:08

built or tested, and like what are the

20:10

requirements around testing,

20:12

uh or requirements that we need to be

20:14

sure check off before committing.

20:17

And then skills are kind of more of a

20:18

specific workflow, right? So there's

20:21

reusable kind of playbooks for agents.

20:24

So if there's something that you're

20:25

doing a lot, you're making motion

20:28

graphics with their motion often, or

20:30

you're

20:31

um you know, doing some sort of like

20:34

uh daily or weekly or monthly change log

20:37

compiling,

20:38

those kinds of things

20:40

are great to put in as skills that an

20:43

agent can then pick up when it needs it

20:45

to do those specific kinds of workflows.

20:48

And so typically those are on demand and

20:50

you say, "Hey, let's use this skill for

20:52

this task." Versus the agents is almost

20:55

always loaded into the context for the

20:56

agent, so it knows what's going on.

20:59

And then of course, I I work at

21:01

Kilocode, and so I've got some power

21:03

user tips there,

21:04

um but I think some of these many of

21:06

these apply, you know, regardless of

21:08

which agent you're using, but I think

21:09

they're critical as you kind of get

21:11

comfortable with those first kinds of

21:13

paradigms. How do I now customize this

21:16

and make it work for me? And one is

21:18

at-mentioning for context. So mentioning

21:20

files or commits or, you know,

21:24

things from the terminal that output.

21:27

Those kinds of things and bringing them

21:28

into the context quickly are really

21:29

helpful. Uh using slash commands to do

21:32

things like starting a new task when we

21:33

need to, or condensing the context when

21:36

it's getting too full.

21:38

Uh those kind of quick commands can help

21:39

us move a lot faster.

21:41

Uh we also can, if we're working in in

21:43

VS Code uh with Kilocode, we can select

21:47

uh a section of of code and right-click

21:49

and say add to Kilocode, and then that

21:50

context is brought right in there, and I

21:53

can then talk or ask or

21:55

uh questions about the that code, or ask

21:57

the agent to change a certain part about

21:59

that code. Uh and then of course, we

22:01

have autocomplete built in as well,

22:03

which I think is still useful,

22:06

especially because we also have it not

22:07

just in code, but as you're prompting.

22:11

And then kind of beyond the IDE, I think

22:13

we're seeing, you know, also this shift

22:15

this year in, you know, where else do I

22:19

want to be able to use this? In the CLI,

22:20

from my mobile phone, in a cloud agent,

22:23

directly in Slack. Right? The ability to

22:25

kind of use these agents wherever you

22:27

are is something that's becoming more

22:30

expected

22:31

uh of of everyone and everyone's agents.

22:34

And I think that's a good thing. I think

22:35

that means that we're starting to learn

22:36

how we can use this these agents again

22:40

more like a collaborator that's

22:42

everywhere that we need to be.

22:46

And then one other thing that I want to

22:48

talk about um are is getting other

22:50

context things in. First of all, model

22:52

context protocol, right? Context is

22:55

right in the name.

22:56

Um

22:57

the idea of this is, you know,

22:59

fundamentally these models originally

23:01

can only like

23:02

it receive input tokens and create

23:04

output tokens, right?

23:06

Uh and slowly but surely we've been

23:08

enabling them to use tools where they

23:10

can, you know, make tool calls out uh

23:13

and affect things in the environment,

23:14

like running tests.

23:16

Uh the MCP, the concept of MCP basically

23:19

expands this to say, "Hey, I want to

23:21

give other tools." Right? For instance,

23:23

the GitHub MCP gives the agent a lot of

23:26

tools to interact with the GitHub API,

23:29

look up pull requests,

23:30

um look up comments, look up issues, and

23:33

understand a lot more about your your

23:35

GitHub environment, right?

23:37

Um or context seven helps it look for

23:41

up-to-date framework documentation,

23:43

because of course, as you know, the LLMs

23:45

kind of have a cutoff date where their

23:46

knowledge cuts off, and then then

23:48

anything that's improved since then they

23:49

don't know about.

23:51

Um

23:52

so these MCP servers can be very

23:54

helpful, and there's there's thousands

23:56

of them out there.

23:57

Uh but the concern is that every one of

23:59

them is going to add at least some

24:00

information, right? Details about those

24:02

tools that it has to the system prompt

24:04

that gets sent every time uh you're

24:07

having an interaction with an agent. And

24:09

so you want to make sure, if you're not

24:10

actually using that, to disable it,

24:12

right? Let's say I have a Postgres MCP

24:14

that connects to my database, and I'm

24:16

doing a whole bunch of front-end work

24:18

that doesn't involve the database at

24:19

all. Well, that Postgres MCP is just

24:21

going to be wasted tokens, and maybe

24:23

even worse,

24:24

tokens that help, you know, kind of

24:26

confuse the agent and and not understand

24:28

that it's not supposed to touch the

24:29

database right now.

24:31

Uh so we want to be really careful to

24:32

not like overuse MCPs.

24:36

And then another thing we hear from

24:37

um

24:38

enterprises a lot is how do we work with

24:40

internal platform APIs?

24:42

Uh and I think that, you know,

24:45

there's kind of four different ways of

24:46

doing that. One, if there's already an

24:48

OpenAI open API spec for it, or Swagger

24:51

spec, use that.

24:53

If there's not, then convert it to

24:54

markdown so that you can save that

24:55

markdown, you know, in the agents.md or

24:57

somewhere else in the repository to

24:59

reference it.

25:01

Uh and if it's something that changes a

25:02

little bit more frequently, maybe you do

25:04

need to have like a reference URL that

25:06

you can pull in

25:08

uh and have the agent go pull every time

25:10

to see the latest and greatest.

25:12

Uh and then we've seen some customers

25:14

who, you know, have complex multi-step,

25:15

multi-system workflows, where building

25:18

their own MCP server might be the right

25:20

choice.

25:22

But, you know, one way or another, I

25:24

think the the key is to, when working

25:26

alongside Kilo or any of these agents,

25:29

you know, isolate your work from the

25:31

agent's work, and then review that

25:33

agent's work as a pull request, right?

25:35

That helps you understand, you know, how

25:38

can I

25:39

um

25:41

best review the code just like I would

25:44

review a junior engineer's code.

25:48

And so that's really the presentation

25:50

that I have on Kilo. We've got some

25:51

exciting new features coming up. We've

25:53

got, you know,

25:55

expanded across all these surfaces.

25:58

Uh we also have a big focus on Openclaw

26:00

and Kiloclaw and making a very safe way

26:02

to use um Openclaw agents.

26:06

Uh and so if you haven't taken a look at

26:07

Kilo, I've just a little plug at the end

26:10

here, visit kilo.ai,

26:12

uh and we'd love to get your feedback on

26:14

what we're building.

26:16

And you know, just kind of to give you,

26:18

you know, where do we go from here?

26:19

Again, I think you've kind of got to

26:20

pick a tool and get lots of reps, right?

26:24

We said earlier on that, you know, it's

26:26

part art and part science, and I think

26:28

that just means you need a lot of reps,

26:30

right? To kind of get the feel for what

26:32

can I trust the models to do, and what

26:33

can't I trust the models to do.

26:36

Uh and then try this research, plan,

26:38

implement, feedback loop. See how that

26:40

works for you.

26:41

Um and I think maybe you'll end up like

26:43

some of these other senior engineers who

26:45

have said, "Hey, look, I'm having more

26:47

fun programming now than I've had in in

26:49

years and years." Uh as we, you know,

26:53

farm out some of this tedious work to AI

26:56

agents and let our brains work on the

26:58

harder engineering problems.

27:01

Thanks.

Get the TLDR of any YouTube video

Transcribe, summarize, and repurpose videos in 125+ languages — free, no signup required.

Try YouTLDR Free