Thoughts on Agent workflows
Things are moving fast in this space. I must admit that it has been rather fun watching everyone trying to figure out the same things at the same time with regards to how best to work with Agents in programming.
There's still definitely a ton to learn and the hype phases for me have been intense (to wit, blathering on endlessly about how great they are one day and then unpicking a large amount of mess the next.) It hasn't felt too terrible to make massive mistakes with LLMs yet though since the input energy is much lower.
Small tweaks
This is the simplest and most approachable workflow. For some tasks (most tasks?) it's probably the most similar to how one would work on a problem manually (perish the thought!)
The problem with this is that it doesn't scale very well. If you are working through a larger set of changes or a long term project, etc. you're likly going to come unstuck with the cognitive overhead. It's just too hard to keep all the things you need to do in your head whilst also making low-level progress on the tasks that keep a project ticking along.
Sure, you can keep a TODO list or Jira tasks or whatever. But its not like those are overhead free. A lot of the time, when I've a long TODO list, I end up either burning out from the towering list of things to do, or simply ignore them and carry on stream-of-consciousness style.
I think of it like the case where you comment out some code in case you might need it later. Except you basically never do need it later. So its sat there, sucking up attention and space when the vast majority of the time, you don't need it and its actively detracting from your cognitive load.
Spec-driven
This one seemed exceedingly alluring at first. You hash out a few bullet points, get the agent to write a PRD with a checklist and then set it off on its merry way: don't come back until the checklist is complete!
The problem I found with this is that it gives far too much control to the LLM. Or perhaps, it takes away far too much control from the human.
- These things grow in scope, and when they do, you lack oversight. This begets Priorities.
- It's easy for the agent to go down a rabbit hole. This begets Architecture.
- Design evolves without your input - this usually leads to a lot of bugs. This begets Constraints.
Great things can be acheved with a strict conformance suite. If the code passes it, you're all good right? The trouble is, with sufficient scope (not even complexity, mind) it's actually surprisingly hard to direct the agent to that goal.
There are two points I've drawn from this approach:
- Concision goes a long way
- Documents are probably better written for-people, by-people.
Conciseness in prompting is a bit of a bigger topic I'd like to get into at some point. But basically my gut feeling is less is more (the same applies to code, although I'm not sure if they apply in the same way.) Being concise drives the agent to a more narrowly scoped workflow because there tends to be less room for interpretation. Using rich language (get your thesauruses out, people) and noun/verb domain terminology can help you be much more precise in specifying what you want.
A tangential point is simplicity. A lot has changed, but I remember reading a few months ago about the importance of writing fresh prompts for each task. It forces you to think through the state of affairs as it stands (this can prevent drift) and also tailor the instructions to a more specific set of outputs (probably, this is my gut feeling.) I don't think this has changed too much, so I purposely avoid prompt templates or prompt engineering hacks (and I try to keep the AGENTS.md small.)
This probably has some parallel with teaching: if you can explain something in simple terms, you probably understand it better. The other angle to look at it is: if you ever copied a vim or emacs config file from someone else. You inevitably reach a point where you need to make a tweak but have no idea what is going on. If you build it up yourself from scratch, it is usually a lot better for your understanding and long-term maintenance.
Tests
Tests are basically the cornerstone of success in agent assisted programming, I think. They give verifiable output and a deterministic process to achieve it. As someone who always like TDD but could never be bothered to actually do it, I'm happy with this state of affairs now. There are some pitfalls though:
- Ensure you specify what should be tested. Otherwise you end up with a lot of (typically verbose) code that will be hard to audit. Discipline, as with the architecture angle, is key. You can't let the agent drift here, even though having to state specific test cases is quite a pain.
- Avoid mocks where possible. This is pretty much just good testing practise but I've noticed that (at least with Go) the agents like to write a lot of mocks. Integration tests should be preferred, with as few mocks as you can get away with.
- Gold standards tests are good. The wonderful conformance suite manifest. Just make sure not to let the gold standard fixtures grow into large blobs. These also can be hard to audit. If you can, it'd be better to split them into logical parts and run integration tests on each.
Where am I now?
I think the key point at this stage is to try and find the sweet spot between the low-level and high-level workflows. I think if you had to err on one side, it'd be towards the low-level as it gives you more control and less scope for things to go off the rails. We will see how this changes as people experiment more and agent harnesses get better though.
In short:
- Keep it simple
- Keep it short
- Try to avoid templates
Ask yourself if the following are evident or can be answered by your workflow:
- What's the goal?
- What are the constraints?
- How can we prove the implementation was successful?