On agentic coding...
Current thoughts, tips and tricks from my experience coding with AI agents since they were first released to the public and how they've changed since then.
I've been coding using AI tools since they first started, and things have changed every single week it seems like. I'm sure this post will become out of date quickly, but it could still be useful as a snapshot in time at the very least.
Level 1: Autocomplete
Honestly as I'm writing this I'm wondering how much the AI autocomplete I use from GitHub Copilot is doing over just the TypeScript language server.
I know I have it enabled, and it sometimes helps me write code faster. But frankly, I feel like I remember undoing things that Copilot suggests and picks automatically when I'm typing too fast and not paying attention than feel grateful that it made up a random property or method that doesn't exist.
I'm very lucky to use a strongly typed codebase in TypeScript though. And I LOVE my current IDE setup with LazyVim with it's built in autocomplete.
Current verdict: might be great for languages without typing, in practice for me I get the most value from the language server rather than the AI autocomplete.
Level 2: Chat
Chat was game-changing for me. When it first came out.
I used GitHub Copilot chat mostly as a faster lookup for things. I used it in lieu of MDN and Stack Overflow for things I already knew but just forgot the names of, and used it to link to these sites for things I had to learn about that there's was great documentation on already.
Eventually it gained my trust and I started asking it for things I didn't know yet or care to remember, like how to configure Google Lighthouse for running automated page metrics across the whole app. Sure, I could read the configuration and setup docs and spend a few hours tinkering, or I could just ask it! Remembering their configuration properties and implementation details are not something I'll get value from storing in memory. I normally do these once and forget until a few years later when I have to do it again. And by that time things have changed so much I have to learn again anyways.
Current verdict: A+, very useful
Level 3: Agent writing code
Current verdict: A++, extremely useful
The verdict is going first here because this was another massive milestone for me. For a long time (let's say 6 months lol which is forever in the agentic coding domain) I would inspect every change and thing it asked for and basically watch it every step of they way. I felt like it needed it.
Maybe since then it's just so much better that it's less important, maybe my brain has melted and I'm a slave to the system now, or maybe I've just gotten better at prompting and setting it up for success. But now I get much better results by letting it run and steering it afterwards until it gets stuck. And then I just come in at the end for some final cleanup and QA.
My current flow that I'm finding very useful:
pi coding agent
- It's very minimal and simple. And I love the author Mario Zechner's ethos (this is a great post I recommend). I was put onto it by reading Peter Steipe's (creator of OpenClaw) blog posts. So far their approach to agentic coding have given me the best results out of many other approaches that I've tried.
- It also shows you what it's doing at each step and doesn't do sub terminals and things under the hood, which makes it much easier to steer it early if it goes haywire. Very, very helpful
Mini models for most things
- After running out of tokens often from using something like Opus 4.6 or even Sonnet 4.6 (slightly smaller model). I started relying on the bigger models less and less to avoid hitting my time limit windows. I only have the $20/month subscriptions for OpenAI and Anthropic and the $10/month GitHub Copilot so they can run out fast if 'm not careful.
- I'm using medium for the reasoning and it feels like a good sweet spot for speed, cost and quality
- Currently GPT-5.4-mini is my go-to, sometimes GPT-5.4-nano too
Bigger models for planning complex tasks and reviewing complex pull requests
- I find the most value from larger models when things aren't obvious and more context is needed. For example, if it's a big feature that needs to be broken down in a "spike" before actually implementing it. Or reviewing a large PR.
- Usually though, I don't find these needed too much if you have a small scope without too many changes to implement or review. Which is the best case scenario and something I strive for often. Having a mini model do a bad job tells me I (usually) have too much scope, too much vagueness in my prompting or am asking it to do too much at a time.
- Like with the mini models, I'm using medium for the reasoning and it feels like a good sweet spot for speed, cost and quality
- Currently GPT-5.4 is my go-to for this
Openspec
- For some reason I'm finding these agent skills - which are just a set of markdown files that instruct your agent - much better at implementation and planning than using the built-in plan mode and then asking it to follow it. I've also tried just using the plan mode to write a document and then implement it too.
- I have a feeling it's the fact that it's structured using a well-tested convention so it's easier for the agent. Very curious though why it seems to work so much better than the plan modes built in to Codex and Claude. Since you'd expect them to also have a well-tested convention.
- I also really enjoy reading these documents more than others. They spit out a design, proposal and spec file that have different levels of detail and are actually easy to read as a human. They don't give the usual "AI slop, I don't want to read this overly verbose page... ugh!" vibe like other AI-generated docs do to me most of the time.
I'm constantly starting new sessions
- I basically only keep a session up to maybe 100K tokens. Since it costs more, uses more of your available subscription quicker, and seems to degrade as the session goes on I've found that planning with OpenSpec so there's a persistent set of documents to read across sessions, and then having each step done one at a time by a mini agent and starting fresh it seems to do better.
- I only recently learned how the API works under the hood. You make a call and it gives a response. Then you make another call with a new request, and it sends a new AI agent the old request/response, and your new request. So each subsequent call in a session uses ALL of your past conversation, PLUS the new request/response on a BRAND NEW agent that has to read all of that before doing anything.
- From a cost perspective, most agent harnesses like Claude Code will cache these requests and responses - which also costs more - and then it makes it cheaper to continue the conversation by using the cache hits. To be transparent, I'm not confident which is cheaper or better on the use cache and slightly longer sessions vs short and sweet, but it feels like the short and sweet with persistent docs across the sessions is working best for me right now at least.
- If you think about it from a human perspective: if I was assigned a task and then had a design doc and could build off of that I'd be good for the most part. Rather than a design doc, and a long conversation in Slack to read first, and then start coding. My "context" can only hold so much and I'm much better when I'm focused on a single thing without too much information. There's definitely a sweet spot of not too little, not too much information. I think AI works the same way.
Most of my prompting is using dictation
- I've used the built-in Mac dictation for a while and it's been wonderful. Much faster to get ideas out, feels more like pair programming, and is frankly more fun than typing all day.
- Lately I've been experimenting with Wispr Flow as a paid, slightly better version. So far I dig it. Not perfect and kind of expensive, but does save me time doing less edits so I'll likely keep it.
I haven't found a ton of value from MCP servers yet
- They usually just pollute the context window that I'm working to keep small! I've tried the Figma one for example and it seems to work better to just drop a screenshot in instead.
- The Notion one works pretty good though. And I've heard great things about the Sentry one that I'll try soon.
- CLIs seem to work the best for me so far though. Especially the bash ones and GitHub one.
Use screenshots for UI
- Drop in screenshots to explain what's wrong or what you want. Much easier for it to understand you. To be fair though, I feel like AI currently struggles to do good UI with responsiveness and a clean UX compared to backend or straight business logic.
Always link to the relevant files
- Feels like it's faster, uses less context/tokens and gives better results to do this legwork. It'll find it on it's own usually, but it's pretty quick to use the
@fuzzy finder and steer it upfront. Saves me time from having to steer it later to refactor things.
Append to the system prompt for conciseness
- This is just personal preference but I always ask my harness to answer in numbered bullet points so I can reply to a specific question easily. And to be concise, brutally honest, and clarify if unsure rather than assuming. It's not a silver bullet to fix hallucinations, but it seems to help.
Some simple steering prompts that help me
- "Are you sure?" or "How are you sure?" - helps combat hallucinations
- "Take your time" - seems to improve quality on longer running tasks. I think it's since they're designed to be fast, this gives them a little push to do things right over just delighting me with speed.
- "Any questions for me?" - helps clear up uncertainty, especially important during the planning phase or before execution
- "Ty!" - idk if it's useful or not, but I say that all the time as a quick "thank you!" to jazz it up and let it know it's doing good lol. It's also hedging my bets in case it ever comes after me for asking it to write most of my code for me now!
- "Is this ready for production at scale" - helps to review for code quality more than just "review this code". Just like with a human, these keywords add more weight to the review.
The review step before marking a pull request ready is the bulk of the effort
- As I get better with the planning, prompting and steering using the approaches mentioned above this gets less and less time consuming. But I can't feel good about shipping code that's not up to my high quality standards. So even though it's writing almost everything for me without asking for edits each step of the way, I act as a hard guardrail to ensure it's solid.
- My current flow is to ask different agents to review and fix until they're happy. And then I come in and do my own review at the very end, QA the changes and mark it ready for review from the rest of the team. This step probably takes 80% of the time of building a feature now and I'm actively working to improve this. But to be fair, maybe it should take 80%! And I'm still much faster at shipping than I was when having to type everything out myself so I'm a happy camper already.
Currently prefer Open AI Codex for most things
- Claude is great! But I don't want to use Claude Code if I don't have to. I prefer pi right now. And Claude seems to run out much, much faster. So it's a fallback for now.
- I also use GitHub Copilot and Google Gemini (which I get free from work with Google Workspace) for other background things sometimes just to keep my skills with other agents sharp and share my usage.
Level 4 (a bit risky): Agent doing things on your behalf
I haven't really found enough use cases for this yet where I feel good about letting an agent run loose outside of background research. I'm pretty new to this part of agents. But it's exciting! Time will tell... SL