modern agentic software engineering
OpenAI’s codex
has demonstrated something I’ve struggled to for a while: we’ve been doing AI software engineering wrong. Or at least, there’s a better way with CLIs and files as the interface. What’s old is new again.
In this post, I demonstrate the core workflow for modern agentic software engineering that can be applied to pair-programming or automation via CI/CD:
- (don’t skip) have a well-structured, well-documented code repository
- (don’t skip) have a code repository with good, fast linting and tests
- generate a
task.md
file with all relevant out-of-repo context ai "open the task.md and work on it"
- further wrap this workflow with scripts for automation when appropriate
Click for some pro tips
gh issue view NUMBER > task.md
and such…- if working interactively, supervise and iterate
- if working in CI/CD, make a PR and review (noting you can loop agentic reviews in here…)
- consolidate commands for linting and testing in a
Makefile
or similar (I likejustfile
) and instructions yourai
on using them - ideate, learn, do, iterate, automate
You can then leverage standard software development lifecycle (SDLC) best practices and GitOps for modern agentic software engineering.
a toy example🔗
In writing this post I wanted to preview this website locally. It wasn’t working for only this post and I was very confused (I just re-wrote this website into a new framework today, or rather yesterday…). In my terminal, side-by-side with this post open in my editor, I typed ai "for some reason modern-agentic-software-engineering.md in posts/ isn't showing up in my build, any ideas?"
and got:
Click to expand `codex` session
)
)
)
)
)
# Justfile
# load environment variables set dotenv-load ... (40 more lines)
)
)
)
)
)
()
Summarizing:
- I was writing the post on April 20, 2025
- The post was dated April 21, 2025
- Hugo by default doesn’t include posts in the future in its preview or build
Instead of going down some rabbit hole on some ad-riddled website, I just “asked” my terminal around where the problematic file was. You may have noticed before reading any files it noted:
Inspecting Hugo structure
The user is working with a Hugo-based personal website...
Where did this information come from? And what makes this pattern so powerful?
a couple of 3 things🔗
There are 3 things that are revolutionary here for agentic software engineering:
- Dynamic in-repo context
- Using CLIs directly
- Files (strings) as the interface
dynamic in-repo context🔗
codex
(uniquely as of release?) works off codex.md
files per-directory. You can also set global instructions in $HOME/.codex/instructions.md
by default (use codex --config/-c
to quickly open global instructions in $EDITOR
). This means as your agent navigates down the file tree you give it access to, it’s dynamically loading relevant context per-directory. This solves a lot of issues with supplying relevant context to agents that is a lot more natural than alternatives.
You can structure repos and directories like:
./
├── repoA/
│ ├── subdirA/
│ │ └── codex.md
│ ├── subdirB/
│ │ └── codex.md
│ └── codex.md
├── repoB/
│ └── codex.md
└── repoC/
└── codex.md
This explains how codex
knew details about my website’s codebase earlier, and has directory-specific context for other repositories I work in. Your task.md
just needs any context that doesn’t already exist in the filetree.
using CLI commands directly🔗
SQL is one of the most popular “programming languages” in the world, despite software engineers not thinking of it as such often. Why? Largely because it’s declarative and reads like English. SELECT * FROM people WHERE age > 21
kinda makes sense even if you don’t code. It’s easy to learn and (relatively) limited in scope while allowing users to accomplish a lot.
Similarly, gh pr list
is a CLI command that probably just makes sense (assuming you know what gh
and pr
are, to be fair). And if they don’t, just ai "explain <whatever>"
! codex
uses CLI tools directly, allowing you to use the same tools and automate repeated actions for human or agentic use. To demonstrate, early codex
didn’t use a special tool to read a file, just sed
like:
or grep
to search files:
The broader point here is LLMs are trained on human text and work best like humans interacting with text. Working via CLIs (which also solve a lot of client/server interaction problems) is a natural fit. And CLIs have good built-in documentation (for humans or agents) via --help/-h
flags. It’s easy to use the exact same tools your agent uses and write any deterministic workflows into reusable scripts that humans, agents, or CI/CD can run again and again.
files (strings) as the interface🔗
Working with LLMs is easiest when you think of them as autoregressive text processors. This explains a lot of their behavior: why they struggle to generate ASCII art, why they can’t reliably do math, etc. If this doesn’t make sense, you should learn more about neural networks and modern LLMs until it does.
Once you do, you understand the text (string) data that goes into the LLM is essentially all that matters: text in, text out. Getting that input text data fast and efficiently is critical to modern agentic software engineering. The Unix philosophy largely solves this and many more problems with effective use.
I’m standardizing on a task.md
convention for loading all relevant context before my ai
call (ai "open the task.md and work on it"
).
how I use git
, gh
, nvim
, tmux
, and codex
🔗
I have a love/hate relationship with typing. I use a lot of bash zsh aliases functions like:
(as an exercise for the reader, without copy/pasting from your browser, pipe the code above for ghprc
into codex
to explain it)
I like to think of this as a form of automation. If I can automatically build a task.md
quickly, I can ai "open the task.md and work on it"
for any task. It’s just about quickly building up the task.md
that will accomplish what I want.
Git is critical for managing strings in the filesystem. Vim keybindings make editing task.md
files to feed into codex
fast. Tmux allows me to have my editor and ai
open side-by-side, and open new windows as needed for more sessions in parallel. And the GitHub CLI + my bash functions help pull information from software engineers I work with into the task.md
file. If your GitHub issues are well-written, there’s no reason gh issue view NUMBER | codex --full-auto
shouldn’t be your first attempt at solving a problem. Though until you and your team adopt this new paradigm, you might want to start with gh issue view NUMBER >> task.md
and edit manually. Do what works!
BUT WHAT ABOUT🔗
But what about Claude Code?? They even released first! And actually, there’s Aider! And that’s already open source, and they’ve been doign it forever!1!!
Yes. But claude
isn’t open source (and thus doesn’t have a --config/-c
flag added by me, making it inferior) and aider
is…just not quite the same level of agentic in my admittedly inexperienced opinion. OpenAI’s codex
is the first where the workflow has clicked for me, though I’ve been using claude
more side-by-side and it’s also fine. I’m more concerned with the workflow than the provider/specific tool.
Claude Code, as of this post, does not support CLAUDE.md
files in subdirectories that are used in the same way codex.md
files are (as far as I can tell from usage and their documentation). I’m less-sure, but believe loading context into Aider is a more manual process as well. These tools will converge on doing whatever works best, and I suspect both will soon do this as codex
does.
Oh and what about RAG! Using grep
? So limiting! Vector search!!!
This last one might be a fair point – and my counterpoint is if semantic vector search is so useful on its own, it should be its own CLI tool alongside grep
for filesystem search. Why limit it to just agents? Modern agentic pair programming implies using the best tool for humans and agents alike, which tend to be open source standards and CLIs following the Unix philosophy.
results🔗
But where are the results!?!?!? In this post, I’ve only answered one (somewhat) simple question with codex
.
While I can’t share the code, in the past 2 weeks using agentic software engineering (claude
, VSCode Copilot agent mode, codex
) I’ve made eight meaningful contributions to our frontend codebase that, as a product manager and JavaScript hater, I would never have made otherwise – my prior contributions were all updating strings in a en-US.json
file. There’s still reason to be skeptical but the evidence is mounting and the usefulness is getting easier to scale. I’ve also redone my personal website, refactored stuff into Go and Rust, and more with agentic assistance. Learning things yourself is important (and critical for effective use of agents), but the results are here now.
predictions🔗
I predict:
- We standardize on
README
/README.md
/readme.md
/etc. instead of LLM-specific filenames for dynamic context across tooling. Why should the instructions for humans versus agents differ anyway? It should be assumed everyone working in the codebase read theREADME
(crazy stuff). - Specialized domain-specific wrappers around standard software engineering agents emerge. Organizations and teams developer their own business logic as usual around these.
- More people will realize the power of terminal-driven, file-driven agentic pair programming.
- More people will realize the power of the above for agentic CI/CD.
- Automation, automation, automation…