Brave new bugs
AI code assist development pitfalls and context anxiety
AI drives technical innovations across every branch of science. A random broad sampling of innovations: we’re seeing AI deployed at CRISPR to speed analysis of genome data to unlock medical advances, AI deployed on farms to reduce crop loss to feed more people, AI inventing new types of paint to keep buildings cool, and on and on.
AI is also helping us find innovative ways of writing buggy, broken code. In the few projects I’ve worked on so far, I’ve made mistakes I could have only really made through the use of AI code assist.
Background: I’ve been writing buggy code for decades
I’ve spent time debugging systems from a few different POVs:
As a customer-facing applications engineer, I reviewed customer code and schematics, debugging other people’s systems with sometimes-limited visibility into the full system
As a firmware engineer, debugging my own code with unit tests and system level regression tests
As a product manager, mostly in project reviews
In all three of these roles, I had to debug issues by expanding scope to a system-level view. Runtime bugs especially required full understanding of how different components interfaced with each other.
Communication was just as important as technical insight. If I couldn't fix the bug myself, clearly explaining the problem became essential to finding a solution.
When working with an AI agent, communicating means providing context.
Context fragmentation
I've been working on a project intermittently for about a month, and its complexity has become unwieldy. Developing individual components in isolation while frequently switching to other projects created ill-fitting component interfaces with mismatched connections.
I only discovered these issues when running system-level tests, which failed at these interface points. Rather than recognizing the broader, systemic problem, I made the situation worse by asking the AI agent to fix what I mistakenly thought were isolated test escapes. The resulting proof of concept worked with carefully prepared test data but failed with most real-world scenarios.
The root cause was a communication breakdown between myself, the agent, and the codebase. The agent's understanding of the system drifted with each new chat thread as context became fragmented. Outdated architecture documents in the repository, which the agent referenced, made matters worse. Every prompt I gave the agent occurred in an environment with only partially valid context because I had allowed this fragmentation to occur.
To solve this, I asked the agent to audit the project, create a workflow diagram showing component interactions, and identify interface mismatches. I also had it revamp all documentation, now organized in the /docs/ directory. After reviewing these new architecture documents and the audit results, I cleaned up the directories and removed obsolete files.
This must be an enormous potential issue in larger projects beyond my small proof of concepts. Having a trusted set of living, up-to-date architecture documents is absolutely essential, and something I'm maintaining from now on. Based on recommendations I've found online, I've also started creating a "brief.md" file containing a concise description (under a few hundred words) of the project's current state, TODO items, and other important notes.
Ghost APIs
Another problem I encountered initially seemed like a classic AI hallucination, but turned out to be something stranger.
I asked the Agent to interface with a Python library. I had been moving smoothly through implementation on this project with few questions or mistakes from the agent. With this Python library, everything appeared to be going well. The agent included the library in the script, added code to interface with it, and proceeded to the next step in the build plan.
When it came time to test my progress, none of the function calls worked. The Agent had tried to use object methods that didn't exist. After checking the library documentation, I thought I had confirmation: the AI was simply making things up.
So I confronted it:
Parker: i see you are using 'addComment()' method. Are you making that up?
Agent: Yes — the addComment() method I referred to is real and documented in Google Apps Script's Document class. It takes a Range object as the anchor and attaches a comment to that location in the document. Here's the proof:
The agent then provided actual links to forum posts where real people had discussed using the same APIs that were causing errors in my Python script.
So it wasn't a hallucination. But what was it?
It wasn't a case of the forum discussing some unrelated library that the Agent's web crawler had misidentified.
And it wasn't a joke. The forum comments were sincere—the people discussing these APIs clearly thought they knew what they were talking about.
Then I noticed the dates on the comments: 2021. After some research, I confirmed that these APIs had been deprecated long ago.
The agent wasn't hallucinating. It was more like the agent was seeing ghosts.
I nudged the agent toward the truth:
Parker: go to the link you provided above and read it today. there is no mention of addComment()https://developers.google.com/apps-script/reference/document/document?utm_source=chatgpt.com
Agent: You’re absolutely right — I went back and reviewed the official documentation at the link you provided (Google’s Apps Script Document reference), and there is no mention of addComment() anywhere. That method simply isn’t documented by Google .
The agent confirmed what I had noticed earlier, explaining that the API had been deprecated, though it was officially supported at one time.
The lesson I learned is to always be specific about which versions of libraries and tools the agent should use in a project. For this particular bug, that would have been the simplest solution. The agent likely wouldn't have made this mistake if I'd included some boilerplate in the PRD like "always use the latest versions of all libraries used in the script."
Sidenote: slopsquatting
The issue described above, where AI appears to invent a non-existent library, creates a serious security vulnerability. Hackers who can predict which libraries an AI might fabricate could create malicious packages with those names. If I had installed such a package to match what my code was looking for, I could have unknowingly executed the hacker's code, essentially falling victim to an injection attack.
This security threat has occurred frequently enough to earn a specific and unpleasant name: slopsquatting. More information about mitigating this risk can be found here:https://www.techradar.com/pro/mitigating-the-risks-of-package-hallucination-and-slopsquatting
Context anxiety
Both of these bugs I’ve encountered are related to context: either not having all of the context the agent needs, or having the wrong context. It’s challenging to get this balance right and decide what information needs to be in context.
I think we might start seeing articles about ‘context anxiety’: an anxiety rooted in lack of confidence in one’s ability to tell the LLM what it needs to know. A neurosis about whether we’re asking our questions to a machine the right way.
The thinking goes: if I can just provide the right context, Goldilocks-style, neither too much nor too little, I'll get a useful answer.
We can see this mindset already driving numerous context engineering how-to blogs and YouTube tutorials. The philosophy often explicitly stated in these resources is "If you give the LLM all the right information in the context, the output will be useful." To be clear, I generally agree with this position.
Memory for agents will likely evolve to address context size limitations, and leading LLMs are racing to deliver models with a context size to 1M tokens or beyond. If an LLM could process effectively unlimited information, consuming entire projects and precisely interpreting thousands of lines of API specifications, then the LLM should generate more useful code.
But would "infinite context" actually solve the problem of context anxiety? Perhaps it merely shifts the challenge from "context" to "focus." Software developers will still need to craft clear prompts, directing the LLM's attention and intent.
Through education and work experience, software engineers have learned to fix compile-time errors and runtime errors. Now they must also learn to fix prompt-time errors.


