Fake it 'til you make it
Using LLMs for spoofed data in development and testing
When you’re a small team, or maybe just one developer, project planning boils down to:
What component of the project should I work on first
When should I move away from that component and work on another bit
I’ve faced this challenge as a solo firmware engineer balancing the development of different code components and test infrastructure. To test one component, I need a base level of functionality in other components. Creating stub functions and fake data to establish a clean data flow has traditionally been my solution.
Recently, I’ve found another approach: generating test inputs and spoofed data using an LLM. This method has surprising benefits alongside significant drawbacks. I believe that LLMs’ tendency to “tell us what we want to hear,” which is a common and valid criticism, can actually be harnessed for good, at least in this specific use case.
Problem: I needed human text input for testing and development
I’ve been working on a project that’s centered around a text editor. The project’s LLM-driven agent makes observations on text in the editor and interacts with the user. Testing for this project has been a challenge because I needed text in the editor window that was relatively long (1K-10K words) and somewhat coherent. There are lots of ways of running tests with this system, but I had issues with all of them:
Use dummy text, like the Latin placeholder text used in website or graphic design tools. This wasn’t viable because I needed to test system responses based on realistic English language inputs that reflected real-world content.
Write text input manually and use that for all testing. This approach wouldn’t work because I needed to test the system with a variety of writing styles and subject matter.
Scrape text from web sources for internal testing. I avoided this approach because finding different text sources with the specific writing styles I needed was tedious. I felt constrained by available content and worried I might miss critical test cases.
Give up on the 1K-10K requirement and just test with snippets. This had potential because it was definitely easier to write a paragraph instead of an entire story/article. I actually tried this as test input at first until I realized that the tests I wanted to run required a text input length that was longer than I had bandwidth to write by hand.
Not seeing a path forward with the first set of options that came to mind, I settled on using an LLM to generate the text inputs based on parameters and guardrails. This system worked really well and led to a broader set of testing and feature development that hopefully I’ll write about sometime later.
Can we call this fuzz testing with AI?
While I’m stretching the definition a bit, using LLMs to spoof data streams resembles fuzz testing in security work. Fuzz testing bombards a system with high volumes of data or specially crafted inputs designed to test edge cases. These tests can reveal bugs when buffers overflow or memory pointers exceed their ranges. Such bugs may become security vulnerabilities, potentially exposing data or allowing attackers to force the system into undefined states.
In my case, I wasn’t hunting for security flaws. I simply wanted to stress test my project with diverse, valid inputs. Creating this test data manually would have taken days rather than minutes.
Other cases where LLM-spoofed data might help
While my example comes from a specific use case, these principles apply broadly. My system generated paragraphs of text, but LLMs could just as easily format data as JSON or structure it in any way needed for testing. Developers could use this approach to:
Generate data payloads for stubbed API returns
Populate test databases with realistic data
Simulate stateful responses in sequential test scenarios
The last bullet point represents where LLMs can provide unique value. You could develop an agent with specific prompts and memory that creates a responsive persona, one that dynamically interacts with a system under test.
Limitations on the concept
The main limitation of this testing approach is that LLM-generated content lacks the determinism of traditional test inputs. Even with strict system prompts and robust guardrails, the output may still require human oversight or post-processing to validate its suitability.
Cost presents another challenge when using LLMs for data generation, though several strategies can mitigate this issue. Selecting a cost-effective model that still delivers adequate quality and coherence is crucial. Additionally, deploying a local LLM within your development environment can eliminate per-token costs entirely compared to remote API calls.
Data spoofing itself isn’t new. Software and embedded engineers have long used simulated inputs during system development. What LLMs offer here is the ability to create data with controlled randomness that can effectively stress-test your system, challenge assumptions, and reveal hidden bugs.
