The Slop Ratio

Almost everything LLMs produce is mediocre. Not wrong, exactly. Just not good enough to use. You prompt. You get a response. You throw most of it away. You rephrase, re-prompt, cherry-pick a sentence or two, rewrite the rest. It’s less an oracle and more a conversation. You’re selecting fragments, not accepting output. Occasionally something genuinely good comes out. But if you’re honest about the hit rate on “output I would actually use as-is,” it’s disturbingly low. Something like 1% of all LLM tokens generated today create real economic value. The other 99% is slop, or answers to the wrong question.

I think this is the most underappreciated observation you can make about the economics of AI right now.

The entire industry prices LLMs by the token. Cost per input token, cost per output token, tokens per second. Infrastructure is optimized for throughput. But if only 1% of output is usable, then the effective cost per useful token is 100x the sticker price. A model that costs 10x more per token but produces usable output half the time isn’t expensive, it’s radically cheaper in practice. This is why people pay a premium for the best available model even when cheap alternatives exist. They’ve done the math intuitively, even if they haven’t done it on paper. The binding constraint on AI’s economic value is not compute. It’s output quality.

The natural instinct is to think: if 99% is waste, just redirect those tokens toward useful work. Scale the good stuff. But this doesn’t work for a simple reason. The tokens are wasted because the use case is low-value, the prompt is vague, or the output doesn’t connect to a real decision. More compute doesn’t fix any of those things. The tempting workaround is to brute-force it: generate more, filter more. But this just moves the bottleneck from generation to evaluation. You still need to identify which 1% is good, and that judgment doesn’t scale with compute. The filter is the hard problem. The curl project (one of the most widely used pieces of open-source software on earth) now reports that 20% of its security vulnerability submissions are AI-generated fabrications. They look legitimate. They’re well-formatted. And they’re completely made up. Disproving a single fake report takes 4.5 hours of human expert time. Cheap generation weaponized against expensive human evaluation. That’s the slop ratio in action.

And here’s the part nobody wants to hear: the slop ratio is equally on the user. Garbage in, garbage out is the oldest law in computing, and it applies to LLMs with a vengeance. Most people type vague, half-formed prompts and then blame the model when they get vague, half-formed answers. They give it no context, no constraints, no clear picture of what good looks like, and then wonder why the output is generic. The model is a mirror. It reflects the quality of thought you put into it. A precise question with real constraints and enough context will produce dramatically better output from the exact same model, at the exact same cost, on the exact same hardware. This is why an entire ecosystem has sprung up around compensating for bad input (system prompts, skill files, prompt libraries, custom instructions). The companies building the models have to ship a cheat sheet alongside them explaining how to talk to them. That should tell you where the real bottleneck is. The fix has to come from both sides. Better models, yes, but also better inputs. Better tokens, in both directions.

This 1% rule isn’t unique to LLMs. It’s true of people, too.

Think about your own work week. How much of it actually matters? Most of it is noise. Emails, meetings, context-switching, busywork, drafts that get thrown away, problems that didn’t need solving. Then maybe for a few hours (sometimes a few minutes) you do the thing that actually changes something. You make the decision. You see the connection. You say no to the wrong project. One conversation redirects the entire trajectory of a quarter. The person who works 80 hours a week doesn’t create 2x the high-value moments of the person who works 40. They produce roughly the same number, buried in twice as much noise. The people who seem disproportionately effective aren’t working harder. They’ve gotten better at recognizing which moments are the 1% and structuring their lives around them. That’s what experience is. That’s what taste is. A learned filter for what matters.

When a human produces a mediocre first draft over three days, we call it “work.” When an LLM does it in three seconds, we call it slop. But it’s the same thing. The slop ratio isn’t a flaw in the technology. It might be a fundamental property of how value gets created – through large amounts of exploration that produces small amounts of signal. The LLM might just be compressing what humans already do.

Now suppose the models get good, really good. Usable output goes from 1% to 50 or 80%. Something counterintuitive happens. The moment the answer becomes reliable, it also becomes cheap. If every model can produce the right answer, the answer is a commodity. And when the answer is a commodity, the value migrates to the question.

But it’s not just any question. When you can build anything, the most important thing is choosing what to build. And when you can build it any way you want, building it the right way comes down to taste. The scarce person isn’t the one who can execute, the model handles that. The scarce person is the systems thinker. The architect. The one with an opinionated view of what should exist and why. Someone who looks at a problem and sees not just a solution but the right solution, sometimes for a world that doesn’t exist yet but should. This has always been the most valuable skill in engineering, in product, in design. But it was muddied by the fact that execution was hard, so we conflated having good taste with being able to ship. AI strips that conflation away. Taste stands alone. This matters because of what happens when it’s absent. LLMs are trained on the sum of human output, so they naturally regress to the statistical average of what’s been done before. Ask one to design a SaaS product and you’ll get tiered pricing and a React frontend. Ask for a go-to-market strategy and you’ll get outbound SDRs. Every company using the same model gets the same probabilistically average answer. The bear case for AI isn’t that it fails. It’s that it works, and everyone uses it to build the same thing. Differentiation collapses and competition becomes purely about price. This is the Fermi paradox of AI: if building is nearly free, where is the explosion of amazing products? The answer is that building was never the bottleneck. Knowing what to build was.

You can teach someone to use an AI tool in an afternoon. You cannot teach them, in an afternoon, to know what’s worth building. That judgment comes from domain expertise, experience, curiosity. It’s the accumulated scar tissue of years of building things and getting it wrong.

Right now, this gap is hidden. The output is unreliable, so asking the wrong question and asking the right question often look the same. You get mediocre output either way and fix it by hand. The slop masks the difference between the people who know what to ask and the people who don’t. A 100x model removes the mask. With perfect output, a mediocre question gets a perfect answer to the wrong question.

If this plays out, the primary axis of economic inequality in knowledge work shifts. It moves from “can you do the work” to “do you know what work is worth doing.” Everyone will have access to the 100x model. Everyone will be able to generate code, analysis, writing, strategy. The execution layer compresses to near-zero cost. What remains is the ability to see what others don’t, ask what others won’t, and frame a question so precisely that the answer becomes obvious. Socrates built an entire philosophy around this idea. But it takes on new urgency when the answering machine becomes essentially perfect.

Most of the current AI tooling stack (prompt engineering and injection, chain-of-thought, retrieval pipelines, RL loops, etc.) exists to compensate for bad input and output. These are workarounds. A quality breakthrough makes them irrelevant overnight. And domain expertise becomes more valuable, not less. AI doesn’t replace the expert. It replaces the generic worker who executes known tasks. The expert, the person who knows which question to ask, becomes the bottleneck. Their leverage goes up because they now have a perfect execution engine at their disposal.

If you’re building AI products, the winning strategy is not more inference at lower cost. It’s reliability. The company that moves usable output from 1% to 10% creates more value than the company that makes tokens 10x cheaper. This is what we’re focused on at Prescene.