Prompt Mass: Characterizing What Makes a Good Prompt When Writing with AI
Everyone has access to the same AI writing models, yet the quality of AI-assisted writing ranges from sharp to slop. This post argues that the gap comes down to how much of the final text is determined by the model's training versus your prompt. When a prompt is light—little constraint, lots of room to drift—the model leans on its defaults and produces generic content. When a prompt is heavy—tight constraints, not much wiggle room—the output is pulled toward your intent. I call this prompt mass: a way to think about what are good prompts when using LLM for writing.
Slop and the Gap in AI Writing Quality
When we prompt a model to write, the result is determined by two sources:
- The model’s parameters (what it learned during training)
- The prompt (the information and constraints we provide)
If our prompt does not include many conditions on the output, then most of what determines the text will come from the model’s training. That training is shared across users, and different models often have similar defaults1. So when the prompt has little influence, the output tends to converge toward what is generic by definition.
Even if we had a model with the writing ability of a Nobel laureate, a light prompt would still push it toward broadly similar prose for everyone. The problem is not only model capability. It is also how much direction we give.
But what makes one prompt better than another? That is what I will try to pin down here.
A Demonstration
Let’s say I ask an AI (Gemini 3 Pro) to write a paragraph explaining how the heavier the conditioning you include in your prompt the better the result. What would that look like?
Each option below asks for the same thing (two sentences), but I progressively add different kinds of constraints.
Option 1: Bare request
Option 2: Adding the core idea
Option 3: Adding purpose and mechanism
Option 4: Adding context and metaphor priming
[The first part of this post pasted here]
I believe that language affects how we think and so I want in this blog post to convince the community that we should try to come up with better terms to define the spectrum of information that a person provides in the instruction when writing using AI. I also aim to propose a metric that I hope that people use that quantifies this with the concept of instruction information—a measure of how much the prompt conditions the model's output.
In the second part of the post, I want to demonstrate in a convincing manner that such prompts lead to better outputs. For that, I need 2 sentences that convinces why making prompts that include more conditioning over the output are better...
Prime the reader for a later "mass/weight" concept by using a subtle physics metaphor (weight/scale/gravity) for how constraints pull the output toward the author's intent—but don't name the concept yet.
Analysis: How Input Shapes Output
The basic request is the same in all four options, yet the outcomes are noticeably different.
Option 1 is so light that the model invents its own answer: “few-shot examples,” “tone, audience, and format”—none of which I asked for.
Option 2 says “constraints… as conditioning,” and the output echoes that directly: “explicit constraints that serve as conditioning mechanisms.” But the phrasing (“high-utility result,” “sculpt the raw output”) is still generic AI-speak.
Option 3 introduces “a balance between the model’s parameters and the prompt,” and the output picks it up almost verbatim: “a negotiation between the model’s static parameters and your specific prompt.” It even extends the metaphor on its own with “tip the scales”—but I didn’t ask for that phrasing.
Option 4 explicitly primes “weight/scale/gravity,” and the output now threads that vocabulary throughout—”a delicate balance on a scale,” “gravity,” “weight,” “tip that scale,” “drift into the orbit”—setting up the reader for what comes next.
Different levels of effort went into the instructions, and that affected the output. But “effort” is vague. How should we think about this spectrum more precisely?
Defining Prompt Mass
I suggest here prompt mass as a useful term in this context.
In the same way that an object with a lot of mass pulls a balance to its side, a prompt with more constraints has more control over what the output looks like. In the same way that big objects often have more mass, longer prompts often have more mass. In the same way that a small object can sometimes have more mass than a big one, a short prompt can sometimes have more mass than a long one.
A light prompt gives the model latitude to drift toward its defaults. A heavy prompt constrains that space, pulling the output toward your intent.
Length is only a proxy. A prompt gets heavier when it makes more of the important decisions in advance. A prompt stays light when it leaves those decisions to the model.
Why This Framing Matters
With this concept, a few things become easier to see:
-
Why we get slop. When someone uses a light prompt, the model defaults to generic patterns. Through this lens, slop is not an inherent property of AI writing. It is what you get when the prompt does not carry enough mass.
-
Why autocomplete feels bland. When you accept an autocomplete suggestion, your only input is the text that came before. You add no constraints on what the next text should do. You get the average continuation.
-
“I used AI” stops being a binary. “Used AI” hides the important question: how much of the final text was determined by the model’s defaults versus your own constraints and judgment.
What the Guidelines Have in Common
There’s no shortage of advice on how to prompt well: write a draft first, specify your audience, use frameworks like CO-STAR, iterate. But these tips can feel disconnected. If prompt mass is a useful concept, it should tie them together.
Start with your own content
Many guidelines emphasize that you shouldn’t begin with a blank page. User studies found that people who injected their own content produced significantly higher-quality outputs than those who asked the AI to write from scratch.2
Through the lens of prompt mass, a draft adds weight because it imports your actual claims and structure.
Specify your audience, tone, and purpose
A draft carries your arguments but typically lacks the implicit context. Frameworks like CO-STAR remind you to make these elements explicit.3
Even just the audience matters. “Audience: PhD researchers” invites caveats. “Audience: 10-year-olds” forces simplification. Same draft, same instruction, different outputs.
Refine over multiple rounds
Rarely does a single prompt contain everything. This is what Mollick calls “Cyborg” behavior: moving back and forth between human judgment and AI generation.4
Each round of feedback adds mass. “Too formal” constrains the space. “Focus on the cost argument” constrains it further. The weight can build up across a conversation even if no single prompt is heavy on its own.
Put differently, these are just ways of moving decisions from the model back into the input. Once you see that, a lot of prompting advice starts to look the same.
Closing
Prompt mass won’t tell you exactly what to write in your prompts—that depends on your specific goals and context. But it might serve as a more abstract guideline, one that captures at a conceptual level what you’re trying to do when prompting an LLM for writing.
-
Wu, Black & Chandrasekaran (2024). Generative Monoculture in Large Language Models. Explores how LLMs narrow output diversity across users. ↩
-
Lee et al. (2024). Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the Wild. ↩
-
Teo, S. (2023). “How I Won Singapore’s GPT-4 Prompt Engineering Competition.” Towards Data Science. ↩
-
Mollick, E. (2024). “I, Cyborg: Using Co-Intelligence.” One Useful Thing. ↩