Authors Are Not a Field.
Isolate a concern into its own schema when it has identity, a lifecycle, or gets reused — and leave it inline when it doesn't.
- An author or a category is an entity — identity, lifecycle, reuse — not a field. Model it as a reference.
- The rule: isolate a concern when it has identity, a lifecycle, or gets reused; otherwise leave it inline.
- The bill is real — a join, an authoring click, possible orphans, and the temptation to over-model. Pay it on purpose.
- The typed, referenced version is also the one a system, or an agent, can resolve instead of guess at.
The fastest way to add an author to a blog post is to type their name into a field. Add authorName, authorBio, and authorAvatar to the post type, ship it, move on. It is also the decision you will be unwinding eighteen months later, across every document that field ever touched.
Most content-modeling advice stops at model your content first and leaves it there. The harder question — the one most CMS schema-design writing skips — is where the seams go: which concerns deserve to be their own thing, and which are just fields. This post is about one seam in particular, the one teams get wrong most often, because the wrong choice is the fast one.
Same author, two shapes. The first treats the author as a property of the post. The second treats the post as something that points at an author. The difference looks cosmetic on day one. It decides everything that happens on day five hundred.
An author exists independently of any one post. They get renamed when they marry. They want a profile page at /authors/maciej. They write under a second content type next quarter, and a third the quarter after. None of that is a fact about a post — it is a fact about a person, and a person has a lifecycle of their own. The moment a concern outlives the thing it is attached to, it has stopped being an attribute.
Here is the seam, in the stack I reach for. The inlined version puts the author's data on the post; the isolated version makes the author its own document and the post a reference to it.[1]
1import { defineType, defineField } from "sanity";2 3// The fast path: the author is three fields on the post.4export const post = defineType({5 name: "post",6 type: "document",7 fields: [8 defineField({ name: "title", type: "string" }),9 defineField({ name: "body", type: "array", of: [{ type: "block" }] }),10 11 // Inlined author — copied onto every post that this person writes.12 defineField({ name: "authorName", type: "string" }),13 defineField({ name: "authorBio", type: "text" }),14 defineField({ name: "authorAvatar", type: "image" }),15 ],16});1import { defineType, defineField } from "sanity";2 3// The author is its own document, with its own lifecycle.4export const author = defineType({5 name: "author",6 type: "document",7 fields: [8 defineField({ name: "name", type: "string" }),9 defineField({ name: "bio", type: "text" }),10 defineField({ name: "avatar", type: "image" }),11 defineField({ name: "slug", type: "slug" }), // now it can have a URL12 ],13});14 15// The post points at the author instead of carrying their data.16export const post = defineType({17 name: "post",18 type: "document",19 fields: [20 defineField({ name: "title", type: "string" }),21 defineField({ name: "body", type: "array", of: [{ type: "block" }] }),22 defineField({23 name: "author",24 type: "reference",25 to: [{ type: "author" }],26 }),27 ],28});Taxonomies are the same shape, one level up.#
Categories and tags fail the same way authors do, just more quietly. Inlined as a free-text array, a taxonomy drifts into near-duplicates within a month: "AI", "ai", and "Artificial Intelligence" all become separate buckets, and the archive page that filters on one of them silently misses the other two. Nobody decided that; the schema allowed it, so it happened.
1// Inline: every post invents its own spelling of every tag.2defineField({ name: "tags", type: "array", of: [{ type: "string" }] });3 4// Isolated: a category is a document; posts reference the canonical one.5export const category = defineType({6 name: "category",7 type: "document",8 fields: [9 defineField({ name: "title", type: "string" }),10 defineField({ name: "slug", type: "slug" }),11 ],12});13 14defineField({15 name: "categories",16 type: "array",17 of: [{ type: "reference", to: [{ type: "category" }] }],18});Once the taxonomy is its own module, the reuse payoff arrives for free. The same category document feeds posts, guides, and case studies — define it once, and three content types consume the one canonical list. Rename a category, and every surface that references it updates at once, because there is only one of it. That is the whole argument for isolation, stated in a sentence: one source of truth instead of three hundred copies that have to agree.[2]
What isolation actually buys you.#
The references-as-API-contract framing is worth one sentence here and no more, because it has its own post: an isolated entity is a contract every surface depends on, the same way your schema is the product. The concrete version is a table. Every row is a later question that an inlined field turns into a content-wide migration and an isolated entity turns into a single edit.
| Question | Inlined field | Isolated entity |
|---|---|---|
| Rename it | Find-and-replace across every document | Edit one document |
| Reuse across types | Copy the fields, hope they match | Reference the same module |
| Referential integrity | None — strings drift apart | Enforced by the reference |
| Give it a page (/authors/x) | Nothing to hang a route on | The entity is the route |
| Dedupe | Manual reconciliation | There is only one |
| Type the UI consumes | string blob, guessed at | A known, typed shape |
This is also the kind of change you can make after the fact. Extracting an inlined entity from a live content model — turning a copied-everywhere string into a referenced document without freezing the CMS while editors keep shipping — is a thing that can be done in-flight. It is more work than getting the seam right on day one, but it is not a rewrite. The point of the table is to get it right the first time so you never have to.
And what it costs — so you isolate on purpose.#
Isolation is not free, and a post that pretends it is would be selling you something. The bill has three line items. There is the join: the post no longer carries the author's name, so every query that wants it has to resolve the reference, and your render layer has to handle the moment before it resolves. There is the authoring click: editors now pick an author from a list instead of typing one, which is better for consistency and worse for the editor in a hurry who just wants to type a name. And there is the orphan: delete an author who still has published posts, and those posts now point at nothing — referential integrity you have to design for, not assume.
The fourth cost is the one that catches people who have just learned the rule: over-modeling. Not every string is an entity. A one-off CTA label, a hero subtitle, a footnote — these have no identity, no lifecycle, and no reuse, and wrapping each one in its own schema document buys you nothing but clicks. The rule cuts both ways.
Why a typed entity is the version a system can read.#
One last reason, and it is the one the AI-ready content conversation usually skips straight to without doing the work underneath. An inlined name is a string an agent can parse but has nothing to resolve it to — no entity to count posts against, to dedupe against, or to link. A referenced, typed author is a stable referent a system can resolve, count, and generate against. Clean isolation is not a separate AI project; it is the quiet prerequisite that makes one possible. The craft is the point, and the agent is just the most demanding consumer of it.
This blog dogfoods half of that today and is honest about the other half. The body of every post you are reading is a typed, discriminated union of content blocks — each block a known shape the renderer switches on, not a freeform blob. But the BlogPost model has no author entity at all yet; the byline lives in presentation, not in the content model. By its own rule, that is the seam I would add next — an author document and a reference — the day this blog has a second author to point at. Identity, lifecycle, reuse: it would finally have all three. Until then, an attribute is the honest shape.
Building something on a content graph?
I help technical product teams model, federate and validate content at scale. Let's talk about your architecture.