The legal-AI reference corpus.
Legal AI work isn't really about prompts. The useful layer is the reference material the model leans on, the validation gates the output has to pass, and the loop that feeds reviewer edits back as signal. That is what keeps the drafting grounded.
- Where it came from
- The LL.M thesis on copyright and generative AI in Sri Lanka, then the work of running an AI legal-tech division.
- What it covers
- The reference layer that sits underneath the correspondence and deal-desk systems.
- What it doesn't contain
- No privileged content. Only reviewer-approved patterns and the reasons behind reviewer edits.
- Hardest open problem
- Measuring judgment, not just style.
Why generic output fails
Legal drafting is full of small conventions that matter: firm voice, British English, capitalised Clause references, fixed sign-offs, defined terms, standing positions on indemnity language, and disciplined caveats around forward-looking statements. A generic model can produce fluent legal prose and still miss the house style or the review position that actually matters.
A reference corpus is what stops a model from sounding like every other firm's model.
The fix is not a longer system prompt. It is a structured body of approved reference material that the model retrieves from, plus validation that the model used it.
Source selection
A useful corpus should contain approved templates, neutral clause patterns, drafting rules, document-type structures, cleared examples, and reviewer-approved corrections. It should exclude material that belongs to a live matter or a private client instruction unless that material has been properly cleared for reuse.
The hard work is curation, not collection. Most internal drafting archives contain a lot of material that should not be reused - half-finished drafts, single-client positions, time-sensitive carve-outs, language that was approved under particular pressure. The corpus has to be a deliberate library, not an attic.
Clause taxonomy
Clause material is more useful when it is organised by function rather than by document title. A jurisdiction clause and a confidentiality clause demand different review questions. Building the taxonomy around those questions, rather than around the documents the clauses appear in, makes retrieval and review both stricter and more reusable across document types.
Validation gates
Validation should test whether the output followed the source record, used the correct document type, surfaced unknowns rather than inventing them, applied standing checks, and reached an execution-ready standard only when the record actually supports it. The gates are simple to describe and hard to enforce - they need to fail loudly, not generously.
The feedback loop
Rejected drafts and reviewer edits are high-value data, but only after they are stripped of confidential matter detail. The reusable signal is the reason for the edit, not the edit itself. Categorised edit reasons feed back into the corpus as drafting guidance, not as copied text.
Edit reasons travel; client text does not. That is the discipline that lets the corpus get better over time without leaking.
Open problem: measuring judgment
The hard part is measuring judgment. A system can pass style checks and still miss the legal or commercial point. The next useful benchmark is reviewer disagreement: which edits changed legal position, which changed tone, and which changed document readiness. Tone and readiness can be measured; legal-position changes are the ones that actually carry stakes.
This note describes the method, not the contents. The reusable material and the review examples remain private - the public point is the discipline of building a reference layer, not the layer itself.
Next note
Regulation, AI accountability, and the human layer