The craft · manual 01 For operators who want to run AI like an instrument Pairs with /comparison/advisor-vs-ai

Manual 01 · AI as a tool

Use AI well: prompt, flow, fact-check, stress-test.

An LLM is the most useful general-purpose instrument an operator has been handed in twenty years. Used badly, it produces confident, fluent, slightly wrong work that no one notices until the cost shows up. This is the eight-step process for using it well, and the second-brain stack Stan runs in the background.

AudienceFounder, COO, GM
Time to first run~90 minutes
ToolsChatGPT, Claude, Gemini
Open the process When AI is not the right tool
Operator using AI as an instrument.

What this work actually is

Using AI well is a different skill from using AI.

Using AI well is the discipline of treating an LLM as a high-bandwidth, low-context collaborator that needs the question framed, the brief loaded, the output stress-tested, and the receipts kept somewhere durable.

It is closer to managing a brilliant junior who shows up to every meeting with no memory of the last one. Most of the work is on your side: framing, context, verification, and the second brain that holds the file.

The mistake most operators make is treating an LLM like a search engine that writes. They type, they read, they paste. The output is fluent, the cost of being wrong is delayed, and the loop closes without anyone running the check.

An operator who runs AI well does five things differently: they know how the model actually works, they draw the full process before they prompt, they design a flow with explicit checkpoints, they stress-test every output against a fixed adversary prompt, and they store the inputs, outputs, and verdicts somewhere they can re-open in a month. That last one is the second brain. It is what turns AI from a productivity hit into compounding edge.

What you need before you start

Four prerequisites. Skip one and the process breaks at step five.

01 · Mental model

Know how an LLM actually works.

Token prediction over a frozen training set, with a context window that resets on every new chat. It does not retrieve facts. It generates plausible continuations. Internalize that and most prompting mistakes stop happening on their own.

02 · Three model accounts

One reasoning model, one writing model, one cheap model.

Stan runs Claude for reading and structural work, ChatGPT for drafting and ideation, and a smaller model for bulk operations. Three accounts, paid, no shared chat. The cost of all three is less than one bad hire-week.

03 · A second-brain store

One place to keep prompts, outputs, and verdicts.

Notion, Obsidian, Apple Notes, plain text. The format is irrelevant. The discipline is non-negotiable: every prompt that produced a useful output gets named, dated, and filed. Without it, you re-derive the same thing four times.

04 · A no-paste list

Decide in writing what never goes in.

Client material under NDA, board-confidential numbers, employee personal data, legal-privileged correspondence. Write the list before you start. The first leak happens when the list lives in your head.

The full process

Eight steps. Each one has a stress-test you can run before you ship.

  1. 01

    Draw the process before you open a chat.

    On paper, in a single page. Three boxes: input, transformation, output. What is going in, what the model is being asked to do with it, and what shape the answer needs to be in to be useful. Most failed AI runs failed at this step, before the first character was typed.

      Stress-test
    • If a smart colleague read the three boxes alone, could they predict what good output looks like?
    • Is the output shape something you can act on, or only something you can read?
  2. 02

    Load the brief like you would for a sharp junior.

    Role, audience, constraints, what good looks like, what to avoid, format. Write it once, save it as a snippet, reuse it. A brief that took twenty minutes to write becomes a brief that takes ten seconds to load.

    Stan keeps these as named blocks: "strategy memo brief," "board pre-read brief," "deal stress-test brief." Each one already contains audience, voice rules, output format, and the refusal rules.

      Stress-test
    • Could a different model run the same brief and produce something usable?
    • Is the format specified, or are you trusting the model to guess?
  3. 03

    Add the source material yourself. Do not let the model fetch.

    Paste the actual numbers, the actual emails, the actual transcript. Tools that fetch in the background are useful for some workflows and dangerous for decision work because the model will fill gaps with confident plausible content. If you control the input, you control the failure surface.

      Stress-test
    • Is every claim in the input something you can point to a source for?
    • If you removed the input, would the model still answer? If yes, you are getting a generated answer, not a read of your situation.
  4. 04

    Build a flow, not a single prompt.

    For anything that matters, the right shape is two or three prompts in sequence. Prompt one drafts. Prompt two stress-tests the draft against an explicit adversary. Prompt three reframes the original question and asks whether the draft is answering the right one.

    Same chat is fine. Different chats with copy-paste between them is fine. The discipline is the sequence, not the tool.

      Stress-test
    • Does the second prompt take an actual position against the first, or does it agree?
    • Does the third prompt question the framing, or only the wording?
  5. 05

    Fact-check every load-bearing claim before you act.

    Numbers, names, citations, regulations, prices, dates. The model is fluent and confident and will get specifics wrong without flagging it. Three checks: is the source real, does the source say what the model claims, is the source current. If any answer is no, the line gets cut or rewritten with the real source.

    Stan runs a single rule: nothing leaves the chat that has a number or a name in it without one round of human verification. The rule has caught a fabricated case citation, two wrong tax thresholds, and one entirely invented competitor in the last six months.

      Stress-test
    • For each named entity: opened the source and confirmed?
    • For each number: traced back to the original document and dated?
    • For each "studies show": named study, named author, named year, or cut.
  6. 06

    Run the adversary prompt against every output.

    One prompt, written once, used forever. "Read the above as the strongest possible critic. Where is the reasoning thinnest? Which claim breaks first under pressure? What is the question this answer is wrong about?" Run it on the output of step four. Edit until it survives.

      Stress-test
    • Did the adversary find anything? If no, the adversary prompt is too weak. Strengthen it.
    • Did you edit the draft, or did you defend it? Editing is the only signal that survived.
  7. 07

    File the prompt, the output, and the verdict.

    Same place every time. Title, date, brief used, model, what was good, what was wrong, what would change next time. Sounds slow. Pays back from week three onward, because every future prompt starts from a stronger base.

    Six months in, the second brain becomes the actual asset. The model can be swapped. The library cannot.

      Stress-test
    • Could you find this output in three months by searching one keyword you would actually remember?
    • Is the verdict line specific enough that future-you will trust or distrust it correctly?
  8. 08

    Decide who in the room sees it and how it is labeled.

    AI-assisted output that is not labeled creates two failure modes: the reader trusts it more than they should, or the reader trusts you less when they find out. Label the level of human edit on anything that goes outside the room. "Drafted with Claude, edited and verified by SK." One line. Saves a relationship later.

      Stress-test
    • If a recipient asked tomorrow whether this was AI-drafted, is the answer in the document?
    • Does the labeling match the actual amount of human work done? Inflating either direction is its own problem.

How to know your output is wrong

Six tells that what came back is fluent and false.

Tell 01

The output is suspiciously balanced.

Two sides, three considerations, neat conclusion. Reality is rarely balanced. If the answer is a perfectly weighted "on the one hand, on the other hand," the model is hedging because it does not know your situation. Re-prompt with the constraint that forces a position.

Tell 02

Every paragraph ends with a soft generalization.

"Ultimately, success depends on context." "It is important to consider all stakeholders." That is the model running out of specifics and reaching for filler. Cut the filler line. If the paragraph collapses without it, the paragraph was filler too.

Tell 03

A specific number appears with no source line.

Percentages, dollar figures, dates, market sizes. Always demand the source. The model fabricates with confidence. Most of the embarrassing AI failures in business documents are exactly this category.

Tell 04

The output uses your industry vocabulary perfectly but says nothing your industry would find new.

The model has learned to sound like your category. It has not done the thinking. Run step six again with a sharper adversary.

Tell 05

A named person is described doing something you cannot verify.

Quotes, attendance, statements, decisions. The model invents these confidently. Treat any human in the output as a fact-check obligation before you act on the line.

Tell 06

You read it twice and felt smarter, but cannot summarize the new claim.

That is fluency without information. The model produced an articulate restatement of what you already knew. Useful sometimes, but if you needed a new read, you did not get one. Reframe the question before re-running.

Tools and tactics

The second brain is the asset. The model is the tool.

Models will keep changing. The library you build around them is what carries year over year.

The Second Brain · tactic stack

Stan's running stack

The system Stan uses to keep AI work compounding instead of evaporating. Every prompt is a named block. Every output is dated and filed against the decision it served. Every verdict is one sentence written within forty-eight hours.

  • Single source of truth for prompts, in plain markdown.
  • Named brief blocks: strategy memo, board pre-read, stress-test, comp scan, term-sheet read.
  • Adversary prompt kept as a fixed file, edited only when it lets something weak through.
  • Decision log that ties the AI session to the decision that came after it.
  • Quarterly prune: anything not re-used in twelve weeks gets archived.

Documented in full inside the engagement · teaser here

Tactic 02

The named brief library

Five to ten reusable brief blocks covering the work you actually do. Loaded as the first message in every relevant chat. Stops the daily re-explanation tax that quietly costs founders an hour a week.

  • One brief per recurring output type.
  • Audience, voice, format, refusal rules, named in each.
  • Versioned: v1, v2. Note what changed and why.

Tactic 03

The adversary prompt

One paragraph, used after every load-bearing draft. Stan's version is roughly: "Read this as the strongest critic. Name the weakest claim, the wrong question, and the failure mode the author is not seeing." Used on every memo before it lands.

  • Refined when it lets something soft through.
  • Used on your own writing too, not only AI output.
  • Pairs with a fact-check pass, never replaces it.

Tactic 04

Verification cadence

A weekly thirty-minute slot where the prompts that produced load-bearing output get re-run, the outputs get checked against what actually happened, and the verdict line gets written. The cadence is what makes the brain compound.

  • Same time, same day, recurring calendar block.
  • Re-runs find drift early.
  • Verdict lines feed the prompt library.

Coming soon

Three rooms held open inside this manual.

These are the products that grow out of this page. Built when the engagement load justifies it. Listed here so the reader can see the road, and so the section has the placeholders the rest of the site expects.

In build

The Second Brain Framework

The full operator manual for the prompt and verdict library Stan runs. Notion template, plain-text fallback, weekly cadence, decision-log integration. Released when the version Stan uses has been stable for three months.

In build

The Prompt System

The named brief blocks and adversary prompts as a versioned, downloadable library. Strategy memo, board pre-read, term-sheet stress-test, hire-letter draft, exit conversation prep. Each one with the brief, the example output, and the verdict pattern.

Scoped

The AI-in-the-decision-room playbook

The narrow case where AI belongs inside a real decision and where it does not. Built from the engagements where the answer mattered. Released as a paid asset when the case library is large enough to be worth the price.

What this work is not

Where the manual stops and the comparison begins.

An LLM does not name the question your answer is wrong about.

This page makes you better at running AI. It does not turn AI into a private advisor. The structural read on a decision belongs to a human in the room, with skin in it, who has watched a thousand framings drift. The comparison page names that line in detail.

Read advisor vs. AI →
Escalate beyond the manual when
  • The question itself is the thing you are uncertain about.
  • The decision is irreversible inside one quarter.
  • The room around it has politics the model cannot see.
  • Wrong answer costs equity, talent, or relationships, not minutes.

When the manual is no longer enough

Run the eight steps on the question you have right now.
If the answer still does not land, bring the situation.

Application-gated. Personal reply within 48 hours. The first conversation names what the room has not been able to name.

Apply for advisory

Tier 01 from $2,500 · Tier 02 from $4,500 / month · All three tiers