artka.dev — Notes from production

12 правил для CLAUDE.md: расширение Karpathy на ошибки 2026 года

a@artka.dev (Артём) — Sun, 10 May 2026 00:00:00 GMT

Over four months after Karpathy’s January thread, the CLAUDE.md template grew from 4 rules to 12. I ran the expanded set on typical blog tasks and several work repos — the frequency of silent Claude Code errors drops noticeably. The eight added rules cover what didn’t exist as a class of problems in January: long-running agent loops, cross-session flows, shallow tests, quiet failures instead of explicit errors. I opened my own CLAUDE.md for this blog — Karpathy’s four original rules are already there in Code Standards and Prohibitions, the eight added ones are not. I’m going through each one and figuring out where it makes sense to insert them.

1. What happened over four months

In late January, Andrej Karpathy published a thread with three complaints about Claude as a code-writer:

silent wrong assumptions — the model fills in context without asking;
over-complication — adds layers of abstraction that nobody asked for;
orthogonal damage — touches code it shouldn’t have touched.

Forrest Chang packaged the complaints into a CLAUDE.md with four behavioral rules and committed it to GitHub. The repo exploded — by May, over 100,000 stars, the fastest-growing single-file project of the year. Then the template grew an extension: eight additional rules that cover what wasn’t a focus in January because the Claude Code landscape didn’t exist the way it does now.

flowchart LR
  subgraph jan["January 2026"]
    K["Karpathy: thread with three<br/>failure modes"]
    K --> F["Forrest Chang:<br/>4 rules in CLAUDE.md"]
  end
  subgraph may["May 2026"]
    F --> N["New failure modes:<br/>agent loops, multi-codebase,<br/>shallow tests, silent failures"]
    N --> M["+8 rules,<br/>total 12"]
  end

2. Karpathy’s four rules

This is the foundation. Without it, any superstructure loses half its meaning.

#	Rule	What it covers
1	Think Before Coding	Silent guesses. Voice assumptions, ask when unclear, push back when there’s a simpler way.
2	Simplicity First	Minimum code that solves the task. No speculative abstractions “for the future”.
3	Surgical Changes	Touch only what’s needed. Don’t “improve” neighboring code, don’t reformat what you weren’t asked about.
4	Goal-Driven Execution	Describe success criteria, not step-by-step instructions. Strong success-criteria let the model iterate.

In my Astro blog’s CLAUDE.md, these four are covered not as a separate section, but through Code Standards → Functional Style (rules 2 and 3 — no classes, no extra abstractions) and Prohibitions (rule 3 — a “don’t do” list). The rules themselves aren’t duplicated as text, but their consequences land in the context.

3. Where the Karpathy template falls short

Four gaps I observe in real work:

Gap	What breaks	Which added rules cover it
Long-running agent tasks	Multi-step pipeline drifts, burns tokens, loses context	6 (budgets), 10 (checkpoints), 12 (loud)
Multi-codebase consistency	In a monorepo “match existing style” is ambiguous — Claude picks randomly or averages	11 (conventions), 7 (surface conflicts)
Test quality	“Tests passed” becomes the goal; Claude writes tests that won’t fail even on broken logic	9 (intent over behavior)
Prototype vs production	“Simplicity First” overdoes it early on, when you need 100 lines of scaffolding to probe	(not covered by 12 rules — separate)

The last gap stays alive. Either you turn Simplicity on or off — there’s no middle mode in CLAUDE.md.

4. Eight added rules

One by one, with the moment that triggered each.

4.1. Rule 5 — Use the model only for judgment calls

If the answer is known from a status code or data schema — that’s not the model’s job. Real case from my practice: code called Claude to decide whether to retry an API call on 503. Worked for two weeks, then started flaking because the model was reading the request body as context for the decision. Retry policy became random because the prompt was random.

Frame: Claude is for classification, extraction, drafts, summarization. Not for routing, retries, deterministic transformations. If a status code already answers the question — regular code answers it.

4.2. Rule 6 — Token budgets are not advisory

Without a budget, the loop dumps 50,000 tokens. Hard version: 4,000 per task, 30,000 per session. Approaching the boundary — sum up and restart the session.

Typical case: 90-minute debugging session with the same 8 KB error message. By the end — re-proposing fixes you rejected 40 messages ago. The model happily iterates on a lost track. A budget would have killed the loop at minute 12.

4.3. Rule 7 — Surface conflicts, don’t average them

If the codebase has two error-handling patterns — try/catch and global boundary — Claude writes code that does both. Double handlers. Symptom: the error gets swallowed twice.

Rule: when there’s a contradiction, pick one (the newer or more tested one), explain why, mark the second for cleanup. Averaged code that satisfies both rules is the worst possible.

4.4. Rule 8 — Read before you write

Karpathy says “don’t touch neighboring code”. Doesn’t say — read it before adding yours. Real case: Claude added a function next to an already-existing identical one without reading the file. Import order won — the old, source-of-truth for six months, lost to the fresh same-named one.

Frame: before adding code to a file — read the exports, the nearest calling code, and common utilities. “Looks orthogonal to me” is the most dangerous phrase in a codebase.

4.5. Rule 9 — Tests verify intent, not just behavior

A test expect(getUserName()).toBe('John') means nothing if the function returns a constant. Tests should fail when business logic changes, otherwise they’re testing that a function exists, not that it’s correct.

Typical example: 12 tests on an auth function, all green, auth is broken in production. Tests checked that the function returns something, not that it returns the right value.

4.6. Rule 10 — Checkpoint after every significant step

A multi-step refactor across 20 files breaks on step 4, Claude keeps going on broken state. By the time you notice, steps 5 and 6 are already done on top of broken — untangling takes longer than redoing from scratch.

Rule: after each significant step — summarize what’s done, what’s verified, what’s left. If you lose track — stop and recap.

4.7. Rule 11 — Match the codebase’s conventions, even if you disagree

Claude introduces hooks into a codebase of class components. Technically works. Breaks the testing pattern built for componentDidMount. Half a day to delete and rewrite.

Rule: inside a codebase, conformance matters more than taste. Disagreement is a separate conversation, not a silent fork. Snake_case vs camelCase, classes vs hooks — pick what’s there, not what’s better.

4.8. Rule 12 — Fail loud

The most expensive errors are the ones that look like success. “Migration complete” when 14% of records were silently skipped. “Tests passed” when some were skipped. “Feature works” if an edge case you explicitly asked to check wasn’t.

Rule: when uncertain — raise the question, don’t hide it. Default to surfacing uncertainty, not concealing it.

5. What doesn’t work (what got filtered out)

The template is valuable not just for what’s in it, but for what was filtered out when trying to expand:

Rules from Reddit and X. Most are reformulations of Karpathy or domain-specific (“always Tailwind”). Don’t generalize.
More than 12 rules. On sets of 14+ rules, compliance drops: important points drown in noise. The 200-line ceiling (including stack, commands, prohibitions) is real.
Tool-specific rules. “Always use eslint” fails silently if eslint isn’t installed. Better — capability-agnostic: “match the enforced style”.
Examples instead of rules. One example eats ~10 rules’ worth of context, and the model over-fits on specifics. Rules are abstract and portable.
Soft language. “Be careful”, “think hard”, “really focus” — compliance ~30%. Not testable. Replace with concrete imperatives: “state assumptions explicitly”.
Identity prompts. “Be a senior engineer” doesn’t work: the model already thinks it’s a senior. The gap between “thinking” and “doing” closes with imperatives, not identity.

6. Checking against my own CLAUDE.md

I opened this blog’s file (191 lines) and went through all 12 rules. Here’s the picture:

Rule	In my CLAUDE.md	Where
1. Think before coding	indirectly	via `architect → critic` workflow in agent stack
2. Simplicity	yes	`No classes`, `Immutability by default`
3. Surgical changes	yes	`Prohibitions` (deprecated `@astrojs/tailwind`, `node:*-alpine`, etc.)
4. Goal-driven	indirectly	via subagent structure, not as separate rule
5. Judgment-only	no
6. Token budgets	no
7. Surface conflicts	no
8. Read before write	partially	GitNexus section requires impact analysis before edits
9. Test intent	no
10. Checkpoints	no
11. Match conventions	yes	`Code Standards → TypeScript / Astro / Git`
12. Fail loud	no

Result — four covered, two partial, six missing. The file is effectively Karpathy-level, without the 2026 superstructure.

Which of the missing ones make sense to add specifically for an Astro blog with publications through an admin panel:

Rule 6 (budgets) — yes, my agents do long-running tasks (generating EN translations via pnpm translate, migrations). Without a budget, a session can drift.
Rule 9 (test intent) — yes, I have Vitest and Playwright, the risk of shallow tests is real.
Rule 10 (checkpoints) — yes, multi-step tasks on schema + migrations + UI updates regularly take half an hour of agent work.
Rule 12 (fail loud) — yes, in the admin panel “saved” often doesn’t mean “published”, need explicit surfacing.

Rule 7 is less acute for a single project. Rule 5 is covered by the fact that there’s no AI routing in the blog runtime — the model doesn’t make decisions for code.

7. How to add — without bloat

Discipline:

Don’t exceed 200 lines total. Counting stack, commands, prohibitions, rules. I’m at 191 now — adding four rules means moving part of Homepage or GitNexus section to @docs/... via Claude Code @-import.
Each rule answers “what error does it prevent”. If it doesn’t — delete it.
Capability-agnostic formulations. “Match the enforced style”, not “use prettier”.
Imperatives, not wishes. “State assumptions explicitly”, not “think carefully”.
Test it. Run a typical task before and after. No difference — the rule didn’t work in your context, delete it.

Six rules tailored to real errors beat twelve generic ones.

Conclusion

Karpathy pinned three code-writing failure modes from January. Forrest Chang packed them into four rules, and the community grabbed the template. The expansion to 12 came from the Claude Code landscape being different by May: multi-step agents, hook cascades, skill conflicts, cross-session flows. The eight added rules cover new gaps without replacing the original ones.

CLAUDE.md is not a wishlist, but a behavioral contract against specific errors you’ve already seen. Someone else’s template is useful as a starter. After that — filter it for your failure modes, not the other way around. Six rules precisely chosen beat twelve copied ones.

Sources:

Andrej Karpathy — original thread on X (January 2026) — three code-writing failure modes
forrestchang/andrej-karpathy-skills — public repo with the basic 4-rule template
Anthropic Claude Code docs — CLAUDE.md — official documentation on file structure, advisory, ~80% compliance

ds4 by antirez: local coding agent on DeepSeek V4 Flash that runs on MacBook

a@artka.dev (Артём) — Sat, 09 May 2026 00:00:00 GMT

Garry Tan and Bindu Reddy on May 9, 2026 simultaneously shared the same news: Redis creator Salvatore Sanfilippo (antirez) released ds4 — an inference engine in C+Metal that runs DeepSeek V4 Flash (284B MoE, 1M context) on a laptop. Not “technically possible,” but “works with coding agents at 26 t/s”. I figured out what’s under the hood and how to use it as a local backend for Claude Code.

1. What happened in two weeks

On April 24, 2026, DeepSeek released the V4 series. V4 Flash is an efficiency model: 284 billion parameters total, 13 billion active (MoE), 1 million token context. Before this, models of this size only lived in the cloud.

Antirez looked at this and made a bet that universal runners can’t make. He forked llama.cpp, spent two weeks inside it, understood the geometry of V4 Flash, threw out everything unnecessary, and wrote an engine from scratch in 4 files: ds4.c (~ inference), ds4_metal.m (Metal kernels), ds4_server.c (HTTP server), ds4_cli.c (REPL). On the outside, all of this speaks two protocols simultaneously: OpenAI Chat Completions (/v1/chat/completions) and Anthropic Messages (/v1/messages). That is, it connects to any agent that knows one of them.

Results that the author measured himself:

Machine	Quant	Prompt	Prefill	Generation
MacBook Pro M3 Max, 128 GB	q2	short	58.52 t/s	26.68 t/s
MacBook Pro M3 Max, 128 GB	q2	11709 tokens	250.11 t/s	21.47 t/s
Mac Studio M3 Ultra, 512 GB	q2	short	84.43 t/s	36.86 t/s
Mac Studio M3 Ultra, 512 GB	q4	12018 tokens	448.82 t/s	26.62 t/s

26 tokens per second of generation — this is not “you can take a look,” this is working speed for a coding agent that writes, reads files, calls tools. On a long prompt, generation drops to 21 t/s, but thanks to KV-cache on disk, this pays for itself by the third request in the same session.

2. Three engineering tricks that make this possible

I carefully read the README and AGENT.md of the repository, and below is the most essential, without which ds4 wouldn’t work.

2.1. Asymmetric 2-bit quantization

The standard approach to 2-bit quantization is to compress everything down to 2 bits, and then the model starts hallucinating in tool calling, confusing arguments, and forgetting the schema. Antirez did it differently: only MoE experts on the routed path are quantized (up/gate in IQ2_XXS, down in Q2_K) — because they take up most of the weight (the model is 284B, and almost all of it is experts). Shared experts, projections, routing — remain in Q8. These are components where loss of precision is expensive.

Effect: 2-bit quantization weighs 81 GB and fits in 128 GB of unified memory on MacBook Pro M3 Max, while reliably working in coding agents (validated by tests against official DeepSeek API logits).

2.2. KV-cache as first-class disk citizen

The main pain of stateless API protocols like Chat Completions: the client sends the entire history every time, and the server must prefill it from scratch. Claude Code, for example, sends ~25K tokens of system prompt at startup. On local hardware, this is tens of seconds before the first token.

Ds4 solves this head-on: after successful prefill, the session state (KV checkpoint) is serialized to a file, the key is SHA1 of token IDs. When the next request comes with the same prefix, the server takes the checkpoint from disk and skips prefill. From the README:

The KV cache is actually a first class disk citizen. <…> Modern MacBooks have fast SSDs and compressed KV caches like the one of DeepSeek v4.

In practice, this means the difference between “4 seconds to first token on repeat call” and “60 seconds”. The disk here is not swap under pressure, but logical storage: SSDs are fast enough, KV in DeepSeek V4 compresses well, and the characteristic “same system prompt + changing tail” precisely describes how a coding agent works.

2.3. Metal-only and one model at a time

No CUDA, no CPU fallback for production (the CPU path exists only for correctness checks and currently crashes at the macOS kernel level due to a VM bug — antirez writes about this honestly). No attempt to make a “universal runner”. Only Apple Silicon, only this one model, and so on until a new version of V4 Flash appears or a much better model of the same class.

The cost is a narrow bet. The benefit is that you don’t need to maintain a matrix of (model × hardware × quant), and you can optimize Metal kernels for the exact geometry of layers in this specific model.

3. What I’ll need: hardware, model, an hour of time

I plan to deploy this on a MacBook Pro M3 Max, 128 GB (the minimally viable configuration according to README). I don’t have it yet, and in this section — an honest plan of what I’ll do when the hardware arrives; the numbers are taken from antirez’s benchmarks, but I want to double-check them on my instance.

Minimum requirements by my estimates:

macOS on a current version (there’s a VM bug in the CPU path, but the Metal path is unaffected).
Apple Silicon with 128 GB+ unified memory. M3 Max or M3 Ultra.
~100 GB free space: 81 GB the model itself in Q2 + space for KV-cache on disk. For Q4 quantization — 256 GB+ RAM and ~150 GB on disk.
Xcode Command Line Tools (for clang/Metal headers).
~30–60 minutes to download the model (depends on your connection).

What might not be enough for beginners: 128 GB unified memory is the level of top-spec MBP M3 Max or Mac Studio. On a 64 GB Mac, Q2 won’t work: the model simply won’t fit in RAM. This is not “slow,” this is “no way.”

4. Installation step by step

The commands below are what I’ll do on day one, based on the README instructions. Where the description lacks specifics — I’ve added my own comments.

4.1. Building

# 1. Склонировать репозиторий
git clone https://github.com/antirez/ds4.git
cd ds4

# 2. Скачать 2-битный квант (81 GB; для 128 GB MBP)
./download_model.sh q2

# Скрипт качает с huggingface.co/antirez/deepseek-v4-gguf,
# поддерживает резюм через curl -C - — можно прервать и продолжить.
# Если нужен 4-битный квант (для Mac Studio 256+ GB), используй ./download_model.sh q4.

# 3. Собрать
make

# Проверить, что собралось:
./ds4 --help
./ds4-server --help

Building is a regular make, no CMake, no pkg-config. This is intentional: the project has no dependencies outside the Apple SDK.

4.2. First run in REPL

./ds4 -p "Объясни Redis streams в одном абзаце."

Without -p, it launches an interactive session with commands /help, /think, /think-max, /nothink, /ctx N, /read FILE, /quit. This is good for checking that the engine is alive and for comparing generation speed against the claimed 26 t/s.

4.3. Running as HTTP server

This is the mode where ds4 becomes a local backend for agents:

./ds4-server \
  --ctx 100000 \
  --kv-disk-dir /tmp/ds4-kv \
  --kv-disk-space-mb 8192

Parameters:

--ctx 100000 — context window of 100K tokens. The full 1M context takes ~26 GB just for the indexer; on a 128 GB Mac where 81 GB is already taken by the model, this leaves no room for KV-cache. 100–300K is a reasonable compromise.
--kv-disk-dir /tmp/ds4-kv — directory for disk KV-cache. I’d move it to a fast SSD (external or built-in — both are fine).
--kv-disk-space-mb 8192 — limit on cache size. 8 GB is enough for one or two active projects; for larger sessions — increase it.

The server listens on 127.0.0.1:8000. Endpoints:

Endpoint	Protocol
`POST /v1/chat/completions`	OpenAI Chat Completions (+ tools)
`POST /v1/completions`	OpenAI legacy completions
`POST /v1/messages`	Anthropic Messages (for Claude Code)
`GET /v1/models`	list of models

Authentication via static API key (by default accepts any; README recommends dsv4-local).

5. Connecting as a coding agent

This is the part I dug into the topic for. All three methods below work simultaneously — each agent talks to the same ds4-server.

5.1. Claude Code → Anthropic-compatible endpoint

Claude Code can talk to any backend that exposes the Anthropic Messages API. Create a wrapper ~/bin/claude-ds4:

#!/bin/sh
unset ANTHROPIC_API_KEY

export ANTHROPIC_BASE_URL="${DS4_ANTHROPIC_BASE_URL:-http://127.0.0.1:8000}"
export ANTHROPIC_AUTH_TOKEN="${DS4_API_KEY:-dsv4-local}"
export ANTHROPIC_MODEL="deepseek-v4-flash"

# Подменяем все алиасы Sonnet/Haiku/Opus на локальную модель —
# чтобы /model в Claude Code не дёрнул облачный fallback.
export ANTHROPIC_DEFAULT_SONNET_MODEL="deepseek-v4-flash"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="deepseek-v4-flash"
export ANTHROPIC_DEFAULT_OPUS_MODEL="deepseek-v4-flash"
export CLAUDE_CODE_SUBAGENT_MODEL="deepseek-v4-flash"

# Отключаем телеметрию и не-стриминговый fallback.
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
export CLAUDE_CODE_DISABLE_NONSTREAMING_FALLBACK=1
export CLAUDE_STREAM_IDLE_TIMEOUT_MS=600000

exec "$HOME/.local/bin/claude" "$@"

chmod +x ~/bin/claude-ds4 — and run Claude Code as claude-ds4 instead of claude. All requests will go to the local ds4 server. A subtlety that antirez himself points out:

Claude Code may send a large initial prompt, often around 25k tokens, before it starts doing useful work. Keep --kv-disk-dir enabled.

Without disk KV-cache, cold startup of Claude Code will take a minute or more; with cache — after the first startup, subsequent ones will restore from disk.

5.2. opencode

opencode is configured via ~/.config/opencode/opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ds4": {
      "name": "ds4.c (local)",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://127.0.0.1:8000/v1",
        "apiKey": "dsv4-local"
      },
      "models": {
        "deepseek-v4-flash": {
          "name": "DeepSeek V4 Flash (ds4.c local)",
          "limit": { "context": 100000, "output": 384000 }
        }
      }
    }
  },
  "agent": {
    "ds4": {
      "description": "DeepSeek V4 Flash served by local ds4-server",
      "model": "ds4/deepseek-v4-flash",
      "temperature": 0
    }
  }
}

limit.context: 100000 must match the --ctx with which ds4-server starts — otherwise the server will truncate, and opencode won’t know about it and will send the next message expecting a non-working length.

5.3. Pi (antirez’s mini-agent)

If you use Pi — the format is slightly different, config in ~/.pi/agent/models.json:

{
  "providers": {
    "ds4": {
      "name": "ds4.c local",
      "baseUrl": "http://127.0.0.1:8000/v1",
      "api": "openai-completions",
      "apiKey": "dsv4-local",
      "compat": {
        "supportsStore": false,
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": true,
        "supportsUsageInStreaming": true,
        "maxTokensField": "max_tokens",
        "thinkingFormat": "deepseek",
        "requiresReasoningContentOnAssistantMessages": true
      },
      "models": [
        {
          "id": "deepseek-v4-flash",
          "name": "DeepSeek V4 Flash (ds4.c local)",
          "reasoning": true,
          "contextWindow": 100000,
          "maxTokens": 384000,
          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
        }
      ]
    }
  }
}

cost: 0 — this is not marketing, it’s the truth. Each request costs electricity and SSD wear, not tokens.

6. Where this will break (important pitfalls)

Real limitations I’ll run into and how to work around them.

Context window must be agreed upon everywhere. You start the server with --ctx 100000, set limit.context: 100000 in opencode, don’t go beyond that in Claude Code’s system prompt. If Claude Code’s init-prompt is ~25K, then 75K remains for the project — realistically enough for a medium codebase, but not for huge repositories.

Disk KV-cache is “tied” to the exact prefix. Any edit to the system prompt, to CLAUDE.md, to the first messages — invalidates the checkpoint. This is not a bug, it’s by design: matching is done by SHA1 of token IDs. If you often edit CLAUDE.md, expect cold starts. Solution — commit the system contract and don’t edit it in every session.

MTP/speculative decoding doesn’t provide much speedup yet. The README directly states: “currently provides at most a slight speedup”. Don’t count on doubling speed from MTP — the current implementation is correctness-gated and often triggers partial accept on complex prompts.

One live KV-cache in memory. The server currently doesn’t batch independent requests. If two agents make requests simultaneously — the second waits for the first. This is a normal trade-off for a local single-user setup, but if you want parallel multi-tenancy on one Mac — ds4 isn’t there yet.

CPU mode crashes on fresh macOS. This is about the debug path, not production (Metal-only is the main target), but if you habitually want to compare inference on CPU — don’t: kernel panic, you’ll need to reboot.

7. What this means: vertical inference engines as a trend

The main thing is not ds4 itself, but the pattern that antirez formalized.

Local inference currently looks like “universal runner + thousands of models in GGUF + wrappers of varying freshness”. It works, but moves at the speed of the least popular model: it’s easier to speed up Llama 3.1 in llama.cpp than to add efficient support for DeepSeek V4 — because in the first case the layer structure matches twenty other models, and in the second — appears once.

Antirez shows the opposite path. One engine — one model — one scenario (coding agent). Next you need three things, and all three are in the product:

Inference engine with HTTP API.
GGUF specially prepared for this engine and its assumptions.
Tests and validation on the coupling with specific agent clients.

If this bet works (and the benchmarks say it does), the future of local inference is not “yet another abstraction on top of abstraction,” but “each important model gets its own ds4-like project”. When V4.1 or V5 comes out, someone from the community makes a new engine, new GGUF, new tests, and in two weeks users already have a working local setup. Old engines retire along with old models.

And second. In the README, antirez explicitly writes:

This software is developed with strong assistance from GPT 5.5 and with humans leading the ideas, testing, and debugging.

Two weeks from forking llama.cpp to a production-ready narrow engine with server API — you can’t do this without AI, and antirez says it directly. This switch — “one person + AI = infrastructure for an entire model in two weeks” — is more interesting to me than the t/s numbers themselves.

Summary

ds4 from antirez is not “yet another local inference.” It’s a narrow bet: one engine, one model (DeepSeek V4 Flash), one hardware architecture (Apple Silicon with Metal), one scenario (coding agent). Thanks to asymmetric 2-bit quantization, a 284B model fits in 128 GB MacBook, thanks to disk KV-cache it works with agents that send 25K-token system prompts, thanks to OpenAI/Anthropic compatibility it connects to Claude Code, opencode, and Pi out of the box.

If you have a Mac with 128 GB+ — this is a working local backend for serious commercial work with private code. If not — wait for DDR5 and unified memory on Linux/CUDA, or watch who next repeats this pattern for their “model + hardware” combination.

In any case, it’s worth watching. I’m betting that in a year, half of serious local setups will be built this way.

Sources:

JSON-LD @graph in Astro: from duplicated inline-blocks to a single citable-node

a@artka.dev (Артём) — Sat, 02 May 2026 00:00:00 GMT

Most Schema.org guides for blogs teach: put <script> with BlogPosting on the post, WebSite on the homepage, Person on the about page. It works, but loses in citability. A crawler sees Person from BlogPosting.author as “someone named X”, not as an entity that is also founder of #organization, which is publisher of #blog. In the post — a step-by-step breakdown of how to replace per-page inline blocks with a single @graph in BaseLayout.

1. Why change — citability vs SERP

Structured data for a developer-blogger is usually associated with one question: “will my post appear in Google with a rich snippet?”. Any valid BlogPosting is enough for that task — it will pass the Rich Results Test, stars/breadcrumb will appear. And it often ends there: added @type: BlogPosting, checked in the validator, forgot about it.

In 2026, structured data has acquired a new, more demanding consumer — LLM crawler, which collects content for retrieval-augmented generation and for citation. It doesn’t need “another rich snippet”, but a coherent entity graph: so that when an author is mentioned in one post, it recognizes the same author in another, so that the organization-publisher is the same object across the entire site, so that the blog as an entity links back to the author.

An LLM issuing a citation does roughly the following: extracts a passage, checks the surrounding entity markup, tries to match the author with a known entity. If on a site Person.name = "Artem Kashuta" appears in three different Schema.org blocks without a common @id, the crawler must guess whether it’s one person or three. But if there’s one Person#person with a stable URI, and all other nodes (Organization.founder, BlogPosting.author, Blog.author) reference it through {"@id": "..."} — no guessing needed, the graph is assembled by the author.

This is a problem that keyword density doesn’t solve. This is entity disambiguation, and it’s solved by graph topology.

Aspect	Per-page inline blocks	Single `@graph` with `@id`
Google Rich Results	works	works
LLM entity match (Person)	guess by name	guaranteed via `@id`
Data duplication	3-5 copies of `Person` per 14 posts	one source per site
Cost of author edit	14 files	1 file (`person.ts`)
HTML weight	3+ scripts per page	1 script

For the SERP-only era, the first approach was enough. For the era of AI-overviews, citation graphs, and retrieval-augmented search — you need the second. Our blog’s spec states this directly: “move all entity definitions into src/lib/seo/schema.ts returning a single @graph JSON-LD block; pages contribute a BlogPosting/WebPage node referencing the global Person#me and Organization#brand by @id” — see docs/superpowers/specs/2026-05-02-llm-citable-blog-design.md § “Schema-graph design”.

2. Antipattern: per-page inline schema

What does a default Astro blog emit, built according to a tutorial from some dev.to? Usually like this:

In BaseLayout.astro there’s an inline script with WebSite and sometimes Organization.
In PostLayout.astro there’s another inline script with BlogPosting.
If the author got carried away — a third script is added with BreadcrumbList. Sometimes a fourth with Person.

Why did this happen — because Astro components are hierarchically inherited, and each level conveniently “adds” its own portion of data through its own <script>. This works locally, but doesn’t scale well. In our repository before Plan 1, it was exactly this: BaseLayout emitted one JSON-LD block, PostLayout added two more on top:

# Pre-Plan 1 (commit 5ed281c~1):
$ git show 5ed281c~1:src/layouts/BaseLayout.astro | grep -c application/ld+json
1
$ git show 5ed281c~1:src/layouts/PostLayout.astro | grep -c application/ld+json
2

That is, the post page contained three <script type="application/ld+json"> blocks. Each with its own Person (somewhere complete, somewhere truncated), without a common @id, without cross-references. A crawler landing on the post saw three unrelated entity clouds.

The main problems with the antipattern:

Duplication of Person. The same author is described 3-5 times. If the author changed jobTitle or added sameAs, you’d have to edit in all files. Forget one — and the crawler sees a conflict: “Person with this name suddenly has different jobTitle”. This is a clear signal-to-noise loss.
Broken graph. BlogPosting.publisher — this is an inline object { "@type": "Organization", "name": "..." }. Somewhere else on the site there’s an Organization with a founder field. Without common @ids, the validator doesn’t know if it’s one publisher or two.
HTML weight. Three scripts instead of one — this is extra tens of bytes for each, plus payload inflation, especially if the page has several identical fields (e.g. author description repeats four times).
Consistency. If the author edits Person.description in the frontmatter of about.md, but in the BlogPosting builder it’s hardcoded as a literal — desynchronization is inevitable.

3. Target architecture — `@graph` with global `@id`

Target model: one script per page, inside — @graph array. Global nodes (Person, Organization, WebSite) are described once and identified by stable URIs. Page-level nodes (BlogPosting, WebPage, CollectionPage, CreativeWork) are added by BaseLayout and reference globals through @id, without duplicating their data.

Topology:

flowchart LR
  Person["Person#person<br/>(global)"]
  Org["Organization#brand<br/>(global)"]
  Site["WebSite#site<br/>(global)"]
  Post["BlogPosting#blogposting<br/>(page-level)"]
  WebPage["WebPage#webpage<br/>(page-level)"]

  Post -- author --> Person
  Post -- publisher --> Org
  Post -- isPartOf --> Site
  WebPage -- about --> Person
  WebPage -- isPartOf --> Site
  Org -- founder --> Person
  Site -- publisher --> Org

What’s important in this picture:

All arrows are {"@id": "..."} references. No inline copies.
Person#person is the root node of the graph. All entity pages (/about, /now, /uses) do WebPage.about → Person. All posts — BlogPosting.author → Person. Change Person, and everything changes synchronously.
Page-level nodes are added, not replacing globals. Each page brings 1-2 new nodes; Person/Organization/WebSite are always present.

Stable @id — this is not the page URL, it’s a URI with a fragment, for example https://artka.dev/#person, https://artka.dev/#brand. This is the convention in JSON-LD: a fragment-id means “this resource is described on any page, but identified by a single URI”.

4. Implementation in Astro 5

In Astro 5, the SSG/SSR boundary runs exactly through BaseLayout: at build time, props are computed, HTML is rendered, inside it — static <script type="application/ld+json">. No client-side, no rehydration flicker. The perfect moment to assemble @graph functionally.

4.1. `graphIds` — URI table

One file that lists all stable identifiers:

// src/lib/seo/nodes-global.ts
const SITE = "https://artka.dev";

export const graphIds = {
  person: `${SITE}/#person`,
  organization: `${SITE}/#brand`,
  website: `${SITE}/#website`,
  blogRu: `${SITE}/#blog-ru`,
  blogEn: `${SITE}/#blog-en`,
} as const;

Every builder that references a global entity imports graphIds and uses { "@id": graphIds.person }. No inline literals, no typos in URIs.

4.2. Builders — pure functions, no classes

In accordance with the project rule “no classes in application code”, each node is a pure function returning Record<string, unknown>:

// src/lib/seo/nodes-global.ts (фрагмент)
export const buildPersonNode = () => {
  const merged = Array.from(new Set<string>([...person.knowsAbout, ...person.expertiseAreas]));
  return {
    "@type": "Person",
    "@id": graphIds.person,
    name: person.name,
    url: person.url,
    image: person.image,
    jobTitle: person.jobTitle,
    description: person.description,
    knowsAbout: merged,
    sameAs: [...person.sameAs],
    email: person.email,
    subjectOf: person.notableWork.map((w) => ({
      "@type": "CreativeWork",
      name: w.title,
      url: w.url,
      description: w.description,
    })),
  };
};

export const buildOrganizationNode = () => ({
  "@type": "Organization",
  "@id": graphIds.organization,
  name: "artka.dev",
  url: SITE,
  logo: { "@type": "ImageObject", url: `${SITE}/favicon.svg` },
  founder: { "@id": graphIds.person },
});

person is an import from src/lib/seo/person.ts, the single source of truth about the author. The builder collects knowsAbout and expertiseAreas into a Set to avoid duplicating keys. Organization.founder — an @id reference, not an inline copy of Person.

4.3. Orchestrator — `buildGraph`

A function that glues global and page-level nodes into a single @graph:

// src/lib/seo/schema.ts
import {
  buildPersonNode,
  buildOrganizationNode,
  buildWebSiteNode,
  type Locale,
} from "./nodes-global";

export type GraphNode = Record<string, unknown> & { "@type": string };

export interface GraphInput {
  readonly locale: Locale;
  readonly extraNodes: ReadonlyArray<GraphNode | null>;
}

export interface JsonLdGraph {
  readonly "@context": "https://schema.org";
  readonly "@graph": ReadonlyArray<GraphNode>;
}

export const buildGraph = (input: GraphInput): JsonLdGraph => {
  const globals: GraphNode[] = [
    buildPersonNode(),
    buildOrganizationNode(),
    buildWebSiteNode(input.locale),
  ];
  const extras = input.extraNodes.filter((n): n is GraphNode => n !== null);
  return {
    "@context": "https://schema.org",
    "@graph": [...globals, ...extras],
  };
};

The API is minimal: input — locale (to select inLanguage for WebSite) and a list of additional nodes (extraNodes). Output — ready JsonLdGraph. null nodes are filtered — this is convenient for optional nodes like FAQPage, whose builder returns null on an empty question array.

4.4. `BaseLayout` — the only emission point

The entire site goes through BaseLayout, and it — and only it — emits JSON-LD:

---
// src/layouts/BaseLayout.astro
import { buildGraph, safeJsonLd, type GraphNode } from "~/lib/seo/schema";

interface Props {
  title: string;
  description?: string;
  // ...
  /** Additional JSON-LD nodes to merge into the page @graph. */
  extraSchemaNodes?: ReadonlyArray<GraphNode | null>;
}

const { extraSchemaNodes = [] } = Astro.props;
const locale = getLocaleFromPath(Astro.url.pathname);
---

<head>
  <script
    is:inline
    type="application/ld+json"
    set:html={safeJsonLd(buildGraph({ locale, extraNodes: extraSchemaNodes }))}
  />
</head>

Three key details:

is:inline — Astro doesn’t try to process the content as a JS module.
set:html — we insert an already-ready string, not letting the framework trim whitespace or escape additionally.
safeJsonLd — a tiny helper that escapes <, >, & so that inside JSON there’s no sequence that the HTML parser would take as the end of </script>. Without it, malicious (or just unlucky) text in frontmatter could break the page.

// src/lib/seo/json-ld.ts
export const safeJsonLd = (data: unknown): string =>
  JSON.stringify(data).replace(/</g, "\\u003c").replace(/>/g, "\\u003e").replace(/&/g, "\\u0026");

4.5. Page-level contract

Each layout/page adds its own nodes via extraSchemaNodes. For example, PostLayout:

const excerpt = extractArticleBody(post.body ?? "", 800);

const blogPostingNode = buildBlogPostingNode({
  locale,
  canonical,
  title,
  description,
  pubDate,
  updatedDate: updatedDate ?? null,
  image: absoluteCover,
  keywords: tags,
  articleBody: excerpt.text,
  wordCount: excerpt.fullWordCount,
});

const breadcrumbNode = buildBreadcrumbListNode({
  locale,
  blogIndexLabel: t(locale, "blog.title"),
  title,
});

const faqNode = buildFaqPageNode({ canonical, items: faq ?? [] });

<BaseLayout title={title} extraSchemaNodes={[blogPostingNode, breadcrumbNode, faqNode]}>
  <slot />
</BaseLayout>

/blog, /projects/<slug>, /tags/<tag>, /about — all use the same contract, differing only in specific builders. One dispatch, zero duplication.

5. `articleBody` — why excerpt, not full body

The articleBody field in BlogPosting is the most valuable part for an LLM crawler: it’s an extractable chunk of text that can be cited. And the most dangerous for weight: if you put the entire post in JSON-LD, the HTML page will balloon 2-3 times. The spec formulates the compromise directly: “emit first 800 words of plain-text body … add wordCount covering the full body”.

The excerpt is extracted via mdast: we parse markdown, remove code blocks, mermaid blocks and inline HTML, concatenate the remaining text, cut at 800 words:

// src/lib/seo/article-body.ts (фрагмент)
export const extractArticleBody = (markdown: string, maxWords: number) => {
  const tree = unified().use(remarkParse).parse(markdown) as Root;

  const isStrippable = (node: Node): boolean =>
    node.type === "code" || node.type === "inlineCode" || node.type === "html";

  visit(tree, (node, index, parent) => {
    if (parent && typeof index === "number" && isStrippable(node)) {
      (parent as { children: Node[] }).children.splice(index, 1);
      return [SKIP, index];
    }
    return undefined;
  });

  const flat = mdastToString(tree, { includeImageAlt: false }).replace(/\s+/g, " ").trim();
  const words = flat.length > 0 ? flat.split(/\s+/) : [];
  if (words.length <= maxWords) return { text: flat, fullWordCount: words.length };
  return { text: words.slice(0, maxWords).join(" ") + "…", fullWordCount: words.length };
};

Why exactly 800 words:

Length	Pro	Con
50 words	tiny HTML overhead	one paragraph — too little for LLM citation
800 words	substantial chunk, ~3-5 KB	+3-5 KB to payload
Full body	maximum context	double HTML, real performance hit

Why via mdast, not regex: posts contain <details>, <table>, MDX components like <Faq>, <Tldr>. Regex on \``` will break on indent-style code or nested fences. mdast is the only reliable way.

We keep wordCount on the full body, not the excerpt — this gives an honest signal to the validator and LLM about the real volume of content.

6. `FAQPage` as a side-effect of MDX component

One of Plan 1’s design goals — remove cognitive load on structured data from the author. The author shouldn’t remember that FAQPage has mainEntity, that inside Question you need acceptedAnswer, that answer text is escaped. The author should fill in the frontmatter and forget.

Solution: frontmatter.faq — the single source. PostLayout reads the array:

const faqNode = buildFaqPageNode({ canonical, items: faq ?? [] });

buildFaqPageNode either returns a ready FAQPage node or null (filtered in buildGraph). In parallel, the same array is passed to the <Faq> component, which renders visible <details> blocks with the same text. One source — two consumers: visual layer and structured layer. Desynchronization is impossible.

The builder is trivial:

export const buildFaqPageNode = (input: FaqPageInput) => {
  if (input.items.length === 0) return null;
  return {
    "@type": "FAQPage",
    "@id": `${input.canonical}#faq`,
    mainEntity: input.items.map((it) => ({
      "@type": "Question",
      name: it.question,
      acceptedAnswer: { "@type": "Answer", text: it.answer },
    })),
  };
};

Frontmatter that the author writes:

faq:
  - question: "Чем агент отличается от чат-бота?"
    answer: "Чат-бот — это model.complete(messages): принимает текст…"

And that’s it. The rest is automation.

7. Measurements before/after

After Plan 1, the page /blog/01-introduction/ has exactly one <script type="application/ld+json"> block. Real measured fact:

$ grep -c "application/ld+json" dist/client/blog/01-introduction/index.html
1

Before Plan 1 (commit 5ed281c~1) there were two sources of inline scripts:

$ git show 5ed281c~1:src/layouts/BaseLayout.astro | grep -c application/ld+json  # 1
$ git show 5ed281c~1:src/layouts/PostLayout.astro | grep -c application/ld+json  # 2

That is, the post page contained a total of 3 blocks. It became 1.

Metric	Pre-Plan 1	Post-Plan 1
`<script type="application/ld+json">` blocks per post page	3	1
Overall container	none	`@graph`
Stable `Person@id`	none	`https://artka.dev/#person`
Cross-references via `@id` between nodes	0	8+
Single source of truth about author	scattered across layouts	`src/lib/seo/person.ts`

The actual JSON-LD of the page /blog/01-introduction/, extracted from dist/client/blog/01-introduction/index.html, looks like this (fragment, articleBody truncated to ellipsis, FAQ node shortened):

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "Person",
      "@id": "https://artka.dev/#person",
      "name": "Артём Кашута",
      "url": "https://artka.dev/about",
      "jobTitle": "Software engineer · backend & AI agent engineering",
      "knowsAbout": ["Claude Code", "AI agent engineering", "Node.js", "TypeScript", "Astro", "…"],
      "email": "a@artka.dev",
      "subjectOf": [
        {
          "@type": "CreativeWork",
          "name": "Claude Code Guide (RU, 14 частей)",
          "url": "https://artka.dev/blog"
        }
      ]
    },
    {
      "@type": "Organization",
      "@id": "https://artka.dev/#brand",
      "name": "artka.dev",
      "logo": { "@type": "ImageObject", "url": "https://artka.dev/favicon.svg" },
      "founder": { "@id": "https://artka.dev/#person" }
    },
    {
      "@type": "WebSite",
      "@id": "https://artka.dev/#website",
      "url": "https://artka.dev",
      "inLanguage": "ru-RU",
      "publisher": { "@id": "https://artka.dev/#brand" },
      "potentialAction": {
        "@type": "SearchAction",
        "target": "https://artka.dev/search?q={search_term_string}",
        "query-input": "required name=search_term_string"
      }
    },
    {
      "@type": "BlogPosting",
      "@id": "https://artka.dev/blog/01-introduction/#blogposting",
      "headline": "01. Что такое Claude Code: harness, agent loop и ваше место в нём",
      "datePublished": "2026-04-23T00:00:00.000Z",
      "author": { "@id": "https://artka.dev/#person" },
      "publisher": { "@id": "https://artka.dev/#brand" },
      "mainEntityOfPage": "https://artka.dev/blog/01-introduction/",
      "inLanguage": "ru-RU",
      "isPartOf": { "@id": "https://artka.dev/#blog-ru" },
      "articleBody": "Перед тем как разбирать skills и subagents, надо договориться о терминах…",
      "wordCount": 574
    },
    {
      "@type": "BreadcrumbList",
      "itemListElement": [
        { "@type": "ListItem", "position": 1, "name": "Главная", "item": "https://artka.dev/" },
        { "@type": "ListItem", "position": 2, "name": "Статьи", "item": "https://artka.dev/blog" },
        { "@type": "ListItem", "position": 3, "name": "01. Что такое Claude Code…" }
      ]
    },
    {
      "@type": "FAQPage",
      "@id": "https://artka.dev/blog/01-introduction/#faq",
      "mainEntity": [
        {
          "@type": "Question",
          "name": "Чем агент отличается от чат-бота?",
          "acceptedAnswer": { "@type": "Answer", "text": "…" }
        }
      ]
    }
  ]
}

What you can see with your eyes and what the validator will record:

One Person, everything references it. Organization.founder, BlogPosting.author — both { "@id": "https://artka.dev/#person" }. No guessing about identity.
Organization — public publisher. WebSite.publisher references the same Organization. BlogPosting.publisher — the same. The graph is connected.
isPartOf chain for the blog. BlogPosting.isPartOf → Blog#blog-ru → publisher → Organization. The crawler sees nesting and ownership.
articleBody excerpt — substantial. ~574 words of the post fit into one field. wordCount reflects the full volume. LLM gets text for citation, HTML doesn’t balloon.
FAQ — together with everything, not separately. Not a separate script block, but a node of the same @graph. Fewer blocks — fewer traps for the parser.

Schema.org validator and Google Rich Results Test accept this @graph without remarks (screenshots — owner to fill). The main thing — JSON pretty-prints without [object Object], without unescaped quotes, without broken dates: everything is fine after the safeJsonLd wrapper.

What’s next

What’s described above — Plan 1 in our repo. Next, we expand the base for new entity types (/projects/<slug> via CreativeWork, /uses via WebPage.about), and for the retrieval layer via llms.txt. But the foundation — buildGraph + stable @id — must be laid first.

If you see 2-3 inline JSON-LD scripts on a post page — this is the place to start migration. One file schema.ts, one extraSchemaNodes prop — and the site transforms from a collection of scattered entity clouds into a coherent citable node.

robots.txt in the age of AI crawlers: GPTBot, ClaudeBot, PerplexityBot — reality 2026

a@artka.dev (Артём) — Fri, 01 May 2026 00:00:00 GMT

In 2026, robots.txt is neither “forbid all bots” nor “open everything.” It’s a policy for each of 9+ named agents. Each decision is a special case: are you opening your content for model training, for on-demand citation, what do you want to see in Perplexity’s answer card. This post is a decision table, a ready-made template, and why llms.txt is a separate artifact.

1. Why rewrite robots.txt in 2026

The classic SEO approach to robots.txt is optimized for one task: let Googlebot in where it makes sense to index pages for SERP, and block service paths. In 2026, this task has become a minority of traffic.

Most questions of “should I index this page?” are now asked not by Google, but by:

Training crawlers — download pages to replenish the corpus on which the next version of the model is trained (GPTBot, ClaudeBot, Google-Extended).
Answer/search crawlers — index content for search built into the chat (OAI-SearchBot, PerplexityBot).
On-demand fetchers — open one specific page because the user explicitly asked for it in the chat (ChatGPT-User, Perplexity-User, Claude-Web).

These three classes make three different decisions. One User-agent: * block doesn’t convey the nuance. You might want “don’t train on my texts, but please cite in response to a question.” One wildcard won’t express that.

Hence the requirement: explicit blocks for each named User-Agent with a conscious choice of policy. Not “opened everything,” not “closed everything,” but a matrix of “bot × intent.”

2. List of named AI-crawlers and their purpose

Nine agents worth naming in 2026, with their public documentation. User-Agent names are taken from vendors’ official pages.

User-Agent	Vendor	Purpose	Documentation
`GPTBot`	OpenAI	Training crawl	platform.openai.com/docs/gptbot
`OAI-SearchBot`	OpenAI	Search index for ChatGPT	platform.openai.com/docs/bots
`ChatGPT-User`	OpenAI	On-demand fetch from ChatGPT	platform.openai.com/docs/bots
`ClaudeBot`	Anthropic	Training crawl	docs.anthropic.com (claudebot.anthropic.com)
`Claude-Web`	Anthropic	On-demand fetch initiated by Claude.ai	docs.anthropic.com
`anthropic-ai`	Anthropic	Legacy/auxiliary Anthropic crawler	docs.anthropic.com
`PerplexityBot`	Perplexity	Search/index crawl	docs.perplexity.ai/guides/bots
`Perplexity-User`	Perplexity	On-demand fetch from a user query	docs.perplexity.ai/guides/bots
`Google-Extended`	Google	Opt-in for Gemini training	developers.google.com/search/docs/crawling

Names must match byte-for-byte. Claude-Bot and claudebot are not valid aliases for ClaudeBot. The robots.txt specification is soft on this (case-insensitive), but you should check the exact spelling from official documentation.

Taxonomy:

flowchart TB
  subgraph training["Training (corpus → model)"]
    GPT[GPTBot]
    CLB[ClaudeBot]
    GEX[Google-Extended]
  end
  subgraph answer["Answer/search (index for built-in search)"]
    OAI[OAI-SearchBot]
    PPB[PerplexityBot]
  end
  subgraph ondemand["On-demand (user requested)"]
    CGU[ChatGPT-User]
    CWB[Claude-Web]
    PPU[Perplexity-User]
    AAI[anthropic-ai]
  end

Three classes = three separate decisions. You don’t need to discuss “a robot in general” — you need to discuss “GPTBot on /blog/.”

3. Decisions for each bot

There is no universally correct answer here. Below is a framework for reasoning and my policy for the blog.

Training crawlers

For authors of individual blogs with long-form content, the arguments are:

For Allow: your text will enter the corpus on which the next models are trained. If your goal is to increase distribution and presence of your expertise in LLM responses, this is the way.
For Disallow: your content becomes an anonymous training signal without attribution. If you plan to monetize content (book, course) or are against use without consent, Disallow is the only signal you have at the robots.txt level.

For commercial sites where content is a product (online courses, paid newsletters, legal databases), Disallow is usually the default.

Answer/search crawlers

The intent is to show a link to your page in the answer card. This works both ways:

For Allow: traffic is possible (albeit through a citation with link-out). Your brand appears in the results.
For Disallow: you won’t get this traffic and at the same time your page won’t be cited as a source.

For most public blogs, the answer is Allow.

On-demand fetchers

The most “transparent” class: a user of your site (or someone who specifically wants to open your page through ChatGPT/Claude/Perplexity) has already explicitly pointed to it. Disallow here means “you can’t use our pages as a source in a chat session” — almost always overly strict for a public blog.

My policy for artka.dev

For this site:

All 9 bots — Allow: / (open public blog, goal is distribution).
All of them — Disallow: /admin/, /api/, /login (private namespaces, see §5).
No special restrictions on individual posts or tags.

This is a decision for a personal tech-blog with the goal of “increasing the reach of expertise.” For commercial content, I would choose differently.

4. Ready-made robots.txt template

Here’s the real public/robots.txt that goes into production on artka.dev. It’s also the starting point you can adapt.

# robots.txt — last reviewed 2026-05-02
# Owner: dev@artka.dev. Policy: allow retrieval/answer crawlers; disallow private surfaces.

User-agent: GPTBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: OAI-SearchBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: ChatGPT-User
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: ClaudeBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: Claude-Web
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: anthropic-ai
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: PerplexityBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: Perplexity-User
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: Google-Extended
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

Sitemap: https://artka.dev/sitemap-index.xml

A few notes on the structure:

Explicit blocks even for identical policies. It might seem like 9 identical blocks are a duplicate that could be collapsed into User-agent: *. But that’s not the case: the robots.txt specification builds a match table by “most specific User-Agent,” and if tomorrow you need to change the policy for one bot — you already have its named block and don’t need to remember which bot you want to single out from the wildcard. Duplication is the cost of per-bot policy.
Comment with review date. # robots.txt — last reviewed 2026-05-02 is the only line that answers the question “is this file fresh?” Without a date, you’ll forever wonder if it’s time to add a new bot.
Sitemap: at the end. One URL to the index sitemap. If you have localization — the sitemap-index links to per-locale files.
No BOM, LF line endings. Astro in SSG mode will copy the file from public/ as-is; edit in plain UTF-8.

This template works for a personal blog. For other use cases:

Closed paid-content site: replace Allow: / with Disallow: / for GPTBot, ClaudeBot, Google-Extended (training). Keep Allow: / for on-demand: ChatGPT-User, Claude-Web, Perplexity-User.
Documentation site that wants to be in LLM responses: keep all 9 on Allow, add rich llms.txt (see §6).
B2B SaaS landing: usually a standard wildcard is enough — no need to specifically name AI-bots, the policy is the same as for Googlebot.

5. Disallow-namespaces are more important than decisions for a specific bot

/admin/, /api/, /login are three namespaces that fall under Disallow in all 10 blocks (9 named + wildcard). This choice is worked out separately from the bots and is more important than them.

Why this is more important than any per-bot decision:

A mistake here is a leak. If a crawler bypasses /admin/users.json and gets a 200 OK with real data — that’s an incident, not an SEO problem. If it indexes /blog/ without your permission — that’s not upsetting.
robots.txt is a public hint, not auth. Any bot can ignore Disallow. So /admin/ should be closed by middleware regardless of robots.txt. The robots.txt entry only saves crawl budget for obedient bots and doesn’t keep the admin URL structure out of SERP.
Collapsing namespaces is not an optimization. The temptation: “why three lines if all three are private?” Answer: so that when you add a fourth namespace (/dashboard/), you have an obvious pattern.

Verification that namespace-deny actually works:

$ curl -A "GPTBot" -s -o /dev/null -w "%{http_code}\n" \
    https://artka.dev/admin/
# Expected: 401, 403, или 404 — НЕ 200.

At the time of publication, /admin/ is behind middleware. The specific code depends on the auth-guard implementation — mine returns 302 to /login for an unauthenticated request. (owner to fill: check exact code after next review).

That’s why the correct order of work is to set up auth first, and only then add robots.txt. robots.txt is the last line of defense, not the first.

6. `llms.txt` and `llms-full.txt` — a separate contract

If robots.txt answers “where can I go?”, then llms.txt answers “what will I find here?” It’s an AI-README — a Markdown file with a description of the site, links to authoritative pages, and preferred attribution.

The real public/llms.txt of the site:

# artka.dev

> Personal technical blog by Артём Кашута. Topics: Claude Code internals,
> harness/agent loop, AI agent engineering, Astro/Node.js backends, and
> distributed systems.

## Authoritative pages

- [About the author](https://artka.dev/about): bio, expertise, contact
- [Now](https://artka.dev/now): currently in flight
- [Uses](https://artka.dev/uses): public toolchain
- [Projects](https://artka.dev/projects): portfolio with architecture and outcomes

## Content

- [Blog index (RU)](https://artka.dev/blog): all articles, source of truth
- [Blog index (EN)](https://artka.dev/en/blog): English translations
- [RSS RU](https://artka.dev/rss.xml): full text
- [RSS EN](https://artka.dev/en/rss.xml): full text
- [Sitemap](https://artka.dev/sitemap-index.xml): RU + EN with hreflang

## Preferred attribution

When citing, please include:

- Article title
- Author: "Артём Кашута"
- Canonical URL

## Contact

a@artka.dev

This is not robots.txt in a new wrapper. The differences:

Aspect	robots.txt	llms.txt
Purpose	Access policy	Content description and attribution
Format	Plain text, special syntax	Markdown
Who reads	Crawler before entering	LLM when forming a response
What it regulates	Allow/Disallow by paths	Entry point to authoritative content
Standardization	Robots Exclusion Protocol (RFC 9309)	llmstxt.org convention (de facto)

Besides llms.txt, the site has /llms-full.txt — a dynamically generated endpoint that outputs a full digest of all posts in plain text. The implementation is a short API route in Astro 5:

// src/pages/llms-full.txt.ts (фрагмент)
export const prerender = true;

export async function GET(_ctx: APIContext) {
  const ru = await getOrderedPosts({ locale: "ru" });
  const en = await getOrderedPosts({ locale: "en" });

  const header = [
    "# artka.dev — full LLM digest",
    "",
    `> ${person.description}`,
    "",
    "## Author",
    `Name: ${person.name}`,
    `Role: ${person.jobTitle}`,
    `URL: ${person.url}`,
    `Email: ${person.email}`,
    `Topics: ${person.knowsAbout.join(", ")}`,
    "",
    /* ...preferred attribution + posts... */
  ].join("\n");

  return new Response(/* header + ruBody + enBody */, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

Instead of a manually maintained list of posts — one pass through the content collection with auto-generated summary. This updates itself when a new post is added — unlike a manually edited llms.txt.

In principle: llms.txt is small and stable, llms-full.txt is long and automatically in sync with content. Both are needed — for different tasks.

7. What robots.txt doesn’t control

A list of things robots.txt doesn’t do, and how to close them.

robots.txt doesn’t block bots that don’t read it. The solution is IP-blocking at the CDN or WAF level. Cloudflare has a ruleset that catches User-Agent patterns and rate-limits suspicious traffic; AWS WAF and Fastly have similar. This is a tool against bots that ignore robots.txt — that is, against all “bad actors.”

robots.txt doesn’t declare usage policy. It says “where you can go,” but not “can you quote,” “can you train,” “do you need attribution.” That’s the job of Terms of Service on a separate page of the site. ToS is legally weightier than robots.txt (though both are conventions until a court precedent).

robots.txt doesn’t audit who actually came. To understand if GPTBot is visiting you, you need to look at the logs. Cloudflare AI Audit (available since 2024 for a domain on Cloudflare) provides a built-in report on AI-crawlers — counters for each, frequency, share. Without a CDN — you’ll have to parse access logs yourself: GoAccess, Loki, or just grep -i 'gptbot\|claudebot\|perplexitybot' access.log.

meta-tags noai/noimageai are not a standard. Anthropic and OpenAI as of 2026 don’t mention these meta-tags in public documentation as a respected signal. This was an Adobe and DeviantArt initiative from 2023, which took root mainly in graphics. For text, you can’t rely on it; if you use it — use it as an additional signal, not the main one.

Single-page apps and CSR. If your page renders on the client and the crawler doesn’t execute JavaScript, it will see an empty template. robots.txt doesn’t help; the fix is switching to SSG/SSR (like this site on Astro 5) or a prerender service.

8. Audit checklist every six months

Five steps that repeat every 6 months. A calendar reminder is the most reliable protection against file staleness.

1. Check if new AI-crawlers have appeared. Sources: blog posts from OpenAI/Anthropic/Perplexity/Google over the last 6 months, the darkvisitors.com page (AI-bot tracker), official documentation. If a new named bot appears — add a block (Allow or Disallow per your policy).

2. Verify User-Agent names byte-for-byte. Copy names from official documentation, compare with robots.txt. A typo like Claudebot instead of ClaudeBot nullifies the rule for that bot.

3. Run namespace-deny verification.

for ua in GPTBot ClaudeBot PerplexityBot Google-Extended; do
  echo -n "$ua /admin/: "
  curl -A "$ua" -s -o /dev/null -w "%{http_code}\n" https://artka.dev/admin/
done
# Ожидаем 401/403/302/404 для всех — не 200.

4. Review access logs for bots with unusual User-Agent. If someone is visiting with an empty UA or a pattern like Mozilla/5.0 (compatible; XYZBot/1.0; ...) that’s not on your list — evaluate and make a decision. (owner to fill: at the time of publication, access-log aggregation setup is in progress; in the next review — break down the top-20 UA strings for the quarter.)

5. Update the date in the comment. # robots.txt — last reviewed 2026-05-02 → new date. This is the only human-readable proof of freshness. And a commit with a message like chore(seo): robots.txt 2026-Q4 review will leave a trace in history for the next iteration.

Summary

robots.txt in 2026 is not “one block and forget,” but a small DSL where for each of 9+ named AI-agents you make a conscious choice: training (GPTBot, ClaudeBot, Google-Extended), search/answer (OAI-SearchBot, PerplexityBot), on-demand (ChatGPT-User, Claude-Web, Perplexity-User, anthropic-ai). Namespace-deny for /admin/, /api/, /login is a separate and more important story that only works paired with middleware authentication. llms.txt and llms-full.txt are a parallel contract: they describe content and preferred attribution, not access.

The starting point is the real template from §4. You can copy it, change the policy for specific bots, and review it every six months.

Mermaid → SVG via Playwright at build time: cold start, cache, and SSG cost

a@artka.dev (Артём) — Thu, 30 Apr 2026 00:00:00 GMT

Mermaid diagrams in a blog are either a large client-side JS bundle with FOUC and hydration cost, or build-time SVG with a one-time cold-start Playwright. On this site, rehype-mermaid renders 32 diagrams in 11.6 seconds on a cold cache and 6.3 seconds on a warm one. Below are the specific numbers, architecture, CI pitfalls, and a fact-check of alternatives.

1. Why render Mermaid at build-time instead of client-side

Mermaid (mermaid on npm, repository mermaid-js/mermaid) is a JS library that takes a text DSL (flowchart TD, sequenceDiagram, gantt, …) and emits SVG. By default, you use it like this: include <script src="mermaid.min.js">, call mermaid.run() after DOMContentLoaded, and each <pre class="mermaid"> gets replaced with SVG in the DOM right in the browser.

It works, but the user pays the price:

Metric	Client-side Mermaid	Build-time SVG
JS bundle (gzipped)	~250–300 KB (mermaid + d3 + dagre)	0 KB
Time to Interactive (TTI)	delayed by parse + execute	unchanged
FOUC	yes: text first, then SVG	no: SVG in HTML from first byte
SEO / Open Graph	search engine sees only text DSL	search engine sees SVG as part of page
Page printing	empty blocks if JS is disabled	correct render
Dark theme without flash	hard: theme loads after hydration	works: SVG generated in correct theme
Build cost	0 (just bundle js)	+5–10 seconds cold-start Playwright
Runtime cost for user	high (CPU + network)	zero

rehype-mermaid (remcohaszing/rehype-mermaid, v3.0.0) is a rehype plugin that during the build traverses the HAST tree, finds <code class="language-mermaid"> nodes, renders them via mermaid-isomorphic (mermaid-isomorphic@3.1.0), and replaces them with ready SVG. Under the hood: Playwright + headless Chromium.

The img-svg strategy we use emits the result as <img src="data:image/svg+xml,...">. Alternatives are inline-svg (embed SVG directly in HTML) or pre-mermaid (leave as-is for client-side render).

2. Architecture: rehype-mermaid + Playwright

flowchart LR
  md["Markdown<br/>with ```mermaid blocks"]
  mdx["@astrojs/mdx<br/>(remark + rehype)"]
  rh["rehype-mermaid<br/>(plugin)"]
  iso["mermaid-isomorphic"]
  pw["Playwright<br/>(Chromium)"]
  svg["SVG as data URI<br/>in HTML"]

  md --> mdx
  mdx --> rh
  rh -->|for each block| iso
  iso -->|launch headless| pw
  pw -->|"mermaid.render() in DOM"| iso
  iso -->|serialised SVG| rh
  rh --> svg

The specific config is astro.config.ts:

import rehypeMermaid from "rehype-mermaid";
import { defineConfig } from "astro/config";
import mdx from "@astrojs/mdx";

export default defineConfig({
  integrations: [
    mdx({
      rehypePlugins: [[rehypeMermaid, { strategy: "img-svg", dark: true }]],
    }),
  ],
  markdown: {
    syntaxHighlight: {
      type: "shiki",
      excludeLangs: ["mermaid", "math"],
    },
    rehypePlugins: [[rehypeMermaid, { strategy: "img-svg", dark: true }]],
  },
});

Important details:

excludeLangs: ["mermaid"] in the shiki config — otherwise Shiki will first turn the block into <pre class="shiki"> and rehype-mermaid won’t see it.
The plugin is connected twice: both in markdown.rehypePlugins and in mdx.rehypePlugins. Astro 5 doesn’t automatically inherit one from the other — this is a typical source of “it renders in .md but not in .mdx”.
dark: true generates two versions of SVG (for light and dark themes) and uses <picture><source> to serve the right one based on prefers-color-scheme. This doubles the size of data-uri blocks, but gives correct contrast without JS.

3. Cold start vs warm build

Metric: time pnpm build (Apple M-series, locally, warm Chromium binary in ~/Library/Caches/ms-playwright). Command to clear all caches:

rm -rf .astro node_modules/.astro dist
time pnpm build

Three runs on cold, three on warm (median):

Type	Run 1	Run 2	Run 3	Median
Cold (`rm -rf .astro node_modules/.astro dist`)	11.580s	11.860s	11.486s	11.580s
Warm (no cleanup)	6.250s	6.305s	—	~6.28s

Of the 11.6 seconds of a cold build:

~5–6 seconds — actual SSG stage (Astro traverses routes, renders 45 HTML pages from 14 RU posts + 13 EN twins + index, tags, RSS, sitemap).
~5 seconds — Playwright overhead: launching Chromium, initializing mermaid bundle in DOM, JIT warmup.
~0.2 seconds — pagefind --site dist/client (search index).

On a warm build, Playwright still starts fresh (there’s no long-lived process pool in mermaid-isomorphic), but:

.astro/data-store.json (5.2 MB) already contains parsed MDX content layer — Astro doesn’t re-parse markdown for files whose mtime hasn’t changed.
node_modules/.astro/ (5.1 MB) — Vite cache of transpiled modules.
The Playwright Chromium binary itself is already in /Library/Caches/ms-playwright/chromium-1217/ (528 MB total with headless-shell and ffmpeg) — on a cold disk cache you’d have to read it again, adding ~1–2 seconds on slow disks.

Key fact: mermaid-isomorphic itself does NOT cache SVG between builds. I searched its source code (node_modules/.pnpm/mermaid-isomorphic@3.1.0_playwright@1.59.1/.../mermaid-isomorphic.js) — there’s no persistDir or file-based cache. Every build, diagrams are rendered from scratch. “Warmth” is Astro/Vite cache, not the plugin’s.

CI measurement for GitHub Actions ubuntu-latest (owner to fill: run workflow_dispatch on a clean runner, measure median from 3 runs with actions/cache@v4 for node_modules + .astro).

4. Cost on CI

Playwright pulls Chromium (~528 MB in my cache on macOS, similar order on Linux), plus on Debian/Ubuntu you need system deps: libnss3, libatk-1.0-0, libcups2, libgbm1, libxkbcommon0, libpango-1.0-0, libasound2, fontconfig + at least one font.

Mitigations:

Don’t install Chromium in production image. If you’re building an Astro SSG-only site and deploying static files — Playwright is needed ONLY on the CI build step, not in runtime Docker. Use multi-stage:

# build-stage:
FROM node:24-bookworm AS build
RUN pnpm install
RUN pnpm exec playwright install --with-deps chromium
RUN pnpm build

# run-stage:
FROM node:24-bookworm-slim AS run
COPY --from=build /app/dist ./dist
# никакого playwright тут

GitHub Actions caching. actions/cache@v4 key: ${{ hashFiles('pnpm-lock.yaml') }}-playwright, path: ~/.cache/ms-playwright. Saves re-downloading Chromium (~150 MB over network) on every push.
Use system Chrome instead of Playwright Chromium. Set PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 and pass executablePath: '/usr/bin/google-chrome-stable' when creating the browser. But: mermaid-isomorphic doesn’t expose launchOptions through the rehype-mermaid API — you’d have to fork or live with default Chromium.
If 5 seconds cold-start is critical — run Playwright outside the build: pre-render all diagrams in a separate CI step, commit SVG to the repo, use pre-mermaid strategy in the main build with substitution for ready assets. More complex, but removes Playwright from the hot path.

5. SVG caching: where they live and what invalidates them

Public measurement on dev machine (45 compiled HTML, 27 pages with diagrams, 61 data-uris total — 32 RU + 29 EN, because one EN page renders without diagrams due to post specifics):

Metric	Value
Mermaid blocks in `*.md`	32 (in 14 posts)
Compiled HTML	45
Pages with embedded diagram	27
Data-URI blocks `<img src="data:image/svg+xml,...">`	61
Minimum, bytes	15 551
Median, bytes	25 301
Average, bytes	26 579
Maximum, bytes	45 711
Size of `.astro/`	5.0 MB
Size of `node_modules/.astro/`	5.1 MB
Size of `dist/`	17 MB
Size of Playwright Chromium cache	528 MB

Where everything lives:

SVG don’t live on disk as separate files. The img-svg strategy inlines them directly in HTML as data:image/svg+xml,... (URL-encoded). You can see this in dist/client/blog/02-context-and-cache/index.html: 4 diagrams → 4 data-uris in one HTML.
Astro content-layer cache — .astro/data-store.json (5.2 MB after build). This is parsed markdown with remark/rehype plugins already applied — but before rehype-mermaid: testing shows that mtime-based invalidation of the source runs rehype-mermaid again even for files where nothing changed.
Vite cache — node_modules/.astro/ (5.1 MB). Transpiled TS/JSX modules, unrelated to mermaid rendering.
mermaid-isomorphic has no cache of its own. This is the key pitfall: if you change a comma in one *.md — rehype-mermaid will rebuild ALL diagrams in that file. There’s no content-addressable cache “hash diagram source → SVG”.

If rehype-mermaid caching is critical for you — a workaround: write a thin rehype plugin wrapper that hashes the diagram source (sha256 of text between ```mermaid and ```), checks .cache/mermaid/<hash>.svg — and returns it without calling mermaid-isomorphic on a hit. I haven’t done this on this blog — 11.6 seconds cold-start isn’t painful enough.

6. Alternatives: what I looked at and why I didn’t choose them

6.1. `@mermaid-js/mermaid-cli`

Official CLI from mermaid-js: mmdc -i diagram.mmd -o diagram.svg. Under the hood: puppeteer (Chromium API fork) + full Chromium binary.

Downsides for a blog pipeline:

No integration with rehype/remark — you’d have to extract markdown blocks manually.
Each run spawns a new browser context (no batch mode).
On 32 diagrams — 32 separate puppeteer launches ≈ tens of seconds vs ~5–6 seconds with mermaid-isomorphic with a single browser instance.

When it fits: one-off conversion *.mmd → *.svg in a monorepo for designers, not for dynamic HTML insertion.

6.2. Client-side `mermaid` (npm package)

Downsides already covered above: bundle, FOUC, hydration. One upside — dynamic diagrams from user input at runtime (live preview in documentation editor). For a static blog — overkill.

6.3. `mermaid-isomorphic` directly (without rehype)

The same package that rehype-mermaid calls under the hood. You can use it outside Astro: import { createMermaidRenderer } from 'mermaid-isomorphic'; const renderer = createMermaidRenderer(); const [{ svg }] = await renderer([{ value: 'flowchart TD\nA-->B' }]);.

When it fits: your own pipeline build (Eleventy, MkDocs plugin on Node.js) that doesn’t use a rehype chain. For me — Astro, so rehype-mermaid gives zero-boilerplate.

6.4. Pre-render via GitHub Actions matrix + commit back

Hypothetically: a workflow on push that renders SVG, commits to public/diagrams/, and the build step uses pre-mermaid strategy with replacement to <img src="/diagrams/<hash>.svg">. Removes Playwright from the hot build path, but: complicates PR review (binary files in diff), requires a separate workflow, breaks local pnpm dev if SVG isn’t committed yet.

Didn’t do it — 5 seconds of cold-start savings don’t justify the complexity.

Summary table

Option	Cold-start	SVG cache	Bundle JS	Setup complexity
`rehype-mermaid` + Playwright (current)	~5–6s	no	0	low (1 plugin)
`mermaid-cli` (`mmdc`)	~10s+	no	0	medium
Client-side `mermaid`	0	browser cache	~250 KB	low
Pre-render + commit	0 in build, ~5s in pre-step	yes, in git	0	high

7. Checklist: what to measure before choosing

Before committing to build-time rendering or anything else:

How many diagrams on average. On 1–3 — client-side is OK (lazy-load mermaid via dynamic import). On 30+ — build-time is cheaper for the user.
Content edit frequency. If you edit content 5 times a day — cold-start 11 seconds × 50 pushes = ~10 minutes of CI time per day. If once a week — doesn’t matter.
CI platform. Vercel hobby, Netlify free, Cloudflare Pages — all have build minute limits. Playwright + Chromium on every PR preview = you’ll hit limits fast. On self-hosted runner or Dokploy (like me) — doesn’t matter.
Target JS bundle size. If your project has a KPI of “<100 KB initial JS” — 250 KB mermaid client-side breaks the budget. Build-time SVG doesn’t touch the JS budget.
Do you need interactivity. Pan/zoom/click handlers in the diagram? Then client-side is mandatory. Static picture for reading? Build-time.
Where your cold-start cost lives. If in runtime Docker — cut Playwright from the run stage. If in CI — cache Chromium via actions/cache.
Can you live with no SVG cache. rehype-mermaid renders ALL blocks in a file on any edit. If that hurts — write your own caching wrapper with sha256 key on diagram source.

Summary

On this blog, rehype-mermaid + Playwright costs ~5 seconds cold-start, outputs 32 diagrams into 27 HTML pages with median inline-SVG size of 25 KB, requires zero bytes of JS on the client, and lets you write diagrams directly in markdown. This is a very good tradeoff for a static blog.

When it won’t fit: a blog with a hundred diagrams, a deploy platform with build-minute limits, or a requirement for interactive diagrams. In the first case — write a caching wrapper, in the second — pre-render in a separate workflow, in the third — client-side.

The main non-obvious thing to remember: Astro “warms up” (5.2 MB content store, Vite cache), but mermaid-isomorphic doesn’t. Cold-start Playwright is paid on every build from scratch. This isn’t a bug, it’s by-design — and it’s why my full build takes 11.6 seconds instead of 1.6.

artka.dev — Notes from production

12 правил для CLAUDE.md: расширение Karpathy на ошибки 2026 года

1. What happened over four months

2. Karpathy’s four rules

3. Where the Karpathy template falls short

4. Eight added rules

4.1. Rule 5 — Use the model only for judgment calls

4.2. Rule 6 — Token budgets are not advisory

4.3. Rule 7 — Surface conflicts, don’t average them

4.4. Rule 8 — Read before you write

4.5. Rule 9 — Tests verify intent, not just behavior

4.6. Rule 10 — Checkpoint after every significant step

4.7. Rule 11 — Match the codebase’s conventions, even if you disagree

4.8. Rule 12 — Fail loud

5. What doesn’t work (what got filtered out)

6. Checking against my own CLAUDE.md

7. How to add — without bloat

Conclusion

ds4 by antirez: local coding agent on DeepSeek V4 Flash that runs on MacBook

1. What happened in two weeks

2. Three engineering tricks that make this possible

2.1. Asymmetric 2-bit quantization

2.2. KV-cache as first-class disk citizen

2.3. Metal-only and one model at a time

3. What I’ll need: hardware, model, an hour of time

4. Installation step by step

4.1. Building

4.2. First run in REPL

4.3. Running as HTTP server

5. Connecting as a coding agent

5.1. Claude Code → Anthropic-compatible endpoint

5.2. opencode

5.3. Pi (antirez’s mini-agent)

6. Where this will break (important pitfalls)

7. What this means: vertical inference engines as a trend

Summary

JSON-LD @graph in Astro: from duplicated inline-blocks to a single citable-node

1. Why change — citability vs SERP

2. Antipattern: per-page inline schema

3. Target architecture — @graph with global @id

4. Implementation in Astro 5

4.1. graphIds — URI table

4.2. Builders — pure functions, no classes

4.3. Orchestrator — buildGraph

4.4. BaseLayout — the only emission point

4.5. Page-level contract

5. articleBody — why excerpt, not full body

6. FAQPage as a side-effect of MDX component

7. Measurements before/after

What’s next

robots.txt in the age of AI crawlers: GPTBot, ClaudeBot, PerplexityBot — reality 2026

1. Why rewrite robots.txt in 2026

2. List of named AI-crawlers and their purpose

3. Decisions for each bot

Training crawlers

Answer/search crawlers

On-demand fetchers

My policy for artka.dev

4. Ready-made robots.txt template

5. Disallow-namespaces are more important than decisions for a specific bot

6. llms.txt and llms-full.txt — a separate contract

7. What robots.txt doesn’t control

8. Audit checklist every six months

Summary

Mermaid → SVG via Playwright at build time: cold start, cache, and SSG cost

1. Why render Mermaid at build-time instead of client-side

2. Architecture: rehype-mermaid + Playwright

3. Cold start vs warm build

4. Cost on CI

5. SVG caching: where they live and what invalidates them

6. Alternatives: what I looked at and why I didn’t choose them

6.1. @mermaid-js/mermaid-cli

6.2. Client-side mermaid (npm package)

6.3. mermaid-isomorphic directly (without rehype)

6.4. Pre-render via GitHub Actions matrix + commit back

Summary table

7. Checklist: what to measure before choosing

Summary

3. Target architecture — `@graph` with global `@id`

4.1. `graphIds` — URI table

4.3. Orchestrator — `buildGraph`

4.4. `BaseLayout` — the only emission point

5. `articleBody` — why excerpt, not full body

6. `FAQPage` as a side-effect of MDX component

6. `llms.txt` and `llms-full.txt` — a separate contract

6.1. `@mermaid-js/mermaid-cli`

6.2. Client-side `mermaid` (npm package)

6.3. `mermaid-isomorphic` directly (without rehype)