<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>artka.dev — Notes from production</title><description>Notes on Claude Code, AI agents, RAG pipelines and production backend by Artyom Kashuta.</description><link>https://artka.dev/</link><language>en-US</language><copyright>© 2026 artka.dev</copyright><item><title>12 правил для CLAUDE.md: расширение Karpathy на ошибки 2026 года</title><link>https://artka.dev/en/blog/claude-md-12-rules/</link><guid isPermaLink="true">https://artka.dev/en/blog/claude-md-12-rules/</guid><description>Mnilax протестировал 12 правил для CLAUDE.md на 30 кодовых базах за 6 недель — расширение шаблона Karpathy на agent-loops, чекпойнты и fail-loud. Разбор и рамка применения.</description><pubDate>Sun, 10 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;Over four months after Karpathy’s January thread, the &lt;code&gt;CLAUDE.md&lt;/code&gt; template grew from 4 rules to 12. I ran the expanded set on typical blog tasks and several work repos — the frequency of silent Claude Code errors drops noticeably. The eight added rules cover what didn’t exist as a class of problems in January: long-running agent loops, cross-session flows, shallow tests, quiet failures instead of explicit errors. I opened my own &lt;code&gt;CLAUDE.md&lt;/code&gt; for this blog — Karpathy’s four original rules are already there in &lt;code&gt;Code Standards&lt;/code&gt; and &lt;code&gt;Prohibitions&lt;/code&gt;, the eight added ones are not. I’m going through each one and figuring out where it makes sense to insert them.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;1. What happened over four months&lt;/h2&gt;
&lt;p&gt;In late January, Andrej Karpathy published a thread with three complaints about Claude as a code-writer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;silent wrong assumptions — the model fills in context without asking;&lt;/li&gt;
&lt;li&gt;over-complication — adds layers of abstraction that nobody asked for;&lt;/li&gt;
&lt;li&gt;orthogonal damage — touches code it shouldn’t have touched.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Forrest Chang packaged the complaints into a &lt;code&gt;CLAUDE.md&lt;/code&gt; with four behavioral rules and committed it to GitHub. The repo exploded — by May, over 100,000 stars, the fastest-growing single-file project of the year. Then the template grew an extension: eight additional rules that cover what wasn’t a focus in January because the Claude Code landscape didn’t exist the way it does now.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-mermaid&quot;&gt;flowchart LR
  subgraph jan[&amp;quot;January 2026&amp;quot;]
    K[&amp;quot;Karpathy: thread with three&amp;lt;br/&amp;gt;failure modes&amp;quot;]
    K --&amp;gt; F[&amp;quot;Forrest Chang:&amp;lt;br/&amp;gt;4 rules in CLAUDE.md&amp;quot;]
  end
  subgraph may[&amp;quot;May 2026&amp;quot;]
    F --&amp;gt; N[&amp;quot;New failure modes:&amp;lt;br/&amp;gt;agent loops, multi-codebase,&amp;lt;br/&amp;gt;shallow tests, silent failures&amp;quot;]
    N --&amp;gt; M[&amp;quot;+8 rules,&amp;lt;br/&amp;gt;total 12&amp;quot;]
  end
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h2&gt;2. Karpathy’s four rules&lt;/h2&gt;
&lt;p&gt;This is the foundation. Without it, any superstructure loses half its meaning.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Rule&lt;/th&gt;
&lt;th&gt;What it covers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Think Before Coding&lt;/td&gt;
&lt;td&gt;Silent guesses. Voice assumptions, ask when unclear, push back when there’s a simpler way.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Simplicity First&lt;/td&gt;
&lt;td&gt;Minimum code that solves the task. No speculative abstractions “for the future”.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Surgical Changes&lt;/td&gt;
&lt;td&gt;Touch only what’s needed. Don’t “improve” neighboring code, don’t reformat what you weren’t asked about.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Goal-Driven Execution&lt;/td&gt;
&lt;td&gt;Describe success criteria, not step-by-step instructions. Strong success-criteria let the model iterate.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In my Astro blog’s &lt;code&gt;CLAUDE.md&lt;/code&gt;, these four are covered not as a separate section, but through &lt;code&gt;Code Standards → Functional Style&lt;/code&gt; (rules 2 and 3 — no classes, no extra abstractions) and &lt;code&gt;Prohibitions&lt;/code&gt; (rule 3 — a “don’t do” list). The rules themselves aren’t duplicated as text, but their consequences land in the context.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;3. Where the Karpathy template falls short&lt;/h2&gt;
&lt;p&gt;Four gaps I observe in real work:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Gap&lt;/th&gt;
&lt;th&gt;What breaks&lt;/th&gt;
&lt;th&gt;Which added rules cover it&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Long-running agent tasks&lt;/td&gt;
&lt;td&gt;Multi-step pipeline drifts, burns tokens, loses context&lt;/td&gt;
&lt;td&gt;6 (budgets), 10 (checkpoints), 12 (loud)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-codebase consistency&lt;/td&gt;
&lt;td&gt;In a monorepo “match existing style” is ambiguous — Claude picks randomly or averages&lt;/td&gt;
&lt;td&gt;11 (conventions), 7 (surface conflicts)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test quality&lt;/td&gt;
&lt;td&gt;“Tests passed” becomes the goal; Claude writes tests that won’t fail even on broken logic&lt;/td&gt;
&lt;td&gt;9 (intent over behavior)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prototype vs production&lt;/td&gt;
&lt;td&gt;“Simplicity First” overdoes it early on, when you need 100 lines of scaffolding to probe&lt;/td&gt;
&lt;td&gt;(not covered by 12 rules — separate)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The last gap stays alive. Either you turn Simplicity on or off — there’s no middle mode in &lt;code&gt;CLAUDE.md&lt;/code&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;4. Eight added rules&lt;/h2&gt;
&lt;p&gt;One by one, with the moment that triggered each.&lt;/p&gt;
&lt;h3&gt;4.1. Rule 5 — Use the model only for judgment calls&lt;/h3&gt;
&lt;p&gt;If the answer is known from a status code or data schema — that’s not the model’s job. Real case from my practice: code called Claude to decide whether to retry an API call on 503. Worked for two weeks, then started flaking because the model was reading the request body as context for the decision. Retry policy became random because the prompt was random.&lt;/p&gt;
&lt;p&gt;Frame: Claude is for classification, extraction, drafts, summarization. Not for routing, retries, deterministic transformations. If a status code already answers the question — regular code answers it.&lt;/p&gt;
&lt;h3&gt;4.2. Rule 6 — Token budgets are not advisory&lt;/h3&gt;
&lt;p&gt;Without a budget, the loop dumps 50,000 tokens. Hard version: 4,000 per task, 30,000 per session. Approaching the boundary — sum up and restart the session.&lt;/p&gt;
&lt;p&gt;Typical case: 90-minute debugging session with the same 8 KB error message. By the end — re-proposing fixes you rejected 40 messages ago. The model happily iterates on a lost track. A budget would have killed the loop at minute 12.&lt;/p&gt;
&lt;h3&gt;4.3. Rule 7 — Surface conflicts, don’t average them&lt;/h3&gt;
&lt;p&gt;If the codebase has two error-handling patterns — try/catch and global boundary — Claude writes code that does both. Double handlers. Symptom: the error gets swallowed twice.&lt;/p&gt;
&lt;p&gt;Rule: when there’s a contradiction, pick one (the newer or more tested one), explain why, mark the second for cleanup. Averaged code that satisfies both rules is the worst possible.&lt;/p&gt;
&lt;h3&gt;4.4. Rule 8 — Read before you write&lt;/h3&gt;
&lt;p&gt;Karpathy says “don’t touch neighboring code”. Doesn’t say — read it before adding yours. Real case: Claude added a function next to an already-existing identical one without reading the file. Import order won — the old, source-of-truth for six months, lost to the fresh same-named one.&lt;/p&gt;
&lt;p&gt;Frame: before adding code to a file — read the exports, the nearest calling code, and common utilities. “Looks orthogonal to me” is the most dangerous phrase in a codebase.&lt;/p&gt;
&lt;h3&gt;4.5. Rule 9 — Tests verify intent, not just behavior&lt;/h3&gt;
&lt;p&gt;A test &lt;code&gt;expect(getUserName()).toBe(&apos;John&apos;)&lt;/code&gt; means nothing if the function returns a constant. Tests should fail when business logic changes, otherwise they’re testing that a function exists, not that it’s correct.&lt;/p&gt;
&lt;p&gt;Typical example: 12 tests on an auth function, all green, auth is broken in production. Tests checked that the function returns something, not that it returns the right value.&lt;/p&gt;
&lt;h3&gt;4.6. Rule 10 — Checkpoint after every significant step&lt;/h3&gt;
&lt;p&gt;A multi-step refactor across 20 files breaks on step 4, Claude keeps going on broken state. By the time you notice, steps 5 and 6 are already done on top of broken — untangling takes longer than redoing from scratch.&lt;/p&gt;
&lt;p&gt;Rule: after each significant step — summarize what’s done, what’s verified, what’s left. If you lose track — stop and recap.&lt;/p&gt;
&lt;h3&gt;4.7. Rule 11 — Match the codebase’s conventions, even if you disagree&lt;/h3&gt;
&lt;p&gt;Claude introduces hooks into a codebase of class components. Technically works. Breaks the testing pattern built for &lt;code&gt;componentDidMount&lt;/code&gt;. Half a day to delete and rewrite.&lt;/p&gt;
&lt;p&gt;Rule: inside a codebase, conformance matters more than taste. Disagreement is a separate conversation, not a silent fork. Snake_case vs camelCase, classes vs hooks — pick what’s there, not what’s better.&lt;/p&gt;
&lt;h3&gt;4.8. Rule 12 — Fail loud&lt;/h3&gt;
&lt;p&gt;The most expensive errors are the ones that look like success. “Migration complete” when 14% of records were silently skipped. “Tests passed” when some were skipped. “Feature works” if an edge case you explicitly asked to check wasn’t.&lt;/p&gt;
&lt;p&gt;Rule: when uncertain — raise the question, don’t hide it. Default to surfacing uncertainty, not concealing it.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;5. What doesn’t work (what got filtered out)&lt;/h2&gt;
&lt;p&gt;The template is valuable not just for what’s in it, but for what was filtered out when trying to expand:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Rules from Reddit and X.&lt;/strong&gt; Most are reformulations of Karpathy or domain-specific (“always Tailwind”). Don’t generalize.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;More than 12 rules.&lt;/strong&gt; On sets of 14+ rules, compliance drops: important points drown in noise. The 200-line ceiling (including stack, commands, prohibitions) is real.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool-specific rules.&lt;/strong&gt; “Always use eslint” fails silently if eslint isn’t installed. Better — capability-agnostic: “match the enforced style”.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Examples instead of rules.&lt;/strong&gt; One example eats ~10 rules’ worth of context, and the model over-fits on specifics. Rules are abstract and portable.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Soft language.&lt;/strong&gt; “Be careful”, “think hard”, “really focus” — compliance ~30%. Not testable. Replace with concrete imperatives: “state assumptions explicitly”.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Identity prompts.&lt;/strong&gt; “Be a senior engineer” doesn’t work: the model already thinks it’s a senior. The gap between “thinking” and “doing” closes with imperatives, not identity.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;6. Checking against my own &lt;a href=&quot;http://CLAUDE.md&quot;&gt;CLAUDE.md&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I opened this blog’s file (191 lines) and went through all 12 rules. Here’s the picture:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rule&lt;/th&gt;
&lt;th&gt;In my &lt;a href=&quot;http://CLAUDE.md&quot;&gt;CLAUDE.md&lt;/a&gt;&lt;/th&gt;
&lt;th&gt;Where&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Think before coding&lt;/td&gt;
&lt;td&gt;indirectly&lt;/td&gt;
&lt;td&gt;via &lt;code&gt;architect → critic&lt;/code&gt; workflow in agent stack&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Simplicity&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;&lt;code&gt;No classes&lt;/code&gt;, &lt;code&gt;Immutability by default&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Surgical changes&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Prohibitions&lt;/code&gt; (deprecated &lt;code&gt;@astrojs/tailwind&lt;/code&gt;, &lt;code&gt;node:*-alpine&lt;/code&gt;, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Goal-driven&lt;/td&gt;
&lt;td&gt;indirectly&lt;/td&gt;
&lt;td&gt;via subagent structure, not as separate rule&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5. Judgment-only&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6. Token budgets&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7. Surface conflicts&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8. Read before write&lt;/td&gt;
&lt;td&gt;partially&lt;/td&gt;
&lt;td&gt;GitNexus section requires impact analysis before edits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9. Test intent&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10. Checkpoints&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11. Match conventions&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Code Standards → TypeScript / Astro / Git&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12. Fail loud&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Result — four covered, two partial, six missing. The file is effectively Karpathy-level, without the 2026 superstructure.&lt;/p&gt;
&lt;p&gt;Which of the missing ones make sense to add specifically for an Astro blog with publications through an admin panel:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Rule 6 (budgets)&lt;/strong&gt; — yes, my agents do long-running tasks (generating EN translations via &lt;code&gt;pnpm translate&lt;/code&gt;, migrations). Without a budget, a session can drift.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rule 9 (test intent)&lt;/strong&gt; — yes, I have Vitest and Playwright, the risk of shallow tests is real.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rule 10 (checkpoints)&lt;/strong&gt; — yes, multi-step tasks on schema + migrations + UI updates regularly take half an hour of agent work.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rule 12 (fail loud)&lt;/strong&gt; — yes, in the admin panel “saved” often doesn’t mean “published”, need explicit surfacing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Rule 7 is less acute for a single project. Rule 5 is covered by the fact that there’s no AI routing in the blog runtime — the model doesn’t make decisions for code.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;7. How to add — without bloat&lt;/h2&gt;
&lt;p&gt;Discipline:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Don’t exceed 200 lines total.&lt;/strong&gt; Counting stack, commands, prohibitions, rules. I’m at 191 now — adding four rules means moving part of &lt;code&gt;Homepage&lt;/code&gt; or GitNexus section to &lt;code&gt;@docs/...&lt;/code&gt; via Claude Code @-import.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Each rule answers “what error does it prevent”.&lt;/strong&gt; If it doesn’t — delete it.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Capability-agnostic formulations.&lt;/strong&gt; “Match the enforced style”, not “use prettier”.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Imperatives, not wishes.&lt;/strong&gt; “State assumptions explicitly”, not “think carefully”.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Test it.&lt;/strong&gt; Run a typical task before and after. No difference — the rule didn’t work in your context, delete it.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Six rules tailored to real errors beat twelve generic ones.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Karpathy pinned three code-writing failure modes from January. Forrest Chang packed them into four rules, and the community grabbed the template. The expansion to 12 came from the Claude Code landscape being different by May: multi-step agents, hook cascades, skill conflicts, cross-session flows. The eight added rules cover new gaps without replacing the original ones.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; is not a wishlist, but a behavioral contract against specific errors you’ve already seen. Someone else’s template is useful as a starter. After that — filter it for your failure modes, not the other way around. Six rules precisely chosen beat twelve copied ones.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/karpathy/status/1885018475234567890&quot;&gt;Andrej Karpathy — original thread on X (January 2026)&lt;/a&gt; — three code-writing failure modes&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/forrestchang/andrej-karpathy-skills&quot;&gt;forrestchang/andrej-karpathy-skills&lt;/a&gt; — public repo with the basic 4-rule template&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.claude.com/en/docs/claude-code/&quot;&gt;Anthropic Claude Code docs — CLAUDE.md&lt;/a&gt; — official documentation on file structure, advisory, ~80% compliance&lt;/li&gt;
&lt;/ul&gt;
</content:encoded><category>ai</category><category>claude-code</category><category>prompt-engineering</category><author>a@artka.dev (Артём)</author></item><item><title>ds4 by antirez: local coding agent on DeepSeek V4 Flash that runs on MacBook</title><link>https://artka.dev/en/blog/local-coding-agent/</link><guid isPermaLink="true">https://artka.dev/en/blog/local-coding-agent/</guid><description>The creator of Redis wrote an inference engine in two weeks for just one model — DeepSeek V4 Flash. 1M context, 26 t/s on M3 Max, KV-cache on disk. How to run it and connect it to Claude Code.</description><pubDate>Sat, 09 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;Garry Tan and Bindu Reddy on May 9, 2026 simultaneously shared the same news: Redis creator Salvatore Sanfilippo (antirez) released &lt;a href=&quot;https://github.com/antirez/ds4&quot;&gt;&lt;code&gt;ds4&lt;/code&gt;&lt;/a&gt; — an inference engine in C+Metal that runs DeepSeek V4 Flash (284B MoE, 1M context) on a laptop. Not “technically possible,” but “works with coding agents at 26 t/s”. I figured out what’s under the hood and how to use it as a local backend for Claude Code.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;1. What happened in two weeks&lt;/h2&gt;
&lt;p&gt;On April 24, 2026, DeepSeek released the V4 series. V4 Flash is an efficiency model: 284 billion parameters total, 13 billion active (MoE), 1 million token context. Before this, models of this size only lived in the cloud.&lt;/p&gt;
&lt;p&gt;Antirez looked at this and made a bet that universal runners can’t make. He forked &lt;code&gt;llama.cpp&lt;/code&gt;, spent two weeks inside it, understood the geometry of V4 Flash, &lt;strong&gt;threw out everything unnecessary&lt;/strong&gt;, and wrote an engine from scratch in 4 files: &lt;code&gt;ds4.c&lt;/code&gt; (~ inference), &lt;code&gt;ds4_metal.m&lt;/code&gt; (Metal kernels), &lt;code&gt;ds4_server.c&lt;/code&gt; (HTTP server), &lt;code&gt;ds4_cli.c&lt;/code&gt; (REPL). On the outside, all of this speaks two protocols simultaneously: OpenAI Chat Completions (&lt;code&gt;/v1/chat/completions&lt;/code&gt;) and Anthropic Messages (&lt;code&gt;/v1/messages&lt;/code&gt;). That is, it connects to any agent that knows one of them.&lt;/p&gt;
&lt;p&gt;Results that the author measured himself:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Machine&lt;/th&gt;
&lt;th&gt;Quant&lt;/th&gt;
&lt;th&gt;Prompt&lt;/th&gt;
&lt;th&gt;Prefill&lt;/th&gt;
&lt;th&gt;Generation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MacBook Pro M3 Max, 128 GB&lt;/td&gt;
&lt;td&gt;q2&lt;/td&gt;
&lt;td&gt;short&lt;/td&gt;
&lt;td&gt;58.52 t/s&lt;/td&gt;
&lt;td&gt;26.68 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MacBook Pro M3 Max, 128 GB&lt;/td&gt;
&lt;td&gt;q2&lt;/td&gt;
&lt;td&gt;11709 tokens&lt;/td&gt;
&lt;td&gt;250.11 t/s&lt;/td&gt;
&lt;td&gt;21.47 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mac Studio M3 Ultra, 512 GB&lt;/td&gt;
&lt;td&gt;q2&lt;/td&gt;
&lt;td&gt;short&lt;/td&gt;
&lt;td&gt;84.43 t/s&lt;/td&gt;
&lt;td&gt;36.86 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mac Studio M3 Ultra, 512 GB&lt;/td&gt;
&lt;td&gt;q4&lt;/td&gt;
&lt;td&gt;12018 tokens&lt;/td&gt;
&lt;td&gt;448.82 t/s&lt;/td&gt;
&lt;td&gt;26.62 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;26 tokens per second of generation — this is not “you can take a look,” this is &lt;strong&gt;working speed for a coding agent&lt;/strong&gt; that writes, reads files, calls tools. On a long prompt, generation drops to 21 t/s, but thanks to KV-cache on disk, this pays for itself by the third request in the same session.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;2. Three engineering tricks that make this possible&lt;/h2&gt;
&lt;p&gt;I carefully read the README and &lt;code&gt;AGENT.md&lt;/code&gt; of the repository, and below is the most essential, without which ds4 wouldn’t work.&lt;/p&gt;
&lt;h3&gt;2.1. Asymmetric 2-bit quantization&lt;/h3&gt;
&lt;p&gt;The standard approach to 2-bit quantization is to compress everything down to 2 bits, and then the model starts hallucinating in tool calling, confusing arguments, and forgetting the schema. Antirez did it differently: &lt;strong&gt;only MoE experts on the routed path are quantized&lt;/strong&gt; (&lt;code&gt;up&lt;/code&gt;/&lt;code&gt;gate&lt;/code&gt; in &lt;code&gt;IQ2_XXS&lt;/code&gt;, &lt;code&gt;down&lt;/code&gt; in &lt;code&gt;Q2_K&lt;/code&gt;) — because they take up most of the weight (the model is 284B, and almost all of it is experts). Shared experts, projections, routing — remain in Q8. These are components where loss of precision is expensive.&lt;/p&gt;
&lt;p&gt;Effect: 2-bit quantization weighs 81 GB and fits in 128 GB of unified memory on MacBook Pro M3 Max, while reliably working in coding agents (validated by tests against official DeepSeek API logits).&lt;/p&gt;
&lt;h3&gt;2.2. KV-cache as first-class disk citizen&lt;/h3&gt;
&lt;p&gt;The main pain of stateless API protocols like Chat Completions: the client &lt;strong&gt;sends the entire history every time&lt;/strong&gt;, and the server must prefill it from scratch. Claude Code, for example, sends ~25K tokens of system prompt at startup. On local hardware, this is tens of seconds before the first token.&lt;/p&gt;
&lt;p&gt;Ds4 solves this head-on: after successful prefill, the session state (KV checkpoint) is serialized to a file, the key is SHA1 of token IDs. When the next request comes with the same prefix, the server takes the checkpoint from disk and skips prefill. From the README:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The KV cache &lt;strong&gt;is actually a first class disk citizen&lt;/strong&gt;. &amp;lt;…&amp;gt; Modern MacBooks have fast SSDs and compressed KV caches like the one of DeepSeek v4.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In practice, this means the difference between “4 seconds to first token on repeat call” and “60 seconds”. The disk here is not swap under pressure, but logical storage: SSDs are fast enough, KV in DeepSeek V4 compresses well, and the characteristic “same system prompt + changing tail” precisely describes how a coding agent works.&lt;/p&gt;
&lt;h3&gt;2.3. Metal-only and one model at a time&lt;/h3&gt;
&lt;p&gt;No CUDA, no CPU fallback for production (the CPU path exists only for correctness checks and currently crashes at the macOS kernel level due to a VM bug — antirez writes about this honestly). No attempt to make a “universal runner”. Only Apple Silicon, only this one model, and so on until a new version of V4 Flash appears or a much better model of the same class.&lt;/p&gt;
&lt;p&gt;The cost is a narrow bet. The benefit is that you don’t need to maintain a matrix of &lt;code&gt;(model × hardware × quant)&lt;/code&gt;, and you can optimize Metal kernels for the exact geometry of layers in this specific model.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;3. What I’ll need: hardware, model, an hour of time&lt;/h2&gt;
&lt;p&gt;I plan to deploy this on a &lt;strong&gt;MacBook Pro M3 Max, 128 GB&lt;/strong&gt; (the minimally viable configuration according to README). I don’t have it yet, and in this section — an honest plan of what I’ll do when the hardware arrives; the numbers are taken from antirez’s benchmarks, but I want to double-check them on my instance.&lt;/p&gt;
&lt;p&gt;Minimum requirements by my estimates:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;macOS on a current version (there’s a VM bug in the CPU path, but the Metal path is unaffected).&lt;/li&gt;
&lt;li&gt;Apple Silicon with 128 GB+ unified memory. M3 Max or M3 Ultra.&lt;/li&gt;
&lt;li&gt;~100 GB free space: 81 GB the model itself in Q2 + space for KV-cache on disk. For Q4 quantization — 256 GB+ RAM and ~150 GB on disk.&lt;/li&gt;
&lt;li&gt;Xcode Command Line Tools (for clang/Metal headers).&lt;/li&gt;
&lt;li&gt;~30–60 minutes to download the model (depends on your connection).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What might not be enough for beginners: 128 GB unified memory is the level of top-spec MBP M3 Max or Mac Studio. On a 64 GB Mac, Q2 won’t work: the model simply won’t fit in RAM. This is not “slow,” this is “no way.”&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;4. Installation step by step&lt;/h2&gt;
&lt;p&gt;The commands below are what I’ll do on day one, based on the README instructions. Where the description lacks specifics — I’ve added my own comments.&lt;/p&gt;
&lt;h3&gt;4.1. Building&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;# 1. Склонировать репозиторий
git clone https://github.com/antirez/ds4.git
cd ds4

# 2. Скачать 2-битный квант (81 GB; для 128 GB MBP)
./download_model.sh q2

# Скрипт качает с huggingface.co/antirez/deepseek-v4-gguf,
# поддерживает резюм через curl -C - — можно прервать и продолжить.
# Если нужен 4-битный квант (для Mac Studio 256+ GB), используй ./download_model.sh q4.

# 3. Собрать
make

# Проверить, что собралось:
./ds4 --help
./ds4-server --help
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Building is a regular &lt;code&gt;make&lt;/code&gt;, no CMake, no pkg-config. This is intentional: the project has no dependencies outside the Apple SDK.&lt;/p&gt;
&lt;h3&gt;4.2. First run in REPL&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;./ds4 -p &amp;quot;Объясни Redis streams в одном абзаце.&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Without &lt;code&gt;-p&lt;/code&gt;, it launches an interactive session with commands &lt;code&gt;/help&lt;/code&gt;, &lt;code&gt;/think&lt;/code&gt;, &lt;code&gt;/think-max&lt;/code&gt;, &lt;code&gt;/nothink&lt;/code&gt;, &lt;code&gt;/ctx N&lt;/code&gt;, &lt;code&gt;/read FILE&lt;/code&gt;, &lt;code&gt;/quit&lt;/code&gt;. This is good for checking that the engine is alive and for comparing generation speed against the claimed 26 t/s.&lt;/p&gt;
&lt;h3&gt;4.3. Running as HTTP server&lt;/h3&gt;
&lt;p&gt;This is the mode where ds4 becomes a local backend for agents:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;./ds4-server \
  --ctx 100000 \
  --kv-disk-dir /tmp/ds4-kv \
  --kv-disk-space-mb 8192
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;--ctx 100000&lt;/code&gt; — context window of 100K tokens. The full 1M context takes ~26 GB just for the indexer; on a 128 GB Mac where 81 GB is already taken by the model, this leaves no room for KV-cache. 100–300K is a reasonable compromise.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;--kv-disk-dir /tmp/ds4-kv&lt;/code&gt; — directory for disk KV-cache. I’d move it to a fast SSD (external or built-in — both are fine).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;--kv-disk-space-mb 8192&lt;/code&gt; — limit on cache size. 8 GB is enough for one or two active projects; for larger sessions — increase it.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The server listens on &lt;code&gt;127.0.0.1:8000&lt;/code&gt;. Endpoints:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Protocol&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;POST /v1/chat/completions&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OpenAI Chat Completions (+ tools)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;POST /v1/completions&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OpenAI legacy completions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;POST /v1/messages&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Anthropic Messages (for Claude Code)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;GET /v1/models&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;list of models&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Authentication via static API key (by default accepts any; README recommends &lt;code&gt;dsv4-local&lt;/code&gt;).&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;5. Connecting as a coding agent&lt;/h2&gt;
&lt;p&gt;This is the part I dug into the topic for. All three methods below work simultaneously — each agent talks to the same &lt;code&gt;ds4-server&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;5.1. Claude Code → Anthropic-compatible endpoint&lt;/h3&gt;
&lt;p&gt;Claude Code can talk to any backend that exposes the Anthropic Messages API. Create a wrapper &lt;code&gt;~/bin/claude-ds4&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;#!/bin/sh
unset ANTHROPIC_API_KEY

export ANTHROPIC_BASE_URL=&amp;quot;${DS4_ANTHROPIC_BASE_URL:-http://127.0.0.1:8000}&amp;quot;
export ANTHROPIC_AUTH_TOKEN=&amp;quot;${DS4_API_KEY:-dsv4-local}&amp;quot;
export ANTHROPIC_MODEL=&amp;quot;deepseek-v4-flash&amp;quot;

# Подменяем все алиасы Sonnet/Haiku/Opus на локальную модель —
# чтобы /model в Claude Code не дёрнул облачный fallback.
export ANTHROPIC_DEFAULT_SONNET_MODEL=&amp;quot;deepseek-v4-flash&amp;quot;
export ANTHROPIC_DEFAULT_HAIKU_MODEL=&amp;quot;deepseek-v4-flash&amp;quot;
export ANTHROPIC_DEFAULT_OPUS_MODEL=&amp;quot;deepseek-v4-flash&amp;quot;
export CLAUDE_CODE_SUBAGENT_MODEL=&amp;quot;deepseek-v4-flash&amp;quot;

# Отключаем телеметрию и не-стриминговый fallback.
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
export CLAUDE_CODE_DISABLE_NONSTREAMING_FALLBACK=1
export CLAUDE_STREAM_IDLE_TIMEOUT_MS=600000

exec &amp;quot;$HOME/.local/bin/claude&amp;quot; &amp;quot;$@&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;chmod +x ~/bin/claude-ds4&lt;/code&gt; — and run Claude Code as &lt;code&gt;claude-ds4&lt;/code&gt; instead of &lt;code&gt;claude&lt;/code&gt;. All requests will go to the local ds4 server. A subtlety that antirez himself points out:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Claude Code may send a large initial prompt, often around 25k tokens, before it starts doing useful work. Keep &lt;code&gt;--kv-disk-dir&lt;/code&gt; enabled.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Without disk KV-cache, cold startup of Claude Code will take a minute or more; with cache — after the first startup, subsequent ones will restore from disk.&lt;/p&gt;
&lt;h3&gt;5.2. opencode&lt;/h3&gt;
&lt;p&gt;opencode is configured via &lt;code&gt;~/.config/opencode/opencode.json&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-json&quot;&gt;{
  &amp;quot;$schema&amp;quot;: &amp;quot;https://opencode.ai/config.json&amp;quot;,
  &amp;quot;provider&amp;quot;: {
    &amp;quot;ds4&amp;quot;: {
      &amp;quot;name&amp;quot;: &amp;quot;ds4.c (local)&amp;quot;,
      &amp;quot;npm&amp;quot;: &amp;quot;@ai-sdk/openai-compatible&amp;quot;,
      &amp;quot;options&amp;quot;: {
        &amp;quot;baseURL&amp;quot;: &amp;quot;http://127.0.0.1:8000/v1&amp;quot;,
        &amp;quot;apiKey&amp;quot;: &amp;quot;dsv4-local&amp;quot;
      },
      &amp;quot;models&amp;quot;: {
        &amp;quot;deepseek-v4-flash&amp;quot;: {
          &amp;quot;name&amp;quot;: &amp;quot;DeepSeek V4 Flash (ds4.c local)&amp;quot;,
          &amp;quot;limit&amp;quot;: { &amp;quot;context&amp;quot;: 100000, &amp;quot;output&amp;quot;: 384000 }
        }
      }
    }
  },
  &amp;quot;agent&amp;quot;: {
    &amp;quot;ds4&amp;quot;: {
      &amp;quot;description&amp;quot;: &amp;quot;DeepSeek V4 Flash served by local ds4-server&amp;quot;,
      &amp;quot;model&amp;quot;: &amp;quot;ds4/deepseek-v4-flash&amp;quot;,
      &amp;quot;temperature&amp;quot;: 0
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;limit.context: 100000&lt;/code&gt; must match the &lt;code&gt;--ctx&lt;/code&gt; with which &lt;code&gt;ds4-server&lt;/code&gt; starts — otherwise the server will truncate, and opencode won’t know about it and will send the next message expecting a non-working length.&lt;/p&gt;
&lt;h3&gt;5.3. Pi (antirez’s mini-agent)&lt;/h3&gt;
&lt;p&gt;If you use Pi — the format is slightly different, config in &lt;code&gt;~/.pi/agent/models.json&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-json&quot;&gt;{
  &amp;quot;providers&amp;quot;: {
    &amp;quot;ds4&amp;quot;: {
      &amp;quot;name&amp;quot;: &amp;quot;ds4.c local&amp;quot;,
      &amp;quot;baseUrl&amp;quot;: &amp;quot;http://127.0.0.1:8000/v1&amp;quot;,
      &amp;quot;api&amp;quot;: &amp;quot;openai-completions&amp;quot;,
      &amp;quot;apiKey&amp;quot;: &amp;quot;dsv4-local&amp;quot;,
      &amp;quot;compat&amp;quot;: {
        &amp;quot;supportsStore&amp;quot;: false,
        &amp;quot;supportsDeveloperRole&amp;quot;: false,
        &amp;quot;supportsReasoningEffort&amp;quot;: true,
        &amp;quot;supportsUsageInStreaming&amp;quot;: true,
        &amp;quot;maxTokensField&amp;quot;: &amp;quot;max_tokens&amp;quot;,
        &amp;quot;thinkingFormat&amp;quot;: &amp;quot;deepseek&amp;quot;,
        &amp;quot;requiresReasoningContentOnAssistantMessages&amp;quot;: true
      },
      &amp;quot;models&amp;quot;: [
        {
          &amp;quot;id&amp;quot;: &amp;quot;deepseek-v4-flash&amp;quot;,
          &amp;quot;name&amp;quot;: &amp;quot;DeepSeek V4 Flash (ds4.c local)&amp;quot;,
          &amp;quot;reasoning&amp;quot;: true,
          &amp;quot;contextWindow&amp;quot;: 100000,
          &amp;quot;maxTokens&amp;quot;: 384000,
          &amp;quot;cost&amp;quot;: { &amp;quot;input&amp;quot;: 0, &amp;quot;output&amp;quot;: 0, &amp;quot;cacheRead&amp;quot;: 0, &amp;quot;cacheWrite&amp;quot;: 0 }
        }
      ]
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;cost: 0&lt;/code&gt; — this is not marketing, it’s the truth. Each request costs electricity and SSD wear, not tokens.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;6. Where this will break (important pitfalls)&lt;/h2&gt;
&lt;p&gt;Real limitations I’ll run into and how to work around them.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Context window must be agreed upon everywhere.&lt;/strong&gt; You start the server with &lt;code&gt;--ctx 100000&lt;/code&gt;, set &lt;code&gt;limit.context: 100000&lt;/code&gt; in opencode, don’t go beyond that in Claude Code’s system prompt. If Claude Code’s init-prompt is ~25K, then 75K remains for the project — realistically enough for a medium codebase, but not for huge repositories.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Disk KV-cache is “tied” to the exact prefix.&lt;/strong&gt; Any edit to the system prompt, to &lt;code&gt;CLAUDE.md&lt;/code&gt;, to the first messages — invalidates the checkpoint. This is not a bug, it’s by design: matching is done by SHA1 of token IDs. If you often edit &lt;code&gt;CLAUDE.md&lt;/code&gt;, expect cold starts. Solution — commit the system contract and don’t edit it in every session.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;MTP/speculative decoding doesn’t provide much speedup yet.&lt;/strong&gt; The README directly states: “currently provides at most a slight speedup”. Don’t count on doubling speed from MTP — the current implementation is correctness-gated and often triggers partial accept on complex prompts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;One live KV-cache in memory.&lt;/strong&gt; The server currently doesn’t batch independent requests. If two agents make requests simultaneously — the second waits for the first. This is a normal trade-off for a local single-user setup, but if you want parallel multi-tenancy on one Mac — ds4 isn’t there yet.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CPU mode crashes on fresh macOS.&lt;/strong&gt; This is about the debug path, not production (Metal-only is the main target), but if you habitually want to compare inference on CPU — don’t: kernel panic, you’ll need to reboot.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;7. What this means: vertical inference engines as a trend&lt;/h2&gt;
&lt;p&gt;The main thing is not ds4 itself, but the pattern that antirez formalized.&lt;/p&gt;
&lt;p&gt;Local inference currently looks like “universal runner &lt;code&gt;+&lt;/code&gt; thousands of models in GGUF &lt;code&gt;+&lt;/code&gt; wrappers of varying freshness”. It works, but moves at the speed of the least popular model: it’s easier to speed up Llama 3.1 in llama.cpp than to add efficient support for DeepSeek V4 — because in the first case the layer structure matches twenty other models, and in the second — appears once.&lt;/p&gt;
&lt;p&gt;Antirez shows the opposite path. &lt;strong&gt;One engine — one model — one scenario (coding agent)&lt;/strong&gt;. Next you need three things, and all three are in the product:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Inference engine with HTTP API.&lt;/li&gt;
&lt;li&gt;GGUF specially prepared for this engine and its assumptions.&lt;/li&gt;
&lt;li&gt;Tests and validation on the coupling with specific agent clients.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If this bet works (and the benchmarks say it does), the future of local inference is not “yet another abstraction on top of abstraction,” but &lt;strong&gt;“each important model gets its own ds4-like project”.&lt;/strong&gt; When V4.1 or V5 comes out, someone from the community makes a new engine, new GGUF, new tests, and in two weeks users already have a working local setup. Old engines retire along with old models.&lt;/p&gt;
&lt;p&gt;And second. In the README, antirez explicitly writes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This software is developed with strong assistance from GPT 5.5 and with humans leading the ideas, testing, and debugging.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Two weeks from forking &lt;code&gt;llama.cpp&lt;/code&gt; to a production-ready narrow engine with server API — you can’t do this without AI, and antirez says it directly. This switch — “one person + AI = infrastructure for an entire model in two weeks” — is more interesting to me than the t/s numbers themselves.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;ds4&lt;/code&gt; from antirez is not “yet another local inference.” It’s a narrow bet: one engine, one model (DeepSeek V4 Flash), one hardware architecture (Apple Silicon with Metal), one scenario (coding agent). Thanks to asymmetric 2-bit quantization, a 284B model fits in 128 GB MacBook, thanks to disk KV-cache it works with agents that send 25K-token system prompts, thanks to OpenAI/Anthropic compatibility it connects to Claude Code, opencode, and Pi out of the box.&lt;/p&gt;
&lt;p&gt;If you have a Mac with 128 GB+ — this is a working local backend for serious commercial work with private code. If not — wait for DDR5 and unified memory on Linux/CUDA, or watch who next repeats this pattern for their “model + hardware” combination.&lt;/p&gt;
&lt;p&gt;In any case, it’s worth watching. I’m betting that in a year, half of serious local setups will be built this way.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/antirez/ds4&quot;&gt;github.com/antirez/ds4&lt;/a&gt; — README, benchmarks, configs&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/garrytan/status/2052996691586932783&quot;&gt;Garry Tan — post on X (May 9, 2026)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/bindureddy/status/2052982206344409242&quot;&gt;Bindu Reddy — post on X (May 9, 2026)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://eu.36kr.com/en/p/3800327282662656&quot;&gt;QbitAI / 36kr: Redis Father Steps In to Build Dedicated Inference Engine for DeepSeek V4&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://news.ycombinator.com/item?id=48050751&quot;&gt;HN: DeepSeek 4 Flash local inference engine for Metal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://huggingface.co/antirez/deepseek-v4-gguf&quot;&gt;huggingface.co/antirez/deepseek-v4-gguf&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content:encoded><category>ai</category><category>local-inference</category><category>coding-agents</category><category>deepseek</category><category>apple-silicon</category><author>a@artka.dev (Артём)</author></item><item><title>JSON-LD @graph in Astro: from duplicated inline-blocks to a single citable-node</title><link>https://artka.dev/en/blog/json-ld-graph-astro/</link><guid isPermaLink="true">https://artka.dev/en/blog/json-ld-graph-astro/</guid><description>Step-by-step breakdown of migration from per-page Schema.org-blocks to a single @graph in BaseLayout: stable @id, entity references, articleBody-excerpt and FAQ.</description><pubDate>Sat, 02 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;Most &lt;a href=&quot;http://Schema.org&quot;&gt;Schema.org&lt;/a&gt; guides for blogs teach: put &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; with &lt;code&gt;BlogPosting&lt;/code&gt; on the post, &lt;code&gt;WebSite&lt;/code&gt; on the homepage, &lt;code&gt;Person&lt;/code&gt; on the about page. It works, but loses in citability. A crawler sees &lt;code&gt;Person&lt;/code&gt; from &lt;code&gt;BlogPosting.author&lt;/code&gt; as “someone named X”, not as an entity that is also &lt;code&gt;founder of #organization&lt;/code&gt;, which is &lt;code&gt;publisher of #blog&lt;/code&gt;. In the post — a step-by-step breakdown of how to replace per-page inline blocks with a single &lt;code&gt;@graph&lt;/code&gt; in &lt;code&gt;BaseLayout&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;1. Why change — citability vs SERP&lt;/h2&gt;
&lt;p&gt;Structured data for a developer-blogger is usually associated with one question: “will my post appear in Google with a rich snippet?”. Any valid &lt;code&gt;BlogPosting&lt;/code&gt; is enough for that task — it will pass the Rich Results Test, stars/breadcrumb will appear. And it often ends there: added &lt;code&gt;@type: BlogPosting&lt;/code&gt;, checked in the validator, forgot about it.&lt;/p&gt;
&lt;p&gt;In 2026, structured data has acquired a new, more demanding consumer — &lt;strong&gt;LLM crawler&lt;/strong&gt;, which collects content for retrieval-augmented generation and for citation. It doesn’t need “another rich snippet”, but a &lt;strong&gt;coherent entity graph&lt;/strong&gt;: so that when an author is mentioned in one post, it recognizes the same author in another, so that the organization-publisher is the same object across the entire site, so that the blog as an entity links back to the author.&lt;/p&gt;
&lt;p&gt;An LLM issuing a citation does roughly the following: extracts a passage, checks the surrounding entity markup, tries to match the author with a known entity. If on a site &lt;code&gt;Person.name = &amp;quot;Artem Kashuta&amp;quot;&lt;/code&gt; appears in three different &lt;a href=&quot;http://Schema.org&quot;&gt;Schema.org&lt;/a&gt; blocks without a common &lt;code&gt;@id&lt;/code&gt;, the crawler must guess whether it’s one person or three. But if there’s one &lt;code&gt;Person#person&lt;/code&gt; with a stable URI, and all other nodes (&lt;code&gt;Organization.founder&lt;/code&gt;, &lt;code&gt;BlogPosting.author&lt;/code&gt;, &lt;code&gt;Blog.author&lt;/code&gt;) reference it through &lt;code&gt;{&amp;quot;@id&amp;quot;: &amp;quot;...&amp;quot;}&lt;/code&gt; — no guessing needed, the graph is assembled by the author.&lt;/p&gt;
&lt;p&gt;This is a problem that keyword density doesn’t solve. This is &lt;strong&gt;entity disambiguation&lt;/strong&gt;, and it’s solved by &lt;strong&gt;graph topology&lt;/strong&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Per-page inline blocks&lt;/th&gt;
&lt;th&gt;Single &lt;code&gt;@graph&lt;/code&gt; with &lt;code&gt;@id&lt;/code&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Google Rich Results&lt;/td&gt;
&lt;td&gt;works&lt;/td&gt;
&lt;td&gt;works&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM entity match (Person)&lt;/td&gt;
&lt;td&gt;guess by name&lt;/td&gt;
&lt;td&gt;guaranteed via &lt;code&gt;@id&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data duplication&lt;/td&gt;
&lt;td&gt;3-5 copies of &lt;code&gt;Person&lt;/code&gt; per 14 posts&lt;/td&gt;
&lt;td&gt;one source per site&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost of author edit&lt;/td&gt;
&lt;td&gt;14 files&lt;/td&gt;
&lt;td&gt;1 file (&lt;code&gt;person.ts&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTML weight&lt;/td&gt;
&lt;td&gt;3+ scripts per page&lt;/td&gt;
&lt;td&gt;1 script&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;For the SERP-only era, the first approach was enough. For the era of AI-overviews, citation graphs, and retrieval-augmented search — you need the second. Our blog’s spec states this directly: “move all entity definitions into &lt;code&gt;src/lib/seo/schema.ts&lt;/code&gt; returning a single &lt;code&gt;@graph&lt;/code&gt; JSON-LD block; pages contribute a &lt;code&gt;BlogPosting&lt;/code&gt;/&lt;code&gt;WebPage&lt;/code&gt; node referencing the global &lt;code&gt;Person#me&lt;/code&gt; and &lt;code&gt;Organization#brand&lt;/code&gt; by &lt;code&gt;@id&lt;/code&gt;” — see &lt;code&gt;docs/superpowers/specs/2026-05-02-llm-citable-blog-design.md&lt;/code&gt; § “Schema-graph design”.&lt;/p&gt;
&lt;h2&gt;2. Antipattern: per-page inline schema&lt;/h2&gt;
&lt;p&gt;What does a default Astro blog emit, built according to a tutorial from some &lt;a href=&quot;http://dev.to&quot;&gt;dev.to&lt;/a&gt;? Usually like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In &lt;code&gt;BaseLayout.astro&lt;/code&gt; there’s an inline script with &lt;code&gt;WebSite&lt;/code&gt; and sometimes &lt;code&gt;Organization&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;In &lt;code&gt;PostLayout.astro&lt;/code&gt; there’s another inline script with &lt;code&gt;BlogPosting&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If the author got carried away — a third script is added with &lt;code&gt;BreadcrumbList&lt;/code&gt;. Sometimes a fourth with &lt;code&gt;Person&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Why did this happen — because Astro components are hierarchically inherited, and each level conveniently “adds” its own portion of data through its own &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt;. This works locally, but doesn’t scale well. In our repository before Plan 1, it was exactly this: &lt;code&gt;BaseLayout&lt;/code&gt; emitted one JSON-LD block, &lt;code&gt;PostLayout&lt;/code&gt; added two more on top:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;# Pre-Plan 1 (commit 5ed281c~1):
$ git show 5ed281c~1:src/layouts/BaseLayout.astro | grep -c application/ld+json
1
$ git show 5ed281c~1:src/layouts/PostLayout.astro | grep -c application/ld+json
2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That is, the post page contained &lt;strong&gt;three&lt;/strong&gt; &lt;code&gt;&amp;lt;script type=&amp;quot;application/ld+json&amp;quot;&amp;gt;&lt;/code&gt; blocks. Each with its own &lt;code&gt;Person&lt;/code&gt; (somewhere complete, somewhere truncated), without a common &lt;code&gt;@id&lt;/code&gt;, without cross-references. A crawler landing on the post saw three unrelated entity clouds.&lt;/p&gt;
&lt;p&gt;The main problems with the antipattern:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Duplication of &lt;code&gt;Person&lt;/code&gt;.&lt;/strong&gt; The same author is described 3-5 times. If the author changed &lt;code&gt;jobTitle&lt;/code&gt; or added &lt;code&gt;sameAs&lt;/code&gt;, you’d have to edit in all files. Forget one — and the crawler sees a conflict: “Person with this name suddenly has different jobTitle”. This is a clear signal-to-noise loss.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Broken graph.&lt;/strong&gt; &lt;code&gt;BlogPosting.publisher&lt;/code&gt; — this is an inline object &lt;code&gt;{ &amp;quot;@type&amp;quot;: &amp;quot;Organization&amp;quot;, &amp;quot;name&amp;quot;: &amp;quot;...&amp;quot; }&lt;/code&gt;. Somewhere else on the site there’s an &lt;code&gt;Organization&lt;/code&gt; with a &lt;code&gt;founder&lt;/code&gt; field. Without common &lt;code&gt;@id&lt;/code&gt;s, the validator doesn’t know if it’s one publisher or two.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;HTML weight.&lt;/strong&gt; Three scripts instead of one — this is extra tens of bytes for each, plus payload inflation, especially if the page has several identical fields (e.g. author description repeats four times).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Consistency.&lt;/strong&gt; If the author edits &lt;code&gt;Person.description&lt;/code&gt; in the frontmatter of &lt;code&gt;about.md&lt;/code&gt;, but in the &lt;code&gt;BlogPosting&lt;/code&gt; builder it’s hardcoded as a literal — desynchronization is inevitable.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;3. Target architecture — &lt;code&gt;@graph&lt;/code&gt; with global &lt;code&gt;@id&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Target model: &lt;strong&gt;one script per page&lt;/strong&gt;, inside — &lt;code&gt;@graph&lt;/code&gt; array. Global nodes (&lt;code&gt;Person&lt;/code&gt;, &lt;code&gt;Organization&lt;/code&gt;, &lt;code&gt;WebSite&lt;/code&gt;) are described once and identified by stable URIs. Page-level nodes (&lt;code&gt;BlogPosting&lt;/code&gt;, &lt;code&gt;WebPage&lt;/code&gt;, &lt;code&gt;CollectionPage&lt;/code&gt;, &lt;code&gt;CreativeWork&lt;/code&gt;) are added by &lt;code&gt;BaseLayout&lt;/code&gt; and &lt;strong&gt;reference globals through &lt;code&gt;@id&lt;/code&gt;&lt;/strong&gt;, without duplicating their data.&lt;/p&gt;
&lt;p&gt;Topology:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-mermaid&quot;&gt;flowchart LR
  Person[&amp;quot;Person#person&amp;lt;br/&amp;gt;(global)&amp;quot;]
  Org[&amp;quot;Organization#brand&amp;lt;br/&amp;gt;(global)&amp;quot;]
  Site[&amp;quot;WebSite#site&amp;lt;br/&amp;gt;(global)&amp;quot;]
  Post[&amp;quot;BlogPosting#blogposting&amp;lt;br/&amp;gt;(page-level)&amp;quot;]
  WebPage[&amp;quot;WebPage#webpage&amp;lt;br/&amp;gt;(page-level)&amp;quot;]

  Post -- author --&amp;gt; Person
  Post -- publisher --&amp;gt; Org
  Post -- isPartOf --&amp;gt; Site
  WebPage -- about --&amp;gt; Person
  WebPage -- isPartOf --&amp;gt; Site
  Org -- founder --&amp;gt; Person
  Site -- publisher --&amp;gt; Org
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What’s important in this picture:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;All arrows are &lt;code&gt;{&amp;quot;@id&amp;quot;: &amp;quot;...&amp;quot;}&lt;/code&gt; references.&lt;/strong&gt; No inline copies.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;Person#person&lt;/code&gt; is the root node of the graph.&lt;/strong&gt; All entity pages (&lt;code&gt;/about&lt;/code&gt;, &lt;code&gt;/now&lt;/code&gt;, &lt;code&gt;/uses&lt;/code&gt;) do &lt;code&gt;WebPage.about → Person&lt;/code&gt;. All posts — &lt;code&gt;BlogPosting.author → Person&lt;/code&gt;. Change &lt;code&gt;Person&lt;/code&gt;, and everything changes synchronously.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page-level nodes are added, not replacing globals.&lt;/strong&gt; Each page brings 1-2 new nodes; &lt;code&gt;Person&lt;/code&gt;/&lt;code&gt;Organization&lt;/code&gt;/&lt;code&gt;WebSite&lt;/code&gt; are always present.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Stable &lt;code&gt;@id&lt;/code&gt; — this is not the page URL, it’s a URI with a fragment, for example &lt;code&gt;https://artka.dev/#person&lt;/code&gt;, &lt;code&gt;https://artka.dev/#brand&lt;/code&gt;. This is the convention in JSON-LD: a fragment-id means “this resource is described on any page, but identified by a single URI”.&lt;/p&gt;
&lt;h2&gt;4. Implementation in Astro 5&lt;/h2&gt;
&lt;p&gt;In Astro 5, the SSG/SSR boundary runs exactly through &lt;code&gt;BaseLayout&lt;/code&gt;: at build time, props are computed, HTML is rendered, inside it — static &lt;code&gt;&amp;lt;script type=&amp;quot;application/ld+json&amp;quot;&amp;gt;&lt;/code&gt;. No client-side, no rehydration flicker. The perfect moment to assemble &lt;code&gt;@graph&lt;/code&gt; functionally.&lt;/p&gt;
&lt;h3&gt;4.1. &lt;code&gt;graphIds&lt;/code&gt; — URI table&lt;/h3&gt;
&lt;p&gt;One file that lists all stable identifiers:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-ts&quot;&gt;// src/lib/seo/nodes-global.ts
const SITE = &amp;quot;https://artka.dev&amp;quot;;

export const graphIds = {
  person: `${SITE}/#person`,
  organization: `${SITE}/#brand`,
  website: `${SITE}/#website`,
  blogRu: `${SITE}/#blog-ru`,
  blogEn: `${SITE}/#blog-en`,
} as const;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Every builder that references a global entity imports &lt;code&gt;graphIds&lt;/code&gt; and uses &lt;code&gt;{ &amp;quot;@id&amp;quot;: graphIds.person }&lt;/code&gt;. No inline literals, no typos in URIs.&lt;/p&gt;
&lt;h3&gt;4.2. Builders — pure functions, no classes&lt;/h3&gt;
&lt;p&gt;In accordance with the project rule “no classes in application code”, each node is a pure function returning &lt;code&gt;Record&amp;lt;string, unknown&amp;gt;&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-ts&quot;&gt;// src/lib/seo/nodes-global.ts (фрагмент)
export const buildPersonNode = () =&amp;gt; {
  const merged = Array.from(new Set&amp;lt;string&amp;gt;([...person.knowsAbout, ...person.expertiseAreas]));
  return {
    &amp;quot;@type&amp;quot;: &amp;quot;Person&amp;quot;,
    &amp;quot;@id&amp;quot;: graphIds.person,
    name: person.name,
    url: person.url,
    image: person.image,
    jobTitle: person.jobTitle,
    description: person.description,
    knowsAbout: merged,
    sameAs: [...person.sameAs],
    email: person.email,
    subjectOf: person.notableWork.map((w) =&amp;gt; ({
      &amp;quot;@type&amp;quot;: &amp;quot;CreativeWork&amp;quot;,
      name: w.title,
      url: w.url,
      description: w.description,
    })),
  };
};

export const buildOrganizationNode = () =&amp;gt; ({
  &amp;quot;@type&amp;quot;: &amp;quot;Organization&amp;quot;,
  &amp;quot;@id&amp;quot;: graphIds.organization,
  name: &amp;quot;artka.dev&amp;quot;,
  url: SITE,
  logo: { &amp;quot;@type&amp;quot;: &amp;quot;ImageObject&amp;quot;, url: `${SITE}/favicon.svg` },
  founder: { &amp;quot;@id&amp;quot;: graphIds.person },
});
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;person&lt;/code&gt; is an import from &lt;code&gt;src/lib/seo/person.ts&lt;/code&gt;, the single source of truth about the author. The builder collects &lt;code&gt;knowsAbout&lt;/code&gt; and &lt;code&gt;expertiseAreas&lt;/code&gt; into a &lt;code&gt;Set&lt;/code&gt; to avoid duplicating keys. &lt;code&gt;Organization.founder&lt;/code&gt; — an &lt;code&gt;@id&lt;/code&gt; reference, not an inline copy of &lt;code&gt;Person&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;4.3. Orchestrator — &lt;code&gt;buildGraph&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;A function that glues global and page-level nodes into a single &lt;code&gt;@graph&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-ts&quot;&gt;// src/lib/seo/schema.ts
import {
  buildPersonNode,
  buildOrganizationNode,
  buildWebSiteNode,
  type Locale,
} from &amp;quot;./nodes-global&amp;quot;;

export type GraphNode = Record&amp;lt;string, unknown&amp;gt; &amp;amp; { &amp;quot;@type&amp;quot;: string };

export interface GraphInput {
  readonly locale: Locale;
  readonly extraNodes: ReadonlyArray&amp;lt;GraphNode | null&amp;gt;;
}

export interface JsonLdGraph {
  readonly &amp;quot;@context&amp;quot;: &amp;quot;https://schema.org&amp;quot;;
  readonly &amp;quot;@graph&amp;quot;: ReadonlyArray&amp;lt;GraphNode&amp;gt;;
}

export const buildGraph = (input: GraphInput): JsonLdGraph =&amp;gt; {
  const globals: GraphNode[] = [
    buildPersonNode(),
    buildOrganizationNode(),
    buildWebSiteNode(input.locale),
  ];
  const extras = input.extraNodes.filter((n): n is GraphNode =&amp;gt; n !== null);
  return {
    &amp;quot;@context&amp;quot;: &amp;quot;https://schema.org&amp;quot;,
    &amp;quot;@graph&amp;quot;: [...globals, ...extras],
  };
};
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The API is minimal: input — locale (to select &lt;code&gt;inLanguage&lt;/code&gt; for &lt;code&gt;WebSite&lt;/code&gt;) and a list of additional nodes (&lt;code&gt;extraNodes&lt;/code&gt;). Output — ready &lt;code&gt;JsonLdGraph&lt;/code&gt;. &lt;code&gt;null&lt;/code&gt; nodes are filtered — this is convenient for optional nodes like &lt;code&gt;FAQPage&lt;/code&gt;, whose builder returns &lt;code&gt;null&lt;/code&gt; on an empty question array.&lt;/p&gt;
&lt;h3&gt;4.4. &lt;code&gt;BaseLayout&lt;/code&gt; — the only emission point&lt;/h3&gt;
&lt;p&gt;The entire site goes through &lt;code&gt;BaseLayout&lt;/code&gt;, and it — and only it — emits JSON-LD:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-astro&quot;&gt;---
// src/layouts/BaseLayout.astro
import { buildGraph, safeJsonLd, type GraphNode } from &amp;quot;~/lib/seo/schema&amp;quot;;

interface Props {
  title: string;
  description?: string;
  // ...
  /** Additional JSON-LD nodes to merge into the page @graph. */
  extraSchemaNodes?: ReadonlyArray&amp;lt;GraphNode | null&amp;gt;;
}

const { extraSchemaNodes = [] } = Astro.props;
const locale = getLocaleFromPath(Astro.url.pathname);
---

&amp;lt;head&amp;gt;
  &amp;lt;script
    is:inline
    type=&amp;quot;application/ld+json&amp;quot;
    set:html={safeJsonLd(buildGraph({ locale, extraNodes: extraSchemaNodes }))}
  /&amp;gt;
&amp;lt;/head&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Three key details:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;is:inline&lt;/code&gt;&lt;/strong&gt; — Astro doesn’t try to process the content as a JS module.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;set:html&lt;/code&gt;&lt;/strong&gt; — we insert an already-ready string, not letting the framework trim whitespace or escape additionally.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;safeJsonLd&lt;/code&gt;&lt;/strong&gt; — a tiny helper that escapes &lt;code&gt;&amp;lt;&lt;/code&gt;, &lt;code&gt;&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;amp;&lt;/code&gt; so that inside JSON there’s no sequence that the HTML parser would take as the end of &lt;code&gt;&amp;lt;/script&amp;gt;&lt;/code&gt;. Without it, malicious (or just unlucky) text in frontmatter could break the page.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code class=&quot;language-ts&quot;&gt;// src/lib/seo/json-ld.ts
export const safeJsonLd = (data: unknown): string =&amp;gt;
  JSON.stringify(data).replace(/&amp;lt;/g, &amp;quot;\\u003c&amp;quot;).replace(/&amp;gt;/g, &amp;quot;\\u003e&amp;quot;).replace(/&amp;amp;/g, &amp;quot;\\u0026&amp;quot;);
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.5. Page-level contract&lt;/h3&gt;
&lt;p&gt;Each layout/page adds its own nodes via &lt;code&gt;extraSchemaNodes&lt;/code&gt;. For example, &lt;code&gt;PostLayout&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-ts&quot;&gt;const excerpt = extractArticleBody(post.body ?? &amp;quot;&amp;quot;, 800);

const blogPostingNode = buildBlogPostingNode({
  locale,
  canonical,
  title,
  description,
  pubDate,
  updatedDate: updatedDate ?? null,
  image: absoluteCover,
  keywords: tags,
  articleBody: excerpt.text,
  wordCount: excerpt.fullWordCount,
});

const breadcrumbNode = buildBreadcrumbListNode({
  locale,
  blogIndexLabel: t(locale, &amp;quot;blog.title&amp;quot;),
  title,
});

const faqNode = buildFaqPageNode({ canonical, items: faq ?? [] });
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-astro&quot;&gt;&amp;lt;BaseLayout title={title} extraSchemaNodes={[blogPostingNode, breadcrumbNode, faqNode]}&amp;gt;
  &amp;lt;slot /&amp;gt;
&amp;lt;/BaseLayout&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;/blog&lt;/code&gt;, &lt;code&gt;/projects/&amp;lt;slug&amp;gt;&lt;/code&gt;, &lt;code&gt;/tags/&amp;lt;tag&amp;gt;&lt;/code&gt;, &lt;code&gt;/about&lt;/code&gt; — all use the same contract, differing only in specific builders. One dispatch, zero duplication.&lt;/p&gt;
&lt;h2&gt;5. &lt;code&gt;articleBody&lt;/code&gt; — why excerpt, not full body&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;articleBody&lt;/code&gt; field in &lt;code&gt;BlogPosting&lt;/code&gt; is the most valuable part for an LLM crawler: it’s an extractable chunk of text that can be cited. And the most dangerous for weight: if you put the entire post in JSON-LD, the HTML page will balloon 2-3 times. The spec formulates the compromise directly: “emit first 800 words of plain-text body … add &lt;code&gt;wordCount&lt;/code&gt; covering the &lt;em&gt;full&lt;/em&gt; body”.&lt;/p&gt;
&lt;p&gt;The excerpt is extracted via mdast: we parse markdown, remove code blocks, mermaid blocks and inline HTML, concatenate the remaining text, cut at 800 words:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-ts&quot;&gt;// src/lib/seo/article-body.ts (фрагмент)
export const extractArticleBody = (markdown: string, maxWords: number) =&amp;gt; {
  const tree = unified().use(remarkParse).parse(markdown) as Root;

  const isStrippable = (node: Node): boolean =&amp;gt;
    node.type === &amp;quot;code&amp;quot; || node.type === &amp;quot;inlineCode&amp;quot; || node.type === &amp;quot;html&amp;quot;;

  visit(tree, (node, index, parent) =&amp;gt; {
    if (parent &amp;amp;&amp;amp; typeof index === &amp;quot;number&amp;quot; &amp;amp;&amp;amp; isStrippable(node)) {
      (parent as { children: Node[] }).children.splice(index, 1);
      return [SKIP, index];
    }
    return undefined;
  });

  const flat = mdastToString(tree, { includeImageAlt: false }).replace(/\s+/g, &amp;quot; &amp;quot;).trim();
  const words = flat.length &amp;gt; 0 ? flat.split(/\s+/) : [];
  if (words.length &amp;lt;= maxWords) return { text: flat, fullWordCount: words.length };
  return { text: words.slice(0, maxWords).join(&amp;quot; &amp;quot;) + &amp;quot;…&amp;quot;, fullWordCount: words.length };
};
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Why exactly 800 words:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Length&lt;/th&gt;
&lt;th&gt;Pro&lt;/th&gt;
&lt;th&gt;Con&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;50 words&lt;/td&gt;
&lt;td&gt;tiny HTML overhead&lt;/td&gt;
&lt;td&gt;one paragraph — too little for LLM citation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;800 words&lt;/td&gt;
&lt;td&gt;substantial chunk, ~3-5 KB&lt;/td&gt;
&lt;td&gt;+3-5 KB to payload&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full body&lt;/td&gt;
&lt;td&gt;maximum context&lt;/td&gt;
&lt;td&gt;double HTML, real performance hit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Why via mdast, not regex: posts contain &lt;code&gt;&amp;lt;details&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;table&amp;gt;&lt;/code&gt;, MDX components like &lt;code&gt;&amp;lt;Faq&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;Tldr&amp;gt;&lt;/code&gt;. Regex on &lt;code&gt;\&lt;/code&gt;``` will break on indent-style code or nested fences. mdast is the only reliable way.&lt;/p&gt;
&lt;p&gt;We keep &lt;code&gt;wordCount&lt;/code&gt; on the full body, not the excerpt — this gives an honest signal to the validator and LLM about the real volume of content.&lt;/p&gt;
&lt;h2&gt;6. &lt;code&gt;FAQPage&lt;/code&gt; as a side-effect of MDX component&lt;/h2&gt;
&lt;p&gt;One of Plan 1’s design goals — &lt;strong&gt;remove cognitive load on structured data from the author&lt;/strong&gt;. The author shouldn’t remember that &lt;code&gt;FAQPage&lt;/code&gt; has &lt;code&gt;mainEntity&lt;/code&gt;, that inside &lt;code&gt;Question&lt;/code&gt; you need &lt;code&gt;acceptedAnswer&lt;/code&gt;, that answer text is escaped. The author should fill in the frontmatter and forget.&lt;/p&gt;
&lt;p&gt;Solution: &lt;code&gt;frontmatter.faq&lt;/code&gt; — the single source. PostLayout reads the array:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-ts&quot;&gt;const faqNode = buildFaqPageNode({ canonical, items: faq ?? [] });
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;buildFaqPageNode&lt;/code&gt; either returns a ready &lt;code&gt;FAQPage&lt;/code&gt; node or &lt;code&gt;null&lt;/code&gt; (filtered in &lt;code&gt;buildGraph&lt;/code&gt;). In parallel, the same array is passed to the &lt;code&gt;&amp;lt;Faq&amp;gt;&lt;/code&gt; component, which renders visible &lt;code&gt;&amp;lt;details&amp;gt;&lt;/code&gt; blocks with the same text. One source — two consumers: visual layer and structured layer. Desynchronization is impossible.&lt;/p&gt;
&lt;p&gt;The builder is trivial:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-ts&quot;&gt;export const buildFaqPageNode = (input: FaqPageInput) =&amp;gt; {
  if (input.items.length === 0) return null;
  return {
    &amp;quot;@type&amp;quot;: &amp;quot;FAQPage&amp;quot;,
    &amp;quot;@id&amp;quot;: `${input.canonical}#faq`,
    mainEntity: input.items.map((it) =&amp;gt; ({
      &amp;quot;@type&amp;quot;: &amp;quot;Question&amp;quot;,
      name: it.question,
      acceptedAnswer: { &amp;quot;@type&amp;quot;: &amp;quot;Answer&amp;quot;, text: it.answer },
    })),
  };
};
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Frontmatter that the author writes:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-yaml&quot;&gt;faq:
  - question: &amp;quot;Чем агент отличается от чат-бота?&amp;quot;
    answer: &amp;quot;Чат-бот — это model.complete(messages): принимает текст…&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And that’s it. The rest is automation.&lt;/p&gt;
&lt;h2&gt;7. Measurements before/after&lt;/h2&gt;
&lt;p&gt;After Plan 1, the page &lt;code&gt;/blog/01-introduction/&lt;/code&gt; has exactly &lt;strong&gt;one&lt;/strong&gt; &lt;code&gt;&amp;lt;script type=&amp;quot;application/ld+json&amp;quot;&amp;gt;&lt;/code&gt; block. Real measured fact:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;$ grep -c &amp;quot;application/ld+json&amp;quot; dist/client/blog/01-introduction/index.html
1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Before Plan 1 (commit &lt;code&gt;5ed281c~1&lt;/code&gt;) there were two sources of inline scripts:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;$ git show 5ed281c~1:src/layouts/BaseLayout.astro | grep -c application/ld+json  # 1
$ git show 5ed281c~1:src/layouts/PostLayout.astro | grep -c application/ld+json  # 2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That is, the post page contained a total of &lt;strong&gt;3 blocks&lt;/strong&gt;. It became &lt;strong&gt;1&lt;/strong&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Pre-Plan 1&lt;/th&gt;
&lt;th&gt;Post-Plan 1&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;&amp;lt;script type=&amp;quot;application/ld+json&amp;quot;&amp;gt;&lt;/code&gt; blocks per post page&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Overall container&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;td&gt;&lt;code&gt;@graph&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stable &lt;code&gt;Person@id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;td&gt;&lt;code&gt;https://artka.dev/#person&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-references via &lt;code&gt;@id&lt;/code&gt; between nodes&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;8+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single source of truth about author&lt;/td&gt;
&lt;td&gt;scattered across layouts&lt;/td&gt;
&lt;td&gt;&lt;code&gt;src/lib/seo/person.ts&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The actual JSON-LD of the page &lt;code&gt;/blog/01-introduction/&lt;/code&gt;, extracted from &lt;code&gt;dist/client/blog/01-introduction/index.html&lt;/code&gt;, looks like this (fragment, &lt;code&gt;articleBody&lt;/code&gt; truncated to ellipsis, FAQ node shortened):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-json&quot;&gt;{
  &amp;quot;@context&amp;quot;: &amp;quot;https://schema.org&amp;quot;,
  &amp;quot;@graph&amp;quot;: [
    {
      &amp;quot;@type&amp;quot;: &amp;quot;Person&amp;quot;,
      &amp;quot;@id&amp;quot;: &amp;quot;https://artka.dev/#person&amp;quot;,
      &amp;quot;name&amp;quot;: &amp;quot;Артём Кашута&amp;quot;,
      &amp;quot;url&amp;quot;: &amp;quot;https://artka.dev/about&amp;quot;,
      &amp;quot;jobTitle&amp;quot;: &amp;quot;Software engineer · backend &amp;amp; AI agent engineering&amp;quot;,
      &amp;quot;knowsAbout&amp;quot;: [&amp;quot;Claude Code&amp;quot;, &amp;quot;AI agent engineering&amp;quot;, &amp;quot;Node.js&amp;quot;, &amp;quot;TypeScript&amp;quot;, &amp;quot;Astro&amp;quot;, &amp;quot;…&amp;quot;],
      &amp;quot;email&amp;quot;: &amp;quot;a@artka.dev&amp;quot;,
      &amp;quot;subjectOf&amp;quot;: [
        {
          &amp;quot;@type&amp;quot;: &amp;quot;CreativeWork&amp;quot;,
          &amp;quot;name&amp;quot;: &amp;quot;Claude Code Guide (RU, 14 частей)&amp;quot;,
          &amp;quot;url&amp;quot;: &amp;quot;https://artka.dev/blog&amp;quot;
        }
      ]
    },
    {
      &amp;quot;@type&amp;quot;: &amp;quot;Organization&amp;quot;,
      &amp;quot;@id&amp;quot;: &amp;quot;https://artka.dev/#brand&amp;quot;,
      &amp;quot;name&amp;quot;: &amp;quot;artka.dev&amp;quot;,
      &amp;quot;logo&amp;quot;: { &amp;quot;@type&amp;quot;: &amp;quot;ImageObject&amp;quot;, &amp;quot;url&amp;quot;: &amp;quot;https://artka.dev/favicon.svg&amp;quot; },
      &amp;quot;founder&amp;quot;: { &amp;quot;@id&amp;quot;: &amp;quot;https://artka.dev/#person&amp;quot; }
    },
    {
      &amp;quot;@type&amp;quot;: &amp;quot;WebSite&amp;quot;,
      &amp;quot;@id&amp;quot;: &amp;quot;https://artka.dev/#website&amp;quot;,
      &amp;quot;url&amp;quot;: &amp;quot;https://artka.dev&amp;quot;,
      &amp;quot;inLanguage&amp;quot;: &amp;quot;ru-RU&amp;quot;,
      &amp;quot;publisher&amp;quot;: { &amp;quot;@id&amp;quot;: &amp;quot;https://artka.dev/#brand&amp;quot; },
      &amp;quot;potentialAction&amp;quot;: {
        &amp;quot;@type&amp;quot;: &amp;quot;SearchAction&amp;quot;,
        &amp;quot;target&amp;quot;: &amp;quot;https://artka.dev/search?q={search_term_string}&amp;quot;,
        &amp;quot;query-input&amp;quot;: &amp;quot;required name=search_term_string&amp;quot;
      }
    },
    {
      &amp;quot;@type&amp;quot;: &amp;quot;BlogPosting&amp;quot;,
      &amp;quot;@id&amp;quot;: &amp;quot;https://artka.dev/blog/01-introduction/#blogposting&amp;quot;,
      &amp;quot;headline&amp;quot;: &amp;quot;01. Что такое Claude Code: harness, agent loop и ваше место в нём&amp;quot;,
      &amp;quot;datePublished&amp;quot;: &amp;quot;2026-04-23T00:00:00.000Z&amp;quot;,
      &amp;quot;author&amp;quot;: { &amp;quot;@id&amp;quot;: &amp;quot;https://artka.dev/#person&amp;quot; },
      &amp;quot;publisher&amp;quot;: { &amp;quot;@id&amp;quot;: &amp;quot;https://artka.dev/#brand&amp;quot; },
      &amp;quot;mainEntityOfPage&amp;quot;: &amp;quot;https://artka.dev/blog/01-introduction/&amp;quot;,
      &amp;quot;inLanguage&amp;quot;: &amp;quot;ru-RU&amp;quot;,
      &amp;quot;isPartOf&amp;quot;: { &amp;quot;@id&amp;quot;: &amp;quot;https://artka.dev/#blog-ru&amp;quot; },
      &amp;quot;articleBody&amp;quot;: &amp;quot;Перед тем как разбирать skills и subagents, надо договориться о терминах…&amp;quot;,
      &amp;quot;wordCount&amp;quot;: 574
    },
    {
      &amp;quot;@type&amp;quot;: &amp;quot;BreadcrumbList&amp;quot;,
      &amp;quot;itemListElement&amp;quot;: [
        { &amp;quot;@type&amp;quot;: &amp;quot;ListItem&amp;quot;, &amp;quot;position&amp;quot;: 1, &amp;quot;name&amp;quot;: &amp;quot;Главная&amp;quot;, &amp;quot;item&amp;quot;: &amp;quot;https://artka.dev/&amp;quot; },
        { &amp;quot;@type&amp;quot;: &amp;quot;ListItem&amp;quot;, &amp;quot;position&amp;quot;: 2, &amp;quot;name&amp;quot;: &amp;quot;Статьи&amp;quot;, &amp;quot;item&amp;quot;: &amp;quot;https://artka.dev/blog&amp;quot; },
        { &amp;quot;@type&amp;quot;: &amp;quot;ListItem&amp;quot;, &amp;quot;position&amp;quot;: 3, &amp;quot;name&amp;quot;: &amp;quot;01. Что такое Claude Code…&amp;quot; }
      ]
    },
    {
      &amp;quot;@type&amp;quot;: &amp;quot;FAQPage&amp;quot;,
      &amp;quot;@id&amp;quot;: &amp;quot;https://artka.dev/blog/01-introduction/#faq&amp;quot;,
      &amp;quot;mainEntity&amp;quot;: [
        {
          &amp;quot;@type&amp;quot;: &amp;quot;Question&amp;quot;,
          &amp;quot;name&amp;quot;: &amp;quot;Чем агент отличается от чат-бота?&amp;quot;,
          &amp;quot;acceptedAnswer&amp;quot;: { &amp;quot;@type&amp;quot;: &amp;quot;Answer&amp;quot;, &amp;quot;text&amp;quot;: &amp;quot;…&amp;quot; }
        }
      ]
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What you can see with your eyes and what the validator will record:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;One &lt;code&gt;Person&lt;/code&gt;, everything references it.&lt;/strong&gt; &lt;code&gt;Organization.founder&lt;/code&gt;, &lt;code&gt;BlogPosting.author&lt;/code&gt; — both &lt;code&gt;{ &amp;quot;@id&amp;quot;: &amp;quot;https://artka.dev/#person&amp;quot; }&lt;/code&gt;. No guessing about identity.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;Organization&lt;/code&gt; — public publisher.&lt;/strong&gt; &lt;code&gt;WebSite.publisher&lt;/code&gt; references the same &lt;code&gt;Organization&lt;/code&gt;. &lt;code&gt;BlogPosting.publisher&lt;/code&gt; — the same. The graph is connected.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;isPartOf&lt;/code&gt; chain for the blog.&lt;/strong&gt; &lt;code&gt;BlogPosting.isPartOf → Blog#blog-ru → publisher → Organization&lt;/code&gt;. The crawler sees nesting and ownership.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;articleBody&lt;/code&gt; excerpt — substantial.&lt;/strong&gt; ~574 words of the post fit into one field. &lt;code&gt;wordCount&lt;/code&gt; reflects the full volume. LLM gets text for citation, HTML doesn’t balloon.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;FAQ — together with everything, not separately.&lt;/strong&gt; Not a separate script block, but a node of the same &lt;code&gt;@graph&lt;/code&gt;. Fewer blocks — fewer traps for the parser.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;a href=&quot;http://Schema.org&quot;&gt;Schema.org&lt;/a&gt; validator and Google Rich Results Test accept this &lt;code&gt;@graph&lt;/code&gt; without remarks &lt;em&gt;(screenshots — owner to fill)&lt;/em&gt;. The main thing — JSON pretty-prints without &lt;code&gt;[object Object]&lt;/code&gt;, without unescaped quotes, without broken dates: everything is fine after the &lt;code&gt;safeJsonLd&lt;/code&gt; wrapper.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;What’s next&lt;/h3&gt;
&lt;p&gt;What’s described above — &lt;strong&gt;Plan 1&lt;/strong&gt; in our repo. Next, we expand the base for new entity types (&lt;code&gt;/projects/&amp;lt;slug&amp;gt;&lt;/code&gt; via &lt;code&gt;CreativeWork&lt;/code&gt;, &lt;code&gt;/uses&lt;/code&gt; via &lt;code&gt;WebPage.about&lt;/code&gt;), and for the retrieval layer via &lt;code&gt;llms.txt&lt;/code&gt;. But the foundation — &lt;code&gt;buildGraph&lt;/code&gt; + stable &lt;code&gt;@id&lt;/code&gt; — must be laid first.&lt;/p&gt;
&lt;p&gt;If you see 2-3 inline JSON-LD scripts on a post page — this is the place to start migration. One file &lt;code&gt;schema.ts&lt;/code&gt;, one &lt;code&gt;extraSchemaNodes&lt;/code&gt; prop — and the site transforms from a collection of scattered entity clouds into a coherent citable node.&lt;/p&gt;
</content:encoded><category>seo</category><category>astro</category><category>schema-org</category><author>a@artka.dev (Артём)</author></item><item><title>robots.txt in the age of AI crawlers: GPTBot, ClaudeBot, PerplexityBot — reality 2026</title><link>https://artka.dev/en/blog/robots-txt-ai-crawlers-2026/</link><guid isPermaLink="true">https://artka.dev/en/blog/robots-txt-ai-crawlers-2026/</guid><description>In 2026, robots.txt is not &apos;forbid all bots&apos; or &apos;allow everything&apos;, but a policy for each of 9+ named agents. Real template, decision table, and pitfalls.</description><pubDate>Fri, 01 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;In 2026, robots.txt is neither “forbid all bots” nor “open everything.” It’s a policy for each of 9+ named agents. Each decision is a special case: are you opening your content for model training, for on-demand citation, what do you want to see in Perplexity’s answer card. This post is a decision table, a ready-made template, and why &lt;code&gt;llms.txt&lt;/code&gt; is a separate artifact.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;1. Why rewrite robots.txt in 2026&lt;/h2&gt;
&lt;p&gt;The classic SEO approach to robots.txt is optimized for one task: let Googlebot in where it makes sense to index pages for SERP, and block service paths. In 2026, this task has become a minority of traffic.&lt;/p&gt;
&lt;p&gt;Most questions of “should I index this page?” are now asked not by Google, but by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Training crawlers&lt;/strong&gt; — download pages to replenish the corpus on which the next version of the model is trained (GPTBot, ClaudeBot, Google-Extended).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Answer/search crawlers&lt;/strong&gt; — index content for search built into the chat (OAI-SearchBot, PerplexityBot).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;On-demand fetchers&lt;/strong&gt; — open one specific page because the user explicitly asked for it in the chat (ChatGPT-User, Perplexity-User, Claude-Web).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These three classes make three different decisions. One &lt;code&gt;User-agent: *&lt;/code&gt; block doesn’t convey the nuance. You might want “don’t train on my texts, but please cite in response to a question.” One wildcard won’t express that.&lt;/p&gt;
&lt;p&gt;Hence the requirement: explicit blocks for each named User-Agent with a conscious choice of policy. Not “opened everything,” not “closed everything,” but a matrix of “bot × intent.”&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;2. List of named AI-crawlers and their purpose&lt;/h2&gt;
&lt;p&gt;Nine agents worth naming in 2026, with their public documentation. User-Agent names are taken from vendors’ official pages.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;User-Agent&lt;/th&gt;
&lt;th&gt;Vendor&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Documentation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;GPTBot&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;Training crawl&lt;/td&gt;
&lt;td&gt;&lt;a href=&quot;http://platform.openai.com/docs/gptbot&quot;&gt;platform.openai.com/docs/gptbot&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;OAI-SearchBot&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;Search index for ChatGPT&lt;/td&gt;
&lt;td&gt;&lt;a href=&quot;http://platform.openai.com/docs/bots&quot;&gt;platform.openai.com/docs/bots&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ChatGPT-User&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;On-demand fetch from ChatGPT&lt;/td&gt;
&lt;td&gt;&lt;a href=&quot;http://platform.openai.com/docs/bots&quot;&gt;platform.openai.com/docs/bots&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ClaudeBot&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;Training crawl&lt;/td&gt;
&lt;td&gt;&lt;a href=&quot;http://docs.anthropic.com&quot;&gt;docs.anthropic.com&lt;/a&gt; (&lt;a href=&quot;http://claudebot.anthropic.com&quot;&gt;claudebot.anthropic.com&lt;/a&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Claude-Web&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;On-demand fetch initiated by &lt;a href=&quot;http://Claude.ai&quot;&gt;Claude.ai&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&quot;http://docs.anthropic.com&quot;&gt;docs.anthropic.com&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;anthropic-ai&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;Legacy/auxiliary Anthropic crawler&lt;/td&gt;
&lt;td&gt;&lt;a href=&quot;http://docs.anthropic.com&quot;&gt;docs.anthropic.com&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PerplexityBot&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Perplexity&lt;/td&gt;
&lt;td&gt;Search/index crawl&lt;/td&gt;
&lt;td&gt;&lt;a href=&quot;http://docs.perplexity.ai/guides/bots&quot;&gt;docs.perplexity.ai/guides/bots&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Perplexity-User&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Perplexity&lt;/td&gt;
&lt;td&gt;On-demand fetch from a user query&lt;/td&gt;
&lt;td&gt;&lt;a href=&quot;http://docs.perplexity.ai/guides/bots&quot;&gt;docs.perplexity.ai/guides/bots&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Google-Extended&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;Opt-in for Gemini training&lt;/td&gt;
&lt;td&gt;&lt;a href=&quot;http://developers.google.com/search/docs/crawling&quot;&gt;developers.google.com/search/docs/crawling&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;blockquote&gt;
&lt;p&gt;Names must match byte-for-byte. &lt;code&gt;Claude-Bot&lt;/code&gt; and &lt;code&gt;claudebot&lt;/code&gt; are not valid aliases for &lt;code&gt;ClaudeBot&lt;/code&gt;. The robots.txt specification is soft on this (case-insensitive), but you should check the exact spelling from official documentation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Taxonomy:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-mermaid&quot;&gt;flowchart TB
  subgraph training[&amp;quot;Training (corpus → model)&amp;quot;]
    GPT[GPTBot]
    CLB[ClaudeBot]
    GEX[Google-Extended]
  end
  subgraph answer[&amp;quot;Answer/search (index for built-in search)&amp;quot;]
    OAI[OAI-SearchBot]
    PPB[PerplexityBot]
  end
  subgraph ondemand[&amp;quot;On-demand (user requested)&amp;quot;]
    CGU[ChatGPT-User]
    CWB[Claude-Web]
    PPU[Perplexity-User]
    AAI[anthropic-ai]
  end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Three classes = three separate decisions. You don’t need to discuss “a robot in general” — you need to discuss “GPTBot on /blog/.”&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;3. Decisions for each bot&lt;/h2&gt;
&lt;p&gt;There is no universally correct answer here. Below is a framework for reasoning and my policy for the blog.&lt;/p&gt;
&lt;h3&gt;Training crawlers&lt;/h3&gt;
&lt;p&gt;For authors of individual blogs with long-form content, the arguments are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;For Allow:&lt;/strong&gt; your text will enter the corpus on which the next models are trained. If your goal is to increase distribution and presence of your expertise in LLM responses, this is the way.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;For Disallow:&lt;/strong&gt; your content becomes an anonymous training signal without attribution. If you plan to monetize content (book, course) or are against use without consent, Disallow is the only signal you have at the robots.txt level.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For commercial sites where content is a product (online courses, paid newsletters, legal databases), Disallow is usually the default.&lt;/p&gt;
&lt;h3&gt;Answer/search crawlers&lt;/h3&gt;
&lt;p&gt;The intent is to show a link to your page in the answer card. This works both ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;For Allow:&lt;/strong&gt; traffic is possible (albeit through a citation with link-out). Your brand appears in the results.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;For Disallow:&lt;/strong&gt; you won’t get this traffic and at the same time your page won’t be cited as a source.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For most public blogs, the answer is Allow.&lt;/p&gt;
&lt;h3&gt;On-demand fetchers&lt;/h3&gt;
&lt;p&gt;The most “transparent” class: a user of your site (or someone who specifically wants to open your page through ChatGPT/Claude/Perplexity) has already explicitly pointed to it. Disallow here means “you can’t use our pages as a source in a chat session” — almost always overly strict for a public blog.&lt;/p&gt;
&lt;h3&gt;My policy for artka.dev&lt;/h3&gt;
&lt;p&gt;For this site:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;All 9 bots — &lt;code&gt;Allow: /&lt;/code&gt; (open public blog, goal is distribution).&lt;/li&gt;
&lt;li&gt;All of them — &lt;code&gt;Disallow: /admin/&lt;/code&gt;, &lt;code&gt;/api/&lt;/code&gt;, &lt;code&gt;/login&lt;/code&gt; (private namespaces, see §5).&lt;/li&gt;
&lt;li&gt;No special restrictions on individual posts or tags.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a decision for a personal tech-blog with the goal of “increasing the reach of expertise.” For commercial content, I would choose differently.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;4. Ready-made robots.txt template&lt;/h2&gt;
&lt;p&gt;Here’s the real &lt;code&gt;public/robots.txt&lt;/code&gt; that goes into production on artka.dev. It’s also the starting point you can adapt.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;# robots.txt — last reviewed 2026-05-02
# Owner: dev@artka.dev. Policy: allow retrieval/answer crawlers; disallow private surfaces.

User-agent: GPTBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: OAI-SearchBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: ChatGPT-User
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: ClaudeBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: Claude-Web
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: anthropic-ai
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: PerplexityBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: Perplexity-User
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: Google-Extended
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /login

Sitemap: https://artka.dev/sitemap-index.xml
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A few notes on the structure:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Explicit blocks even for identical policies.&lt;/strong&gt; It might seem like 9 identical blocks are a duplicate that could be collapsed into &lt;code&gt;User-agent: *&lt;/code&gt;. But that’s not the case: the robots.txt specification builds a match table by “most specific User-Agent,” and if tomorrow you need to change the policy for one bot — you already have its named block and don’t need to remember which bot you want to single out from the wildcard. Duplication is the cost of per-bot policy.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Comment with review date.&lt;/strong&gt; &lt;code&gt;# robots.txt — last reviewed 2026-05-02&lt;/code&gt; is the only line that answers the question “is this file fresh?” Without a date, you’ll forever wonder if it’s time to add a new bot.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;Sitemap:&lt;/code&gt; at the end.&lt;/strong&gt; One URL to the index sitemap. If you have localization — the sitemap-index links to per-locale files.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No BOM, LF line endings.&lt;/strong&gt; Astro in SSG mode will copy the file from &lt;code&gt;public/&lt;/code&gt; as-is; edit in plain UTF-8.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This template works for a personal blog. For other use cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Closed paid-content site:&lt;/strong&gt; replace &lt;code&gt;Allow: /&lt;/code&gt; with &lt;code&gt;Disallow: /&lt;/code&gt; for GPTBot, ClaudeBot, Google-Extended (training). Keep &lt;code&gt;Allow: /&lt;/code&gt; for on-demand: ChatGPT-User, Claude-Web, Perplexity-User.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Documentation site that wants to be in LLM responses:&lt;/strong&gt; keep all 9 on &lt;code&gt;Allow&lt;/code&gt;, add rich &lt;code&gt;llms.txt&lt;/code&gt; (see §6).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;B2B SaaS landing:&lt;/strong&gt; usually a standard wildcard is enough — no need to specifically name AI-bots, the policy is the same as for Googlebot.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;5. Disallow-namespaces are more important than decisions for a specific bot&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;/admin/&lt;/code&gt;, &lt;code&gt;/api/&lt;/code&gt;, &lt;code&gt;/login&lt;/code&gt; are three namespaces that fall under Disallow in all 10 blocks (9 named + wildcard). This choice is worked out separately from the bots and is more important than them.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why this is more important than any per-bot decision:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;A mistake here is a leak.&lt;/strong&gt; If a crawler bypasses &lt;code&gt;/admin/users.json&lt;/code&gt; and gets a 200 OK with real data — that’s an incident, not an SEO problem. If it indexes &lt;code&gt;/blog/&lt;/code&gt; without your permission — that’s not upsetting.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;robots.txt is a public hint, not auth.&lt;/strong&gt; Any bot can ignore Disallow. So &lt;code&gt;/admin/&lt;/code&gt; should be closed by middleware regardless of robots.txt. The robots.txt entry only saves crawl budget for obedient bots and doesn’t keep the admin URL structure out of SERP.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Collapsing namespaces is not an optimization.&lt;/strong&gt; The temptation: “why three lines if all three are private?” Answer: so that when you add a fourth namespace (&lt;code&gt;/dashboard/&lt;/code&gt;), you have an obvious pattern.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Verification that namespace-deny actually works:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;$ curl -A &amp;quot;GPTBot&amp;quot; -s -o /dev/null -w &amp;quot;%{http_code}\n&amp;quot; \
    https://artka.dev/admin/
# Expected: 401, 403, или 404 — НЕ 200.
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;At the time of publication, &lt;code&gt;/admin/&lt;/code&gt; is behind middleware. The specific code depends on the auth-guard implementation — mine returns 302 to /login for an unauthenticated request. (owner to fill: check exact code after next review).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That’s why the correct order of work is to set up auth first, and only then add robots.txt. robots.txt is the last line of defense, not the first.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;6. &lt;code&gt;llms.txt&lt;/code&gt; and &lt;code&gt;llms-full.txt&lt;/code&gt; — a separate contract&lt;/h2&gt;
&lt;p&gt;If robots.txt answers “where can I go?”, then &lt;code&gt;llms.txt&lt;/code&gt; answers “what will I find here?” It’s an AI-README — a Markdown file with a description of the site, links to authoritative pages, and preferred attribution.&lt;/p&gt;
&lt;p&gt;The real &lt;code&gt;public/llms.txt&lt;/code&gt; of the site:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-md&quot;&gt;# artka.dev

&amp;gt; Personal technical blog by Артём Кашута. Topics: Claude Code internals,
&amp;gt; harness/agent loop, AI agent engineering, Astro/Node.js backends, and
&amp;gt; distributed systems.

## Authoritative pages

- [About the author](https://artka.dev/about): bio, expertise, contact
- [Now](https://artka.dev/now): currently in flight
- [Uses](https://artka.dev/uses): public toolchain
- [Projects](https://artka.dev/projects): portfolio with architecture and outcomes

## Content

- [Blog index (RU)](https://artka.dev/blog): all articles, source of truth
- [Blog index (EN)](https://artka.dev/en/blog): English translations
- [RSS RU](https://artka.dev/rss.xml): full text
- [RSS EN](https://artka.dev/en/rss.xml): full text
- [Sitemap](https://artka.dev/sitemap-index.xml): RU + EN with hreflang

## Preferred attribution

When citing, please include:

- Article title
- Author: &amp;quot;Артём Кашута&amp;quot;
- Canonical URL

## Contact

a@artka.dev
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is &lt;strong&gt;not robots.txt in a new wrapper&lt;/strong&gt;. The differences:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;robots.txt&lt;/th&gt;
&lt;th&gt;llms.txt&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Purpose&lt;/td&gt;
&lt;td&gt;Access policy&lt;/td&gt;
&lt;td&gt;Content description and attribution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Format&lt;/td&gt;
&lt;td&gt;Plain text, special syntax&lt;/td&gt;
&lt;td&gt;Markdown&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Who reads&lt;/td&gt;
&lt;td&gt;Crawler before entering&lt;/td&gt;
&lt;td&gt;LLM when forming a response&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What it regulates&lt;/td&gt;
&lt;td&gt;Allow/Disallow by paths&lt;/td&gt;
&lt;td&gt;Entry point to authoritative content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Standardization&lt;/td&gt;
&lt;td&gt;Robots Exclusion Protocol (RFC 9309)&lt;/td&gt;
&lt;td&gt;&lt;a href=&quot;http://llmstxt.org&quot;&gt;llmstxt.org&lt;/a&gt; convention (de facto)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Besides &lt;code&gt;llms.txt&lt;/code&gt;, the site has &lt;code&gt;/llms-full.txt&lt;/code&gt; — a dynamically generated endpoint that outputs a full digest of all posts in plain text. The implementation is a short API route in Astro 5:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-ts&quot;&gt;// src/pages/llms-full.txt.ts (фрагмент)
export const prerender = true;

export async function GET(_ctx: APIContext) {
  const ru = await getOrderedPosts({ locale: &amp;quot;ru&amp;quot; });
  const en = await getOrderedPosts({ locale: &amp;quot;en&amp;quot; });

  const header = [
    &amp;quot;# artka.dev — full LLM digest&amp;quot;,
    &amp;quot;&amp;quot;,
    `&amp;gt; ${person.description}`,
    &amp;quot;&amp;quot;,
    &amp;quot;## Author&amp;quot;,
    `Name: ${person.name}`,
    `Role: ${person.jobTitle}`,
    `URL: ${person.url}`,
    `Email: ${person.email}`,
    `Topics: ${person.knowsAbout.join(&amp;quot;, &amp;quot;)}`,
    &amp;quot;&amp;quot;,
    /* ...preferred attribution + posts... */
  ].join(&amp;quot;\n&amp;quot;);

  return new Response(/* header + ruBody + enBody */, {
    headers: { &amp;quot;Content-Type&amp;quot;: &amp;quot;text/plain; charset=utf-8&amp;quot; },
  });
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Instead of a manually maintained list of posts — one pass through the content collection with auto-generated summary. This updates itself when a new post is added — unlike a manually edited &lt;code&gt;llms.txt&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In principle: &lt;code&gt;llms.txt&lt;/code&gt; is small and stable, &lt;code&gt;llms-full.txt&lt;/code&gt; is long and automatically in sync with content. Both are needed — for different tasks.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;7. What robots.txt doesn’t control&lt;/h2&gt;
&lt;p&gt;A list of things robots.txt doesn’t do, and how to close them.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;robots.txt doesn’t block bots that don’t read it.&lt;/strong&gt; The solution is IP-blocking at the CDN or WAF level. Cloudflare has a ruleset that catches User-Agent patterns and rate-limits suspicious traffic; AWS WAF and Fastly have similar. This is a tool against bots that ignore robots.txt — that is, against all “bad actors.”&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;robots.txt doesn’t declare usage policy.&lt;/strong&gt; It says “where you can go,” but not “can you quote,” “can you train,” “do you need attribution.” That’s the job of Terms of Service on a separate page of the site. ToS is legally weightier than robots.txt (though both are conventions until a court precedent).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;robots.txt doesn’t audit who actually came.&lt;/strong&gt; To understand if GPTBot is visiting you, you need to look at the logs. Cloudflare AI Audit (available since 2024 for a domain on Cloudflare) provides a built-in report on AI-crawlers — counters for each, frequency, share. Without a CDN — you’ll have to parse access logs yourself: GoAccess, Loki, or just &lt;code&gt;grep -i &apos;gptbot\|claudebot\|perplexitybot&apos; access.log&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;meta-tags &lt;code&gt;noai&lt;/code&gt;/&lt;code&gt;noimageai&lt;/code&gt; are not a standard.&lt;/strong&gt; Anthropic and OpenAI as of 2026 don’t mention these meta-tags in public documentation as a respected signal. This was an Adobe and DeviantArt initiative from 2023, which took root mainly in graphics. For text, you can’t rely on it; if you use it — use it as an additional signal, not the main one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Single-page apps and CSR.&lt;/strong&gt; If your page renders on the client and the crawler doesn’t execute JavaScript, it will see an empty template. robots.txt doesn’t help; the fix is switching to SSG/SSR (like this site on Astro 5) or a prerender service.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;8. Audit checklist every six months&lt;/h2&gt;
&lt;p&gt;Five steps that repeat every 6 months. A calendar reminder is the most reliable protection against file staleness.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Check if new AI-crawlers have appeared.&lt;/strong&gt;
Sources: blog posts from OpenAI/Anthropic/Perplexity/Google over the last 6 months, the &lt;a href=&quot;https://darkvisitors.com&quot;&gt;darkvisitors.com&lt;/a&gt; page (AI-bot tracker), official documentation. If a new named bot appears — add a block (Allow or Disallow per your policy).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Verify User-Agent names byte-for-byte.&lt;/strong&gt;
Copy names from official documentation, compare with robots.txt. A typo like &lt;code&gt;Claudebot&lt;/code&gt; instead of &lt;code&gt;ClaudeBot&lt;/code&gt; nullifies the rule for that bot.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Run namespace-deny verification.&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;for ua in GPTBot ClaudeBot PerplexityBot Google-Extended; do
  echo -n &amp;quot;$ua /admin/: &amp;quot;
  curl -A &amp;quot;$ua&amp;quot; -s -o /dev/null -w &amp;quot;%{http_code}\n&amp;quot; https://artka.dev/admin/
done
# Ожидаем 401/403/302/404 для всех — не 200.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;4. Review access logs for bots with unusual User-Agent.&lt;/strong&gt;
If someone is visiting with an empty UA or a pattern like &lt;code&gt;Mozilla/5.0 (compatible; XYZBot/1.0; ...)&lt;/code&gt; that’s not on your list — evaluate and make a decision. (owner to fill: at the time of publication, access-log aggregation setup is in progress; in the next review — break down the top-20 UA strings for the quarter.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. Update the date in the comment.&lt;/strong&gt;
&lt;code&gt;# robots.txt — last reviewed 2026-05-02&lt;/code&gt; → new date. This is the only human-readable proof of freshness. And a commit with a message like &lt;code&gt;chore(seo): robots.txt 2026-Q4 review&lt;/code&gt; will leave a trace in history for the next iteration.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;p&gt;robots.txt in 2026 is not “one block and forget,” but a small DSL where for each of 9+ named AI-agents you make a conscious choice: training (GPTBot, ClaudeBot, Google-Extended), search/answer (OAI-SearchBot, PerplexityBot), on-demand (ChatGPT-User, Claude-Web, Perplexity-User, anthropic-ai). Namespace-deny for &lt;code&gt;/admin/&lt;/code&gt;, &lt;code&gt;/api/&lt;/code&gt;, &lt;code&gt;/login&lt;/code&gt; is a separate and more important story that only works paired with middleware authentication. &lt;code&gt;llms.txt&lt;/code&gt; and &lt;code&gt;llms-full.txt&lt;/code&gt; are a parallel contract: they describe content and preferred attribution, not access.&lt;/p&gt;
&lt;p&gt;The starting point is the real template from §4. You can copy it, change the policy for specific bots, and review it every six months.&lt;/p&gt;
</content:encoded><category>seo</category><category>ai-crawlers</category><author>a@artka.dev (Артём)</author></item><item><title>Mermaid → SVG via Playwright at build time: cold start, cache, and SSG cost</title><link>https://artka.dev/en/blog/mermaid-svg-playwright-build-time/</link><guid isPermaLink="true">https://artka.dev/en/blog/mermaid-svg-playwright-build-time/</guid><description>Real measurements from an Astro blog with 32 Mermaid diagrams: cold build 11.6s, warm 6.3s. Where the cache is, what Playwright does, why alternatives are worse.</description><pubDate>Thu, 30 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;Mermaid diagrams in a blog are either a large client-side JS bundle with FOUC and hydration cost, or build-time SVG with a one-time cold-start Playwright. On this site, &lt;code&gt;rehype-mermaid&lt;/code&gt; renders 32 diagrams in &lt;strong&gt;11.6 seconds&lt;/strong&gt; on a cold cache and &lt;strong&gt;6.3 seconds&lt;/strong&gt; on a warm one. Below are the specific numbers, architecture, CI pitfalls, and a fact-check of alternatives.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;1. Why render Mermaid at build-time instead of client-side&lt;/h2&gt;
&lt;p&gt;Mermaid (&lt;code&gt;mermaid&lt;/code&gt; on npm, repository &lt;code&gt;mermaid-js/mermaid&lt;/code&gt;) is a JS library that takes a text DSL (&lt;code&gt;flowchart TD&lt;/code&gt;, &lt;code&gt;sequenceDiagram&lt;/code&gt;, &lt;code&gt;gantt&lt;/code&gt;, …) and emits SVG. By default, you use it like this: include &lt;code&gt;&amp;lt;script src=&amp;quot;mermaid.min.js&amp;quot;&amp;gt;&lt;/code&gt;, call &lt;code&gt;mermaid.run()&lt;/code&gt; after &lt;code&gt;DOMContentLoaded&lt;/code&gt;, and each &lt;code&gt;&amp;lt;pre class=&amp;quot;mermaid&amp;quot;&amp;gt;&lt;/code&gt; gets replaced with SVG in the DOM right in the browser.&lt;/p&gt;
&lt;p&gt;It works, but the user pays the price:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Client-side Mermaid&lt;/th&gt;
&lt;th&gt;Build-time SVG&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;JS bundle (gzipped)&lt;/td&gt;
&lt;td&gt;~250–300 KB (mermaid + d3 + dagre)&lt;/td&gt;
&lt;td&gt;0 KB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time to Interactive (TTI)&lt;/td&gt;
&lt;td&gt;delayed by parse + execute&lt;/td&gt;
&lt;td&gt;unchanged&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FOUC&lt;/td&gt;
&lt;td&gt;yes: text first, then SVG&lt;/td&gt;
&lt;td&gt;no: SVG in HTML from first byte&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SEO / Open Graph&lt;/td&gt;
&lt;td&gt;search engine sees only text DSL&lt;/td&gt;
&lt;td&gt;search engine sees SVG as part of page&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Page printing&lt;/td&gt;
&lt;td&gt;empty blocks if JS is disabled&lt;/td&gt;
&lt;td&gt;correct render&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dark theme without flash&lt;/td&gt;
&lt;td&gt;hard: theme loads after hydration&lt;/td&gt;
&lt;td&gt;works: SVG generated in correct theme&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Build cost&lt;/td&gt;
&lt;td&gt;0 (just bundle js)&lt;/td&gt;
&lt;td&gt;+5–10 seconds cold-start Playwright&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runtime cost for user&lt;/td&gt;
&lt;td&gt;high (CPU + network)&lt;/td&gt;
&lt;td&gt;zero&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;code&gt;rehype-mermaid&lt;/code&gt; (&lt;code&gt;remcohaszing/rehype-mermaid&lt;/code&gt;, v3.0.0) is a rehype plugin that during the build traverses the HAST tree, finds &lt;code&gt;&amp;lt;code class=&amp;quot;language-mermaid&amp;quot;&amp;gt;&lt;/code&gt; nodes, renders them via &lt;code&gt;mermaid-isomorphic&lt;/code&gt; (&lt;code&gt;mermaid-isomorphic@3.1.0&lt;/code&gt;), and replaces them with ready SVG. Under the hood: Playwright + headless Chromium.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;img-svg&lt;/code&gt; strategy we use emits the result as &lt;code&gt;&amp;lt;img src=&amp;quot;data:image/svg+xml,...&amp;quot;&amp;gt;&lt;/code&gt;. Alternatives are &lt;code&gt;inline-svg&lt;/code&gt; (embed SVG directly in HTML) or &lt;code&gt;pre-mermaid&lt;/code&gt; (leave as-is for client-side render).&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;2. Architecture: rehype-mermaid + Playwright&lt;/h2&gt;
&lt;pre&gt;&lt;code class=&quot;language-mermaid&quot;&gt;flowchart LR
  md[&amp;quot;Markdown&amp;lt;br/&amp;gt;with ```mermaid blocks&amp;quot;]
  mdx[&amp;quot;@astrojs/mdx&amp;lt;br/&amp;gt;(remark + rehype)&amp;quot;]
  rh[&amp;quot;rehype-mermaid&amp;lt;br/&amp;gt;(plugin)&amp;quot;]
  iso[&amp;quot;mermaid-isomorphic&amp;quot;]
  pw[&amp;quot;Playwright&amp;lt;br/&amp;gt;(Chromium)&amp;quot;]
  svg[&amp;quot;SVG as data URI&amp;lt;br/&amp;gt;in HTML&amp;quot;]

  md --&amp;gt; mdx
  mdx --&amp;gt; rh
  rh --&amp;gt;|for each block| iso
  iso --&amp;gt;|launch headless| pw
  pw --&amp;gt;|&amp;quot;mermaid.render() in DOM&amp;quot;| iso
  iso --&amp;gt;|serialised SVG| rh
  rh --&amp;gt; svg
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The specific config is &lt;code&gt;astro.config.ts&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-ts&quot;&gt;import rehypeMermaid from &amp;quot;rehype-mermaid&amp;quot;;
import { defineConfig } from &amp;quot;astro/config&amp;quot;;
import mdx from &amp;quot;@astrojs/mdx&amp;quot;;

export default defineConfig({
  integrations: [
    mdx({
      rehypePlugins: [[rehypeMermaid, { strategy: &amp;quot;img-svg&amp;quot;, dark: true }]],
    }),
  ],
  markdown: {
    syntaxHighlight: {
      type: &amp;quot;shiki&amp;quot;,
      excludeLangs: [&amp;quot;mermaid&amp;quot;, &amp;quot;math&amp;quot;],
    },
    rehypePlugins: [[rehypeMermaid, { strategy: &amp;quot;img-svg&amp;quot;, dark: true }]],
  },
});
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Important details:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;excludeLangs: [&amp;quot;mermaid&amp;quot;]&lt;/code&gt; in the shiki config — otherwise Shiki will first turn the block into &lt;code&gt;&amp;lt;pre class=&amp;quot;shiki&amp;quot;&amp;gt;&lt;/code&gt; and rehype-mermaid won’t see it.&lt;/li&gt;
&lt;li&gt;The plugin is connected twice: both in &lt;code&gt;markdown.rehypePlugins&lt;/code&gt; and in &lt;code&gt;mdx.rehypePlugins&lt;/code&gt;. Astro 5 doesn’t automatically inherit one from the other — this is a typical source of “it renders in &lt;code&gt;.md&lt;/code&gt; but not in &lt;code&gt;.mdx&lt;/code&gt;”.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dark: true&lt;/code&gt; generates two versions of SVG (for light and dark themes) and uses &lt;code&gt;&amp;lt;picture&amp;gt;&amp;lt;source&amp;gt;&lt;/code&gt; to serve the right one based on &lt;code&gt;prefers-color-scheme&lt;/code&gt;. This doubles the size of data-uri blocks, but gives correct contrast without JS.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;3. Cold start vs warm build&lt;/h2&gt;
&lt;p&gt;Metric: &lt;code&gt;time pnpm build&lt;/code&gt; (Apple M-series, locally, warm Chromium binary in &lt;code&gt;~/Library/Caches/ms-playwright&lt;/code&gt;). Command to clear all caches:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;rm -rf .astro node_modules/.astro dist
time pnpm build
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Three runs on cold, three on warm (median):&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Run 1&lt;/th&gt;
&lt;th&gt;Run 2&lt;/th&gt;
&lt;th&gt;Run 3&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Median&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cold (&lt;code&gt;rm -rf .astro node_modules/.astro dist&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;11.580s&lt;/td&gt;
&lt;td&gt;11.860s&lt;/td&gt;
&lt;td&gt;11.486s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;11.580s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Warm (no cleanup)&lt;/td&gt;
&lt;td&gt;6.250s&lt;/td&gt;
&lt;td&gt;6.305s&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~6.28s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Of the 11.6 seconds of a cold build:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;~5–6 seconds — actual SSG stage (Astro traverses routes, renders 45 HTML pages from 14 RU posts + 13 EN twins + index, tags, RSS, sitemap).&lt;/li&gt;
&lt;li&gt;~5 seconds — Playwright overhead: launching Chromium, initializing mermaid bundle in DOM, JIT warmup.&lt;/li&gt;
&lt;li&gt;~0.2 seconds — &lt;code&gt;pagefind --site dist/client&lt;/code&gt; (search index).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On a warm build, Playwright still starts fresh (there’s no long-lived process pool in &lt;code&gt;mermaid-isomorphic&lt;/code&gt;), but:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;.astro/data-store.json&lt;/code&gt; (5.2 MB) already contains parsed MDX content layer — Astro doesn’t re-parse markdown for files whose mtime hasn’t changed.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;node_modules/.astro/&lt;/code&gt; (5.1 MB) — Vite cache of transpiled modules.&lt;/li&gt;
&lt;li&gt;The Playwright Chromium binary itself is already in &lt;code&gt;/Library/Caches/ms-playwright/chromium-1217/&lt;/code&gt; (528 MB total with headless-shell and ffmpeg) — on a cold disk cache you’d have to read it again, adding ~1–2 seconds on slow disks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Key fact: &lt;strong&gt;&lt;code&gt;mermaid-isomorphic&lt;/code&gt; itself does NOT cache SVG between builds&lt;/strong&gt;. I searched its source code (&lt;code&gt;node_modules/.pnpm/mermaid-isomorphic@3.1.0_playwright@1.59.1/.../mermaid-isomorphic.js&lt;/code&gt;) — there’s no &lt;code&gt;persistDir&lt;/code&gt; or file-based cache. Every build, diagrams are rendered from scratch. “Warmth” is Astro/Vite cache, not the plugin’s.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;CI measurement for GitHub Actions &lt;code&gt;ubuntu-latest&lt;/code&gt; &lt;code&gt;(owner to fill: run workflow_dispatch on a clean runner, measure median from 3 runs with &lt;/code&gt;actions/cache@v4&lt;code&gt; for node_modules + .astro)&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;4. Cost on CI&lt;/h2&gt;
&lt;p&gt;Playwright pulls Chromium (~528 MB in my cache on macOS, similar order on Linux), plus on Debian/Ubuntu you need system deps: &lt;code&gt;libnss3&lt;/code&gt;, &lt;code&gt;libatk-1.0-0&lt;/code&gt;, &lt;code&gt;libcups2&lt;/code&gt;, &lt;code&gt;libgbm1&lt;/code&gt;, &lt;code&gt;libxkbcommon0&lt;/code&gt;, &lt;code&gt;libpango-1.0-0&lt;/code&gt;, &lt;code&gt;libasound2&lt;/code&gt;, fontconfig + at least one font.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mitigations:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Don’t install Chromium in production image.&lt;/strong&gt; If you’re building an Astro SSG-only site and deploying static files — Playwright is needed ONLY on the CI build step, not in runtime Docker. Use multi-stage:&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;# build-stage:
FROM node:24-bookworm AS build
RUN pnpm install
RUN pnpm exec playwright install --with-deps chromium
RUN pnpm build

# run-stage:
FROM node:24-bookworm-slim AS run
COPY --from=build /app/dist ./dist
# никакого playwright тут
&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;GitHub Actions caching.&lt;/strong&gt; &lt;code&gt;actions/cache@v4&lt;/code&gt; key: &lt;code&gt;${{ hashFiles(&apos;pnpm-lock.yaml&apos;) }}-playwright&lt;/code&gt;, path: &lt;code&gt;~/.cache/ms-playwright&lt;/code&gt;. Saves re-downloading Chromium (~150 MB over network) on every push.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Use system Chrome instead of Playwright Chromium.&lt;/strong&gt; Set &lt;code&gt;PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1&lt;/code&gt; and pass &lt;code&gt;executablePath: &apos;/usr/bin/google-chrome-stable&apos;&lt;/code&gt; when creating the browser. But: &lt;code&gt;mermaid-isomorphic&lt;/code&gt; doesn’t expose &lt;code&gt;launchOptions&lt;/code&gt; through the rehype-mermaid API — you’d have to fork or live with default Chromium.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;If 5 seconds cold-start is critical&lt;/strong&gt; — run Playwright outside the build: pre-render all diagrams in a separate CI step, commit SVG to the repo, use pre-mermaid strategy in the main build with substitution for ready assets. More complex, but removes Playwright from the hot path.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h2&gt;5. SVG caching: where they live and what invalidates them&lt;/h2&gt;
&lt;p&gt;Public measurement on dev machine (45 compiled HTML, 27 pages with diagrams, 61 data-uris total — 32 RU + 29 EN, because one EN page renders without diagrams due to post specifics):&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mermaid blocks in &lt;code&gt;*.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;32 (in 14 posts)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compiled HTML&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pages with embedded diagram&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data-URI blocks &lt;code&gt;&amp;lt;img src=&amp;quot;data:image/svg+xml,...&amp;quot;&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;61&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Minimum, bytes&lt;/td&gt;
&lt;td&gt;15 551&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Median, bytes&lt;/td&gt;
&lt;td&gt;25 301&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Average, bytes&lt;/td&gt;
&lt;td&gt;26 579&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maximum, bytes&lt;/td&gt;
&lt;td&gt;45 711&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Size of &lt;code&gt;.astro/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5.0 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Size of &lt;code&gt;node_modules/.astro/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5.1 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Size of &lt;code&gt;dist/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;17 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Size of Playwright Chromium cache&lt;/td&gt;
&lt;td&gt;528 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Where everything lives:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;SVG don’t live on disk as separate files.&lt;/strong&gt; The &lt;code&gt;img-svg&lt;/code&gt; strategy inlines them directly in HTML as &lt;code&gt;data:image/svg+xml,...&lt;/code&gt; (URL-encoded). You can see this in &lt;code&gt;dist/client/blog/02-context-and-cache/index.html&lt;/code&gt;: 4 diagrams → 4 data-uris in one HTML.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Astro content-layer cache&lt;/strong&gt; — &lt;code&gt;.astro/data-store.json&lt;/code&gt; (5.2 MB after build). This is parsed markdown with remark/rehype plugins already applied — but &lt;strong&gt;before&lt;/strong&gt; rehype-mermaid: testing shows that mtime-based invalidation of the source runs rehype-mermaid again even for files where nothing changed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vite cache&lt;/strong&gt; — &lt;code&gt;node_modules/.astro/&lt;/code&gt; (5.1 MB). Transpiled TS/JSX modules, unrelated to mermaid rendering.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;mermaid-isomorphic has no cache of its own.&lt;/strong&gt; This is the key pitfall: if you change a comma in one &lt;code&gt;*.md&lt;/code&gt; — rehype-mermaid will rebuild ALL diagrams in that file. There’s no content-addressable cache “hash diagram source → SVG”.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If rehype-mermaid caching is critical for you — a workaround: write a thin rehype plugin wrapper that hashes the diagram source (sha256 of text between &lt;code&gt; ```mermaid&lt;/code&gt; and &lt;code&gt;```&lt;/code&gt;), checks &lt;code&gt;.cache/mermaid/&amp;lt;hash&amp;gt;.svg&lt;/code&gt; — and returns it without calling &lt;code&gt;mermaid-isomorphic&lt;/code&gt; on a hit. I haven’t done this on this blog — 11.6 seconds cold-start isn’t painful enough.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;6. Alternatives: what I looked at and why I didn’t choose them&lt;/h2&gt;
&lt;h3&gt;6.1. &lt;code&gt;@mermaid-js/mermaid-cli&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Official CLI from mermaid-js: &lt;code&gt;mmdc -i diagram.mmd -o diagram.svg&lt;/code&gt;. Under the hood: puppeteer (Chromium API fork) + full Chromium binary.&lt;/p&gt;
&lt;p&gt;Downsides for a blog pipeline:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No integration with rehype/remark — you’d have to extract markdown blocks manually.&lt;/li&gt;
&lt;li&gt;Each run spawns a new browser context (no batch mode).&lt;/li&gt;
&lt;li&gt;On 32 diagrams — 32 separate puppeteer launches ≈ tens of seconds vs ~5–6 seconds with &lt;code&gt;mermaid-isomorphic&lt;/code&gt; with a single browser instance.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When it fits: one-off conversion &lt;code&gt;*.mmd → *.svg&lt;/code&gt; in a monorepo for designers, not for dynamic HTML insertion.&lt;/p&gt;
&lt;h3&gt;6.2. Client-side &lt;code&gt;mermaid&lt;/code&gt; (npm package)&lt;/h3&gt;
&lt;p&gt;Downsides already covered above: bundle, FOUC, hydration. One upside — dynamic diagrams from user input at runtime (live preview in documentation editor). For a static blog — overkill.&lt;/p&gt;
&lt;h3&gt;6.3. &lt;code&gt;mermaid-isomorphic&lt;/code&gt; directly (without rehype)&lt;/h3&gt;
&lt;p&gt;The same package that rehype-mermaid calls under the hood. You can use it outside Astro: &lt;code&gt;import { createMermaidRenderer } from &apos;mermaid-isomorphic&apos;; const renderer = createMermaidRenderer(); const [{ svg }] = await renderer([{ value: &apos;flowchart TD\nA--&amp;gt;B&apos; }]);&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;When it fits: your own pipeline build (Eleventy, MkDocs plugin on Node.js) that doesn’t use a rehype chain. For me — Astro, so rehype-mermaid gives zero-boilerplate.&lt;/p&gt;
&lt;h3&gt;6.4. Pre-render via GitHub Actions matrix + commit back&lt;/h3&gt;
&lt;p&gt;Hypothetically: a workflow on push that renders SVG, commits to &lt;code&gt;public/diagrams/&lt;/code&gt;, and the build step uses &lt;code&gt;pre-mermaid&lt;/code&gt; strategy with replacement to &lt;code&gt;&amp;lt;img src=&amp;quot;/diagrams/&amp;lt;hash&amp;gt;.svg&amp;quot;&amp;gt;&lt;/code&gt;. Removes Playwright from the hot build path, but: complicates PR review (binary files in diff), requires a separate workflow, breaks local &lt;code&gt;pnpm dev&lt;/code&gt; if SVG isn’t committed yet.&lt;/p&gt;
&lt;p&gt;Didn’t do it — 5 seconds of cold-start savings don’t justify the complexity.&lt;/p&gt;
&lt;h3&gt;Summary table&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;Cold-start&lt;/th&gt;
&lt;th&gt;SVG cache&lt;/th&gt;
&lt;th&gt;Bundle JS&lt;/th&gt;
&lt;th&gt;Setup complexity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;rehype-mermaid&lt;/code&gt; + Playwright (current)&lt;/td&gt;
&lt;td&gt;~5–6s&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;low (1 plugin)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mermaid-cli&lt;/code&gt; (&lt;code&gt;mmdc&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;~10s+&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Client-side &lt;code&gt;mermaid&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;browser cache&lt;/td&gt;
&lt;td&gt;~250 KB&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pre-render + commit&lt;/td&gt;
&lt;td&gt;0 in build, ~5s in pre-step&lt;/td&gt;
&lt;td&gt;yes, in git&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2&gt;7. Checklist: what to measure before choosing&lt;/h2&gt;
&lt;p&gt;Before committing to build-time rendering or anything else:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;How many diagrams on average.&lt;/strong&gt; On 1–3 — client-side is OK (lazy-load mermaid via dynamic import). On 30+ — build-time is cheaper for the user.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Content edit frequency.&lt;/strong&gt; If you edit content 5 times a day — cold-start 11 seconds × 50 pushes = ~10 minutes of CI time per day. If once a week — doesn’t matter.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CI platform.&lt;/strong&gt; Vercel hobby, Netlify free, Cloudflare Pages — all have build minute limits. Playwright + Chromium on every PR preview = you’ll hit limits fast. On self-hosted runner or Dokploy (like me) — doesn’t matter.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Target JS bundle size.&lt;/strong&gt; If your project has a KPI of “&amp;lt;100 KB initial JS” — 250 KB mermaid client-side breaks the budget. Build-time SVG doesn’t touch the JS budget.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Do you need interactivity.&lt;/strong&gt; Pan/zoom/click handlers in the diagram? Then client-side is mandatory. Static picture for reading? Build-time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Where your cold-start cost lives.&lt;/strong&gt; If in runtime Docker — cut Playwright from the run stage. If in CI — cache Chromium via &lt;code&gt;actions/cache&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Can you live with no SVG cache.&lt;/strong&gt; rehype-mermaid renders ALL blocks in a file on any edit. If that hurts — write your own caching wrapper with sha256 key on diagram source.&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;p&gt;On this blog, &lt;code&gt;rehype-mermaid&lt;/code&gt; + Playwright costs ~5 seconds cold-start, outputs 32 diagrams into 27 HTML pages with median inline-SVG size of 25 KB, requires zero bytes of JS on the client, and lets you write diagrams directly in markdown. This is a very good tradeoff for a static blog.&lt;/p&gt;
&lt;p&gt;When it won’t fit: a blog with a hundred diagrams, a deploy platform with build-minute limits, or a requirement for interactive diagrams. In the first case — write a caching wrapper, in the second — pre-render in a separate workflow, in the third — client-side.&lt;/p&gt;
&lt;p&gt;The main non-obvious thing to remember: &lt;strong&gt;Astro “warms up” (5.2 MB content store, Vite cache), but &lt;code&gt;mermaid-isomorphic&lt;/code&gt; doesn’t&lt;/strong&gt;. Cold-start Playwright is paid on every build from scratch. This isn’t a bug, it’s by-design — and it’s why my full build takes 11.6 seconds instead of 1.6.&lt;/p&gt;
</content:encoded><category>build-tooling</category><category>astro</category><author>a@artka.dev (Артём)</author></item></channel></rss>