A Skill Is a Folder, Not a Prompt: What Anthropic Learned Running Hundreds of Them

TL;DR

Anthropic published lessons from using hundreds of Claude Code Skills across its engineering organization, framing Skills as reusable folders rather than saved prompts. The company says verification-focused Skills had the largest measured effect on output quality, though best practices are still developing.

Anthropic has published new lessons from using hundreds of Claude Code Skills across its engineering organization, arguing that the most useful agent instructions are not one-off prompts but versioned folders containing instructions, scripts, references, templates and guardrails.

The source post, “Lessons from building Claude Code: How we use skills” by Thariq Shihipar, describes Skills as discoverable folders that Claude Code can read and act on when a task matches their description. According to Anthropic, a Skill can include a SKILL.md file, reference material, scripts, assets, configuration, hooks and memory.

Anthropic says its internal Skills fell into nine broad categories: library and API reference, product verification, data fetching and analysis, business-process automation, code scaffolding and templates, code quality and review, CI/CD and deployment, runbooks, and infrastructure operations.

The company’s strongest claim is about verification Skills. According to Anthropic’s own measurement, Skills that check whether an agent’s work is correct had the largest impact on output quality. The company also says strong Skills often begin as a short instruction plus one hard-won caveat, then grow as teams add edge cases and reusable tools.

At a glance
reportWhen: published June 3, 2026; discussed July…
The developmentAnthropic published a June 3, 2026 Claude blog post describing what it learned from running hundreds of Claude Code Skills internally.
AI Dispatch · Insights · 1 July 2026

A Skill is a folder, not a prompt

Anthropic published what it learned running hundreds of Skills across its own engineering org. Read as a business memo, the point is bigger than a coding trick: this is how ad-hoc prompting becomes durable institutional capability — the SOPs your agents actually follow, versioned and shared.

✕ The misconception

“A Skill is just a clever markdown prompt you save in a file.”

✓ What it actually is

A folder the agent can discover, read & run — instructions, scripts, references, templates, config & on-demand hooks.

Anatomy of a Skill — the file system is context engineering
my-skill/the unit you share & version
├─ SKILL.mdroot instructions + a description written for the model (its trigger)
├─ references/deep detail pulled in only when needed — progressive disclosure
├─ scripts/real code, so the agent composes instead of rebuilding boilerplate
├─ assets/templates & files to copy into the output
├─ config.jsonsetup the agent asks for if it’s missing (e.g. which Slack channel)
└─ hooks + memoryon-demand guardrails + an append-only log so it remembers
Why it matters: the folder itself is the knowledge base. The agent reads the root, then reaches deeper only when the task demands it — the same way you’d hand a new hire a one-pager that points to the detailed docs.
The nine types — a gap-analysis map for your own library
1Library / API reference
2Product verification ★ top impact
3Data fetching & analysis
4Business-process automation
5Code scaffolding & templates
6Code quality & review
7CI/CD & deployment
8Runbooks
9Infrastructure operations
By Anthropic’s own measurement, verification Skills — the ones that check the work — moved output quality the most. If you build one category well, build that one.
The craft — what separates a good Skill from a useless one
Gotchas = highest-signal section Describe for the model, not humans (it’s the trigger) Don’t state the obvious Ship scripts, not just prose On-demand guardrail hooks (/careful, /freeze) Let it remember (log / SQLite) Don’t railroad — leave room to adapt
The take

The knowledge of how your organization actually operates can be captured, versioned, shared & executed — and the thing capturing it is a humble folder with a script and a gotchas list inside. For the builder, that’s context engineering with real tools attached. For whoever owns the budget, it’s the difference between AI that starts from zero every morning and an asset that compounds. Caveats: best practices are still evolving, checked-in Skills cost context, and curation beats accumulation. Start with one Skill, one gotcha, and the category that catches your mistakes.

Source: “Lessons from building Claude Code: How we use skills,” Thariq Shihipar (Anthropic), Claude blog, 3 June 2026. Categories, examples & measured claims are Anthropic’s; framing is the author’s. Docs: code.claude.com/docs/en/skills.
thorstenmeyerai.com

Why Skills Matter for Teams

The development matters because it reframes agent setup as institutional knowledge management, not prompt tinkering. If a Skill contains the way a team reviews code, verifies product behavior or handles releases, that knowledge can be shared, versioned and reused instead of retyped into each session.

For engineering leaders, the claim points to a practical question: whether AI agent quality improves more from buying stronger models or from giving agents better local operating knowledge. Anthropic’s account suggests that reusable procedures, scripts and checks can help reduce inconsistent output across teams.

The business case remains partly interpretive. Anthropic reports internal experience and measured gains for verification Skills, but it has not provided enough public detail in the source material to independently compare those gains across companies, codebases or agent setups.

From Scripting To Systems: A Practical Guide to Using AI Workflows That Save Time, Reduce Errors, and Make You the Go-To Tech Expert

From Scripting To Systems: A Practical Guide to Using AI Workflows That Save Time, Reduce Errors, and Make You the Go-To Tech Expert

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

From Prompting to Folders

The central correction in Anthropic’s write-up is definitional: a Skill is not just markdown. It is a folder that can hold the lightweight instruction the model sees first, then deeper references or scripts that are loaded only when needed.

That design supports progressive disclosure: the agent does not need every detail at once, but can reach for more specific material when the task requires it. In Anthropic’s framing, the folder itself becomes the knowledge base.

The July 1 commentary from Thorsten Meyer AI casts the finding as a business memo: ad-hoc prompting can become a durable asset when companies package repeated work into shared operating procedures that agents can follow.

“A Skill is a folder, not a prompt.”

— Thorsten Meyer AI

Mastering Codex for Parallel AI Agents: Run multiple AI agents at once and verify their work — a non-engineer's guide to supervising Codex (Codex Mastery Series Book 2)

Mastering Codex for Parallel AI Agents: Run multiple AI agents at once and verify their work — a non-engineer's guide to supervising Codex (Codex Mastery Series Book 2)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Limits of the Evidence

Several details remain unclear from the available source material. Anthropic does not provide enough public information here to show how each Skill category was measured, what baseline was used, or how results differed across teams and projects.

It is also unclear how much of Anthropic’s experience transfers to smaller organizations, non-engineering teams or companies without mature internal documentation. The source material says best practices are still evolving, and that checked-in Skills can carry context costs if teams accumulate them without curation.

Designing Instruction with Generative AI: 24/7 Support for Optimizing Teaching and Learning

Designing Instruction with Generative AI: 24/7 Support for Optimizing Teaching and Learning

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Teams Will Test Skill Libraries

The next step for companies using coding agents is likely to be smaller pilots: building one Skill around a repeatable workflow, especially a verification or review task, then measuring whether it improves output consistency.

Anthropic’s own guidance, as reflected in the source material, points toward curated libraries rather than large collections. Teams will need to decide which procedures deserve scripts, templates and hooks, and which are better left as ordinary documentation.

50 AI Workflows for Engineers: From Debugging to System Design, Code Review & Engineering Automation

50 AI Workflows for Engineers: From Debugging to System Design, Code Review & Engineering Automation

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

What did Anthropic publish?

Anthropic published a Claude blog post by Thariq Shihipar describing lessons from using hundreds of Claude Code Skills inside its engineering organization.

What is a Claude Code Skill?

A Skill is described as a folder that can include instructions, references, scripts, templates, configuration, hooks and memory that an agent can use for a task.

Which Skills had the biggest reported effect?

According to Anthropic’s own measurement cited in the source material, product verification Skills had the largest impact on output quality.

Why is this relevant beyond developers?

The approach suggests that companies can turn repeated work patterns into versioned operating knowledge for AI agents, making agent behavior more consistent across teams.

What is still unknown?

The available source material does not fully show measurement methods, external benchmarks or how well the approach works outside Anthropic’s own engineering environment.

Source: Thorsten Meyer AI

You May Also Like

Review response quality coach for local service businesses

A review response quality coach for local service businesses is being tested to improve reply consistency, professionalism, and compliance in reputation management.

AI prompt audit log for marketing agencies

Small marketing agencies are testing a new prompt-and-output log to improve AI-generated client work review and approval processes.

Asia-Pacific 12-Well Culture Plates – Market Analysis, Forecast, Size, Trends and Insights

Market analysis shows significant growth in Asia-Pacific 12-well culture plates, driven by biotech and pharmaceutical sectors, with forecasts predicting continued expansion.

BYD: World’s Largest Automaker In 5 Years

BYD Chairman Wang Chuanfu forecasts the company will surpass 10 million units annually and lead globally by 2028, driven by battery tech and market expansion.