Agent Skills

addyosmani.com

226 points by BOOSTERHIDROGEN 10 hours ago


wg0 - 3 hours ago

Snake oil. Good to read for sure. Seems all plausible too. But snake oil nevertheless.

Here's why: The slot machine can drop any hard requirement that you specifically in your AGENTS.md, memory.md or your dozens of skill markdowns. Pretty much guaranteed.

These harnesses approaches pretend as if LLMs are strict and perfect rule followers and the only problem is not being able to specify enough rules clearly enough. That's fundamental cognitive lapse in how LLMs operate.

That leaves only one option not reliable but more reliable nevertheless: Human review and oversight. Possibly two of them one after the other.

Everything else is snake oil but at that point, you also realize that promised productivity gains are also snake oil because reading code and building a mental model is way harder than having a mental model and writing it into code.

ai_fry_ur_brain - 5 hours ago

Cant wait for everyone to realize they've wasted a year + messing with agents and experiencing a feeling of psuedo productivity.

stellalo - 2 hours ago

> A skill is a markdown file with frontmatter that gets injected into the agent’s context when the situation calls for it.

When the LLM decides that the situation calls for it

> It is a workflow: a sequence of steps the agent follows, with checkpoints that produce evidence, ending in a defined exit criterion.

A sequence of steps the LLM can decide to follow

dmix - 5 hours ago

I've tried these larger agent skillsets in the past and felt it was a waste of time because it was just doing too much. Just like vim it's often better to pick and choose from the community instead of installing skills like they are an IDE. Skills are way too personal because every dev and dev team is different. So better to treat these as a reference for your own config rather than bulk install someone else's config.

thatmf - 4 hours ago

Why are people so excited to put themselves out of a job?

Not that these or any "skills" will do that, but just- in principle. This is like alienation from labor at scale.

CharlesW - 8 hours ago

From an SEO/LLMO perspective, the discoverability of these skills will be difficult without a rename: https://agentskills.io/

If Addy reads this, how do you pitch this vs. Superpowers? https://github.com/obra/superpowers

ColinEberhardt - an hour ago

Agents Skills are built upon “Five design decisions [that] are the load-bearing ones”

And Open Design (HN front page yesterday) is supported by “Six load-bearing ideas”

The similarities in the way these prompt libraries are documented doesn’t feel coincidental.

zmmmmm - 7 hours ago

I was surprised how long some of these skills are. They are pages and pages long with tables and checkbox lists and code examples, etc.

Curious how normal that is - it would only take a couple of these to really fill the context alot.

tariky - 2 hours ago

What is difference between superpowers and this?

I use superpowers for several months now and it really does help. But still 90/10 rule applies, 10% of time it will produce stupid decision. So always check spec.

cortesoft - 2 hours ago

What makes this better/different than spec-kit? It seems to have a very similar philosophy. I wonder if they could work together? Or would they just be duplicative?

https://github.com/github/spec-kit

turlockmike - 7 hours ago

The best way to prompt an LLM is to describe the outcome you want, that's it. They are trained as task completers. A clear outcome is way better than a process.

If the LLM fails, either you didn't describe your outcome sufficiently or is misinterpreted what you said or it couldn't do it (rare).

Common errors should be encoded as context for future similar tasks, don't bloat skills with stuff that isn't shown to be necessary.

SudheerTammini - 4 hours ago

Recently I have got an access(enterprise)to the latest ChatGPT module with an ability to write skills to automate repeatable taks. Without any prior knowledge I just started tinkering and now after creating and testing multiple skills in real business environment I can confidently say writing a good skill is a skill itself. As the author mentioned it's not an essay but a specific instructions sets organised in steps and in a concise manner.

theahura - 4 hours ago

I really wish he wouldn't use AI to write his posts. It would be faster to just post the prompt he used to write the article

koliber - 3 hours ago

Lately I keep hearing the same thing over and over: the things that are good for managing a team of devs are good for LLMs.

Good test cases.

Clear and concise documentation.

CI/CD.

Best practices and onboarding docs.

Managing LLMs is becoming more and more similar to managing teams of people.

scotty79 - 24 minutes ago

> Workflows are agent-actionable; essays are not. The same is true for human teams. If your team handbook is 200 pages, no one reads it under time pressure.

Agents do read that. And actually remember it. Because it's tiny with other things you are cramming into their context.

codemog - 6 hours ago

Everyone who writes this kind of stuff skips the boring parts: science and engineering.

Yep, benchmarks, comparisons of with/without, samples of generated code with/without. This kind of stuff matters, and you may be making your agent stupider or getting worse results without real analysis.

Also this prose reads like the author has drunk the Google kool-aid and not much else.

konaraddi - 4 hours ago

There’s so many ways, many redundant, to set up agents for software development that beyond personal/team/org needs+tastes, I need to look into setting up some benchmarks to evaluate what set up is optimal or whether the differences are even worth it.

senko - 8 hours ago

> This isn’t a coincidence. It’s the same SDLC every functioning engineering organisation runs, just in different vocabulary. [...] Amazon calls it the working-backwards memo and the bar raiser. Every healthy team has some version of this loop.

This (sdlc == working backwards & bar raiser) is so horribly wrong, that I hope this was an LLM hallucination.

In general, I'm starting to see these agent scaffolding systems as an anti-pattern: people obsess over systems for guiding agents and construct elaborate rube-goldberg machines and then others cargo-cult them wholesale, in an effort to optimize and control a random process and minimize human involvement.

ElijahLynn - 8 hours ago

I've been using Agent Skills on a new side project and I'm really impressed so far! It really holds my hand a lot of the way and really lets me focus on developing a product instead of figuring out how to build it. I get to focus much more energy on high level architecture and product design.

Very grateful for this repository and everyone who contributed to it!

y-curious - 8 hours ago

Thanks for this, going to steal a lot of this. I would install your plugin, but I worry about being able to delete it later. I also think that each one of these is better served customized to a developer. That said, I'm still going to grab some of these, thanks!

gavmor - 7 hours ago

Naming things is such a hard problem that many devs don't even bother trying.

That being said, this post is full of reasonable assertions, so I'm looking forward to experimenting with this... whatever it is.

gosukiwi - 8 hours ago

I wonder how does this compare to superpowers

AndyNemmity - 6 hours ago

This is why I created the /do router, to route to all skills. I also have anti rationalization, progressive context discovery etc.

I only make it for me, so it's a bit complex and targeted towards me, and what I do, but it's pretty easy to adjust things.

https://github.com/notque/vexjoy-agent

Working on reading through Agent Skills, it seems we've converged on a lot of the same points, and I've never seen it, so trying to get an understanding of it.

Edit 1: I don't like all the commands. I just rely on a single router to automatically decide what I want, and that feels like the most reasonable way to me to communicate with it.

I don't want to remember things. And that's the way for me to scale the number of skills and activities. I don't have to think about them.

Edit 2: We have very different routers.

https://github.com/addyosmani/agent-skills/blob/f504276d8e07...

vs

https://github.com/notque/vexjoy-agent/blob/main/skills/do/S...

I personally wouldn't call theirs an intelligent router. They are dancing between a few different skills. We have extremely different setups there.

But of course, I'm using way more context to get it done. I'm even sending it out to Haiku to build the route choices.

I choose to use tokens to make things better for myself, not everyone would make the same choice, so I certainly see why they are using a few skills, and composing them.

Edit 3: This is much easier for a user to wrap their head around because there's much less.

I am only focused on the best improvements I can make that show value for my use cases. This is straight foward to reason about.

This seems like a nice way to get the best concepts for people trying to understand them. I commend them for a clean, simple approach.

Edit 4: Yeah, I think there are some things I can learn from them which is always good.

I especially like simple decisions like collapsing the install details for each harness in the readme.

I'm going to read over the entire thing and look for opportunities to improve my stuff.

We are all working together, learning, testing, building, trying to find the best way to implement things.

encoderer - 9 hours ago

I adopted a couple of these, the api design and ui testing ones have been particularly helpful.

openclawclub - 2 hours ago

[flagged]

lacymorrow - 3 hours ago

[flagged]

dmitrijbelikov - 3 hours ago

[dead]

Amber-chen - 4 hours ago

[dead]

cpharsh410 - 5 hours ago

[flagged]

stevenpetryk - 8 hours ago

[dead]

panavm - 3 hours ago

[flagged]

saluc28 - 10 hours ago

[dead]