MCP doesn't need tools, it needs code

lucumr.pocoo.org

226 points by the_mitsuhiko 3 days ago

Yeah I quite agree with this take. I don't understand why editors aren't utilizing language servers more for making changes. Crazy to see agents running grep and sed and awk and stuff, all of that should be provided through a very efficient cursor-based interface by the editor itself.

And for most languages, they shouldn't even be operating on strings, they should be operating on token streams and ASTs

fny - 3 days ago

Strings are a universal interface with no dependencies. You can do anything in any language across any number of files. Any other abstraction heavily restricts what you can accomplish.
Also, LLMs aren't trained on ASTs, they're trained on strings -- just like programmers.
- skybrian - 3 days ago
  
  No, it’s not really “any string.” Most strings sent to an interpreter will result in a syntax error. Many Unix commands will report an error if you pass in an unknown flag.
  In theory, there is a type that describes what will parse, but it’s implicit.
- anon7000 - 3 days ago
  
  Exactly. LLMs are trained on huge amounts of bash scripts. They “know” how to use grep/awk/whatever. ASTs are, I assume, not really part of that training data. How would they know how to work well with on? LLMs are trained on what humans do to code. Yes, I assume down the road someone will train more efficient versions that can work more closely with the machine. But LLMs work as well as they do because they have a large body of “sed” statements in their statistical models
  - theshrike79 - 2 days ago
    
    They also know how to use modern options like fd and rg, which allow more complex operations with a single call.
  - FuckButtons - 2 days ago
    
    treesitter is more or less a universal AST parser you can run queries against. Writing queries against an AST that you incrementally rebuild is massively more powerful and precise in generating the correct context than manually writing infinitely many shell pipeline oneliners and correctly handling all of the edge cases.
    
    anon7000 - 2 days ago
    
    I agree with you, but the question is more whether existing LLMs have enough training with AST queries to be more effective with that approach. It’s not like LLMs were designed to be precise in the first place
- - 3 days ago
  
  [deleted]
- UltraSane - 20 hours ago
  
  generating code that doesn't run is just a waste of electricity.
spacebanana7 - 3 days ago

It's so weird that codex/claude code will manually read through sometimes dozens of files in a project because they have no easy way to ask the editor to "Find Usages".
Even though efficient use of CLI tools might make the token burn not too bad, the models will still need to spent extra effort thinking about references in comments, readmes, and method overloading.
- doikor - 3 days ago
  
  We have that in Scala with the MCP tools metals provides but convincing Claude to actually use the tools has been really painful.
  https://scalameta.org/metals/blog/2025/05/13/strontium/#mcp-...
- ctoth - 3 days ago
  
  Which is why I wrote a code extractor MCP which uses Tree-sitter -- surely something that directly connects MCP with LSP would be better but the one bridge layer I found for that seemed unmaintained. I don't love my implementation which is why I'm not linking to it.
  - tough - 3 days ago
    
    both opencode and charm's crush support LSP's and MCP's as configs
- jgalt212 - 2 days ago
  
  Also, the business models are incentivized towards efficient token usage.
- fzzzy - 2 days ago
  
  Really? Github Copilot Agent can search. Interesting.
mgsloan2 - 3 days ago

I agree the current way tools are used seems inefficient. However there are some very good reasons they tend to operate on code instead of syntax trees:
* Way way way more code in the training set.
* Code is almost always a more concise representation.
There has been work in the past training graph neural networks or transformers that get AST edge information. It seems like some sort of breakthrough (and tons of $) would be needed for those approaches to have any chance of surpassing leading LLMs.
Experimentally having agents use ast-grep seems to work pretty well. So, still representing a everything as code, but using a syntax aware search replace tool.
- sam0x17 - 3 days ago
  
  Didn't want to bury the lead, but I've done a bunch of work with this myself. It goes fine as long as you give it both the textual representation and the ability to walk along the AST. You give it the raw source code, and then also give it the ability to ask a language server to move a cursor that walks along the AST, and then every time it makes a change you update the cursor location accordingly. You basically have a cursor in the text and a cursor in the AST and you keep them in sync so the LLM can't mess it up. If I ever have time I'll release something but right now just experimenting locally with it for my rust stuff
  On the topic of LLMs understanding ASTs, they are also quite good at this. I've done a bunch of applications where you tell an LLM a novel grammar it's never seen before _in the system prompt_ and that plus a few translation examples is usually all it takes for it to learn fairly complex grammars. Combine that with a feedback loop between the LLM and a compiler for the grammar where you don't let it produce invalid sentences and when it does you just feed it back the compiler error, and you get a pretty robust system that can translate user input into valid sentences in an arbitrary grammar.
  - mgsloan2 - 2 days ago
    
    Sounds like cool stuff, along the lines of structure editing!
    The question is not whether it can work, but whether it works better than an edit tool using textual search/replace blocks. I'm curious what you see as the advantage of this approach? One thing that comes to mind is that having a cursor provides some natural integration with LSP signature help
    Yes agentic loop with diagnostic feedback is quite powerful. I'd love to have more controllable structured decode from the big llm providers to skip some sources of needing to loop - something like https://github.com/microsoft/aici
  - MarkMarine - 3 days ago
    
    I’d love to see how you’re giving this interface to the LLM
- jonfw - 3 days ago
  
  > * Way way way more code in the training set.
  Why not convert the training code to AST?
  - mgsloan2 - 2 days ago
    
    You could, but it is extremely expensive to train an LLM that is competitive on coding evals. So, I was assuming use of a model someone else trained.
    Also, if it is only trained on code, it's likely to miss out on all the world knowledge that comes from the rest of the data.
    
    exe34 - 2 days ago
    
    fine tune instead of training from scratch might help.
p_j_w - 3 days ago

I think you've hit the nail on the head here.
After being pleasantly surprised at how well an AI did at a task I asked of it a few months ago that I thought was much more complicated, I was amused at how badly it did when I asked it to refactor some code to change variable names in one single source file to match a particular coding standard. After doing the work that a good junior developer might have needed a couple of days for, it failed hard at refactoring, working more at the level of a high school freshman.
kelseyfrog - 3 days ago

Structured output generally gives a nice performance boost, so I agree.
Specifically, I'd love to see widespread structured output support for context free grammars. You get a few here and there - vLLM for example. Most LLMs as a service only support JSON output which is better than nothing but doesn't cover this case at all.
Something with semantic analysis - scope informed output, would be a cherry on the top, but while technically possible, I don't see arriving anytime soon. But hey - maybe an opportunity for product differentiation.
- sam0x17 - 3 days ago
  
  Yeah see my other comment above, I've done it with arbitrary grammars, works quite well, don't know why this isn't more widespread
skydhash - 3 days ago

AST is only half of the picture. Semantics (aka the action taken by the abstract machine) are what’s important. What code helps with is identifying patterns which helps in code generation (defmacro and api services generations) because it’s the primary interface. AST is implementation detail.
0x457 - 3 days ago

If you look API exposed by LSP you would understand why. It's very hard to use LSP outside an editor because a lot of it is "where is a symbol in file X on line Y between these two columns is used"
ChickeNES - 2 days ago

You're looking for Serena: https://github.com/oraios/serena
lkjdsklf - 3 days ago

There's a few agents that integrate with LSP servers
opencode comes to mind off the top of my head
it still tends to do a lot of grep and sed though.

jumploops - 3 days ago

The promise of MCP is that it “connects your models with the world”[0].

In my experience, it’s actually quite the opposite.

By giving an LLM a set of tools, 30 in the Playwright case from the article, you’re essentially restricting what it can do.

In this sense, MCP is more of a guardrail/sandbox for an LLM, rather than a superpower (you must choose one of these Stripe commands!).

This is good for some cases, where you want your “agent”[1] to have exactly some subset of tools, similar to a line worker or specialist.

However it’s not so great when you’re using the LLM as a companion/pair programmer for some task, where you want its output to be truly unbounded.

[0]https://modelcontextprotocol.io/docs/getting-started/intro

[1]For these cases you probably shouldn’t use MCP, but instead define tools explicitly within one context.

ehnto - 3 days ago

If you're running one of the popular coding agents, they can run commands in bash which is more or less access to the infinite space of tooling I myself use to do my job.
I even use it to troubleshoot issues with my linux laptop that in the past I would totally have done myself, but can't be bothered. Which led to the most relatable AI moment I have encountered: "This is frustrating" - Claude Code thought, after 6 tries in a row to get my bluetooth headset working.
- chuckmcp - 3 days ago
  
  Even with all of the CLI tools at its disposal (e.g. sed), it doesn’t consistently use them to make updates as it could (e.g. widespread text replacement). Once in a blue moon, an LLM will choose some tool and use it in a way that they almost never do in a really smart way to handle a problem. Most of the time it seems optimized for using too many individual things, probably both for safety and because it makes the AI companies more money.
  - acedTrex - 3 days ago
    
    It's because the broader the set of "tools" the worse the model gets at utilizing them effectively. By constraining the use you ensure a much higher % of correct usage.
  - mmargenot - 3 days ago
    
    There is a tradeoff between quantity of tools and the ability of the model to make effective use of them. If tools in an MCP are defined at a very granular level (i.e. single API calls) it's a bad MCP.
    I imagine you run into something similar with bash - while bash is a single "tool" for an agent, a similar decision still need to be made about the many CLI tools that are available from enabling bash.
- AuthAuth - 2 days ago
  
  I've never seen an LLM do anything but absolutely destroy linux. So much of their data is outdated solutions.
  - ehnto - 2 days ago
    
    That the best thing about Linux(et al), it's a fairly stable target and programs and tools are pretty much the same as they were year on year. I wouldn't get it to help me with Nix, or let it loose on an EC2 instance, but doe general troubleshooting of Arch or something it's fine.
    Edge cases are everywhere, obviously, but I don't let it run wild. I approve every command it runs.
  - thrown-0825 - 2 days ago
    
    same, this is 100x worse than just copy pasting commands from stack overflow.
PhilippGille - 3 days ago

Given the security issues that come with MCP [1], I think it's a bad idea to call MCP a "guardrail/sandbox".
Also, there are MCP servers that allow running any command in your terminal, including apt install / brew install etc.
[1] https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
- jumploops - 3 days ago
  
  Yeah admittedly poor choice of words, given the security context surrounding MCP at large.
  Maybe “fettered” is better?
  Compared to giving the LLM full access to your machine (direct shell, Python executable as in the article), I still think it’s right way to frame MCP.
  We should view the whole LLM <> computer interface as untrusted, until proven otherwise.
  MCP can theoretically provide gated access to external resources, unfortunately many of them provide direct access to your machine and/or the internet, making them ripe as an attack vector.
- limagnolia - 2 days ago
  
  The security issues aren't so much with "MCP", they are with folks giving access to LLMs to do things they don't want those LLMs to be able to do. By describing MCP as guardrails, you might convince some of the nimkumpoops to think about where they place those guardrails.
- 0x457 - a day ago
  
  Different issues. Let's take a look at a technology that nearly every coding agent needs to use - git or any other version control tool. Sure, agent can use git by running shell scripts, but how do I limit what part of git it can do? For example, IDGAF what commits it makes on a feature branch because it will be squashed and merged later.
  With MCP server, I can just expose commit functionality and add it to allow list. The fact that security for remote MCP servers (i.g. not stdin) is a separate issue. The fact that there isn't an easy way to provide credentials to an MCP server is also a separate issue.
oooyay - 3 days ago

In my uneducated experience MCP is nothing more than a really well structured prompt. You can call out tools for the agent or model to use in the instruction prompt, especially for certain project. I define workflows that trigger for certain files being changed in Cursor and usually the model can run uninterrupted for a while.
- dragonwriter - 3 days ago
  
  > In my uneducated experience MCP is nothing more than a really well structured prompt.
  MCP isn't a prompt (though prompts are a resource an MCP server can provide). An MCP client that is also the direct LLM manager toolchain has to map the data from MCP servers tool/prompt/resource definition into the prompt, and it usually does so using prompt templates that are defined for each model, usually by the model provider. So the meaningful part of having a “really well-structured prompt” part isn't from MCP at all, just something that already exists that the MCP client leverages.
faangguyindia - 3 days ago

My coding agent just has access to these functions:
ask> what all tools u have?
I have access to the following tools:
1 code_search: Searches for a pattern in the codebase using ripgrep.
2 extract_code: Extracts a portion of code from a file based on a line range.
3 file_operations: Performs various file operations like ls, tree, find, diff, date, mkdir, create_file.
4 find_all_references: Finds all references to a symbol (function, class, etc.) from the AST index.
5 get_definition: Gets the definition of a symbol (function, class, etc.) from the AST index.
6 get_library_docs: Gets documentation for a library given its unique ID.
7 rename_symbol: Renames a symbol using VS Code. 8 resolve_library_id: Resolves a library name to a unique library ID.
what do i need MCP and other agents for? This is solving most of my problems already.
- dragonwriter - 3 days ago
  
  > what do i need MCP and other agents for?
  For your use cases, maybe you don't. Not every use case for an LLM is identical to your coding usage pattern.
- spacebanana7 - 3 days ago
  
  Which coding agent are you using?
nsonha - 3 days ago

It's not guardrail, it's guidance. You don't guide a child or an intern with: "here is everything under the sun, just do things", you give them a framework, programming language, or general direction to operate within.
- nativeit - 3 days ago
  
  Interns and children didn’t cost $500B.
  - pixl97 - 3 days ago
    
    You're right, they've cost trillions and trillions of dollars and to get any single one up to speed takes the minimum of 18 to 25 years.
    500b sounds like a value prop in those regards.
  - Bjartr - 3 days ago
    
    Collectively they kind of do and then some. That cost for AI is in aggregate, so really it should be compared to the cost of living + raising children to be educated and become interns.
    At some point the hope for both is that they result in a net benefit society.
  - nsonha - 3 days ago
    
    Some of them quip on HN, quite impressive.
  - senko - 3 days ago
    
    How is that relevant?
chris222 - 3 days ago

I find it’s best to use it to actually give context. Like prompted with a peice of information that the LLM doesn’t know how to look up (such as a link to the status or logs for an internal system), give it a tool to perform the lookup.
TZubiri - 3 days ago

All of this superhuman intelligence and we still haven't solved the "CALL MOM" demo

juanviera23 - 3 days ago

I agree MCP has these flaws, idk why we need MCP servers when LLMs can just connect to the existing API endpoint

Started on working on an alternative protocol, which lets agents call native endpoints directly (HTTP/CLI/WebSocket) via “manuals” and “providers,” instead of spinning up a bespoke wrapper server: https://github.com/universal-tool-calling-protocol/python-ut...

even connects to MCP servers

if you take a look, would love your thoughts

rco8786 - 3 days ago

> when LLMs can just connect to the existing API endpoint
The primary differentiator is that MCP includes endpoint discovery. You tell the LLM about the general location of the MCP tool, and it can figure out what capabilities that tool offers immediately. And when the tool updates, the LLM instantly re-learns the updated capability.
The rest of it is needlessly complicated (IMO) and could just be a bog standard HTTP API. And this is what every MCP server I've encountered so far actually does, I haven't seen anyone use the various SSE functionality and whatnot.
MCP v.01 (current) is both a step in the right direction (capability discovery) and an awkward misstep on what should have been the easy part (the API structure itself)
- AznHisoka - 3 days ago
  
  How is this different than just giving the LLM an OpenAI spec in the prompt? Does it somehow get around the huge amount of input tokens that would require?
  - stanleydrew - 3 days ago
    
    Technically it's not really much different from just giving the LLM an OpenAPI spec.
    The actual thing that's different is that an OpenAPI spec is meant to be an exhaustive list of every endpoint and every parameter you could ever use. Whereas an MCP server, as a proxy to an API, tends to offer a curated set of tools and might even compose multiple API calls into a single tool.
    
    orra - 3 days ago
    
    It's a farce, though. We're told these LLMs can already perform our jobs, so why should they need something curated? A human developer often gets given a dump of information (or nothing at all), and has to figure out what works and what is important.
    
    rco8786 - 3 days ago
    
    You should try and untangle what you read online about LLMs from the actual technical discussion that's taking place here.
    Everyone in this thread is aware that LLMs aren't performing our jobs.
  - rco8786 - 3 days ago
    
    Because again, discoverability is baked into the protocol. OpenAI specs are great, but they are: optional, change over time, and have a very different target use case.
    MCP discoverability is designed to be ingested by an LLM, rather than used to implement an API client like OAI specs. MCP tools describe themselves to the LLM in terms of what they can do, rather than what their API contract is.
    It also removes the responsibility of having to inject the correct version of the spec into the prompt from the user, and moves it into the protocol.
neomantra - 2 days ago

On my first couple days of writing MCP servers, I made ones that bind APIs (DataBento, Buttplug.io). I thought that was the point. These were my immediate takeaways:
1) I need an auto-binder (eg OpenAPI) or a better binding system like this UTCP is trying to be
2) I need a secure sandbox, for the system and even for the APIs (like a UAT env)
I’ve continued to make MCP servers and tools and realized (1) was a fallacy. Most APIs were not made for computers; they were made for humans to allow other humans to connect to their code.
It’s hard to explain, but it’s an ergonomic thing. An API to a database might have things like paging and filtering. The design might have to fit into a URL and you want to simplify or hide things. LLMs don’t care.
My insight was similar to this article wrt code. An LLM doesn’t need a cute API to a dataset. They can code so you don’t need to give them an API, you give them a SQL endpoint (my focus), or a Python venv, or a bash prompt.
Then akin to UTCP manuals, the user and LLM can develop tool descriptions and code helpers (in SQL, Views and stored procedures) to make them better at doing whatever they need to do. Maybe there’s a main tool description and then a supplementary user-specific one too.
So I’m taking a DuckDB, loading data and locking it down, and give a single SQL endpoint that returns a DB table in CSV. Then work with the LLMs to make tool descriptions and helpers in agentic loops.
So I think what y’all are working on is cool, but the power isn’t in the API connection itself, but how the LLM effectively uses it. But you can build that agentic-assist part into the product; or somebody wraps something around it.
pegasus - 3 days ago

What you're building makes a lot of sense to me. The communication indirection MCP use frequently introduces bothers me, as well as the duplication of effort when it comes to e.g. the OpenAPI spec. I'll keep an eye on this repo and plan to give it a spin sometime (though I wish there was a typescript version too).
- juanviera23 - 3 days ago
  
  there is a TS version actually, all the SDKs are here: https://github.com/universal-tool-calling-protocol
  - johnthescott - 2 days ago
    
    the link to RFC is broken:
    https://github.com/universal-tool-calling-protocol
stanleydrew - 3 days ago

> idk why we need MCP servers when LLMs can just connect to the existing API endpoint
Because the LLM can't "just connect" to an existing API endpoint. It can produce input parameters for an API call, but you still need to implement the calling code. Implementing calling code for every API you want to offer the LLM is at minimum very annoying and often error-prone.
MCP provides a consistent calling implementation that only needs to be written once.
- juanviera23 - 3 days ago
  
  yupp that's what UTCP does as well, standardizing the tool-calling
  (without needing an MCP server that adds extra security vulnerabilities)
  - otterley - 3 days ago
    
    There's still an agent between the user and the LLM. The model isn't making the tool calls and has no mechanism of its own to accomplish this.
  - mdaniel - 3 days ago
    
    heh, relevant to the "do what now?" thread, I didn't recognize that initialism https://github.com/universal-tool-calling-protocol
    I'll spare the audience the implied XKCD link
theshrike79 - 2 days ago

Why do we need GraphQL when we have REST APIs?
Both serve a different purpose, but both can achieve the exact same thing.

yxhuvud - 3 days ago

First rule of writing about something that can be abbreviated: First have some explanation so people have an idea of what you are talking about. Either type out what the abbreviation stands for, have an explanation or at least a link to some other page that explain what is going on.

EDIT: This has since been fixed in link, so it is outdated.

cgriswald - 3 days ago

Just so folks who want to do this know, the proper way to introduce an initialism is to use the full term on first use and put the initialism in parentheses. Thereafter just use the initialism.
Always consider your audience, but for most non-casual writing it’s a good default for a variety of reasons.
- mdaniel - 3 days ago
  
  You're welcome to do that in print media, but on the web the proper way is the abbr element with its title attribute <https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...>. Related to the distinction, I'd bet $1 there's some fancy CSS that would actually expand the definition under @media print
  I can attest the abbr is also mobile friendly, although I am for sure open to each browser doing its own UI hinting that a long-press is available for the term
  - jethack - 3 days ago
    
    Sadly abbr with title doesn't work at all on mobile Chrome [1] or Firefox [2]. Probably not Safari either, since long press on mobile means "select text" so you'd have to do some CSS trickery (and trying to hit a word-sized target with a finger is quite annoying).
    [1] https://issues.chromium.org/issues/337222647 -> https://issues.chromium.org/issues/41130053
    [2] https://bugzilla.mozilla.org/show_bug.cgi?id=1468007
  - crazygringo - 3 days ago
    
    Please read your own link. It literally says to put the definition in parentheses (same as print) on first use. Second paragraph.
    <abbr> is not what you seem to think it is. But the "typical use cases" section of your link does explain what it's actually for.
  - cgriswald - 3 days ago
    
    From your source:
    > Spelling out the acronym or abbreviation in full the first time it is used on a page is beneficial for helping people understand it, especially if the content is technical or industry jargon.
    > Only include a title if expanding the abbreviation or acronym in the text is not possible. Having a difference between the announced word or phrase and what is displayed on the screen, especially if it's technical jargon the reader may not be familiar with, can be jarring.
losvedir - 3 days ago

If you don't know what "MCP" stands for, then this article isn't for you. It's okay to load it, realize you're not the target audience, and move on. Or, spend some of your own time looking it up.
This is like complaining that HTTP or API isn't explained.
- AznHisoka - 3 days ago
  
  The difference is those terms are ubiquitous terms after 20 years of usage. MCP is a relatively new term that hasnt even been around for a year or so
- DrewADesign - 3 days ago
  
  I think this issue seems completely straightforward to many people… and their answer likely depends on if they know what MCP means.
  The balance isn’t really clear cut. On one hand, MCP isn’t ubiquitous like, say, DNS or ancient like BSD. On the other, technical audiences can be expected to look up terms that are new to them. The point of a headline is to offer a terse summary, not an explanation, and adding three full words makes it less useful. However, that summary isn’t particularly useful if readers don’t know what the hell you’re talking about, either, and using jargon nearly guarantees that.
  I think it’s just one of those damned-if-you-do/don’t situations.
- mattacular - 3 days ago
  
  It's not really like your examples because MCP has been around for about 1 year whereas those others have been around for decades and are completely ubiquitous throughout the software industry as a result.
- bityard - 3 days ago
  
  Textbook example of gatekeeping if I ever saw it.
jeroenhd - 3 days ago

If you don't know the abbreviation, that can also mean you're not the target audience. This is a blog post written for an audience that uses multiple MCP servers, arguing for a different way to use LLMs. If you need the term explained and don't care enough to throw the abbreviation into Google, you're not going to care much about what's being said anyway.
I have no idea what any of the abbreviations in stock market news mean and those stock market people won't know their CLIs from their APIs and LLMs, but that doesn't mean the articles are bad.
bgwalter - 3 days ago

"MCP" is the new "webscale". It can be used to write philosophical papers about LLMs orchestrating the obliquely owned ontologies of industrial systems, including SCADA systems:
https://arxiv.org/html/2506.11180v1
SCADA systems got famous, because they previously required STUXNET to be hacked. In the future you can just vibe hack them.
diggan - 3 days ago

> or at least a link to some other page that explain what is going on
There is a link to a previous post by the same author (within the first ten words even!), which contains the context you're looking for.
- yxhuvud - 3 days ago
  
  A link to a previous post is not enough, though of course appreciated. But it would be something I click on after I decide if I should spend time on the article or not. I'm not going on goose chases to figure out what the topic is.
  - dkdcio - 3 days ago
    
    this is a wild position. it would have taken you the same amount of time to type your question(s) into your favorite search engine or LLM to learn what the terms mean as you now have spent on this comment thread. the idea that every article should contain all prerequisite knowledge for anybody at any given level of context about any topic is absurd
jahsome - 3 days ago

Are you referring to MCP? If so, it's fully spelled out in the first sentence of the first paragraph, and links to a more thorough post on the subject. That meets 2 of the 3 criteria you've dictated.
- yxhuvud - 3 days ago
  
  That was not the case when I commented. It has obviously been updated since then.
owebmaster - 3 days ago

If you are looking for a definition, you should go for beginners article, not advanced.
reactordev - 3 days ago

MCP is Model Context Protocol, welcome to the land of the living. Make sure you turn the lights off to the cave. :)
It’s pretty well known by now what MCP stands for, unless you were referring to something else…
- AznHisoka - 3 days ago
  
  If by cave, you mean a productive room where busy people get things done, I agree.
- tronreference - 3 days ago
  
  Master Control Program:
  https://www.youtube.com/watch?v=atmQjQjoZCQ
  - lsaferite - 3 days ago
    
    I refuse to believe they didn't name the spec with that in mind.
    Also... that's some dedication. A user dedicated to a single comment.
  - polotics - 3 days ago
    
    Mysteriously Convoluted Protocol ...to get LLM's to do tool calling. I do agree that direct code execution in an enclave is the way to go.
- koakuma-chan - 3 days ago
  
  > It’s pretty well known by now what MCP
  Minecraft Coder Pack
  https://minecraft.fandom.com/wiki/Tutorials/Programs_and_edi...
- klez - 3 days ago
  
  I, for one, still need to look it up every time I see it mentioned. Not everyone is talking or thinking about LLMs every waking minute.
  - grim_io - 3 days ago
    
    Are you looking up what the abbreviation stands for, or what an MCP is?
    The first case doesn't matter at all if you already know what an MCP actually is.
    At least for the task of understanding the article.
    
    lsaferite - 3 days ago
    
    MCP being the initialism for "Model Context Protocol", the specification released by Anthropic, generally dictates you shouldn't say "an MCP" but simply "MCP" or "the MCP". If you are referring to a concrete implementation of a part of MCP, then you likely meant to say "an MCP Server" or "an MCP Client".
  - reactordev - 3 days ago
    
    I figured with all the AI posts and models, tools, apps, featured on here in the last year or two that it was a given. I guess not.

xavierx - 3 days ago

Is this just code injection?

It’s talking about passing Python code in that would have a Python interpreter tool.

Even if you had guardrails setup that seems a little chancery, but hey this is the time of development evolution where we’re letting AI write code anyway, so why not give other people remote code execution access, because fuck it all.

thrown-0825 - 2 days ago

yes, modern development practice is to introduce rce’s as a feature and then fund raise around it.
laser_eagle - 3 days ago

[dead]

scosman - 3 days ago

I made a MCP server that tries to address some of these (undocumented, security, discoverability, platform specific). You write a yaml describing your tools (lint/format/test/build), and it exposes them to agents MCP. Kinda like package.json scripts but for agents. Speeds things up too, fewer incorrect commands, no human approval needed, and parallel execution.

https://github.com/scosman/hooks_mcp

The interactive lldb session here is super cool for deeper debugging. For security, containers seem like the solution - sketch.dev is my fav take on containerizing agents at the moment.

preek - 3 days ago

Re Security: I put my AI assistant in a sandbox. There, it can do whatever it wants, including deleting or mutating anything that would otherwise be harmful.

I wrote about how to do it with Guix: https://200ok.ch/posts/2025-05-23_sandboxing_ai_tools:_how_g...

Since then, I have switched to using Bubblewrap: https://github.com/munen/dotfiles/blob/master/bin/bin/bubble...

xmorse - 3 days ago

This is how tools are implemented in latest Gemini models like gemini-2.5-flash-preview-native-audio-dialog: the LLM has access to a code execution tool that can run code in python and all tools are available in a default_api class

CharlieDigital - 3 days ago

A few weeks back, I actually started working on an MCP server that is designed to let the LLM generate and execute JavaScript in a safe, sandboxed C# runtime with Jint as the interpreter.

https://github.com/CharlieDigital/runjs

Lets the LLM safely generate and execute whatever code it needs. Bounded by statement count, memory limits, and runtime limits.

It has a built in secrets manager API (so generated code can make use of remote APIs) can, HTTP fetch analogue, JSONPath for JSON handling, and Polly for HTTP request resiliency.

mdaniel - 3 days ago

I don't meant to throw shade on your toy, but trying to get a prediction model to use a language that actively hates developers is a real roll-the-dice outcome
- CharlieDigital - 3 days ago
  
  Which language? C# or JS? OpenAI and Claude are quite good at JS. The runtime is C# to sandbox the execution so it is not being generated.

larve - 3 days ago

codeact is a really interesting area to explore. I expanded upon the JS platform I started sketching out in https://www.youtube.com/watch?v=J3oJqan2Gv8 . LLMs know a million APIs out of the box and have no trouble picking more up through context, yet struggle once you give them a few tools. In fact just enabling a single tool definition "degrades" the vibes of the model.

Give them an eval() with a couple of useful libraries (say, treesitter), and they are able not only to use it well, but to write their own "tools" (functions) and save massively on tokens.

They also allow you to build "ephemeral" apps, because who wants to wait for tokens to stream and a LLM to interpret the result when you could do most tasks with a normal UI, only jumping into the LLM when fuzziness is required.

Most of my work on this is sadly private right now, but here's a few repos github.com/go-go-golems/jesus https://github.com/go-go-golems/go-go-goja that are the foundation.

BLanen - 3 days ago

What this is saying is again, that MCP is not a protocol. Which is the point of MCP, making it essentially worthless because it doesn't define actual behavioral rules, it can only describe existing rules informally.

This is because defining a formal system, that can do everything MCP promises to enable, is a logical impossibility.

pploug - 2 days ago

While I generally agree with the author on code over tools, the article could have benefitted from some concrete ways that this could have potentially been done somewhat securely by sandboxing, enforcing zero trust, network segmentation, and all the other known controls we've developed over the last decade.

I love the optimism of this space, but fear that the "security is a sham" attitude will bite us all in the ass down the line.

kordlessagain - 3 days ago

As one does, I've built an alternative to MCP: https://ahp.nuts.services

Put GPT5 into agent mode then give it that URL and the token 'linkedinPROMO1' and once it loads the tools tell it to use curl in a terminal (it's faster) and then run the random tool.

This is authenticated at the moment with that token, plus bearer tokens, but I've got the new auth system up and its working. I still have to do the integration with all the other services (the website, auth, AHP and the crawler and OCR engine), so will be a while before all that's done.

ryukafalz - 2 days ago

As ever, I think the answer to "how do we sandbox arbitrary code while still letting it do useful things?", whether human-written or machine-written, is with object capabilities. Run the generated code in a sandbox, but pass in capabilities to useful resources, whether that be remote servers, local directories, or whatever else. Then you know the bounds of what trouble it can get up to from the start.

philipp-gayret - 3 days ago

Agree on that it should be composable. Even better if MCP tooling wouldn't yield huge amounts of output that pollutes the context and the output of one can be input to the next, so indeed that may as well be code.

Would be nice if there was a way for agents to work with MCPs as code, preview or debug the data flowing through them. At the moment it all seems not a mature enough solution and Id rather mount a Python sandbox with API keys to what it needs than connect an MCP tool on my own machine.

thrown-0825 - 2 days ago

MCP defeats the entire point of LLMs.

PhilipRoman - 3 days ago

Can't wait until I can buy a H100 with a DisplayPort input and USB keyboard and mouse output and just let it figure everything out.

thrown-0825 - 2 days ago

In a couple years you will be able to get used ones on ebay for cheap, I cant wait.
mdaniel - 3 days ago

I'm guessing I'm spoiling the joke, but why not just a Thunderbolt dock and plug the H100 into your existing machine, no DisplayPort interpretation required?
Although I could easily imagine the external robot(?) being a "hold my beer" to the interview cheat arms race
To extra ruin the joke, the 96GB versions seem to be going for $24,000 on ebay right now

abtinf - 3 days ago

I’ve posted this before[1], and have searched, but still haven’t found it: I wish someone would write a clear, crisp explanation for why MCP is needed over simply supporting swagger or proto/grpc.

[1] https://news.ycombinator.com/item?id=44848489

aszen - a day ago

I'll give you one simple reason, APIs designed for machines are not suitable for LLMs to use consistently.
Llms need a carefully designed interface which exposes tools at the intent level, most APIs are too low level for llms to perform user actions in a single call.
avereveard - 3 days ago

Think a LLM driving a Browser, where it fills field, click things, and in general where losing the state loses the work done so far
That's the C in the protocol.
Sure you can add a session key to the swagger api and expose it that way so that llm can continue their conversation, but it's going to be a fragile integration at best.
A MCP tied to the conversation state abstract all that away, for better or worse.

skerit - 3 days ago

I don't get it. Tools are a way to let LLMs do something via what is essentially an API. Is it limited? Yes, it is. By design.

Sure in some cases it might be overkill and letting the assistant write & execute plain code might be best. There are plenty of silly MCP servers out there.

s1mplicissimus - 3 days ago

I tried doing the MCP approach with about 100 tools, but the agent picks the wrong tool a lot of the time and it seems to have gotten significantly worse the more tools I added. Any ideas how to deal with this? Is it one of those unsolvable XOR-like problems maybe?

lsaferite - 3 days ago

There are many routes to a solution.
Two options (out of multiple):
- Have sub-agents with different subsets of tools. The main agent then delegates. - Have dedicated tools that let the main agent activate subsets of tools as needed.
faangguyindia - 3 days ago

AI agents can't even remember which files are already in the context, let alone them picking right tool for the job.
throwaway314155 - 3 days ago

You wind up having to explicitly tell it to use a tool and how to use it (defeating the point, mostly)
thrown-0825 - 2 days ago

congrats, you have encountered one of the fundamental flaws of using an ad-lib generator as an orchestration / rules engine.
the_mitsuhiko - 3 days ago

Remove most tools. After 30 tools it greatly regresses.
laser_eagle - 3 days ago

[dead]

throwmeaway222 - 3 days ago

problem with MCP right now is that LLMs don't natively know what it is

an LLM natively knows bash and how to run things

MCP is forcing a weird set of non-normal rules that most of the writing of the web doesn't support. Most of the web writes a lot about bash and getting things done.

Maybe in a few years LLMs will "natively" understand them, but I see MCP more as a buzzword right now.

dragonwriter - 3 days ago

> problem with MCP right now is that LLMs don't natively know what it is
Most models that it is used with natively know what tools are (they are trained with particular prompt formats for the use of arbitrary tools), and the model never sees MCP at all, it just sees tool definitions, or tool responses, in the format it expects in prompts. MCPs are a way to communicate information about tools to the toolchain running the LLM, when the LLM sees information that came via MCP it is indistinguishable from tools that might be built into the toolchain or provided by another mechanism.
- throwmeaway222 - 3 days ago
  
  No that's not what I'm saying. If you tell an LLM that you need a report on a specific member of congress and provide a prompt saying you can use bash tools like grep/curl/ping/git/etc... just return bash then a formatted code block
  Or you can use fetch_record followed by a formatted code block of the name of a google search you want to perform.
  The LLM will likely use bash and curl because it NATIVELY knows what it is and is capable of, while this other tool you have to feed it all these parameters that it is not used to.
  I'm not saying go ahead and throw that in chatgpt, I'm talking from experience at our company using MCP vs bashable stuff, it keeps ignoring the other tools.
  - dragonwriter - 3 days ago
    
    It's possible that its not about "native knowledge" but about how the descriptions (which get mapped into the prompt) for each of the tools are setup (or even their order; LLM behavior can be very sensitive to not-obviously-important prompt differences.)
    I'd be cautious inferring generalizations about behavior and then explanations of those generalizations from observation of a particular LLM used via a particular toolchain.
    That said, that it does that in that environment is still an interesting observation.

giltho - 3 days ago

Imagine 50 years of computer security to have articles come up on hackernews saying “what you need is to allow a black box to run arbitrary python code” :(

thrown-0825 - 2 days ago

from user input lmao
these people are not to be taken seriously

turnsout - 3 days ago

> One surprisingly useful way of running an MCP server is to make it an MCP server with a single tool (the ubertool) which is just a Python interpreter that runs eval() with retained state.

Wow, you better be sure you have that Python environment locked down.

starkparker - 3 days ago

yeah, check out the article's "Security is a Sham" heading that explicitly covers why the author doesn't really give a shit
- turnsout - 3 days ago
  
  Yeah, I saw that, it's still just wild. I guess "YOLO" is one response to the difficulties of securing endpoints.

PhilKunz - 3 days ago

How come noone mentioned serena MCP here until now :D

evrennetwork - 3 days ago

[dead]

laser_eagle - 3 days ago

[dead]

faangguyindia - 3 days ago

Here is why MCP is bad, here i am trying to use MCP to build a simple node cli tool to fetch documentation from Context7: https://pastebin.com/raw/b4itvBu4 And it doesn't work even after 10 attemps.

Fails and i've no idea why, meanwhile python code works without issues but i can't use that one as it conflicts with existing dependencies in aider, see: https://pastebin.com/TNpMRsb9 (working code after 5 failed attempts)

I am never gonna bother with this again, it can be built as a simple rest API, why we even need this ugly protocol?

coverj - 3 days ago

I'm interested why you aren't using the actual context7 MCP?
- the_mitsuhiko - 3 days ago
  
  He is if you look at the code.
  From my experience context7 just does not work, or at least does not help. I did plenty of experiments with it and that approach just does not go anywhere with the tools and models available today.