Gemini 2.5 Computer Use model

633 points by mfiguiere 6 days ago

I've had good success with the Chrome devtools MCP (https://github.com/ChromeDevTools/chrome-devtools-mcp) for browser automation with Gemini CLI, so I'm guessing this model will work even better.

arkmm - 6 days ago

What sorts of automations were you able to get working with the Chrome dev tools MCP?
- odie5533 - 6 days ago
  
  Not OP, but in my experience, Jest and Playwright are so much faster that it's not worth doing much with the MCP. It's a neat toy, but it's just too slow for an LLM to try to control a browser using MCP calls.
  - raffraffraff - 5 days ago
    
    Actually the super power of having the LLM in the bowser may be that it vastly simplifies using LLMs to write Playwright scripts.
    Case in point, last week I wrote a scraper for Rate Your Music, but found it frustrating. I'm not experienced with Playwright, so I used vscode with Claude to iterate in the project. Constantly diving into devtools, copying outter html, inspecting specific elements etc is a chore that this could get around, making for faster development of complex tests
  - atonse - 6 days ago
    
    Yeah I think it would be better to just have the model write out playwright scripts than the way it's doing it right now (or at least first navigate manually and then based on that, write a playwright typescript script for future tests).
    Cuz right now it's way too slow... perform an action, then read the results, then wait for the next tool call, etc.
    
    omneity - 6 days ago
    
    This is basically our approach with Herd[0]. We operate agents that develop, test and heal trails[1, 2], which are packaged browser automations that do not require browser use LLMs to run and therefore are much cheaper and reliable. Trail automations are then abstracted as a REST API and MCP[3] which can be used either as simple functions called from your code, or by your own agent, or any combination of such.
    You can build your own trails, publish them on our registry, compose them ... You can also run them in a distributed fashion over several Herd clients where we take care of the signaling and communication but you simply call functions. The CLI and npm & python packages [4, 5] might be interesting as well.
    Note: The automation stack is entirely home-grown to enable distributed orchestration, and doesn't rely on puppeteer nor playwright but the browser automation API[6] is relatively similar to ease adoption. We also don't use the Chrome Devtools Protocol and therefore have a different tradeoff footprint.
    0: https://herd.garden
    1: https://herd.garden/trails
    2: https://herd.garden/docs/trails-automations
    3: https://herd.garden/docs/reference-mcp-server
    4: https://www.npmjs.com/package/@monitoro/herd
    5: https://pypi.org/project/monitoro-herd/
    6: https://herd.garden/docs/reference-page
    
    disqard - 5 days ago
    
    Looks useful! What would it take to add support for (totally random example :D) Harper's Magazine?
    
    atonse - 6 days ago
    
    Whoa that’s cool. I’ll check it out, thanks!
    
    omneity - 6 days ago
    
    Thanks! Let me know if you give it a shot and I’ll be happy to help you with anything.
    
    jarek83 - 5 days ago
    
    You might want to change column title colors as they're not visible (I can see them when highlighting the text) https://herd.garden/docs/alternative-herd-vs-puppeteer/
    
    omneity - 5 days ago
    
    Oh thanks! It was a bug in handling browser light mode. I just fixed it.