Show HN: Understudy – Teach a desktop agent by demonstrating a task once

github.com

119 points by bayes-song 4 days ago


I built Understudy because a lot of real work still spans native desktop apps, browser tabs, terminals, and chat tools. Most current agents live in only one of those surfaces.

Understudy is a local-first desktop agent runtime that can operate GUI apps, browsers, shell tools, files, and messaging in one session. The part I'm most interested in feedback on is teach-by-demonstration: you do a task once, the agent records screen video + semantic events, extracts the intent rather than coordinates, and turns it into a reusable skill.

Demo video: https://www.youtube.com/watch?v=3d5cRGnlb_0

In the demo I teach it: Google Image search -> download a photo -> remove background in Pixelmator Pro -> export -> send via Telegram. Then I ask it to do the same for Elon Musk. The replay isn't a brittle macro: the published skill stores intent steps, route options, and GUI hints only as a fallback. In this example it can also prefer faster routes when they are available instead of repeating every GUI step.

Current state: macOS only. Layers 1-2 are working today; Layers 3-4 are partial and still early.

    npm install -g @understudy-ai/understudy
    understudy wizard
GitHub: https://github.com/understudy-ai/understudy

Happy to answer questions about the architecture, teach-by-demonstration, or the limits of the current implementation.

obsidianbases1 - 3 days ago

Nice work. I scanned through the code and found this file to be an interesting read https://github.com/understudy-ai/understudy/blob/main/packag...

rybosworld - 3 days ago

I have a hard time believing this is robust.

walthamstow - 3 days ago

It's a really cool idea. Many desktop tasks are teachable like this.

The look-click-look-click loop it used for sending the Telegram for Musk was pretty slow. How intelligent (and therefore slow) does a model have to be to handle this? What model was used for the demo video?

sethcronin - 3 days ago

Cool idea -- Claude Chrome extension as something like this implemented, but obviously it's restricted to the Chrome browser.

8note - 3 days ago

sounds a bit sketch?

learning to do a thing means handling the edge cases, and you cant exactly do that in one pass?

when ive learned manual processes its been at least 9 attempts. 3 watching, 3 doing with an expert watching, and 3 with the expert checking the result

skeledrew - 3 days ago

Interested, and disappointed that it's macOS only. I started something similar a while back on Linux, but only got through level 1. I'll take some ideas from this and continue work on it now that it's on my mind again.

jedreckoning - 4 days ago

cool idea. good idea doing a demo as well.

mustafahafeez - 3 days ago

Nice idea

- 4 days ago
[deleted]
abraxas - 4 days ago

One more tool targeting OSX only. That platform is overserved with desktop agents already while others are underserved, especially Linux.

aiwithapex - 4 days ago

[dead]

rockmanzheng - 3 days ago

[dead]

webpolis - 4 days ago

[dead]

mahendra0203 - 3 days ago

[flagged]

wuweiaxin - 4 days ago

[flagged]

sukhdeepprashut - 4 days ago

2026 and we still pretend to not understand how llms work huh