A Social Filesystem

overreacted.io

444 points by icy a day ago


swyx - 15 hours ago

> Apps may come and go, but files stay—at least, as long as our apps think in files.

yes: https://www.swyx.io/data-outlasts-code-but

all lasting work is done in files/data (can be parsed permissionlessly, still useful if partially corrupted), but economic incentives keep pushing us to keep things in code (brittle, dies basically when one of maintainer|buildtools|hardware substrate dies).

when standards emerge (forcing code to accept/emit data) that is worth so much to a civilization. a developer ecosystem tipping the incentive scales such that companies like the Googl/Msft/OpenAI/Anthropics of the world WANT to contribute/participate in data standards rather than keep things proprietary is one of the most powerful levers we as a developer community collectively hold.

(At the same time we shoudl also watch out for companies extending/embracing/extinguishing standards... although honestly outside of Chrome I struggle to think of a truly successful example)

theturtletalks - 16 hours ago

POSSE and AT Protocol can be understood as interoperable marketplaces. Platforms like Reddit and Instagram already function this way: the product is user content, the payment is attention, and the platform’s cut is ads or behavioral data. Dan argues that this structure is not inevitable. If social data is treated as something people own and store themselves, applications stop being the owners of social graphs and become interfaces that read from user-controlled data instead.

I am working on a similar model for commerce. Sellers deploy their own commerce logic such as orders, carts, and payments as a hosted service they control, and marketplaces integrate directly with seller APIs rather than hosting sellers. This removes platform overhead, lowers fees, and shifts ownership back to the people creating value, turning marketplaces into interoperable discovery layers instead of gatekeepers.

skybrian - 19 hours ago

This article goes into a lot of detail, more than is really needed to get the point across. Much of that could have been moved to an appendix? But it's a great metaphor. Someone should write a user-friendly file browser for PDS's so you can see it for yourself.

I'll add that, like a web server that's just serving up static files, a Bluesky PDS is a public filesystem. Furthermore it's designed to be replicated, like a Git repo. Replicating the data is an inherent part of how Bluesky works. Replication is out of your control. On the bright side, it's an automatic backup.

So, much like with a public git repo, you should be comfortable with the fact that anything you put there is public and will get indexed. Random people could find it in a search. Inevitably, AI will train on it. I believe you can delete stuff from your own PDS but it's effectively on your permanent record. That's just part of the deal.

So, try not to put anything there that you'll regret. The best you could do is pick an alias not associated with your real name and try to use good opsec, but that's perilous.

extraduder_ire - 33 minutes ago

You mention having a "self" record in the app.bsky.actor.profile lexicon to store profile information, is there any reason to have more records of that type in your repository?

I've seen a few people make other ones when examining their accounts with pdsls, but they seem to be there for "just because I can" reasons.

japanuspus - 5 hours ago

> Identity -- This is a difficult problem.

My hope is that in 5 years, I will not have anything in my feeds that have not been signed in a way that I can assign a trust level.

Here in the Nordics, we are already seeing messaging apps such as [hudd] that require government issued ID to sign in. I want this to spread to everything from podcasts and old-school journalism to the soccer-club newsletter, so that I can always connect a piece of information back to a responsible source.

[hudd]: (https://about.hudd.dk/))

motoxpro - 17 hours ago

I've always thought walled gardens are the effect of consumer preferences, not the cause.

The effect of the internet (everything open to everyone) was to create smaller pockets around a specific idea or culture. Just like you have group chats with different people, thats what IG and Snap are. Segmentation all the way down.

I am so happy that my IG posts arent available on my HN or that my IG posts arent being easily cross posted to a service I dont want to use like truth social. If you want it to be open, just post it to the web.

I think I don't really understand the benefit of data portability in the situation. It feels like in crypto when people said I want to use my Pokemon in game item in Counterstrike (or any game) like, how and why would that even be valuable without the context? Same with a Snap post on HN or a HN post on some yet-to-be-created service.

christophilus - 14 hours ago

I’ve been reading “The Unix Programming Environment”. It’s made me realize how much can be accomplished with a few basic tools and files (mostly plain text). I want to spend some time thinking of what a modern equivalent would look like. For example, what would Slack look like if it was file (and text) oriented and UNIXy? Well, UNIX had a primitive live chat in the form of live inter-user messaging. I’d love to see a move back to simpler systems that composed well.

voidUpdate - an hour ago

How does this system determine the amount of likes a post has? Since there is no back reference on a post to people who have liked it, don't you have to iterate over every single person, iterate over their likes, to see if one of them is the post you are viewing, and add all them up?

echoangle - 12 hours ago

Is there anything stopping me from backdating my own records? Since the createdAt is just an arbitrary field, I can just write whatever I want in there, right? Is there a way for the viewing application to verify when the record was created (and not modified since), maybe by looking at the mentioned signing?

skeledrew - 17 hours ago

I've been thinking of this for some time, conceptually, but perhaps from a more fundamental angle. I think the idea of "files" is pretty dated and can be thrown out. Treat everything as data blobs (inspired by PerKeep[0]) addressed by their hashes and many of the issues described in the article just aren't even a thing. If it really makes sense, or for compatibility sake, relevant blobs can be exposed through a filesystem abstraction.

Also, users don't really want apps. What users want are capabilities. So not Bluesky, or YouTube for example, but the capability to easily share a life update with interested parties, or the capability to access yoga tutorial videos. The primary issue with apps is that they bundle capabilities, but many times particular combinations of capabilities are desired, which would do well to be wired together.

Something in particular that's been popping up fairly often for me is I'm in a messaging app, and I'd like to lookup certain words in some of the messages, then perhaps share something relevant from it. Currently I have to copy those words over to a browser app for that lookup, then copy content and/or URL and return to the messaging app to share. What I'd really love is the capability to do lookups in the same window that I'm chatting with others. Like it'd be awesome if I could embed browser controls alongside the message bubbles with the lookup material, and optionally make some of those controls directly accessible to the other part(y|ies), which may even potentially lead to some kind of adhoc content collaboration as they make their own updates.

It's time to break down all these barriers that keep us from creating personalized workflows on demand. Both at the intra-device level where apps dominate, and at the inter-device level where API'd services do.

[0] https://perkeep.org/

articsputnik - 3 hours ago

Amazing write-up. Owing to your data, a standardized protocol (such as the AT Protocol) is so great! It's like markdown, everything is basically built on files (but cleverly architected so that it works decentralized across the web too!).

I wrote a little along the same lines on «Well Being in Times of Algorithms» at https://www.ssp.sh/blog/well-being-algorithms/ (or HN: https://news.ycombinator.com/item?id=46352747).

Jonovono - 19 hours ago

I can’t remember how many times I’ve read an article and enjoyed it so much and then looked and saw it was written by Dan ;) always a pleasure !

aembleton - 3 hours ago

Why would we need to store the createdAt value in a file? The filesystem already stores this information. We could just store the text which would mean no Json would be needed.

sroerick - 12 hours ago

Interesting - I just spent all day on this on an app which I'm using. My architecture is a little different (probably worse).

The app lives on a single OpenBSD server. All user data is stored in /srv/app/[user]. Authentication is done by accessing OpenBSD Auth helper functions.

Users can access their data through the UI normally. Or they can use a web based filesystem browser to edit their data files. Or, alternately, they can ssh into the server and have full access to their files with all the advantages this entails. Hopefully, this raises the ceiling a bit for what power users of the system can accomplish.

I wanted to unify the OS ecosystem and the web app ecosystem and play around with the idea of what happens if those things aren't separate. I'm sure I'm introducing all kinds of security concerns which I'm not currently aware of.

Another commenter brought up Perkeep, which I think is very interesting. Even though I love Plan 9 conceptually, I do sort of wonder if "everything is a file" was a bit of a wrong turn. If I had my druthers, I think building on top of an OS which had DB and blob storage as the primary concept would be interesting and perhaps better.

If anybody cares, it's POOh stack, Postgres, OCAML, OpenBSD, AND htmx

camgunz - 14 hours ago

I'm skeptical of these kind of like, self-describing data models. Like, I generally like at proto--because I like IPFS--but I think the whole "just add a lexicon for your service and bickety bam, clients appear" is a leap too far.

For example, gaze upon dev.ocbwoy3.crack.defs [0] and dev.ocbwoy3.crack.alterego [1]. If you wanted to construct a UI around these, realistically you're gonna need to know wtf you're building (it's a twitter/bluesky clone); there simply isn't enough information in the lexicons to do a good job. And the argument can't be "hey you published a lexicon and now people can assume your data validates", because validation isn't done on write, it's done on read. So like, there really is no difference between this and like, looking up the docs on the data format and building a client. There are no additional guarantees.

Maybe there's an argument for moving towards some kind of standardization, but... do we really need that? Like are we plagued by dozens of slightly incompatible scrobbling data models? Even if we are, isn't this the job of like, an NPM library and not a globally replicated database?

Anyway, I appreciate that, facially, at proto is trying to address lock in. That's not easy, and I like their solution. But I don't think that's anywhere near the biggest problem Twitter had. Just scanning the Bluesky subreddit, there's still problems like too much US politics and too many dick pics. It's good to know that some things just never change I guess.

[0]: https://lexicon.garden/lexicon/did:plc:s7cesz7cr6ybltaryy4me...

[1]: https://lexicon.garden/lexicon/did:plc:s7cesz7cr6ybltaryy4me...

clnhlzmn - 18 hours ago

Seems similar to remoteStorage [0]. What happened to that anyway?

[0]: https://remotestorage.io/

hollowonepl - 14 hours ago

Interesting concept for all new social platforms that already live in federated, distributed environments that share communication protocols and communication data formats.

I bet more difficult to push existing commercial platforms to anyhow consider.

That would make marketing tools to manage social communications and posting across popular social media, much easier. Never the less Social Marketing tools have already invented we similar analogy just to make control over own content and feedback across instances and networks.

We still live in a world where some would say BSKY some would say Mastodon is the future… while everybody still has facebook and instagram and youngsters tik tok too. Those are closed platforms where only tools to hack them, not standards persist

itmitica - 14 hours ago

To share is to lose control. You can't undo, even once shared, it can't be undone. You can't retract a published novel. You can't retract a broadcast music or show. What makes you think you can do it over internet?

metabagel - 19 hours ago

How does this relate to the SOLID project?

https://solidproject.org/

noelwelsh - 19 hours ago

This, Local-first Software [1], the Humane Web Manifesto [2], etc. make me optimistic that we're moving away from the era of "you are the product" dystopian enshittification to a more user-centric world. Here's hoping.

[1]: https://www.inkandswitch.com/essay/local-first/

[2]: https://humanewebmanifesto.com/

nonethewiser - 18 hours ago

But how do you get people to actually want this? This stuff is pretty niche even within tech.

geokon - 19 hours ago

This was a nice intro to AT (though I feel it could have been a bit shorter)

The whole things seems a bit over engineered with poor separation of concerns.

It feels like it'd be smarter to flatten the design and embed everything in the Records. And then other layers can be built on top of that

Making every record includes the author's public-key (or signature?). Anything you need to point at you'd either just give its hash, or hash + author-public-key. This way you completely eliminate this goofy filesystem hierarchy. Everything else is embed it in the Record.

Lexicons/Collections are just a field in the Record. Reverse looking up the hash to find what it is, also a separate problem.

ahussain - 16 hours ago

It seems like the biggest downside of this world is iteration speed.

If the AT instagram wants to add a new feature (i.e posts now support video!) then can they easily update their "file format"? How do they update it in a way that is compatible with every other company who depends on the same format, without the underlying record becoming a mess?

yladiz - 17 hours ago

I know this is somewhat covered in another comment, but, the concepts described in the post could have been reduced quite a bit, no offense Dan. While I like the writing generally, I would consider writing and then letting it sit for a few days, rereading, and then cutting chaff (editing). This feels like a great first draft but without feedback, and could have greatly benefited from an editing process, and I think using the argument that you want to put out something for others to take and refine isn’t really a strong one… a bit more time and refinement could have made a big difference here (and given you have a decently sized audience I would keep in mind).

jrm4 - 18 hours ago

The more I read and consider Bluesky and this protocol, the more pointless -- and perhaps DANGEROUS -- I find the idea.

It really feels like no one is addressing the elephant in the room of; okay, someone who makes something like this is interested in "decentralized" or otherwise bottom-up ish levels of control.

Good goal. But then, when you build something like this, you're actually helping build a perfect decentralized surveillance record.

This why I say that most of Mastodon's limitations and bugs in this regard (by leaving everything to the "servers") are actually features. The ability to forget and delete et al is actually important, and this makes that HARDER.

I'm just kind of like, JUST DO MASTODONS MODEL, like email. It's better and the kinks are more well thought about and/or solved.

viraptor - 5 hours ago

I like the write-up of this idea. It's well presented. But I'd change one aspect: "We could leave author: 'dril' in the JSON but this is unnecessary too." - kind of. What the post lacks is the record of the identity at the time. What the user's username and the avatar was at the time can change the meaning of the post entirely. To really preserve the message, you need to reference what the displayed identity was used to post it - not just the account id.

There's a number of famous accounts that do it continuously. For example popehat today is "Fucking Bitch Hat" but will change to something else soon that may be related to the current events.

elbci - a day ago

agree! Social-media contributions as files on your system: owned by you, served to the app. Like .svg specifications allows editing in inkscape or illustrator a post on my computer would be portable on mastodon or bluesky or a fully distributed p2p network.

diceduckmonk - 12 hours ago

Git is the API.

Github/Gitlab would be a provider of the filesystem.

The problem is app developers like Google want to own your files.

air217 - 13 hours ago

nostr protocol and the client/relay model is one simple way to separate apps (clients) from the data (relays)

nonethewiser - 9 hours ago

Ironically, DID is the perfect vehicle for age verification.

jadbox - 16 hours ago

How do people view AT Protocol vs Nostr? Why choose one over the other? Which has a better chance at replacing X?

black_puppydog - 3 hours ago

The premise of this article resonates so much with me! I didn't see the angle on ATproto coming, and frankly this description of it is the first that makes me want to dig into it a bit.

The issue of "file-less" computing has been bothering me a lot. It's worst on iOS, where apple are really pushing hard to have users never ever think in "files". Closely followed by MacOS and only then android, imho. "That thing over there? That's not a file! That's a photo! Very different thing, that. No, you can't use $app to view it or share it, sorry. And if you want to copy it, you have to go through our export functionality which is buggy and strips all kind of info and generally works best through iCloud.

My mind's insistance that behind all the flashy apps must he a backing store, and that it's most likely file based, makes it even more infuriating when trying to get unprocessed photos off a relative's iPhone or such.

hahahahhaah - 7 hours ago

I have always thought open file format > open source. My ideal web everyone has their own web file storage (get from anywhere e.g. email provider) and web apps use that to store things. Team collab etc. built on top of that e.g. sharing a file means share ann accept edits type flow. Everyone owns their file.

James_K - 17 hours ago

AT Proto seems very overengineered. We already have websites with RSS feeds, which more or less covers the publishing end in a way far more distributed and reliable than what AT offers. Then all you need is a kind of indexer to provide people with notifications and discovery and you're done. But I suppose you can't sell that to shareholders because real decentralised technology probably isn't going to turn as much of a profit as a Twitter knockoff with a vague decentralised vibe to it that most users don't understand or care about.

eduction - 17 hours ago

Unpopular opinion: this should be done with xml, not json. XML can have types, be self describing, and be extended (the X in XML).

That said it’s a very elegant way to describe AT protocol.

sneak - 18 hours ago

Losing private keys is much more common than losing domains.

LoganDark - 15 hours ago

I did a double take at "DID as identity" because Dissociative Identity Disorder shares the same acronym

EGreg - 17 hours ago

As someone who explicitly designed social protocols since 2011, who met Tim Berners-Lee and his team when they were building SOLID (before he left MIT and got funded to turn it into a for-profit Inrupt) I can tell you that files are NOT the best approach. (And neither is SPARQL by the way, Tim :) SOLID was publishing ACLs for example as web resources. Presumably you’d manage all this with CalDAV-type semantics.

But one good thing did come out of that effort. Dmitri Zagidulin, the chief architect on the team, worked hard at the W3C to get departments together to create the DID standard (decentralized IDs) which were then used in everything from Sidetree Protocol (thanks Dan Buchner for spearheading that) to Jack Dorsey’s “Web5”.

Having said all this… what protocol is better for social? Feeds. Who owns the feeds? Well that depends on what politics you want. Think dat / hypercore / holepunch (same thing). SLEEP protocol is used in that ecosystem to sync feeds. Or remember scuttlebutt? Stuff like that.

Multi-writer feeds were hard to do and abandoned in hypercore but you can layer them on top of single-writer. That’s where you get info join ownership and consensus.

ps: Dan, if you read this, visit my profile and reach out. I would love to have a discussion, either privately or publicly, about these protocols. I am a huge believer in decentralized social networking and build systems that reach millions of community leaders in over 100 countries. Most people don’t know who I am and I’m happy w that. Occasionally I have people on my channel to discuss distributed social networking and its implications. Here are a few:

Ian Clarke, founder of Freenet, probably the first decentralized (not just federated) social network: https://www.youtube.com/watch?v=JWrRqUkJpMQ

Noam Chomsky, about Free Speech and Capitalism (met him same day I met TimBL at MIT) https://www.youtube.com/watch?v=gv5mI6ClPGc

Patri Friedman, grandson of Milton Friedman on freedom of speech and online networks https://www.youtube.com/watch?v=Lgil1M9tAXU

catapart - 20 hours ago

yeah yeah yeah, everyone get on the AT protocol, so that the bluesky org can quickly get all of these filthy users off of their own servers (which costs money) while still maintaining the original, largest, and currently only portal to actually publish the content (which makes money[0]). let them profit from a technical "innovation" that is 6 levels of indirection to mimic activity pub.

if they were decent people, that would be one thing. but if they're going to be poisoned with the same faux-libertarian horseshit that strangled twitter, I don't see any value in supporting their protocol. there's always another protocol.

but assuming I was willing to play ball and support this protocol, they STILL haven't solved the actual problem that no one else is solving either: your data exists somewhere else. until there's a server that I can bring home and plug in with setup I can do using my TV's remote, you're not going to be able to move most people to "private" data storage. you're just going to change which massive organization is exploiting them.

I know, I know: hardware is a bitch and the type of device I'm even pitching seems like a costly boondoggle. but that's the business, and if you're not addressing it, you're not fomenting real change; you're patting yourself on the back for pretending we can algorithm ourselves out of late-stage capitalism.

[0] *potentially/eventually

bschmidt999 - 15 hours ago

[dead]

ninkendo - 19 hours ago

> When great thinkers think about problems, they start to see patterns. They look at the problem of people sending each other word-processor files, and then they look at the problem of people sending each other spreadsheets, and they realize that there’s a general pattern: sending files. That’s one level of abstraction already. Then they go up one more level: people send files, but web browsers also “send” requests for web pages. And when you think about it, calling a method on an object is like sending a message to an object! It’s the same thing again! Those are all sending operations, so our clever thinker invents a new, higher, broader abstraction called messaging, but now it’s getting really vague and nobody really knows what they’re talking about any more.

https://www.joelonsoftware.com/2001/04/21/dont-let-architect...

doctorflan - 17 hours ago

I was hoping this was literally just going to be some safe version of a BBS/Usenet sort of filesharing that was peer-based king of like torrents, but just simple and straightforward, with no porn, infected warez, randomware, crypto-mining, racist/terrorist/nazi/maga/communist/etc. crap, where I could just find old computing magazines, homebrew games, recipes, and things like that.

Why can’t we have nice things?

I guess that’s what Internet Archive is for.