Tell HN: YC companies scrape GitHub activity, send spam emails to users

569 points by miki123211 13 hours ago


Hi HN,

I recently noticed that an YC company (Run ANywhere, W26) sent me the following email:

From: Aditya <aditya@buildrunanywhere.org>

Subject: Mikołaj, think you'd like this

[snip]

Hi Mikołaj,

I found your GitHub and thought you might like what we're building.

[snip]

I have also received a deluge of similar emails from another AI company, Voice.AI (doesn't seem to be YC affiliated). These emails indicate that those companies scrape people's Github activity, and if they notice users contributing to repos in their field of business, send marketing emails to those users without receiving their consent. My guess is that they use commit metadata for this purpose. This includes recipients under the GDPR (AKA me).

I've sent complaints to both organizations, no response so far.

I have just contacted both Github and YC Ethics on this issue, I'll update here if I get a response.

martinwoodward - 10 hours ago

Martin from GitHub here. This type of behaviour is explicitly against the GitHub terms of service, when we catch the accounts doing this we can (and do) take action against those accounts including banning the accounts. It's a game of whack-a-mole for sure, and it's not just start-ups that take part in this sketchy behaviour to be honest. I've been plenty of examples in my time across the board.

The fundamental nature of Git makes this pretty easy for folks to scrape data from open source repositories. It's against our terms of service and those folks might want to talk with some lawyers about doing it - but as every Git commit contains your name and email address in the commit data it's not technically difficult even if it is unethical.

From the early days we've added features to help users anonymise their email addresses for commits posted to GitHub. Basically, you configure your local Git client to use your 'no-reply' email address in commits and that still links back to your GitHub account when you push: https://docs.github.com/en/account-and-profile/reference/ema...

I think that's still probably the best route. We want to keep open source data as open as possible, so I don't think locking down API's etc is the right route. We do throttle API requests and scraping traffic, but then again there have been plenty of posts here over the years from people annoyed at hitting those limits so it's definitely a balancing act. Love to know what folks here think though.

scottydelta - 10 hours ago

YC is a proud investor in Flock, what YC Ethics thing are you talking about?