Tell HN: YC companies scrape GitHub activity, send spam emails to users
569 points by miki123211 13 hours ago
569 points by miki123211 13 hours ago
Hi HN,
I recently noticed that an YC company (Run ANywhere, W26) sent me the following email:
From: Aditya <aditya@buildrunanywhere.org>
Subject: Mikołaj, think you'd like this
[snip]
Hi Mikołaj,
I found your GitHub and thought you might like what we're building.
[snip]
I have also received a deluge of similar emails from another AI company, Voice.AI (doesn't seem to be YC affiliated). These emails indicate that those companies scrape people's Github activity, and if they notice users contributing to repos in their field of business, send marketing emails to those users without receiving their consent. My guess is that they use commit metadata for this purpose. This includes recipients under the GDPR (AKA me).
I've sent complaints to both organizations, no response so far.
I have just contacted both Github and YC Ethics on this issue, I'll update here if I get a response.
Martin from GitHub here. This type of behaviour is explicitly against the GitHub terms of service, when we catch the accounts doing this we can (and do) take action against those accounts including banning the accounts. It's a game of whack-a-mole for sure, and it's not just start-ups that take part in this sketchy behaviour to be honest. I've been plenty of examples in my time across the board. The fundamental nature of Git makes this pretty easy for folks to scrape data from open source repositories. It's against our terms of service and those folks might want to talk with some lawyers about doing it - but as every Git commit contains your name and email address in the commit data it's not technically difficult even if it is unethical. From the early days we've added features to help users anonymise their email addresses for commits posted to GitHub. Basically, you configure your local Git client to use your 'no-reply' email address in commits and that still links back to your GitHub account when you push: https://docs.github.com/en/account-and-profile/reference/ema... I think that's still probably the best route. We want to keep open source data as open as possible, so I don't think locking down API's etc is the right route. We do throttle API requests and scraping traffic, but then again there have been plenty of posts here over the years from people annoyed at hitting those limits so it's definitely a balancing act. Love to know what folks here think though. > when we catch the accounts doing this we can (and do) take action against those accounts including banning the accounts. This isn't my experience. I requested that you looked into a spammer in July 2025, you ignored my reply and the account is still active. ---- Thank you so much for the report. We're sorry to hear you're receiving unwanted emails, but it's always a possibility when your public contact information is listed on the web. You can keep your email address private if you wish by following the steps here: Setting your commit email address We do expect our users to comply with our Terms of Service, which prohibits transmitting using information from the GitHub (whether scraped, collected through our API, or obtained otherwise) for spamming purposes. I'm happy to look into it further to see if we can contact the reported user and let them know that this type of activity is not allowed. Please let us know if you have any other questions or concerns. ---- My reply which was ignored: ---- I understand it will happen from time to time. I'd rather be contactable (I've received legitimate emails today because my email is on my profile). Please take further action. My email is public with the expectation that the ToS will be enforced. If GitHub isn't discouraging spammers then it makes it much harder to justify being contactable. All the best,
David I reported spammers ~5 times to GH, and every time the account went down in a couple of hours. Obviously mileage may vary, but I don't want the whole HN to think this process is completely broken. Please keep reporting spammers, usually it works. It's impossible for them to stop if you list your email on there. They could make it harder of course. But if you put your email out there for a human to find, then a script or bot or also find it. And yes of course they can also stop a specific spammer. But that spammer may pick up another account and email. The grandparent post wasn't asking for them to do the impossible and stop all spamming, only to take action against the particular user that spammed them. I’ve made over five reports for this exact spam scenario, and never once have y’all acted on them. I have a hard time believing you ban spam accounts that clearly violate your ToS. I even wrote about a specific example of a YC company spamming me from my GitHub email at https://benword.com/dont-tolerate-unsolicited-spam How would you know whether the account that did the scraping was banned? By visiting the account and noticing that it still has activity long after the report. I'm confused. How do you know what account scraped your email address from github in order to send you an email? Or do you mean going after the accounts of companies that make use of a likely scraped email address? That's not a bad idea either, but it has risks and isn't the same thing. Half the time they literally say it in the email. I just looked in my spam folder and just a few hours ago got an email titled "Your profile: Github", that started with: > I came across your profile on GitHub. Given you're based in the US, I thought it might be relevant to reach out.
>
> Profile: https://github.com/tedivm They aren't doing anything to hide it. How do you propose GH take action without risking taking down legitimate projects due to brigades of false reports? GH literally say in a parent comment: > we can (and do) take action against those accounts including banning the accounts That they use some of their trillion dollar marketshare to solve it, why are you acting like this is a hard problem? It's not. They're just too cheap and greedy to do anything about it. I don't have any specific suggestions, but I do want to give thanks for implementing functionality to block pushes if the email field is *not* using an anonymized mail address. It's one thing to offer anonymous e-mail addresses, but it's also awesome that GitHub can help prevent mistakes that would otherwise leak a user's e-mail address. I am not sure how many people try to be privacy conscious on GitHub, but I assume most users don't, so it's nice seeing this little feature exist. It gets more complicated when commit signing, the widely broken web of trust (for the signing key) and similar are involved. And not all devs want or need anonymity on github. In general just because information is publicly accessible in some form doesn't make it okay or legal to abuse it (accessible doesn't mean any form of usage rights are transferred to you weather it's in context of GDPR or in context of copy right). I am also getting constant spam because apparently they can see who starred a repo (i.e. I see you starred repo x and we are doing something similar). I am not starring anything anymore. Scrape once, spam forever. I think it's pretty clear you need to use an anonymization scheme in the way commits are handled so that it links back to your github account and the email addresses are kept private. Privacy centric companies like Apple do this for users offering hashed emails, on a per login basis. I'm sure this would not work in a world of scraping, but having that kind of ability to figure out bad actors would be nice. You could require authenticated users for certain kinds of requests, and block user information from non-authenticated requests. They already do[0] [0] https://docs.github.com/en/account-and-profile/reference/ema... How does the spammer get through this then? they don't. it's an optional process, and many users don't change their git config to use the provided email I know it is against the ToS. I've reported multiple organisations doing this. Last time I reported one, support closed the ticket saying the activity is off platform so they can't do anything. I didn't realize this was against the Github TOS - I just thought it was par for the course for recruiters nowadays. This is good to know! How do I report that person, though? Your support page about reporting abuse assumes I know the person's Github account: https://docs.github.com/en/communities/maintaining-your-safe... What section of the ToS prohibits this? In other words, what is the thing that is being done that is against the ToS? Looking up the creator of a repo, or the contributors of the repo? I did a quick scan of the ToS and all I could find was D8 that states that autmated access (scraping) used for "AI" applies a reciprocal license that prevents the scraper from restricting GitHub's access to the data (the whole model? the weights?) resulting from the scraping. This makes it sound like any model trained on GitHhub content cannot be commercialized, because charging for access to the output would be a "technical or other limit"... So you're obviously not really enforcing this, otherwise MS would be suing every big commercial model out there! It seems like a safe assumption that the big commercial models will have negotiated their own private GitHub terms of service, especially considering their many-digit annual contracts with Azure. How about improving the processing of abuse reports for repos hosting windows malware that is actively being advertised to potential victims? https://github.com/preconfigured/dl/blob/main/ms-update32.ex... I have reported several spam emails to Github and from what I can tell none has been acted upon. Maybe I am missing something, but can’t you simply not show the email address in a git commit? (Sincere question, not saying this is trivial. i am dumb and like to ask dumb questions even if might be embarassing) If someone wants to message someone, it goes through github notifications or github emails them Also banning an account doesnt seem like a heavy punishment, given they can simply move to gitlab, bitbucket etc That would be a fundamental change to how Git works, not just GitHub. Even if the web UI didn't show it, a simple `git log` would reveal it. You can mask your email address in git commits but a lot of open source projects won't accept that. And some pseudo-open-source ones insist on sending you an email to authenticate before they'll give you access to the GitHub repo (looking at you Unreal Engine!) So, no, I don't think they could simply "not show the email address". fyi, you can also see the author email by appending ".patch" to the end of a commit URL Git commits have a email address as a required field[0], although some people put something bogus in there. And then it's in the data provided when you clone the repo onto your machine even if you aren't using the GitHub APIs. To his point, you can set that to the no-reply email address GitHub gives you if you don't want mail but do want the commit to be linked to your GitHub account. [0]: https://git-scm.com/docs/git-commit#_commit_information Git commits are identified by a hash of their entire contents[1]. The way hashes work, if you change even one bit, the hash becomes completely different. Every commit contains the email address of the committer and the hash of the parent commit. If the email address in even one commit is changed or removed, that changes its hash, which in turn requires you to update its children, changing their hashes etc. So, updating a commit from n years ago requires you to update all commits that have been made since. By default, git will refuse to pull from such an updated repository, as commits are considered immutable once pushed. [1] In practice, it's a bit more complicated. Merkle trees are involved, so it's hashes of hashes of hashes instead of hashing a multi-gigabyte blob on each commit, but that's a performance optimization that doesn't affect semantics much. You should be using the email address "username@no.reply.github.com" or similar There's never been an obligation to use a real email address for git Amazon did this to me. Their recruiters started hounding me at an email address that I only ever used to sign git commits on some repos used on GitHub. When I asked them how they got my email address they said "it was in [our] database" Are no-reply emails associated with the accounts if the username is changed? That's one reason why I switched back to my personal email. I've had more than a few instances of this over the past 2 years, and my reply is exactly the above. "What you are doing is against Github's TOS" Nice, thank you Martin. How do you punish the fraudsters? Do you send them to prison over CFAA violation terms of service? I kinda wish I had that much power. There would certainly be less people in the world listening to their phones without headphones.. Usually starts with contacting them over email reminding them of the terms of service and warning them to stop. Then their account might get deactivated and they need to write and promise to not be naughty again. If they ignore that then the account gets removed. There are a bunch of automated checks that are running all the time as well and will take automated action that then gets later reviewed by humans. At lot of times the process is fast-tracked. The off-platform 'let's scrape a bunch of data and then spam nice people' is the hardest to police. Linking those mails to an offending GitHub account is hard and very manual, also anyone can send emails saying they are someone they are not and because of that anyone can deny they sent the mail and they'll usually blame a rogue agency they where working with etc. I probably shouldn't say it, but the public shame that comes from being mentioned on social, in hacker news etc. That stops people who want to be treated as legitimate from doing that sort of thing and helps educate the wider community around what is and isn't acceptable behaviour - that is why it's good to see this thread and see the issue getting attention. Love the transparency - someone should make you VP of ..uhm dev rel or something! I was being quite hyperbolic in my original comment, however, I _do_ think you are doing the right thing, and you are definitely not the bad guy. Having said that, there are big corps who have been known to use the CFAA as a way to coerce the long arm of the law upon teenagers and geeks hacking away - not always a great thing either IMO. > CFAA violation terms of service This would be a gross miscarriage of justice and bringing successful action under this theory would do widespread harm by expanding the definition of the CFAA. Just because a company can take some nuclear action, doesn't mean they should. > it's not technically difficult even if it is unethical. kettle, pot, black? I received the following offical spam last week from GitHub: > Build AI agents with the new GitHub Copilot SDK despite never granting consent for marketing material (and yes, there's a GDPR complaint now working its way through the national regulator) Ban them. Honestly I get the same and it is beyond frustrating. I will pay more for GitHub if you go hard on these mfs. Hey, Martin - https://github.com/lucidrains Mind fixing lucidrains account? Something happened without notice or recourse. He's one of, if not the most well known open source AI researchers on the planet, with implementations and explanations of papers and ideas that are wonderful. If you could bring some sanity to that situation and take it out of whatever kafkaesque account purgatory it fell into, you'd be doing the work of angels. Thanks! What was happening with this account? I was often seeing popular but empty (only title of the paper and maybe a short readme) repositories that were created directly after a paper was published? Just part of the process - he'd queue up the projects as interesting things came in, then plow through. Usually he'd have a rough framework within a day or two, and then a working proof of concept within a week, and then return to the most promising, useful, or interesting projects. I really appreciated his coding-style, but the bar is quite low on research/ML-algorithms to be fair. I still wonder how he managed to get „trending“ repositories regularly despite the repositories being empty. Is this mirrored on gitlab or somewhere else? Nobody should trust Github to store all their data YC is a proud investor in Flock, what YC Ethics thing are you talking about? And, Gecko Security. Flock is an awful company, but what's the trouble with Gecko security? Are you talking about https://www.gecko.security/ or something else? [flagged] How would that even be legal? (Although I can't find such a startup with any kind of search engine) Why would it be illegal? i am not sure of anywhere it is illegal. but areas i am familiar with can consider a negative reference to be defamation, thus anyone providing a negative reference should only do so if they are able to defend it (i.e. prove their statement is substantially true, or prove that the statement was honestly believed to be true and published with no malice or reckless disregard). seems risky, at least, to build a whole business around negative references that could potentially cross the line into defamation. but that type of thinking is probably why i am not rich. There are many definitions of illegal (criminal, civil, regulatory, the much much looser “license to operate” as used in chemical industry, etc). A blacklist seems dubious. I’d advise the founders to get counsel on their obligations under the FCRA, which they may be construed to be regulated by. That said, I believe "Bad News" is an AI hallucination. The most similar company I can find historical news is "Peeple"[0], which was not funded by YC. YCombinator's only known association with a blacklist that I can find was a blacklist of VC's that were accused of harassing female founders[1]. >There are many definitions of illegal (criminal, civil, regulatory, the much much looser “license to operate” as used in chemical industry, etc). yes, but i am not sure why this matters here. i am not aware of negative references, in general, being illegal under any of those definitions of illegal. no one would say regular speech is illegal just because it can be subject to a defamation lawsuit. same logic. but i agree, if it is a real business, it seems exceptionally risky. https://www.law.cornell.edu/uscode/text/15/1681d It's more than just "subject to a defamation lawsuit" (including class action lawsuits). Although for me, even if it were "just that", I'd still call it "potentially illegal". Rather, they'd potentially face FTC penalties and CFPB enforcement actions under 15 U.S.C. section 1681d(a), (b). This law would likely classify such a company as falling under laws pertaining to "investigative consumer reports" under FCRA. This is any report on someone's "character, general reputation, personal characteristics, and mode of living" used for the purposes of employment, loans, housing, etc. > A consumer reporting agency shall not prepare or furnish an investigative consumer report on a consumer that contains information that is adverse to the interest of the consumer and that is obtained through a personal interview with a neighbor, friend, or associate of the consumer or with another person with whom the consumer is acquainted or who has knowledge of such item of information, unless— > (A) the agency has followed reasonable procedures to obtain confirmation of the information, from an additional source that has independent and direct knowledge of the information; or > (B) the person interviewed is the best possible source of the information. They'd find themselves subject to legal penalties under: FCRA Willful Noncompliance (15 U.S. Code § 1681n) (if they did not disclose their existence/use/content of reports to employment candidates) FCRA Negligent Noncompliance (15 U.S. Code § 1681o) (if they made somewhat reasonable but insufficient efforts to comply with the FCRA) or Administrative Enforcement (15 U.S. Code § 1681s) and be subject to fines up to $4,700 per violation plus actual damages, plus punitive damages, plus legal fees. State Attorneys General can also bring FCRA lawsuits on behalf of their constituents, not just the federal government. FTC / CFPB can name the founders individually in the lawsuits, not just their corporate entity, and ban[1][2] them from operating any similar businesses in the future. That all said, to some extent, YCombinator partners are on the record[3] supporting the idea of their startups sometimes doing illegal things. Generally they'll frame this as challenging outdated regulations, but they acknowledge that the founders whose strategies they fully support sometimes come into office hours and discuss how they're worried that the strategy puts them at risk of going to jail. 0: https://www.law.cornell.edu/uscode/text/15/1681d 1: FTC v MyLife.com, Inc., and Jeffrey Tinsley (CEO): https://www.ftc.gov/news-events/news/press-releases/2021/12/... 2: https://www.ftc.gov/legal-library/browse/cases-proceedings/b... ah, okay. so the hypothetical company may potentially be doing something illegal (the "investigative consumer report" part). good to know! that makes sense, and i was unaware of that. i stand corrected in the hypothetical "bad reference aggregator company" scenario. >YCombinator partners are on the record[3] supporting the idea of their startups sometimes doing illegal things. interesting, thanks for surfacing that up! i wont pretend to be surprised, though. To be defamation in the US they'd generally need to be false statements of fact. "John is a bad person, and you shouldn't hire him" wouldn't be defamation. It's definitely illegal in the UK. i dont believe that it is illegal to provide a negative reference in the UK, as long as it is honest, factual, and provided in good faith. from gov.uk: >"If you think you’ve been given an unfair or misleading reference, you may be able to claim damages in court. Your previous employer must be able to back up the reference, such as by supplying examples of warning letters. You must be able to show that: - it’s misleading or inaccurate -you ‘suffered a loss’ – for example, the withdrawal of a job offer" which means, if the reference is not misleading and not inaccurate, a negative reference is ok. other uk-based law firms (from a quick google) agree with this interpretation. Providing a negative reference is totally different than gathering negative references and selling them. The former could be legal while the latter could be illegal. for sure! in my comment, i was speaking more generally than i should have, and that (obviously, in hindsight) caused some confusion between the specific case of the hypothetical company, and the general case of an employer providing a negative reference. my bad -- and it is too late to edit to provide clarification.
martinwoodward - 10 hours ago
david_allison - 3 hours ago
gettingoverit - 5 minutes ago
tom_m - 2 hours ago
angoragoats - 2 hours ago
retlehs - 7 hours ago
eli - 7 hours ago
retlehs - 7 hours ago
eli - an hour ago
tedivm - an hour ago
hedora - 4 hours ago
adrianmsmith - 3 hours ago
shimman - 4 hours ago
koito17 - 9 hours ago
dathinab - an hour ago
ayhanfuat - 10 hours ago
blobbers - 4 hours ago
david_allison - 3 hours ago
this includes a unique ID which survives account renames, and the name of the GitHub account at the time. 62114487+david-allison@users.noreply.github.com
blobbers - 3 hours ago
bstsb - an hour ago
skwashd - 8 hours ago
danesparza - 8 hours ago
just6979 - 3 hours ago
wrs - 3 hours ago
nickphx - 12 minutes ago
Foxboron - 2 hours ago
AznHisoka - 10 hours ago
EdNutting - 10 hours ago
sheept - 6 hours ago
easton - 10 hours ago
miki123211 - 4 hours ago
dent9 - 4 hours ago
dent9 - 4 hours ago
TheSaifurRahman - 8 hours ago
ericol - 9 hours ago
trympet - 8 hours ago
martinwoodward - 7 hours ago
trympet - 5 hours ago
nerdsniper - 7 hours ago
blibble - 4 hours ago
moomoo11 - 6 hours ago
observationist - 6 hours ago
davnn - 6 hours ago
observationist - 5 hours ago
davnn - 2 hours ago
nextaccountic - 2 hours ago
scottydelta - 10 hours ago
wslh - 5 hours ago
nextaccountic - 2 hours ago
ls-a - 10 hours ago
shrubble - 9 hours ago
akerl_ - 8 hours ago
john_strinlai - 8 hours ago
nerdsniper - 8 hours ago
john_strinlai - 7 hours ago
nerdsniper - 7 hours ago
john_strinlai - 7 hours ago
akerl_ - 2 hours ago
drcongo - 8 hours ago
john_strinlai - 8 hours ago
laserlight - 7 hours ago
john_strinlai - 7 hours ago