Tell HN: YC companies scrape GitHub activity, send spam emails to users

569 points by miki123211 13 hours ago

Hi HN,

I recently noticed that an YC company (Run ANywhere, W26) sent me the following email:

From: Aditya <aditya@buildrunanywhere.org>

Subject: Mikołaj, think you'd like this

[snip]

Hi Mikołaj,

I found your GitHub and thought you might like what we're building.

[snip]

I have also received a deluge of similar emails from another AI company, Voice.AI (doesn't seem to be YC affiliated). These emails indicate that those companies scrape people's Github activity, and if they notice users contributing to repos in their field of business, send marketing emails to those users without receiving their consent. My guess is that they use commit metadata for this purpose. This includes recipients under the GDPR (AKA me).

I've sent complaints to both organizations, no response so far.

I have just contacted both Github and YC Ethics on this issue, I'll update here if I get a response.

martinwoodward - 10 hours ago

Martin from GitHub here. This type of behaviour is explicitly against the GitHub terms of service, when we catch the accounts doing this we can (and do) take action against those accounts including banning the accounts. It's a game of whack-a-mole for sure, and it's not just start-ups that take part in this sketchy behaviour to be honest. I've been plenty of examples in my time across the board.

The fundamental nature of Git makes this pretty easy for folks to scrape data from open source repositories. It's against our terms of service and those folks might want to talk with some lawyers about doing it - but as every Git commit contains your name and email address in the commit data it's not technically difficult even if it is unethical.

From the early days we've added features to help users anonymise their email addresses for commits posted to GitHub. Basically, you configure your local Git client to use your 'no-reply' email address in commits and that still links back to your GitHub account when you push: https://docs.github.com/en/account-and-profile/reference/ema...

I think that's still probably the best route. We want to keep open source data as open as possible, so I don't think locking down API's etc is the right route. We do throttle API requests and scraping traffic, but then again there have been plenty of posts here over the years from people annoyed at hitting those limits so it's definitely a balancing act. Love to know what folks here think though.

david_allison - 3 hours ago

> when we catch the accounts doing this we can (and do) take action against those accounts including banning the accounts.
This isn't my experience. I requested that you looked into a spammer in July 2025, you ignored my reply and the account is still active.
----
Thank you so much for the report. We're sorry to hear you're receiving unwanted emails, but it's always a possibility when your public contact information is listed on the web. You can keep your email address private if you wish by following the steps here:
Setting your commit email address
We do expect our users to comply with our Terms of Service, which prohibits transmitting using information from the GitHub (whether scraped, collected through our API, or obtained otherwise) for spamming purposes. I'm happy to look into it further to see if we can contact the reported user and let them know that this type of activity is not allowed.
Please let us know if you have any other questions or concerns.
----
My reply which was ignored:
----
I understand it will happen from time to time. I'd rather be contactable (I've received legitimate emails today because my email is on my profile).
Please take further action. My email is public with the expectation that the ToS will be enforced. If GitHub isn't discouraging spammers then it makes it much harder to justify being contactable.
All the best, David
- gettingoverit - 5 minutes ago
  
  I reported spammers ~5 times to GH, and every time the account went down in a couple of hours. Obviously mileage may vary, but I don't want the whole HN to think this process is completely broken.
  Please keep reporting spammers, usually it works.
- tom_m - 2 hours ago
  
  It's impossible for them to stop if you list your email on there. They could make it harder of course. But if you put your email out there for a human to find, then a script or bot or also find it.
  And yes of course they can also stop a specific spammer. But that spammer may pick up another account and email.
  - angoragoats - 2 hours ago
    
    The grandparent post wasn't asking for them to do the impossible and stop all spamming, only to take action against the particular user that spammed them.
retlehs - 7 hours ago

I’ve made over five reports for this exact spam scenario, and never once have y’all acted on them. I have a hard time believing you ban spam accounts that clearly violate your ToS.
I even wrote about a specific example of a YC company spamming me from my GitHub email at https://benword.com/dont-tolerate-unsolicited-spam
- eli - 7 hours ago
  
  How would you know whether the account that did the scraping was banned?
  - retlehs - 7 hours ago
    
    By visiting the account and noticing that it still has activity long after the report.
    
    eli - an hour ago
    
    I'm confused. How do you know what account scraped your email address from github in order to send you an email?
    Or do you mean going after the accounts of companies that make use of a likely scraped email address? That's not a bad idea either, but it has risks and isn't the same thing.
    
    tedivm - an hour ago
    
    Half the time they literally say it in the email. I just looked in my spam folder and just a few hours ago got an email titled "Your profile: Github", that started with:
    > I came across your profile on GitHub. Given you're based in the US, I thought it might be relevant to reach out. > > Profile: https://github.com/tedivm
    They aren't doing anything to hide it.
    
    hedora - 4 hours ago
    
    How do you propose GH take action without risking taking down legitimate projects due to brigades of false reports?
    
    adrianmsmith - 3 hours ago
    
    GH literally say in a parent comment:
    > we can (and do) take action against those accounts including banning the accounts
    
    shimman - 4 hours ago
    
    That they use some of their trillion dollar marketshare to solve it, why are you acting like this is a hard problem? It's not. They're just too cheap and greedy to do anything about it.
    
    cortesoft - 4 hours ago
    
    Trillion dollar marketshare? How big do you think GitHub is?
    
    mardef - 4 hours ago
    
    GitHub is wholly owned by Microsoft, which has a 3 trillion market cap
    
    DonHopkins - 2 hours ago
    
    How small do you think Microsoft is??!
koito17 - 9 hours ago

I don't have any specific suggestions, but I do want to give thanks for implementing functionality to block pushes if the email field is *not* using an anonymized mail address.
It's one thing to offer anonymous e-mail addresses, but it's also awesome that GitHub can help prevent mistakes that would otherwise leak a user's e-mail address. I am not sure how many people try to be privacy conscious on GitHub, but I assume most users don't, so it's nice seeing this little feature exist.
- dathinab - an hour ago
  
  It gets more complicated when commit signing, the widely broken web of trust (for the signing key) and similar are involved.
  And not all devs want or need anonymity on github.
  In general just because information is publicly accessible in some form doesn't make it okay or legal to abuse it (accessible doesn't mean any form of usage rights are transferred to you weather it's in context of GDPR or in context of copy right).
ayhanfuat - 10 hours ago

I am also getting constant spam because apparently they can see who starred a repo (i.e. I see you starred repo x and we are doing something similar). I am not starring anything anymore.
blobbers - 4 hours ago

Scrape once, spam forever.
I think it's pretty clear you need to use an anonymization scheme in the way commits are handled so that it links back to your github account and the email addresses are kept private.
Privacy centric companies like Apple do this for users offering hashed emails, on a per login basis.
I'm sure this would not work in a world of scraping, but having that kind of ability to figure out bad actors would be nice. You could require authenticated users for certain kinds of requests, and block user information from non-authenticated requests.
- david_allison - 3 hours ago
  They already do[0]
  62114487+david-allison@users.noreply.github.com
  this includes a unique ID which survives account renames, and the name of the GitHub account at the time.
  [0] https://docs.github.com/en/account-and-profile/reference/ema...
  - blobbers - 3 hours ago
    
    How does the spammer get through this then?
    
    bstsb - an hour ago
    
    they don't. it's an optional process, and many users don't change their git config to use the provided email
skwashd - 8 hours ago

I know it is against the ToS. I've reported multiple organisations doing this. Last time I reported one, support closed the ticket saying the activity is off platform so they can't do anything.
danesparza - 8 hours ago

I didn't realize this was against the Github TOS - I just thought it was par for the course for recruiters nowadays. This is good to know!
How do I report that person, though? Your support page about reporting abuse assumes I know the person's Github account: https://docs.github.com/en/communities/maintaining-your-safe...
just6979 - 3 hours ago

What section of the ToS prohibits this? In other words, what is the thing that is being done that is against the ToS? Looking up the creator of a repo, or the contributors of the repo?
I did a quick scan of the ToS and all I could find was D8 that states that autmated access (scraping) used for "AI" applies a reciprocal license that prevents the scraper from restricting GitHub's access to the data (the whole model? the weights?) resulting from the scraping.
This makes it sound like any model trained on GitHhub content cannot be commercialized, because charging for access to the output would be a "technical or other limit"... So you're obviously not really enforcing this, otherwise MS would be suing every big commercial model out there!
- wrs - 3 hours ago
  
  It seems like a safe assumption that the big commercial models will have negotiated their own private GitHub terms of service, especially considering their many-digit annual contracts with Azure.
nickphx - 12 minutes ago

How about improving the processing of abuse reports for repos hosting windows malware that is actively being advertised to potential victims? https://github.com/preconfigured/dl/blob/main/ms-update32.ex...
Foxboron - 2 hours ago

I have reported several spam emails to Github and from what I can tell none has been acted upon.
AznHisoka - 10 hours ago

Maybe I am missing something, but can’t you simply not show the email address in a git commit? (Sincere question, not saying this is trivial. i am dumb and like to ask dumb questions even if might be embarassing)
If someone wants to message someone, it goes through github notifications or github emails them
Also banning an account doesnt seem like a heavy punishment, given they can simply move to gitlab, bitbucket etc
- EdNutting - 10 hours ago
  
  That would be a fundamental change to how Git works, not just GitHub. Even if the web UI didn't show it, a simple `git log` would reveal it.
  You can mask your email address in git commits but a lot of open source projects won't accept that. And some pseudo-open-source ones insist on sending you an email to authenticate before they'll give you access to the GitHub repo (looking at you Unreal Engine!)
  So, no, I don't think they could simply "not show the email address".
  - sheept - 6 hours ago
    
    fyi, you can also see the author email by appending ".patch" to the end of a commit URL
  - AznHisoka - 10 hours ago
    
    Makes sens! Appreciate the explanation!
- easton - 10 hours ago
  
  Git commits have a email address as a required field[0], although some people put something bogus in there. And then it's in the data provided when you clone the repo onto your machine even if you aren't using the GitHub APIs.
  To his point, you can set that to the no-reply email address GitHub gives you if you don't want mail but do want the commit to be linked to your GitHub account.
  [0]: https://git-scm.com/docs/git-commit#_commit_information
- miki123211 - 4 hours ago
  
  Git commits are identified by a hash of their entire contents[1]. The way hashes work, if you change even one bit, the hash becomes completely different. Every commit contains the email address of the committer and the hash of the parent commit. If the email address in even one commit is changed or removed, that changes its hash, which in turn requires you to update its children, changing their hashes etc. So, updating a commit from n years ago requires you to update all commits that have been made since. By default, git will refuse to pull from such an updated repository, as commits are considered immutable once pushed.
  [1] In practice, it's a bit more complicated. Merkle trees are involved, so it's hashes of hashes of hashes instead of hashing a multi-gigabyte blob on each commit, but that's a performance optimization that doesn't affect semantics much.
- dent9 - 4 hours ago
  
  You should be using the email address "username@no.reply.github.com" or similar
  There's never been an obligation to use a real email address for git
dent9 - 4 hours ago

Amazon did this to me. Their recruiters started hounding me at an email address that I only ever used to sign git commits on some repos used on GitHub. When I asked them how they got my email address they said "it was in [our] database"
TheSaifurRahman - 8 hours ago

Are no-reply emails associated with the accounts if the username is changed? That's one reason why I switched back to my personal email.
- martinwoodward - 14 minutes ago
  
  Since 2017 they are yes.
ericol - 9 hours ago

I've had more than a few instances of this over the past 2 years, and my reply is exactly the above.
"What you are doing is against Github's TOS"
miki123211 - 4 hours ago

I've raised this as ticket ID 4114793, just in case.
trympet - 8 hours ago

Nice, thank you Martin. How do you punish the fraudsters? Do you send them to prison over CFAA violation terms of service?
- martinwoodward - 7 hours ago
  
  I kinda wish I had that much power. There would certainly be less people in the world listening to their phones without headphones..
  Usually starts with contacting them over email reminding them of the terms of service and warning them to stop. Then their account might get deactivated and they need to write and promise to not be naughty again. If they ignore that then the account gets removed.
  There are a bunch of automated checks that are running all the time as well and will take automated action that then gets later reviewed by humans. At lot of times the process is fast-tracked.
  The off-platform 'let's scrape a bunch of data and then spam nice people' is the hardest to police. Linking those mails to an offending GitHub account is hard and very manual, also anyone can send emails saying they are someone they are not and because of that anyone can deny they sent the mail and they'll usually blame a rogue agency they where working with etc.
  I probably shouldn't say it, but the public shame that comes from being mentioned on social, in hacker news etc. That stops people who want to be treated as legitimate from doing that sort of thing and helps educate the wider community around what is and isn't acceptable behaviour - that is why it's good to see this thread and see the issue getting attention.
  - trympet - 5 hours ago
    
    Love the transparency - someone should make you VP of ..uhm dev rel or something! I was being quite hyperbolic in my original comment, however, I _do_ think you are doing the right thing, and you are definitely not the bad guy.
    Having said that, there are big corps who have been known to use the CFAA as a way to coerce the long arm of the law upon teenagers and geeks hacking away - not always a great thing either IMO.
- nerdsniper - 7 hours ago
  
  > CFAA violation terms of service
  This would be a gross miscarriage of justice and bringing successful action under this theory would do widespread harm by expanding the definition of the CFAA.
  Just because a company can take some nuclear action, doesn't mean they should.
- skeptic_ai - 8 hours ago
  
  Will send a strong email: Don’t do bad things.
blibble - 4 hours ago

> it's not technically difficult even if it is unethical.
kettle, pot, black?
I received the following offical spam last week from GitHub:
> Build AI agents with the new GitHub Copilot SDK
despite never granting consent for marketing material
(and yes, there's a GDPR complaint now working its way through the national regulator)
moomoo11 - 6 hours ago

Ban them. Honestly I get the same and it is beyond frustrating.
I will pay more for GitHub if you go hard on these mfs.
observationist - 6 hours ago

Hey, Martin - https://github.com/lucidrains
Mind fixing lucidrains account? Something happened without notice or recourse. He's one of, if not the most well known open source AI researchers on the planet, with implementations and explanations of papers and ideas that are wonderful. If you could bring some sanity to that situation and take it out of whatever kafkaesque account purgatory it fell into, you'd be doing the work of angels.
Thanks!
- davnn - 6 hours ago
  
  What was happening with this account? I was often seeing popular but empty (only title of the paper and maybe a short readme) repositories that were created directly after a paper was published?
  - observationist - 5 hours ago
    
    Just part of the process - he'd queue up the projects as interesting things came in, then plow through. Usually he'd have a rough framework within a day or two, and then a working proof of concept within a week, and then return to the most promising, useful, or interesting projects.
    
    davnn - 2 hours ago
    
    I really appreciated his coding-style, but the bar is quite low on research/ML-algorithms to be fair. I still wonder how he managed to get „trending“ repositories regularly despite the repositories being empty.
- nextaccountic - 2 hours ago
  
  Is this mirrored on gitlab or somewhere else? Nobody should trust Github to store all their data

scottydelta - 10 hours ago

YC is a proud investor in Flock, what YC Ethics thing are you talking about?

otherayden - 2 hours ago

And that Optifye.ai demo with the sweatshop surveillance software
cassonmars - 10 hours ago

And Cluely
- tasn - 10 hours ago
  
  Cluely is not YC.
  - fantasizr - 5 hours ago
    
    he might be thinking of chadIDE "the first brainrot ide"
- insane_dreamer - 3 hours ago
  
  the same Cluely that's on IG? I thought that was a fictional satire.
wslh - 5 hours ago

And, Gecko Security.
- nextaccountic - 2 hours ago
  
  Flock is an awful company, but what's the trouble with Gecko security? Are you talking about https://www.gecko.security/ or something else?
ls-a - 10 hours ago

[flagged]
- shrubble - 9 hours ago
  
  How would that even be legal? (Although I can't find such a startup with any kind of search engine)
  - akerl_ - 8 hours ago
    
    Why would it be illegal?
    
    john_strinlai - 8 hours ago
    
    i am not sure of anywhere it is illegal.
    but areas i am familiar with can consider a negative reference to be defamation, thus anyone providing a negative reference should only do so if they are able to defend it (i.e. prove their statement is substantially true, or prove that the statement was honestly believed to be true and published with no malice or reckless disregard).
    seems risky, at least, to build a whole business around negative references that could potentially cross the line into defamation. but that type of thinking is probably why i am not rich.
    
    nerdsniper - 8 hours ago
    
    There are many definitions of illegal (criminal, civil, regulatory, the much much looser “license to operate” as used in chemical industry, etc).
    A blacklist seems dubious. I’d advise the founders to get counsel on their obligations under the FCRA, which they may be construed to be regulated by.
    That said, I believe "Bad News" is an AI hallucination. The most similar company I can find historical news is "Peeple"[0], which was not funded by YC. YCombinator's only known association with a blacklist that I can find was a blacklist of VC's that were accused of harassing female founders[1].
    0: https://archive.is/r9UQo
    1: https://archive.is/17Ans
    
    john_strinlai - 7 hours ago
    
    >There are many definitions of illegal (criminal, civil, regulatory, the much much looser “license to operate” as used in chemical industry, etc).
    yes, but i am not sure why this matters here. i am not aware of negative references, in general, being illegal under any of those definitions of illegal.
    no one would say regular speech is illegal just because it can be subject to a defamation lawsuit. same logic.
    but i agree, if it is a real business, it seems exceptionally risky.
    
    nerdsniper - 7 hours ago
    
    https://www.law.cornell.edu/uscode/text/15/1681d
    It's more than just "subject to a defamation lawsuit" (including class action lawsuits). Although for me, even if it were "just that", I'd still call it "potentially illegal". Rather, they'd potentially face FTC penalties and CFPB enforcement actions under 15 U.S.C. section 1681d(a), (b).
    This law would likely classify such a company as falling under laws pertaining to "investigative consumer reports" under FCRA. This is any report on someone's "character, general reputation, personal characteristics, and mode of living" used for the purposes of employment, loans, housing, etc.
    > A consumer reporting agency shall not prepare or furnish an investigative consumer report on a consumer that contains information that is adverse to the interest of the consumer and that is obtained through a personal interview with a neighbor, friend, or associate of the consumer or with another person with whom the consumer is acquainted or who has knowledge of such item of information, unless—
    > (A) the agency has followed reasonable procedures to obtain confirmation of the information, from an additional source that has independent and direct knowledge of the information; or
    > (B) the person interviewed is the best possible source of the information.
    They'd find themselves subject to legal penalties under:
    FCRA Willful Noncompliance (15 U.S. Code § 1681n) (if they did not disclose their existence/use/content of reports to employment candidates)
    FCRA Negligent Noncompliance (15 U.S. Code § 1681o) (if they made somewhat reasonable but insufficient efforts to comply with the FCRA)
    or
    Administrative Enforcement (15 U.S. Code § 1681s)
    and be subject to fines up to $4,700 per violation plus actual damages, plus punitive damages, plus legal fees. State Attorneys General can also bring FCRA lawsuits on behalf of their constituents, not just the federal government. FTC / CFPB can name the founders individually in the lawsuits, not just their corporate entity, and ban[1][2] them from operating any similar businesses in the future.
    That all said, to some extent, YCombinator partners are on the record[3] supporting the idea of their startups sometimes doing illegal things. Generally they'll frame this as challenging outdated regulations, but they acknowledge that the founders whose strategies they fully support sometimes come into office hours and discuss how they're worried that the strategy puts them at risk of going to jail.
    0: https://www.law.cornell.edu/uscode/text/15/1681d
    1: FTC v MyLife.com, Inc., and Jeffrey Tinsley (CEO): https://www.ftc.gov/news-events/news/press-releases/2021/12/...
    2: https://www.ftc.gov/legal-library/browse/cases-proceedings/b...
    3: https://www.youtube.com/watch?v=Hm-ZIiwiN1o&t=8m46s
    
    john_strinlai - 7 hours ago
    
    ah, okay. so the hypothetical company may potentially be doing something illegal (the "investigative consumer report" part). good to know! that makes sense, and i was unaware of that.
    i stand corrected in the hypothetical "bad reference aggregator company" scenario.
    >YCombinator partners are on the record[3] supporting the idea of their startups sometimes doing illegal things.
    interesting, thanks for surfacing that up! i wont pretend to be surprised, though.
    
    - 8 hours ago
    
    [deleted]
    
    akerl_ - 2 hours ago
    
    To be defamation in the US they'd generally need to be false statements of fact.
    "John is a bad person, and you shouldn't hire him" wouldn't be defamation.
    
    drcongo - 8 hours ago
    
    It's definitely illegal in the UK.
    
    john_strinlai - 8 hours ago
    
    i dont believe that it is illegal to provide a negative reference in the UK, as long as it is honest, factual, and provided in good faith.
    from gov.uk:
    >"If you think you’ve been given an unfair or misleading reference, you may be able to claim damages in court. Your previous employer must be able to back up the reference, such as by supplying examples of warning letters.
    You must be able to show that:
    - it’s misleading or inaccurate
    -you ‘suffered a loss’ – for example, the withdrawal of a job offer"
    which means, if the reference is not misleading and not inaccurate, a negative reference is ok. other uk-based law firms (from a quick google) agree with this interpretation.
    
    laserlight - 7 hours ago
    
    Providing a negative reference is totally different than gathering negative references and selling them. The former could be legal while the latter could be illegal.
    
    john_strinlai - 7 hours ago
    
    for sure!
    in my comment, i was speaking more generally than i should have, and that (obviously, in hindsight) caused some confusion between the specific case of the hypothetical company, and the general case of an employer providing a negative reference. my bad -- and it is too late to edit to provide clarification.