SQL patterns I use to catch transaction fraud

analytics.fixelsmith.com

305 points by redbell a day ago


jstanley - 18 hours ago

> Real cardholders almost never buy something for exactly $1.00. Coffee is $4.73, gas is $52.81. The roundness is the signal.

Surely this depends on how the vendor sets their prices? If you're going to buy something from a website to test a stolen credit card you don't just get to make up your own prices.

And I think you may be over-indexing on the US "prices don't include tax" thing. Elsewhere, round-number prices are extremely common.

In fact a lot of the rest of the stuff in the post seems like it wouldn't work very well either. (E.g. you're flagging anyone who has done a transaction in the last 90 days outside the range of hours at which they have 2+ transactions? Wouldn't that be like 50% of people?).

It's unclear to me whether this article is an attempt at breaking down complex expertise into over-simplified SQL queries, or whether it is all speculative and made up.

There is a conflict between "Six SQL patterns I use to catch transaction fraud" and "Nothing here comes from anything I’ve actually worked on or seen".

Kwpolska - 14 hours ago

> Border crossings inside 10 minutes. International rings.

Or normal people living in Europe in border-adjacent areas.

Also, I guess you don't include card-not-present transactions in this, but you incorrectly assume that every merchant has their location set correctly. And that every sale happens in a brick-and-mortar establishment, not from travelling salespeople or whatever. And that all transactions happen online.

relevant_stats - 12 hours ago

Reading this to the very end uncovers empty and contradictory advice. I'm almost sure it's LLM generated.

We learn simultaneously that 'your team' shouldn't rely on any one of those patterns ('none of them is enough'), but that pattern 1 'alone will surface a useful amount of fraud'.

We also read strange sentences like "Every analyst on your team will use them (ie window functions) once they exist, and adding the next fraud pattern stops being a project. [end of paragraph]"

Or irrelevant discussions about how filtering by "IS NULL" might be not applicable when almost none of the provided examples uses it (and the one which does uses it in different context).

This is low quality and too long.

reconnecting - 13 hours ago

Hacker News, we need to talk!

"Fixel Smith" is an AI-generated person, with an article that has very little to do with fraud analysis. 'This' is also a music artist (1), novelist (2), fraud analyst (3), influencer (4), and whatever else you can imagine.

220+ points and 70 comments, and very few notice it's quite a fake post — and no one that it's an AI generated person?

1. https://www.amazon.it/Forged-Soundtrack-Explicit-Fixel-Smith...

2. https://fixelsmith.com

3. https://analytics.fixelsmith.com/

4. https://www.instagram.com/fixeltales/

reconnecting - 13 hours ago

We develop tirreno (1), an open-source security framework.

I question the described approaches. For example, while impossible travel is a legitimate and widely used technique, it's related to online user behaviour based on IP address. Moreover, tirreno, for example, has separate rules for cases where the IP clearly comes from Apple Relay or VPN/Tor — those are separate flags. I assume some or all examples are LLM-generated, as the context is mixed up and no one actually collects GPS location in bulk for card swipes.

1. https://github.com/tirrenotechnologies/tirreno

datsci_est_2015 - an hour ago

Or, “rules-based logic encoded in SQL queries without any backing data”.

Bunch of thresholds, no data proving those thresholds are meaningful.

enoent - 14 hours ago

> Fraud detection in transaction data is mostly SQL. Not machine learning, not graph databases, not whatever Gartner is hyping this year. SQL, run against the right tables, with the right joins, looking for the right shapes.

It's also not all program-integrity, which is the only work that could justify such blanket statements. Worse is better as long as it addresses the problem domain.

Fintech clients are generally interested in knowing whether a transaction happening _right now_ is fraud. They want to know that in a few milliseconds, for high-dimensional data. It's work done at a scale where relational databases cannot meet these real-time constraints, and instead find other uses like historical data loading. That's how you end up with in-memory databases, stream-processing engines, and yes, even machine learning.

Having said that, some of the author's points are valid, and I'm looking forward for their next writings, in particular dealing with noisy alerts is a general problem beyond performance engineering.

0cf8612b2e1e - 17 hours ago

  If a card swipes in Chicago and seven minutes later swipes in Los Angeles, one of those swipes is fake.
How does this work with online shopping? When I am sitting on the couch and buy from Amazon, where does the address get registered?

Can also imagine an edge case: couple shares an online account, one is traveling and purchases with the saved card details.

crmd - 17 hours ago

> Drawback: this doesn’t work until you have history. New accounts have no baseline.

This is an underrated CX factor: If my card gets denied when i’m a new customer or exhibiting a new pattern, i’m impressed with their software.

However if they deny a transaction where there is any previous history of me authenticating, then I’m frustrated by their naive paranoid algorithm.

fny - 15 hours ago

Isn't the point of ML that you learn these rules from the data? The right approach to me would be to use ML models to detect patterns that correspond with fraud and then evaluate them to see if any make sense. This way you might discover new hyptotheses.

janpeuker - 9 hours ago

I agree the post looks a little AI written but generally this kind of analysis is quite common. Leaving aside human heuristics that are generally too well known to catch real scammers (like time travel or "7 days", which is bad because often weekly patterns are important so at the very least look at 10 days) and actually have low precision, what I find odd it that all results just return a user ID.

So this is really just surfacing cases, but with not enough context to be useful to prioritise. I would expect a score to be included.

Apart from that it misses a lot of signals like refunds, declines, disputes etc [1].

1) https://stripe.com/gb/guides/improve-fraud-management-with-r...

jwr - 11 hours ago

After 35 years of building software systems I've learned to temper my hubris. These days I rarely assume things to be "definitely true".

For example "Impossible travel": these days you can add your credit card to your phone and use Apple Pay. Well, this is useful for many things, one of them being adding your credit card to your kid's (teenager) phone, so that your kid can use your card in case of need/emergency when they are away from you. I did exactly that recently and actually worried about fraud control systems when my child paid using my card in Boston while I was in Europe.

Many things which you think are true might not be.

Anecdotally, US banks are terrible at building fraud control systems. It seems US banks assume any transaction that is charged by an entity outside the US is fraud. In my 10-year history of running a SaaS, the US banks and their "fraud control" systems have been one of the biggest billing problems.

daneel_w - 10 hours ago

In reality, most banks perform a lot of these transaction checks in real time to block fraudulent txes up-front, instead of validating tx legitimacy retroactively at a point where the money is already gone. Some 15 years ago a security rep with Nordea (a large Nordic bank) called me late at night asking if I was currently in South Korea and had just a minute ago used my card in a shop. Someone had initiated a "card present" purchase with my card for 1337 SEK (I'm certain this amount was intentional), which Nordea automatically blocked as it was near the edge of possibility relative to my previous card swipe in Sweden earlier in the day, and they wanted to make sure they weren't about to mistakenly strand me abroad by blocking the card.

maciekkmrk - 17 hours ago

What if I go on a roadtrip and suddenly get gas at 2am?

layer8 - 11 hours ago

> Views are mine, not my employer’s.

What about the tables?

dnnddidiej - 16 hours ago

2 can be genuine use. I let my partner use my card and I use it on my phone as rfid. Maybe ignore phone usages since they are secured pretty well.

vladiat0r - 11 hours ago

These seem like reasonable interview questions. Otherwise, these seem very basic and naive.

dogscatstrees - 16 hours ago

The main problem with these SQL calculations is that they are deterministic shortcuts for a probabilistic problem. Fraud is not usually a “true because rule X matched.” It is more like "what is the probability this is fraudulent"? SQL patterns are useful, but they are blunt instruments. I really don't think banks use deterministic heuristics but more data science stuff.

inheritedwisdom - 15 hours ago

This takes me back, fighting telephone fraud back when folks use to accept cc over the phone. We used similar patterns but only had phone numbers and the white pages. Cross state boundaries inside similar time frames and categorizing similar merchant types. It’s fun to see these same patterns still in use 20 years later for the same purpose.

noduerme - 13 hours ago

This is very cool to read. Although I've never truly worked in fraud prevention, I stumbled into automating a lot of similar pattern checks to catch collusion and fraud when I wrote and ran a poker site / casino. Window functions were not available then so the queries were LONG. One way I'd deal with it was to assign uuids to every pair of players who'd ever shared a poker table, and then run nightly analysis of how much their betting deviated from expected norms and their own baseline on each stage of the game if they were in the same hand as each other. This could actually be done in one or two magnificent 100+ line SQL queries on the history table, on a read replica.

Lagging window functions and/or lateral joins probably would have reduced it to 1/4 the size but definitely increased the cost versus just narrowing the sets into smaller tables first.

skeeter2020 - 9 hours ago

These all seem pretty elementary TBH. they focus on identifying fraudulent transactions, vs (IME) the more valuable deciding if a transaction is fraudulent. This is totally double today. Example: instead of some sort of "outside of normal transaction" you can confidently determine that "coffee at 2am" is likely fine, if they also bought gas 10 minutes earlier from the same merchant, dinner 300 miles away at 7pm, and again gas 8 hours ago in their home town.

sincerely - 17 hours ago

This is quite interesting, but the blatantly AI generated explanations are like an anti-signal for quality

achierius - 17 hours ago

This seems interesting, but has so many signs of AI writing that I worry it's not just edited but generated from whole cloth. Probably still a lot of truth in there but it does give me pause!

> The roundness is the signal.

> Slight pain, same result.

to point at a few.

AussieWog93 - 14 hours ago

Wait, you can clone a credit card? Why don't they use a public-private key pair?

- 13 hours ago
[deleted]
Beestie - 9 hours ago

This is gold, mate. Much obliged.

TheOrange - 12 hours ago

Fascinating stuff. More please :/)

How do you deal with vacations and online shopping. You could be in another country or two in a few hours and purchase from across the world

1a527dd5 - 11 hours ago

I feel like if you've done any kind of investigation work then this is the normal baseline.

jason1cho - 10 hours ago

People here provide counterexamples to show this article is bullshit. Don't forget, fraud detection is about statistics. Outliers always exist.

Machine learning systems also learn your pattern. The article gives simple SQL rules. Don't dismiss this article as worthless.

sltr - 13 hours ago

Obligatory in any discussion of money fraud: https://www.bitsaboutmoney.com/archive/optimal-amount-of-fra...

atombender - 12 hours ago

This is AI slop, as has been pointed out by several other commenters. Flagged.

nubg - 11 hours ago

> If their card does, it’s either being used by someone else or they’re traveling — and travel produces other signals you can check.

Signal's he can check? So some random dude is looking at my credit card purchase history while playing around with his SQL queries?

nubg - 11 hours ago

Article was written by AI (dozens of give aways), so take the content with a grain of salt! They author could not be bothered to write it by hand, yet demands your attention to read it. I'm not even sure which parts are based on his prompt and which were hallucinated... How to butcher what could have been an interesting article... sigh

mattmanser - 15 hours ago

This is the sort of thing I used to love doing and I often gaze at raw data analysis and sometimes wish my career had pivoted towards working with data like this.

But I must admit there was a point where I suddenly lost my love for SQL and it was pretty much when the OVER PARTITION BY syntax appeared.

It never clicks. I always have to look up how it works, I always find it unintuitive. I've never understood why I hate it so much.

Hackbraten - 13 hours ago

Some of these heuristics are dystopian to no end.

> Most people are creatures of habit when they spend money. A nine-to-fiver doesn’t suddenly start buying gas at 3am.

Breaking out of a habit once in a while is what keeps one's mind sharp.

A big "fuck you" to financial analysts with those groundhog-day mindsets for making my life much more miserable than it needs to be and for adding a chilling effect to those little getaways that make life interesting and worthwhile. I despise you for this.

themafia - 16 hours ago

> If a card swipes in Chicago and seven minutes later swipes in Los Angeles, one of those swipes is fake. The card is cloned.

Or, the cardholder is trying to do the cannonball run:

https://www.youtube.com/shorts/Dx5WPNIEwiE

yieldcrv - 16 hours ago

the real question is whether you would point hour agentic system at this blog post and create

chargeback-mcp

or would you turn it all into a markdown file and call it a skill?

baranazal - 11 hours ago

[flagged]