Parse, Don't Validate (2019)

lexi-lambda.github.io

132 points by shirian 3 hours ago


seanwilson - 2 hours ago

Maybe I'm missing something and I'm glad this idea resonates, but it feels like sometime after Java got popular and dynamic languages got a lot of mindshare, a large chunk of the collective programming community forgot why strong static type checking was invented and are now having to rediscover this.

In most strong statically typed languages, you wouldn't often pass strings and generic dictionaries around. You'd naturally gravitate towards parsing/transforming raw data into typed data structures that have guaranteed properties instead to avoid writing defensive code everywhere e.g. a Date object that would throw an exception in the constructor if the string given didn't validate as a date (Edit: Changed this from email because email validation is a can of worms as an example). So there, "parse, don't validate" is the norm and not a tip/idea that would need to gain traction.

zdw - an hour ago

This is a great article, but people often trip over the title and draw unusual conclusions.

The point of the article is about locality of validation logic in a system. Parsing in this context can be thought as consolidating the logic that makes all structure and validity determination about incoming data into one place in the program.

This lets you then rely on the fact that you have valid data in a known structure in all other parts of the program, which don't have to be crufted up with validation logic when used.

Related, it's worth looking at tools that further improve structure/validity locality like protovalidate for protobuf, or Schematron for XML, which allow you to outsource the entire validity checking to library code for existing serialization formats.

macintux - 2 hours ago

A frequent visitor to HN. Tip: if you click on the "past" link under the title (but not the "past" link at the top of the page), you'll trigger a search for previous posts.

https://hn.algolia.com/?query=Parse%2C%20Don%27t%20Validate&...

However, it's more effective to throw quotes into the mix, reduces false positives.

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

rorylaitila - 19 minutes ago

I make great use of value objects in my applications but there are things I needed to do to make it ergonomic/performant. A "small" application of mine has over 100 value objects implemented as classes. Large apps easily get into the 1000s of classes just for value objects. That is a lot of boilerplate. It's a lot of boxing/unboxing. It'd be a lot of extra typing than "stringly typed" programs.

To make it viable, all value objects are code-generated from model schemas, and then customized as needed (only like 5% need customization beyond basic data types). I have auto-upcasting on setters so you can code stringly when wanted, but everything is validated (very useful for writing unit tests more quickly). I only parse into types at boundaries or on writes/sets, not on reads/gets (limit's the amount of boxing, particularly on reading large amounts of data). Heavy use of reflection, and auto-wiring/dependency injection.

But with these conventions in place, I quite enjoy it. Easy to customize/narrow a type. One convention for all validation. External inputs are by default secure with nice error messages. Once place where all values validation happens (./values classes folder).

kayo_20211030 - an hour ago

A great piece.

Unfortunately, it's somewhat of a religious argument about the one true way. I've worked on both sides of the fence, and each field is equally green in its own way. I've use OCaml, with static typing, and Clojure, with maybe-opt-in schema checking. They both work fine for real purposes.

The big problem arrives when you mix metaphors. With typing, you're either in, or you're out - or should be. You ought not to fall between stools. Each point of view works fine, approached in the right way, but don't pretend one thing is the other.

r4victor - an hour ago

It seems modern statically-typed and even dynamically-typed languages all adopted this idea, except Go, where they decided zero values represent valid states always (or mostly).

A sincere question to Go programmers – what's your take on "Parse, Don't Validate"?

pcwelder - 2 hours ago

Each repost is worth it.

This, along with John Ousterhout's talk [1] on deep interfaces was transformational for me. And this is coming from a guy who codes in python, so lots of transferable learnings.

[1] https://www.youtube.com/watch?v=bmSAYlu0NcY

yakshaving_jgt - an hour ago

I did a lightning talk on this topic last year, with a concrete example in Yesod.

https://www.youtube.com/watch?v=MkPtfPwu3DM

curiousgal - an hour ago

Semi tangent but I am curious. for those with more experience in python, do you just pass around generic Pandas Dataframes or do you parse each row into an object and write logic that manipulates those instead?

whalesalad - 19 minutes ago

The author's point here is great, but the post does (imho) a poor job illustrating it.

The tl;dr on this is: stop sprinkling guards and if statements all over your codebase. Convert (parse) the data into truthful objects/structs/containers at the perimieter. The goal is to do that work at the boundaries of your system, so that inside of your system you can stop worrying about it and trust the value objects you have.

I think my hangup here is on the use of terms parse vs validate. They are not the right terms to describe this.

danieltanfh95 - an hour ago

Hot take: Static typing is often touted as the end all be all, and all you need to do is "parse, don't validate" at the edge of your program and everything is fine and dandy.

In practice, I find that staunch static typing proponents are often middle or junior engineeers that want to work with an idealised version of programming in their heads. In reality what you are looking for is "openness" and "consistency", because no amount of static typing will save you from poorly defined or optimised-too-early types that encode business logic constraints into programmatic types.

This is also why in practice alot of customer input ends up being passed as "strings" or have a raw copy + parsed copy, because business logic will move faster than whatever code you can write and fix, and exposing it as just "types" breaks the process for future programmers to extend your program.