Losing language features: some stories about disjoint unions

graydon2.dreamwidth.org

111 points by Bogdanp 4 days ago


MarkusQ - 20 hours ago

Back in the day, when memory wasn't as cheap as it is now, there was a strong belief that forcing the user to "waste bits" on a proper sum type was a non-starter for a "real" language. It was widely assumed that the reason you were "sharing memory" between two fields was to conserve space, because you were clever enough to have recognized that they couldn't both be used at the same time. But doing so, it was generally assumed, meant that you were space constrained and so anything that took away your precious savings was bad.

I'm not saying this was "right" in any sense, but it wasn't just foolish old timers not recognizing that a "better" solution was possible. When you grew up having every single bit of memory threaded by hand (and costing macroscopic amounts of money), you think about memory efficiency differently.

jchw - 19 hours ago

Lack of sum types is probably one of the worst things about working in Go, and I think it is a much bigger problem than the lack of generics ever was. Sadly, though, I don't think you can really just bolt sum types onto an already complete programming language design.

ivanjermakov - 20 hours ago

I'm surprised how many modern languages lack first-class sum type support, considering the amount of domain use cases for them.

Taniwha - 16 hours ago

Wirth was on the Algol68 committee - I'm sure he understood how those sorts of unions worked.

He also avoided a lot of the more advanced features of Algol68, he thought it too complex, when he designed Pascal

Buttons840 - 14 hours ago

> But another thing Muratori points out is that is that Dahl and Nygaard copied the feature in safe working form into Simula, and Stroustrup knew about it and intentionally dropped it from C++, thinking it inferior to the encapsulation you get from inheritance. This is funny! Because of course C already had case #3 above -- completely unchecked/unsafe unions, they only showed up in 1976 C, goodness knows why they decided on that -- and the safe(ish) std::variant type has taken forever to regrow in C++.

This seems like a mistake. At the end of the day, a bunch of code and logic has to be written somewhere, and I think it's better done outside the data object, at least some of the time.

Imagine you have the classic Shape class / interface and someone wants to write some code to determine whether a Shape is happy or sad, based on their synesthesia, what are they suppose to do? I guess just add a happy_or_sad() method to the interface? Like, we're just going to pile--err, I mean, "encapsulate"--every possible thing that can be done with the data into the data object?

The OOP way is probably some Shape class hierarchy with a Shape superclass and a bunch of specific Square, Circle, Triangle, subclasses. So I guess you go and modify a dozen subclasses to add your happy_or_sad() method. And you're definitely going to have to fork the code because nobody wants to upstream your personal feelings about which Shapes are happy or sad.

It's better to have a sum type for your Shape and then everyone can put all their code and logic outside of the Shape itself, and the type system will ensure, at compile time, that no Shape variants have been missed, so refactoring is assisted by the type system.

eru - 14 hours ago

Alas, this one gives a 403 Forbidden.

https://archive.is/oTbMW works though.

suprtx - 2 hours ago

While I only watched 25%-50% of the linked talk by Casey Muratori, spread out here and there, and fast-forwarded through the rest, I did not like his talk. And it reflects on this blog post as well. Casey Muratori obviously spent a lot of time on it, but programming and computer science is a huge field, and it is possible to spend a lifetime on even a part of one aspect of it.

In that talk, Casey Muratori refers to Simula, a PDF of it can be found at https://www.mn.uio.no/tjenester/it/hjelp/programvare/simula/... . You may want to use an OCR tool on that PDF, for instance ocrmypdf is available for a number of Linux distributions. I am not sure if it is the same version of Simula as what is being discussed, but it does have the "connection" statement, at PDF-page 56, which has "inspect" as the first syntactical element. That does look vaguely similar to the pattern matching of ML, but it does not AFAICT support a number of significant features that many love about modern pattern matching, such as nested patterns. Does it have field bindings as part of a pattern, and matching against specific constant values, or only matching the class type? I am not sure if it supports exhaustiveness checking. Does it mandate a finite number of possibilities, to help exhaustiveness checking? And the "connection" statement has two variants. AFAICT, it is the kind of abstraction that is primitive enough that one can get close to its functionality with "switch" in C++ together with a type-cast, and a far cry from what Standard ML (later?) supported. In that light, it might not be surprising that it was not included in C++.

When was pattern matching as we know it in modern times invented, or was it a gradual evolution? https://en.wikipedia.org/wiki/Hope_(programming_language) is cited as introducing https://en.wikipedia.org/wiki/Algebraic_data_type in the 1970s. And Hope had for instance this spin on just one aspect of pattern matching:

> Changing the order of clauses does not change the meaning of the program, because Hope's pattern matching always favors more specific patterns over less specific ones.

This is different from modern pattern matching, where the order (AFAIK generally across modern languages) does matter.

I am not sure that Casey Muratori did a good job of researching this topic, but I am not sure if and how much I can fault him, since the topic is complex and huge and may require a lot of research. Researching the history of programming languages may be difficult, since it would both require a high technical level and also have to be focused on history. One could probably have several full-time university positions just spending their time researching, documenting and describing the history of programming languages. And the topic is a moving target, with the professionals having to have a good understand of multiple languages and of programming language theory in general, and preferably also some general professional software development experience.

All in all, the data types and pattern matching of the 1970s might be extremely different from the discriminated unions and pattern matching of the 1990s. C++ also does not have garbage collection, which complicates the issue. Rust, for instance, that also does not have garbage collection, has different binding modes for the bindings in pattern matches.

It is important to note that subtyping and inheritance are different. And even FP languages can use subtyping.

I think both Casey Muratori (and Graydon Hoare, if he has not already read it) could be interested in reading the book Types and Programming Languages, even though that book is old by now and may not contain a lot of newer advancements and theory. I also think that Casey Muratori could have benefited (in regards to this talk, at least) from learning and using Scala and its sealed traits in regards to pattern matching, if I recall correctly, Scala had as one of its objectives to attempt unify OOP and FP. I do agree that OOP can be abused, and personally I am lukewarm on inheritance, especially as direct modelling of a domain as discussed in the talk without deeper thought whether such an approach is good relative to other options and trade-offs. But subtyping, as well as objects that can be used as a kind of "mini-module", is typically more appealing than inheritance IMO. "Namespacing" of objects is also popular.

Some theory and terminology also discuss "open" and "closed" types.

And, after all, Haskell has type classes, which is not OOP, but is relevant for ad-hoc polymorphism (is Casey Muratori familiar with type classes or ad-hoc polymorphism?), Rust has traits, not quite the same as type classes but related. Scala has various kinds of implicits in regards to that. And Rust also has "dyn" traits, not so commonly used, but are available.

ahaferburg - 9 hours ago

C++ is a gun with five triggers, one for each finger that holds the grip. That's what makes it so versatile and powerful.