CCL: Categorical Configuration Language

chshersh.com

97 points by SchwKatze 7 days ago


theamk - 6 days ago

That's a lot of words to describe a very simple syntax: name=value pairs, with line continuation using whitespace.

Basically RFC822 email headers, or Debian Control File Format [0] but with "=" instead of ":", and without dedicated comment character.

The biggest problem with this format is that a lot of things are left for the app, so each app will have its own way to implement lists, bools, line wrap support.. Even something like "value override" is left to program implementation. Don't expect YAML/JSON/XML-style automated validators/linters, each program will need its own bespoke parser/generator.

[0] https://www.debian.org/doc/debian-policy/ch-controlfields.ht...

mightyham - 6 days ago

I really like the conciseness of this syntax. The language seems very well thought through.

That being said, I've been working with NixOS recently and it's made me reconsider what is useful for a configuration language. In many reasonably large software projects, where configs become very complex, config reuse (in other words templating or meta-configuration) becomes an increasingly helpful feature. Nix configs are great because it's not just a config, but a full blown purely functional language for manipulating the config. It's intuitive and powerful once you get the hang of it, and I sometimes find myself wishing I could use it when I have to work with yaml, json, etc.

nickm12 - 6 days ago

Everyone is entitled to scratch their own itch, but this seems like the most useless configuration language I've ever seen.

Take the "fixed point" example, where you have a boolean setting which one file says should should be "yes" and the other says it should be "no" and the language semantics composes that into a list with both values. For what boolean setting does this make sense?

The article says "Overrides are not a problem because you keep both values. And you can decide what to do with them: keep only the first, keep only the last or use some smart logic to combine both of them. You’re the boss."

If you need custom logic in your application determine the setting to use, how is this language helping you?

efitz - a day ago

“Data” has syntax (structure), semantics (meaning), and often needs references (to other parts of itself or other data).

There does not exist a perfect configuration language because whether and to what extent each of these capabilities are supported is a subjective trade-off, and reasonable people with different problems might reasonably want different trade-offs.

I like config languages that allow variables and references, so that eg if I change the root path, I just have to change the $ROOT variable near the start of the file and 20 other sub-paths just reference the new $ROOT.

I also like semantics with my syntax, because lots of time I care about dstip but not srcip or vice versa; IP lets me parse for accuracy but not for meaning/usage.

I hate encoding meaning in whitespace; it trades away robustness in duplication in favor of being more human readable. This probably comes from lots of NNTP and XMODEM and 7-bit ASCII battle scars. But reasonable people can disagree.

On the other hand, I think it is a valuable learning exercise to write your own DSL for some common problem space and share it, IF you listen to and internalize the feedback others write about it rather than just filter out anything that isn’t adulation.

trelliscoded - 6 days ago

The equal sign is a required character for anything base64 encoded, which includes some things you’d expect to be in a config file, like ssh public keys and x509 certs.

jcarrano - 2 days ago

Essentially a stripped-down ini where you have to code any additional functionality and you will never have any tooling because basic things (like comments) are not standardized.

For the use cases mentioned in the article, I have used JSON with schemas and JSON-merge and JSON-patch quite effectively. VScode supports schemas so it will help you edit the file without making mistakes. You can use JSON merge to combine global and local configs and you could even use custom fields in the schema to indicate metadata, for example, that a key represents an upper limit so the lowest value should be considered when merging.

Bost - 2 days ago

Have a look at "Code is Data / Data is Code" https://en.m.wikipedia.org/wiki/Homoiconicity And then see how it's done in real life: https://guix.gnu.org/

- a day ago
[deleted]
worthless-trash - 6 days ago

Every configuration language that is not code ends up being wrong in multiple ways. We will never learn.

cirwin - a day ago

I’ve been working on a configuration format [1] that looks surprisingly similar to this!

That said, the expectation in CONL is that the entire structure is one document. A separate syntax for multiline comments also enables nice things like syntax highlighting the nested portions.

I do like the purity of just a = b, but it seems harder to provide good error messages with so much flexibility

1: https://github.com/ConradIrwin/conl

- 6 days ago
[deleted]
junon - 2 days ago

What a delightfully arrogant article, to the point I believe it to be satire (stopped reading at section headers, perhaps I missed the punchline).

TOML is by far the most stripped down and easy to understand configuration format I've ever used, allowing just enough syntactic sugar to be useful without changing semantics. The fact it's made by Tom is meaningless, so its flagrant dismissal is silly to me.

Meanwhile, the proposed configuration format sounds like a nightmare to read. There is still clashing syntax, offloads all of the parsing work to the software (which means now you have the same config format with multiple different ways of interpreting values), restricts usage of certain characters with no way of escaping them (someone else mentioned base64), and otherwise requires that you recursively parse it for nested KVs rather than constructing the final in memory structures in a linear pass, adding a layer of indirection prior to parsing.

Not to mention, I really get turned off by this sort of pious writing style.

No thanks. Lots of reasons to dislike config formats we've seen before but this doesn't solve anything in my eyes.

kukimik - a day ago

This, for a number of reasons, reminds me of the Tree format, see https://hackernoon.com/tree-ast-which-crushes-json-xml-yaml-... and https://github.com/nin-jin/tree.d.

- a day ago
[deleted]
jjulius - 2 days ago

With the handful of threads about electronic music in the past couple of weeks, for a brief moment I thought this was going to be about the inimitable CCL[1].

[1] https://on.soundcloud.com/CGSmV6qHWNXKhLcp8

4ad - 2 days ago

Compositionality is paramount and category theory guarantees compositionality, but the author's criteria for what entails a good configuration language are woefully naïve.

Configuration is not about describing data, it's about control. Control over a system made of impure, effectful parts.

Configuration is a matter or programming a mutable computer, i.e. a way to specify the composition of effects that you want.

The configuration language is agnostic over the systems it controls, therefore it must provide semantics that preserve morphism in any of its interpetations. The language must be rich enough to accomodate for this. It is not enough to have one semantics.

Moreover, it must be rich enough to describe its own models. Yes, the interpretation of it by arbitrary systems must be expressible in the language itself in order to be meaningful and to preserve consistency with regard of its interpretations. In practice, this is done through types.

Additionally, configuration is a global activity, it's applied to the whole system, with many people changing conflicting aspects of it. Just like with any large evolving program, abstraction and typing are required for software engineering reasons alone.

Coincidentally, CUE is also a monoid, but it is more than that, it is a complete Heyting algebra (or a complete Boolean algebra in the case of closed world assumption), these objects also form very rich categories.

Another way to look at CUE is to view it as a semantic domain for the denotation of arbitrary types of arbitrary languages. It's suitable for this because it's a coherence space (Girard). All CUE operations are closed, preserving the structure of the space.

One interesting aspect of author's effort is that even if he was so naïve, category theory led him to a path that is correct. What he did is incomplete, a monoid does not suffice for a configuration language, but a monoid is required. This is saying something.

azeirah - 2 days ago

If you want a category theoretically-informed configuration language that has real-world use, then use nix.

It's precisely this, and there's a reason it has the largest package repo on the planet

herrington_d - 2 days ago

Calling other languages like "none of the tooling" in the "why" section sounds like a huge self-roasting since CCL does not have, say, highlighting/LSP/FFI for adoption.

binary132 - 2 days ago

This is silly. I like it. I’m pretty sure you just tricked me into reading a “for dummies” primer on category theory. Congratulations!

bobnamob - 2 days ago

I’m still sad there’s such an aversion to parens

Edn is a lovely config language that checks most of the authors boxes, while still being “composable”

jteppinette - 2 days ago

Is there a /s missing somewhere?

hoseja - 6 days ago

You'd think people would be more disinclined to xkcd://927 but for some reason this keeps happening.

jalk - a day ago

> You want to introduce data validation and type-checking in your config? Fine, you can just ask users to provide type annotations in the format you want...

No - the users cannot choose type - they cannot suddenly decide that they want to provide a date where your parser expects an URL, or you are suddenly just making users repeat the schema

> Every software MUST WORK WITHOUT A CONFIG!! > So, empty config or no config file at all must be a valid configuration

Loads of scenarios where I want fail-fast over a running but broken system: "WARN AlertSystem URL not configured: alerting disabled" "WARN No credentials store configured, adding admin/111111" ....

cies - 2 days ago

Nice! I really like a fresh take on anything.

It's been said to be like RFC822 or Debian Control File Format in the comments here, I'd like to add like x-www-form-urlencoded. At work I use this a lot as it is what browsers submit. It's List<String, List<String>>, so keys may occur more than once. We standardized on little language for the keys that allows us to submit structured forms. (Many libraries prescribe a language for this, Rails does too; our keys look like ".location.space[2].name" for "{location:{space:[null,null,{name=VALUE_AS_STRING}]}}" in json).

Some years ago I wrote a TOML parser in Haskell. Because parsers a fun to write in Haskell, and I needed one.

Since we deploy with AWS/Fargate (Docker) the config is passed as JSON k-v pairs that are then set as ENV VARs in the container (following one of those 12factor principles). So it seems I cannot dictate the config file format.