CCL: Categorical Configuration Language
chshersh.com97 points by SchwKatze 7 days ago
97 points by SchwKatze 7 days ago
That's a lot of words to describe a very simple syntax: name=value pairs, with line continuation using whitespace.
Basically RFC822 email headers, or Debian Control File Format [0] but with "=" instead of ":", and without dedicated comment character.
The biggest problem with this format is that a lot of things are left for the app, so each app will have its own way to implement lists, bools, line wrap support.. Even something like "value override" is left to program implementation. Don't expect YAML/JSON/XML-style automated validators/linters, each program will need its own bespoke parser/generator.
[0] https://www.debian.org/doc/debian-policy/ch-controlfields.ht...
Not to discredit the author, there are some smart thoughts in there... but I can't help but feel like: yeah of course this is very elegant — but the complexity is not gone, it is elsewhere. And they are not showing that elsewhere.
Namely the parsing code.
Yup.
If simplicity syntax is the only goal, we can take one step further.
I present you with, EP - easy properties: Any UTF8 encoded file is valid configuration file. You’re the boss, not the language. You can concatenate two files, you can add comments in format you want, you can choose any syntax you want!
(Somebody will need to parse it eventually though)
You can literally just look at it: https://github.com/chshersh/ccl/tree/main/lib
Sure, but it is a configuration format if it is intended to be used in all kind of languages you have to show bow to deal with it in all kind of languages.
Checks that might be trivial in OCaml might be utter footguns in say C.
Don't get me wrong here, I get that this is someones spare time project that they might use for themselves only. I am fine with that. But I am unconvinced if the (I admit: well demonstrated) simplicity of the format translates to simplicity of use in the scope it aimed for (replacement for other wide spread configuration formats).
I don't say it is impossible (or even unlikely) that this is an improvement, I just caution against seeing an elegant minimalist approach and automatically assuming it makes things simpler — remember, computers with their binary file systems already have the most simple format that could exist: zero or one. Yet somehow people shot themselves in the foot parsing those for decades. Much of the complexity of computer systems stems from managing the simplicity of the underlying components (if we ignore the thick layer historically grown cruft).
What would it take to change my mind? Elegant examples how to parse all the examples in that post using major programming languages (C, C++, Javascript, Java, Python, Go, Rust, ...). That is ofc a lot of work, but if the format should be adopted that work would be needed anyways.
Part of the power of the idea is how amenable this is to property-based testing.
And it really only takes one solid implementation that can be wrapped/called from other languages to do well. (Or perhaps one per an ecosystem like JVM, .NET, WASM, anything that can be relatively easily called from C, anywhere Python is used for scripting, etc.)
And because of the formalisms involved, you could have a pretty precise and complete spec defined.
> The biggest problem with this format is that a lot of things are left for the app, so each app will have its own way to implement lists, bools, line wrap support
That seems to be one of the explicit goals:
> Configuration is specific to a particular application. What you want is to follow the rule of the least surprise and utility functions to parse strings.
Since configuration is specific to a particular program, so should the configuration, seems to be what the author is getting at.
Personally, what puts me off this particular configuration language is this part, hidden behind collapsed text:
> In fact, CCL is indentation-sensitive.
Programming/configuring stuff with invisible characters isn't my idea of fun, and it sounds especially cumbersome if everyone is using it differently, since the configuration language leaves a lot up to the users of the configuration.
I think indentation sensitivity is very well suited for configs: you want little line noise and the complexity is low. I do understand the trade-off TOML made in this case.
Some languages prohibit the TAB character, and only allow spaces at the start of the line in groups of 2 or 4: so it is always clear how indentation is to be understood.
I really like the conciseness of this syntax. The language seems very well thought through.
That being said, I've been working with NixOS recently and it's made me reconsider what is useful for a configuration language. In many reasonably large software projects, where configs become very complex, config reuse (in other words templating or meta-configuration) becomes an increasingly helpful feature. Nix configs are great because it's not just a config, but a full blown purely functional language for manipulating the config. It's intuitive and powerful once you get the hang of it, and I sometimes find myself wishing I could use it when I have to work with yaml, json, etc.
You might be interested in nickel (https://nickel-lang.org/), which is a modern take on configuration management based on the experience of Nix/NixOS configurations: purely functional configuration, built-in validation (types & contracts), reusable (functions, modules, defaults), and in addition exports to Yaml, Json, etc.
To integrate nickel with nix, see how organist (https://github.com/nickel-lang/organist) does DevShell management.
Everyone is entitled to scratch their own itch, but this seems like the most useless configuration language I've ever seen.
Take the "fixed point" example, where you have a boolean setting which one file says should should be "yes" and the other says it should be "no" and the language semantics composes that into a list with both values. For what boolean setting does this make sense?
The article says "Overrides are not a problem because you keep both values. And you can decide what to do with them: keep only the first, keep only the last or use some smart logic to combine both of them. You’re the boss."
If you need custom logic in your application determine the setting to use, how is this language helping you?
I think this is probably the best place within these comments to note that one thing some people expect of a configuration format is to be able to hide information from the consuming piece of software.
Normally, it is often useful for a program to receive all the configuration from all sources. ("This flag is normally set to TRUE, has been set to FALSE on this system, has been set to TRUE by the user, and now there's an environment variable that says one thing and a command line flag that says something else.") Sometimes, integrating several incoherent settings into one is dependent on its consumer, or even the setting itself. Sometimes, you would like to be able to debug how different settings interact with one another. Sometimes, different settings can be merged without issue.
CCL exposes everything to the program receiving the config, which is something (some) people seem to abhor. I can see how wanting to hide information can be both useful and detrimental, so I'm wondering if this issue is actually orthogonal to configuration languages, meaning CCL, and others, shouldn't even concern themselves with it.
Reading this I think of all the programming languages that comments with whole languages inside of them. That is beyond the complex documentation I found.
you apply another monoid operation.
one possible one is to return first or last element. makes sense in layered configuration of, for example, a text editor, where you might override a colour.
another possible one is to return error on duplicate. makes sense in flat configuration of, for example, a build system.
your application knows which operation fits its intended structure. your application documents the behaviour, just as it normally would.
“Data” has syntax (structure), semantics (meaning), and often needs references (to other parts of itself or other data).
There does not exist a perfect configuration language because whether and to what extent each of these capabilities are supported is a subjective trade-off, and reasonable people with different problems might reasonably want different trade-offs.
I like config languages that allow variables and references, so that eg if I change the root path, I just have to change the $ROOT variable near the start of the file and 20 other sub-paths just reference the new $ROOT.
I also like semantics with my syntax, because lots of time I care about dstip but not srcip or vice versa; IP lets me parse for accuracy but not for meaning/usage.
I hate encoding meaning in whitespace; it trades away robustness in duplication in favor of being more human readable. This probably comes from lots of NNTP and XMODEM and 7-bit ASCII battle scars. But reasonable people can disagree.
On the other hand, I think it is a valuable learning exercise to write your own DSL for some common problem space and share it, IF you listen to and internalize the feedback others write about it rather than just filter out anything that isn’t adulation.
The equal sign is a required character for anything base64 encoded, which includes some things you’d expect to be in a config file, like ssh public keys and x509 certs.
Essentially a stripped-down ini where you have to code any additional functionality and you will never have any tooling because basic things (like comments) are not standardized.
For the use cases mentioned in the article, I have used JSON with schemas and JSON-merge and JSON-patch quite effectively. VScode supports schemas so it will help you edit the file without making mistakes. You can use JSON merge to combine global and local configs and you could even use custom fields in the schema to indicate metadata, for example, that a key represents an upper limit so the lowest value should be considered when merging.
i miss comments so badly in json, especially when used in config files
Just add a "$comment" key. The only problem will be if the parser is too strict and rejects the unrecognized key.
Or the linter requires sorted keys.
JSON is just about the worst config format imaginable. I’d rather write my configs in xml and I really don’t like xml.
Have a look at "Code is Data / Data is Code" https://en.m.wikipedia.org/wiki/Homoiconicity And then see how it's done in real life: https://guix.gnu.org/
Every configuration language that is not code ends up being wrong in multiple ways. We will never learn.
I’ve been working on a configuration format [1] that looks surprisingly similar to this!
That said, the expectation in CONL is that the entire structure is one document. A separate syntax for multiline comments also enables nice things like syntax highlighting the nested portions.
I do like the purity of just a = b, but it seems harder to provide good error messages with so much flexibility
What a delightfully arrogant article, to the point I believe it to be satire (stopped reading at section headers, perhaps I missed the punchline).
TOML is by far the most stripped down and easy to understand configuration format I've ever used, allowing just enough syntactic sugar to be useful without changing semantics. The fact it's made by Tom is meaningless, so its flagrant dismissal is silly to me.
Meanwhile, the proposed configuration format sounds like a nightmare to read. There is still clashing syntax, offloads all of the parsing work to the software (which means now you have the same config format with multiple different ways of interpreting values), restricts usage of certain characters with no way of escaping them (someone else mentioned base64), and otherwise requires that you recursively parse it for nested KVs rather than constructing the final in memory structures in a linear pass, adding a layer of indirection prior to parsing.
Not to mention, I really get turned off by this sort of pious writing style.
No thanks. Lots of reasons to dislike config formats we've seen before but this doesn't solve anything in my eyes.
This, for a number of reasons, reminds me of the Tree format, see https://hackernoon.com/tree-ast-which-crushes-json-xml-yaml-... and https://github.com/nin-jin/tree.d.
With the handful of threads about electronic music in the past couple of weeks, for a brief moment I thought this was going to be about the inimitable CCL[1].
Compositionality is paramount and category theory guarantees compositionality, but the author's criteria for what entails a good configuration language are woefully naïve.
Configuration is not about describing data, it's about control. Control over a system made of impure, effectful parts.
Configuration is a matter or programming a mutable computer, i.e. a way to specify the composition of effects that you want.
The configuration language is agnostic over the systems it controls, therefore it must provide semantics that preserve morphism in any of its interpetations. The language must be rich enough to accomodate for this. It is not enough to have one semantics.
Moreover, it must be rich enough to describe its own models. Yes, the interpretation of it by arbitrary systems must be expressible in the language itself in order to be meaningful and to preserve consistency with regard of its interpretations. In practice, this is done through types.
Additionally, configuration is a global activity, it's applied to the whole system, with many people changing conflicting aspects of it. Just like with any large evolving program, abstraction and typing are required for software engineering reasons alone.
Coincidentally, CUE is also a monoid, but it is more than that, it is a complete Heyting algebra (or a complete Boolean algebra in the case of closed world assumption), these objects also form very rich categories.
Another way to look at CUE is to view it as a semantic domain for the denotation of arbitrary types of arbitrary languages. It's suitable for this because it's a coherence space (Girard). All CUE operations are closed, preserving the structure of the space.
One interesting aspect of author's effort is that even if he was so naïve, category theory led him to a path that is correct. What he did is incomplete, a monoid does not suffice for a configuration language, but a monoid is required. This is saying something.
If you want a category theoretically-informed configuration language that has real-world use, then use nix.
It's precisely this, and there's a reason it has the largest package repo on the planet
Calling other languages like "none of the tooling" in the "why" section sounds like a huge self-roasting since CCL does not have, say, highlighting/LSP/FFI for adoption.
This is silly. I like it. I’m pretty sure you just tricked me into reading a “for dummies” primer on category theory. Congratulations!
I’m still sad there’s such an aversion to parens
Edn is a lovely config language that checks most of the authors boxes, while still being “composable”
Is there a /s missing somewhere?
You'd think people would be more disinclined to xkcd://927 but for some reason this keeps happening.
> You want to introduce data validation and type-checking in your config? Fine, you can just ask users to provide type annotations in the format you want...
No - the users cannot choose type - they cannot suddenly decide that they want to provide a date where your parser expects an URL, or you are suddenly just making users repeat the schema
> Every software MUST WORK WITHOUT A CONFIG!! > So, empty config or no config file at all must be a valid configuration
Loads of scenarios where I want fail-fast over a running but broken system: "WARN AlertSystem URL not configured: alerting disabled" "WARN No credentials store configured, adding admin/111111" ....
Nice! I really like a fresh take on anything.
It's been said to be like RFC822 or Debian Control File Format in the comments here, I'd like to add like x-www-form-urlencoded. At work I use this a lot as it is what browsers submit. It's List<String, List<String>>, so keys may occur more than once. We standardized on little language for the keys that allows us to submit structured forms. (Many libraries prescribe a language for this, Rails does too; our keys look like ".location.space[2].name" for "{location:{space:[null,null,{name=VALUE_AS_STRING}]}}" in json).
Some years ago I wrote a TOML parser in Haskell. Because parsers a fun to write in Haskell, and I needed one.
Since we deploy with AWS/Fargate (Docker) the config is passed as JSON k-v pairs that are then set as ENV VARs in the container (following one of those 12factor principles). So it seems I cannot dictate the config file format.