Some C habits I employ for the modern day

unix.dog

193 points by signa11 5 days ago


themafia - 13 hours ago

> and I end up having all these typedefs in my projects

I avoid doing this now. It's more trouble than it's worth and it changes your code from a standard dialect of C into a custom one. Plus my eyes are old and they don't enjoy separating short identifiers.

> typedef struct { ... } String

I avoid doing this. Just use `struct string { ... };'. It makes it clear what you're handling. C23 finally gave us "auto", you shouldn't fret over typedefing everything anymore. I also prefer a "strbuf" type with an index and capacity so I can safely read and write to it with a derived "strview" having pointer and length only which references into the buffer.

> returning results

The general method of returning structures larger than two machine words is fairly inefficient. Plus you're cutting yourself off from another C23 gem which was [[nodiscard]]. If you want the 'ok' value checked then you can _really_ specify that. Put everything else behind a pointer passed in an argument. The sum type logic works just as well there.

> I tend to avoid the string.h functions most of the time, only employing the mem family when I want to, well, mess with memory.

So you use strlen() a lot and don't have to deal with multibyte characters anywhere in your code. It's not much of a strategy.

WalterBright - 14 hours ago

> I’ve long been employing the length+data string struct. If there was one thing I could go back and time to change about the C language, it would be removal of the null-terminated string.

It's not necessary to go back in time. I proposed a way to do it in modern C - no existing code would break:

https://www.digitalmars.com/articles/C-biggest-mistake.html

It's simple, and easy to implement.

apaprocki - 11 hours ago

Please don’t buy into “no const”. If you’ve ever worked with a lot of C/C++ code, you really appreciate proper const usage and it’s very obvious if a prototype is written incorrectly because now any callers will have errors. No serious reusable library would expose functions taking char* without proper const usage. You would never be able to pass a C++ string c_str() to such a C function without a const_cast if that were the case. Casting away const is and should be an immediate smell.

moth-fuzz - 8 hours ago

I'm a huge fan of the 'parse, don't validate' idiom, but it feels like a bit of a hurdle to use it in C - in order to really encapsulate and avoid errors, you'd need to use opaque pointers to hidden types, which requires the use of malloc (or an object pool per-type or some other scaffolding, that would get quite repetitive after a while, but I digress).

You basically have to trade performance for correctness, whereas in a language like C++, that's the whole purpose of the constructor, which works for all kinds of memory: auto, static, dynamic, whatever.

In C, to initialize a struct without dynamic memory, you could always do the following:

    struct Name {
        const char *name;
    };

    int parse_name(const char *name, struct Name *ret) {
        if(name) {
            ret->name = name;
            return 1;
        } else {
            return 0;
        }
    }

    //in user code, *hopefully*...
    struct Name myname;
    parse_name("mothfuzz", &myname);
But then anyone could just instantiate an invalid Name without calling the parse_name function and pass it around wherever. This is very close to 'validation' type behaviour. So to get real 'parsing' behaviour, dynamic memory is required, which is off-limits for many of the kinds of projects one would use C for in the first place.

I'm very curious as to how the author resolves this, given that they say they don't use dynamic memory often. Maybe there's something I missed while reading.

matheusmoreira - 16 hours ago

> In the absence of proper language support, “sum types” are just structs with discipline.

With enough compiler support they could be more than that. For example, I submitted a tagged union analysis feature request to gcc and clang, and someone generalized it into a guard builtin.

https://github.com/llvm/llvm-project/issues/74205

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112840

GCC proved to be too complex for me to hack this in though. To this day I'm hoping someone better than me will implement it.

tom_ - 14 hours ago

If you really insist on not having a distinction between "u8"/"i8" and "unsigned char"/"signed char", and you've gone to the trouble of refusing to accept CHAR_BIT!=8, I'm pretty sure it'd be safer to typedef unsigned char u8 and typedef signed char i8. uint8_t/int8_t are not necessarily character types (see 6.2.5.20 and 7.22.1.1) and there are ramifications (see, e.g., 6.2.6.1, 6.3.2.3, 6.5.1).

bArray - 3 hours ago

> I don’t personally do things that require dynamic memory management in C often, so I don’t have many practices for it. I know that wellons & co. Have been really liking the arena, and I’d probably like it too if I actually used the heap often. But I don’t, so I have nothing to say.

> If I find myself needing a bunch of dynamic memory allocations and lifetime management, I will simply start using another language–usually rust or C#.

I'm not sure what the modern standards are, but if you are writing in C, pre-allocate as much as possible. Any kind of garbage collection is just extra processing time and ideally you don't want to run out of memory during an allocation mid-execution.

People may frown at C, but nothing beats getting your inner loops into CPU cache. If you can avoid extra fetches into RAM, you can really crank some processing power. Example projects have included computer vision, servers a custom neural network - all of which had no business being so fast.

doanbactam - 12 hours ago

Solid list. The bit about avoiding the preprocessor as much as possible really resonates—using `static inline` functions and `enum` instead of macros makes debugging so much less painful. What's your take on using C11's `_Generic` for type-generic macros? It adds some verbosity but can save you from a lot of runtime type errors.

keyle - 14 hours ago

That made me smile

     If I find myself needing a bunch of dynamic memory allocations and lifetime management, I will simply start using another language–usually rust or C#.
Now that is some C habit for the modern day... But huh, not C.
canpan - 15 hours ago

Regarding memory, I recently changed to try to not use dynamic memory, or if I need to, to do it once at startup. Often static memory on startup is sufficient.

Instead use the stack much more and have a limit on how much data the program can handle fixed on startup. It adds the need to think what happens if your system runs out of memory.

Like OP said, it's not a solution for all types of programs. But it makes for very stable software with known and easily tested error states. Also adds a bit of fun in figuring out how to do it.

amiga386 - 13 hours ago

Fun fact: the background image is the "BallsMany" pattern included with MagicWB for the Amiga

(To confirm: download the LhA archive from https://aminet.net/package/util/wb/MagicWB21p then open the archive in 7-zip, extract Patterns/BallsMany then load into an ILBM viewer, e.g. https://www.retroreversing.com/ilbm )

skywalqer - 16 hours ago

Nice post, but the flashy thing on the side is pretty distracting. I liked the tuples and maybes.

JamesTRexx - 15 hours ago

Two things I thought while reading the post: Why not typedef BitInt types for stricter size and accidental promotion control when typedeffing for easier names anyway? I came across a post mentioning using regular arrays instead of strings to avoid the null terminatorand off-by-one pitfalls.

I still have a lot of conversion to do before I can try this in my hobby project, but these are interesting ideas.

jcalvinowens - 14 hours ago

  #if CHAR_BIT != 8
   #error "CHAR_BIT != 8"
  #endif
In modern C you can use static_assert to make this a bit nicer.

  static_assert(CHAR_BIT == 8, "CHAR_BIT is not 8");
...although it would be a bit of a shame IMHO to add that reflexively in code that doesn't necessarily require it.

https://en.cppreference.com/w/c/language/_Static_assert.html

SkiFire13 - 7 hours ago

> Additionally, the intent of whether the buffer is used as “raw” memory chunks versus a meaningful u8 is pretty clear from the code that it gets used in, so I’m not worried about confusing intent with it.

It's generally not clear to the compiler, and that can result in missed optimization opportunities.

BigJono - 14 hours ago

I really dislike parsing not validating as general advice. IMO this is the true differentiator of type systems that most people should be familiar with instead of "dynamic vs static" or "strong vs weak".

Adding complexity to your type system and to the representation of types within your code has a cost in terms of mental overhead. It's become trendy to have this mental model where the cost of "type safety" is paid in keystrokes but pays for itself in reducing mental overhead for the developers. But in reality you're trading one kind of mental overhead for another, the cost you pay to implement it is extra.

It's like "what are all the ways I could use this wrong" vs "what are all the possibilities that exist". There's no difference in mental overhead between between having one tool you can use in 500 ways or 500 tools you can use in 1 way, either way you need to know 500 things, so the difference lies elsewhere. The effort and keystrokes that you use to add type safety can only ever increase the complexity of your project.

If you're going to pay for it, that complexity has to be worth it. Every single project should be making a conscious decision about this on day one. For the cost to be worth it, the rate of iteration has to be low enough and the cost of runtime bugs has to be high enough. Paying the cost is a no brainer on a banking system, spacecraft or low level library depended on by a million developers.

Where I think we've lost the plot is that NOT paying the cost should be a no brainer for stuff like front end web development and video games where there's basically zero cost in small bugs. Typescript is a huge fuck up on the front end, and C++ is a 30 year fuck up in the games industry. Javascript and C have problems and aren't the right languages for those respective jobs, but we completely missed the point of why they got popular and didn't learn anything from it, and we haven't created the right languages yet for either of those two fields.

Same concept and cost/benefit analysis applies to all forms of testing, and formal verification too.

taminka - 8 hours ago

really cool website, what's your colour palette?

0xbadcafebee - 11 hours ago

> I think one of the most eye-opening blog posts I read when getting into programming initially was the evergreen parse, don’t validate post

Bro, that was written in 2019. If it's not old enough to drink it's not yet evergreen. But it's also long-winded. A 25-minute read, and y'know what the conclusion is? "Parsing leaves you with a new data structure matching a type, validation checks if some data technically complies with a type (but might not later be parsed correctly)".

I need all the baby programmers in the back to hear me: type systems are bikeshedding. The point of a type is only to restrict computation to a fixed set. This concept can be applied anywhere you need to ensure reliability and simplicity. You don't need a programming language to natively support types in order to implement the concept yourself in that language.

sys_64738 - 15 hours ago

#define BEGIN {

#define END }

/* scream! */

Panzerschrek - 10 hours ago

Yet another C person reinventing things which C++ already has.