Brent's Encapsulated C Programming Rules (2020)

76 points by p2detar 2 days ago

Check against FLT_EPSILON. Oh boy.

The reason is floating point precision errors, sure, but that check is not going to solve the problems.

Took a difference of two numbers with large exponents, where the result should be algebraically zero but isn't quite numerically? Then this check fails to catch it. Took another difference of two numbers with very small exponents, where the result is not actually algebraically zero? This check says it's zero.

syncsynchalt - 2 days ago

Yeah, at the least you'll need an understanding of ULPs[0] before you can write code that's safe in this way. And understanding ULPs means understanding that no single constant is going to be applicable across the FLT or DBL range.
[0] https://en.wikipedia.org/wiki/Unit_in_the_last_place

breckinloggins - 2 days ago

Other resources I like:

- Eskil Steenberg’s “How I program C” (https://youtu.be/443UNeGrFoM). Long and definitely a bit controversial in parts, but I find myself agreeing with most of it.

- CoreFoundation’s create rule (https://stackoverflow.com/questions/5718415/corefoundation-o...). I’m definitely biased but I strongly prefer this to OP’s “you declare it you free it” rule.

quelsolaar - 2 days ago

Thanks for the shout out. I had no idea my 2h video, without a camera 8 years ago would have such legs! I should make a new one and include why zero initialization is bad.
- elcapitan - 2 days ago
  
  Thank you for recording it! :) It hits the right balance between opinionated choices with explanations and a general introduction to "post-beginner" problems which probably a lot of people who have programming experience, but not in C, face.
writebetterc - 2 days ago

I can't edit my comment any longer, but I really like nullprogram.com
- capyba - 2 days ago
  
  Same here! That’s a great blog with a lot of good advice.

writebetterc - 2 days ago

void* is basically used for ad-hoc polymorphism in C, and it is a vital part of C programming.

    void new_thread(void (*run)(void*), void* context);

^- This let's us pass arbitrary starting data to a new thread.

I don't know whether this counts as "very few use cases".

The Memory Ownership advice is maybe good, but why are you allocating in the copy routine if the caller is responsible for freeing it, anyway? This dependency on the global allocator creates an unnecessarily inflexible program design. I also don't get how the caller is supposed to know how to free the memory. What if the data structure is more complex, such as a binary tree?

It's preferable to have the caller allocate the memory.

    void insert(BinTree *tree, int key, BinTreeNode *node);

^- this is preferable to the variant where it takes the value as the third parameter. Of course, an intrusive variant is probably the best.

If you need to allocate for your own needs, then allow the user to pass in an allocator pointer (I guessed on function pointer syntax):

    struct allocator { void* (*new)(size_t size, size_t alignment); void (*free)(void* p, size_t size); void* context; }.*

mrkeen - a day ago

void* is a problem because the caller and callee need to coordinate across the encapsulation boundary, thus breaking it. (Internally it would be fine to use - the author could carefully check that qsort casts to the right type inside the .c file)
> What if the data structure is more complex, such as a binary tree?
I think that's what the author was going with by exposing opaque structs with _new() and _free() methods.
But yeah, his good and bad versions of strclone look more or less the same to me.
warmwaffles - 2 days ago

Curious about the allocator, why pass a size when freeing?
- - 2 days ago
  
  [deleted]
- naasking - 2 days ago
  
  If you don't pass the size, the allocation subsystem has to track the size somehow, typically by either storing the size in a header or partitioning space into fixed-size buckets and doing address arithmetic. This makes the runtime more complex, and often requires more runtime storage space.
  If your API instead accepts a size parameter, you can ignore it and still use these approaches, but it also opens up other possibilities that require less complexity and runtime space by relying on the client to provide this information.
  - warmwaffles - a day ago
    
    The way I've implemented it now was indeed to track the size in a small header above the allocation, but this was only present in debug mode. I only deal with simple allocators like a linear, pool, and normal heap allocator. I haven't found the need for something super complex yet.

zoomablemind - 2 days ago

"...C is my favorite language and I love the freedom and exploration it allows me. I also love that it is so close to Assembly and I love writing assembly for much of the same reasons!"

I wonder what is author's view about user's reasons to choose a C API?

What I mean is users may want exactly the same freedom and immediacy of C that the author embraces. However, the very approach to encapsulation by hiding the layout of the memory, the use of accessor functions limits the user's freedom and robs them of performance too.

In my view, the choice of using C in projects comes with certain responsibilities and expectations from the user. Thus higher degree of trust to the API user is due.

f1shy - 2 days ago

> Make sure that you turn on warnings as errors

I’m seeing this way too often. It is a good idea to never ignore a warning, an developers without discipline may need it. But for god’s sake, there is a reason why there are warnings and errors ,and they are treated differently. I don’t think compiler writers and/or C standards will deprecate warnings and make them errors anytime soon, and for good reason. So IMHO is better to treat errors as errors and warnings as warnings. I have seen plenty of times this flag is mandatory, and to avoid the warning (error) the code is decorated with compiler pacifiers, which makes no sense!

So for some setups I understand the value, but doing it all the time shows some kind of lazyness.

Chabsff - a day ago

> and to avoid the warning (error) the code is decorated with compiler pacifiers, which makes no sense!
How is that a bad thing, exactly?
Think of it this way: The pacifiers don't just prevent the warnings. They embed the warnings within the code itself in a way where they are acknowledged by the developer.
Sure, just throwing in compiler pacifiers willy-nilly to squelch the warnings is terrible.
However, making developers explicitly write in the code "Yes, this block of code triggers a warning, and yes it's what I want to do because xyz" seems not only perfectly fine, but straight up desirable. Preventing them from pushing the code to the repo before doing so by enabling warnings-as-errors is a great way to get that done.
The only place where I've seen warnings-as-errors become a huge pain is when dealing with multiple platforms and multiple compilers that have different settings. This was a big issue in Gen7 game dev because getting the PS3's gcc, the Wii's CodeWarrior and the XBox360's MSVC to align on warnings was like herding cats, and not every dev had every devkit for obvious reason. And even then, warnings as errors was still very much worth it in the long run.
- f1shy - a day ago
  
  IMHO readability is the absolute maximum paramount priority. Having the code interrupted by pacifiers makes the code more difficult to read. The warning is very visible when compiling. Let me argue, much more visible. Why? well, independent if my last change had something directly to do with that piece of code, I will see the warning. If I use some preprocessor magic, I will only see that if I directly work in that part of the code.
  Again, IMHO the big problem is people think "warnings are ok, just warnings, can be ignored".
  And just as anecdotal point "Sure, just throwing in compiler pacifiers willy-nilly to squelch the warnings is terrible." this is exactly what I have seen in real life, 100% of the time.
  - 1718627440 - 19 hours ago
    
    But how do you distinguish between warnings intended by the author and warnings, that weren't, so they should be fixed?
    
    f1shy - 17 hours ago
    
    Please remember we are coming from "set warnings to errors", which I interpret as: I know better than the people doing the compiler. There is a good reason for the two. If not, there could be no warnings at all, all would be an error.
    My rationale: if you do set warn->error, then there are 2 ways around it: change the code to eliminate the warning, or pacify the compiler. Note, the measure to set it to error, is to instigate lazy programmers to deal with it. If the lazy person is really lazy, then they will deal with it with a pacifier. You won nothing.
    There is no one recipe for everything. That is why, even if I do not like to treat warnings as errors, sometimes may be a possible solution.
    I think you should deal with warnings, you should have as few as possible, if any at all. So if you have just a couple, is not a problem to document them clearly. Developers building the project should be informed anyway of many other things.
    In some projects I worked, we saw warnings as technical debt. So hiding them with a pacifier would make us forget. But we saw them in every build, so we were reminded constantly, we should rework that code. Again, it depends on the setup you have in the project. I know people now are working with this new trend "ci/cd" and never get to the see the compilation. So depending on the setup one thing or another may be better.
    
    1718627440 - 17 hours ago
    
    > My rationale: if you do set warn->error, then there are 2 ways around it: change the code to eliminate the warning, or pacify the compiler. Note, the measure to set it to error, is to instigate lazy programmers to deal with it. If the lazy person is really lazy, then they will deal with it with a pacifier. You won nothing.
    > You won nothing.
    No, you won that you can distinguish between intended and not intended warnings. Specifying in the code, which warnings are expected makes all warnings that the compiler outputs something you want to get fixed. When you do not do that, than it is easy to miss a new warning or that the warning changed. So you essentially say that you should not distinguish between intended and non-intended warnings?
    Having no warnings is a worthwhile goal, but often not possible, since you want to be warned for some things, so you need that warning level, but you don't want to be warned about that in a specific line.
    
    f1shy - 16 hours ago
    
    > So you essentially say that you should not distinguish between intended and non-intended warnings?
    No, I pretty clearly said the opposite. Please read what I wrote:
    "[...] is not a problem to document them clearly. Developers building the project should be informed anyway of many other things"
    I also stated "warnings, you should have as few as possible, if any at all" in the projects I worked we hardly had any in the final delivery, but we had many in-between, which I find ok. If there are only 2 warnings, I do not see a big risk of not seeing a 3rd. I expect developers to look the compiler output, carefully, as if it was a review from a coworker.
    Last but not least you ignore my last paragraph, where I say warnings are typically technical debt. There should be in the long run no "expected" warnings. My whole point is that they are just no error, so you should allow the program to compile and keep working in other things. I do not think is ok to have warnings. Also (specially) I think is a bad idea to silence the compiler.
    Anyway, a good compiler will end with a nile lime “N warnings detected“ so there is that. You can just compare an integer to know if there are more warnings… not so difficult, is it?
    If you read my comments it should be clear. If not, I cannot help with that. If you want to disagree, as long as you don't work in my code, is ok. This is just my 2ct opinion.
  - rramadass - 21 hours ago
    
    Well said.
    For some reason people stop thinking when it comes to warnings. Often it is the warning which gets one to rethink and refactor the code properly. If for whatever reason you want to live with the warning, comment the code appropriately, do not squelch blindly.

masfoobar - 17 hours ago

My only gripe is with Vec3_new() function in "Memory Ownership" section.

It assumes you want a single malloc of Vec3. It tries to behave as if you are doing a 'new' in an OOP language.

Let the programmer decide the size of it.

Mock example (not tested)

  struct Vec3* Vec3_new(size_t size)
  {
    if(size <= 0) {
      // todo: handle properly
      return NULL;
    }
  
    struct Vec3 *v = malloc(sizeof(struct Vec3) * size);
  
    size_t i;
    for(i = 0; i < size; i++) {
      v[i].x = 0.0F;
      v[i].y = 0.0F;
      v[i].z = 0.0F;
    }
  
    return v;
  }

1718627440 - 19 hours ago

> In making code readable, you should only use char* or unsigned char* for strings (character arrays). If you want a block of bytes/memory pointer, then you should use uint8_t* where uint8_t is part of stdint.h. This makes the code much more readable where memory is represented as an unsighned 8-bit array of numbers (byte array). Now you can trust when you see a char* that it is referring to a UTF-8 (or ASCII) character array (text).

I use uint8_t for 8-bit integers, unsigned char for memory and char for text. uint8_t for memory doesn't feels right.

pizlonator - 2 days ago

Good stuff.

Only things I disagree with:

- The out-parameter of strclone. How annoying! I don't think this adds information. Just return a pointer, man. (And instead of defending against the possibility that someone is doing some weird string pooling, how about jut disallow that - malloc and free are your friends.)

- Avoiding void. As mentioned in another comment, it's useful for polymorphism. You can do quite nice polymorphic code in C and then you end up using void a lot.

syncsynchalt - 2 days ago

Yes that section raised my hackles too, to the point where I'm suspicious of the whole article.
The solution, in my opinion, is to either document that strclone()'s return should be free()'d, or alternately add a strfree() declaration to the header (which might just be `#define strfree(x) free(x)`).
Adding a `char **out` arg does not, in my opinion, document that the pointer should be free()'d.

1718627440 - 19 hours ago

> One of the flaws with pure encapsulation is that you can see a drop in performance. Having a bunch of functions to get inner members of a structure also blocks the compiler from optimizing it’s best.

Why can't you just use an optimizing compiler? Trading casting const away, doesn't seem right to me.

bvrmn - a day ago

It's funny. One can't simply write a correct C code. Even after years of practice.

    void strclone(const char* str, char** outCpy)
    {
        size_t len = strlen(s) + 1;
        *outCpy = malloc(len);
        memcpy(outCpy, str, len); // wrong dest address 
    }

I don't like double pointer parameters because of it.

1718627440 - 19 hours ago

> So in closing on the UTF-8 topic, please stop using `wchar_t`

So he said and then shows corrections to a manual implementation of character length instead of using the standard wcswidth.

fpotier - 2 days ago

void employee_set_age(struct Employee* employee, int newAge) { // Cast away the const and set it's value, the compiler should optimize this for you (int)&employee->age = newAge; }

I believe that "Casting away the const" is UB [1]

[1]: https://en.cppreference.com/w/c/language/const.html

spacechild1 - 2 days ago

It's only UB if the pointed to object is actually const (in which case it might live in read-only memory).
- fpotier - 2 days ago
  
  [dead]

jjgreen - 2 days ago

Outstanding, why hadn't I come across this before?

unwind - 2 days ago

Quite interesting, and felt fairly "modern" (which for C programming advice sometimes only means it's post-2000 or so). A few comments:

----

This:

    struct Vec3* v = malloc(sizeof(struct Vec3));

is better written as:

    struct Vec3 * const v = malloc(sizeof *v);

The `const` is perhaps over-doing it, but it makes it clear that "for the rest of this scope, the value of this pointer won't change" which I think is good for readability. The main point is "locking" the size to the size of the type being pointed at, rather than "freely" using `sizeof` the type name. If the type name later changes, or `Vec4` is added and code is copy-pasted, this lessens the risk of allocating the wrong amount and is less complicated.

----

This is maybe language-lawyering, but you can't write a function named `strclone()` unless you are a C standard library implementor. All functions whose names begin with "str" followed by a lower-case letter are reserved [1].

----

This `for` loop header (from the "Use utf8 strings" section:

    for (size_t i = 0; *str != 0; ++len)

is just atrocious. If you're not going to use `i`, you don't need a `for` loop to introduce it. Either delete (`for(; ...` is valid) or use a `while` instead.

----

In the "Zero Your Structs" section, it sounds as if the author recommends setting the bits of structures to all zero in order to make sure any pointer members are `NULL`. This is dangerous, since C does not guarantee that `NULL` is equivalent to all-bits-zero. I'm sure it's moot on modern platforms where implementations have chosen to represent `NULL` as all-bits-zero, but that should at least be made clear.

[1]: https://www.gnu.org/software/libc/manual/html_node/Reserved-...

jandrese - 2 days ago
```
    This:

    struct Vec3* v = malloc(sizeof(struct Vec3));

    is better written as:

    struct Vec3 * const v = malloc(sizeof *v);
```
I don't love this. Other people are going to think you're only allocating a pointer. It's potentially confusing.
- f1shy - 2 days ago
  
  I also personally find totally confusing leaving the * in the middle of nowhere, like flapping in the breeze.
  - unwind - a day ago
    
    Where would you put it? The const of the pointer is not the main point, it's just extra clarity that the allocated pointer is not as easily overwritten which would leak the memory.
    
    f1shy - a day ago
    
    I put it attached to the variable name when possible, if not attached to the type.
- unwind - a day ago
  Uh, okay, but if you need to constantly write code as if people reading it don't understand the language, then ... I don't know how to do that. :)
  It's not possible to know C code and think that
  sizeof *v
  and
  sizeof v
  somehow mean the same thing, at least not to me.
  - ux266478 - a day ago
    
    no, but you can misread the two interchangeably no matter how familiar you are with the language.

Jean-Papoulos - 2 days ago

>What this means is that you can explain all the intent of your code through the header file and the developer who uses your lib/code never has to look at the actual implementations of the code.

I hate this. If my intellisense isn't providing sufficient info (generated from doc comments), then I need to go look at the implementation. This just adds burden.

Headers are unequivocally a bad design choice, and this is why most of every language past the nineties got rid of them.

alextingle - 2 days ago

Separating interface from implementation of one of the core practices for making large code bases tractable.
- valleyer - 2 days ago
  
  Of course, but that's doable without making programmers maintain headers, and some modern languages do that.
  - ux266478 - 2 days ago
    
    I've found usually to poor effect. Both Rust and Haskell did away with .mli files and ended up worse for it. Haskell simplified the boundary between modules and offloaded the abstractions it was used for into its more robust type system, but it ended up lobotomizing the modularity paradigm that ML did so well.
    Rust did the exact opposite and spread the interface language across keywords in source files, making it simultaneously more complicated and ultimately less powerful since it ends up lacking a cognate to higher order modules. Now the interfaces are machine generated, but the modularity paradigm ends up lobotomized all the same, and my code is littered with pub impls (pimples, as I like to call it) as though it's Java or C# or some other ugly mudball.
    For Haskell, the type system at least copes for the bad module system outside of compile times. For Rust, it's hard for me to say anything positive about its approach to interfaces and modules. I'd rather just have .mlis instead of either.
  - 1718627440 - 19 hours ago
    
    I don't understand that hate against header files. It is just a separate file with the interface. Of course you also need to change it, when you change the API, which you might find annoying, but maybe you should use that to consider that you are just changing an API?
GhosT078 - 2 days ago

Look to Ada for “headers” (i.e. specs) done right.
- runlaszlorun - 2 days ago
  
  Recently became big Ada fanboy, ironic because Im far more a fan of minimal, succinct syntax like lisp, forth, etc and I actually successfully lobbied a professor in 1993 to _not_ use it in an undergrad software engineering class lol.
  Still in the honeymoon phase granted, but I'm actually terrified that we have these new defense tech startups have no clue about Ada collectively.
  Your startup MVP you wants to ship a SaaS product ASAP and iterate? Sure, grab Python or JS and whatever shitstorm of libraries you want to wrestle with.
  Want to play God and write code that kills?
  Total category error.
  The fact that I'm sure there are at least a few of these defense tech startups yolo'ing our future away with vibe coded commits when writing code that... let's not mince our words... takes human life... prob says about how far we've fallen from "engineering".
antonvs - 2 days ago

C's text preprocessor headers were a pragmatic design choice in the 1970s. It's just that the language stuck around longer than it deserved to.
- tomcam - a day ago
  
  So what language is ready to take its place in the thousands of new chips that emerge every year, the new operating systems, and millions of programs written in see every year?
  - antonvs - a day ago
    
    You're alluding to the network effects that make that takeover difficult now, after decades of doubling down on a technically weak and systemically insecure solution.
    Languages that are technically capable of replacing C in all those applications include Ada, (and in certain applications, SPARK Ada), D, Zig, Rust, and the Pascal/Modula-2/Oberon family. None of those language use a purely textual preprocessor like C's. They all fix many of C's other design weaknesses that were excusable in the 1970s, but really aren't today.
    But Rust in the Linux kernel is no longer experimental, so perhaps things are starting to improve.