The hidden corners of C

Intro

C is a very old language, but it's still being worked on to this day. Most of the syntax has been figured out for several decades, but very old C used to look pretty different. There's also a few features that haven't ever changed, but are very rarely used.

Bit Fields

Sometimes you need to cram different types of data into a small space. You'll see this a lot with certain file headers and things, and they're usually done with regular bitwise operations. However, C actually has a language feature for this, called bit fields.

To make a bitfield in C, you just need to specify the bit-size with a colon after the name. Here's a little example:

struct Bitfield {
    signed int x: 4;
    signed int y: 4;
};

The first thing you'll probably notice is the signed keyword. That's a bit of an honorable mention, since you don't see it much, but it does exactly what you'd expect it to do. The reason it's used here, is because int inside of a bitfield is different than every other int in the language for some reason. Inside a bitfield, standard int has an implementation-defined signedness, despite being signed in every other part of the language.

Back to the actual syntax: The only allowed type here is int, with an optional signed/unsigned qualifier. This makes sense, because specifying the size of the number with a type like short or long doesn't make sense in this context, and pointers and floating-point numbers can't be resized in a meaningful way. Arrays also aren't allowed, though this is more of a design choice than anything else. You also can't take a pointer to a bit field's members, since they necessarily may not fall on addressable boundaries.

K&R-Style Function Definitions

Have you ever seen a function declared like this

int func(void);

and wondered why they need to specify void there? Or maybe you've seen why, when you had a function like

int func();

inside a header file and realized that the compiler lets you call it with any number of arguments of any type. You might need to compile with an older version of C to see this happen, since it was removed in C23, but in classic C, functions could be declared in all sorts of weird ways. The reason for this is a bit hard to justify, but C functions used to look very different.

If you were writing C in the very early days, before 1989 when the first official standard was released, C functions looked like this:

int add(a, b) int a; int b; {
    return a + b;
}

So, why? Everything I could find mainly states that that's just "how it was", and that it would've been more work and memory for the compiler to track all the types, which wasn't an easy pill to swallow when you're designed for the PDP-11, a computer with 4KB of RAM, less than half the size of this markdown file!

So then, the idea was that as long as the compiler knew there was a function, the types didn't really matter. Worse though, is that the compiler also would simply assume types in a lot of cases. You see, you can actually totally omit those parameter types, and the compiler will simply assume they're ints. You can even do the same for the function's return type, which means that

add(a, b) {
    return a + b;
}

is a perfectly valid C function. You're almost certainly gonna have to tell your compiler to be a bit permissive in order to compile this one, but it's valid nonetheless, even in C23!

String concatenation

This one's not so secret, but I think it's interesting to cover nonetheless. In a lot of high-level languages like Java, Python, or C#, you can combine strings just by adding them. The only issue here is that it requires an allocation, so you need some sort of automatic memory management to keep it under control. There is one exception though, when all of the values are known at compile-time, the allocation can be done once, when compiling, and then the final string can be baked into your program. C allows you to do this, simply by placing the literals next to each other!

printf("Hello,"" World!");

is a perfectly valid way to combine two strings, however cursed it may look. Your first thought might be "Why didn't they just use +?", but the reason is actually pretty clever. You see, it's impossible to do this with anything other than a string literal, so the syntax doesn't need to try and make a distinction between compile-time and runtime values, which simplifies the work.

Additional fun fact, Python also supports doing this to strings, I have no idea why.

Variadic arguments

Have you ever looked at a function like printf and wondered how it's able to accept any number of arguments? A ton of languages have features for this, but they usually rely on features that C doesn't have, like a proper array or slice type. Instead, C has a special ... parameter, which can be used to allow any number of arguments to be jammed into the function. Using it looks a little like this:

void printf(const char *format, ...);

Something you'll notice is that there's no type there. That makes sense, since printf can accept any type, but how does the function know what types it's being given? This is actually super clever. You see, when you use the variadic argument, C will just shove every argument together into what's basically a big buffer. It's up to you to actually parse what's in there, which is exactly what printf uses those %d format specifiers for! It's parsing through the string you give it, and every time it finds a format specifier, it's reading that type out of the big argument buffer! This is also why it isn't type-safe, since there's no way to validate what it's actually trying to read.

Flexible array members

This one's a bit weird. Have you ever seen a struct like this?

struct Array {
    int size;
    int values[];
}

Do you know how big it is? I'll give you a hint, this isn't a case of arrays sneakily being pointers.

The answer is the size of int, at least according to sizeof. What's happening here is what's called a Flexible Array Member. It's a C99 feature that allows the last member of a struct to be an array of unspecified size. It means you can store all your information inline, without needing to store a pointer and have indirection, but it comes with some interesting drawbacks.

Like I mentioned, the sizeof operator will act like that flexible array is entirely empty. It's doing this because sizeof happens at compile-time, so it has no way to know the size of the struct. You'll also see this behavior anywhere else you have to know the size at compile time, like trying to assign it to another variable, or passing it to a function. This makes it pretty niche and easy to mess up, but if you're careful to always use pointers, it can be really useful in the right cases.

The `restrict` and `volatile` keywords

You might've heard of this one before, but have you ever actually used it? restrict in C is an additional type specifier you can give to pointers, like int *restrict x;, and it tells the compiler that this pointer is "restricted", meaning it's the sole pointer to some data. The main goal here it to say that the value stored at the pointer won't get swapped under your feet, here's an example.

void modify(int *target, int* value) {
    *target += *value;
    *target *= *value;
    *target += *value;
}

Now, under most circumstances, there's a very simple optimization to make, which is to dereference value once, rather than 3 times, however the C compiler can't safely assume that. Consider this case:

int x = 2;
modify(&x, &x);

Here, x is the target and the value, which means that value will change during the function! Now, a modern compiler may be able to detect cases when this doesn't happen, and optimize accordingly, but restrict exists to allow you to directly tell the compiler that that value will never change under your feet.

Next, there's volatile. It basically says the opposite. The compiler shall never assume that the data is stable. This is mainly useful in the context of embedded systems or other cases with memory-mapped IO, where code like

volatile bool *const PORT = (volatile bool *)0x1000;

*PORT = 0;
while (!*PORT) {}

might actually make sense. Here, we're saying that PORT is a boolean pointer at address 0x1000, and telling the compiler to make no assumptions about the state of it. It makes sense in a context like this, where we can assume there's another program, or maybe hardware itself, which will write to address 0x1000 to inform us of something.

Another honorable mention goes to placing const on the other side of the pointer. It's just a way to say that PORT cannot be reassigned, but *PORT is fine.

The `_Generic` keyword

This one's a new-ish feature for a change! One common complaint about C is its lack of generic programming capabilities, something like template in C++ or generics in other languages. Turns out, since C11, C has had generics! Sort of, kind of, ish...

This is a feature I've genuinely never seen someone use, but it's effectively a switch statement for the type of an expression. Here's an example, apologies that it's a bit messy:

void print_float(float x) {
    printf("A float!\n");
}

void print_int(int x) {
    printf("An int!\n");
}

#define print(val) _Generic((val), \
    float: print_float,\
    int: print_int\
)(val)

int main(void) {
    int x = 10;
    float y = 20.0f;

    print(x);
    print(y);
    
    return 0;
}

So, what the hell is this? Firstly, we have two functions for printing different types, that's simple enough, then we have the print macro...

Let's look at the expansion of print(x):

_Generic((x), 
    float: print_float,
    int: print_int
)(x)

Firstly, the generic keyword. It works like a function, sorta. Firstly, it accepts an expression, whose type is the type we're switching over, then a list of possible matches. In this case, we give it an expression to expand to for floats, and an expression for ints. Then, we're calling it with x again, because _Generic will simply expand to the value that matches, which will be one of our two functions. _Generic also supports a default branch, which will be matched last, if no other types fit.

That's all, for now!

So yeah, C is a weird language, and there's a ton I didn't cover. One thing I totally ignored was all the unexpected places that undefined behavior creeps in, some other weird types like C's builtin complex numbers under the _Complex type, even more odd keywords like register or _Noreturn, or even variadic macros!

Either way, that's for another time, so have fun writing C, or don't!