Jackson Allan · 28 January 2023
This article is part of a series describing the novel techniques that power the generic container library Convenient Containers (CC). CC distinguishes itself by eliminating some of the inconveniences traditionally burdening container libraries in C. Specifically, it does not require the user to define container types, and it provides a generic API that is agnostic to both container type and content type yet also type-safe. In other words, CC containers should be almost as simple to use as containers in higher-level languages.
In this article, we will explore extendable _Generic
macros.
CC uses such macros to deduce comparison, hash, and destructor functions based on a container's element or key type at the site of every API call.
These macros alleviate the need for containers to carry around function pointers at runtime and the subsequent performance penalty.
They also spare library users from having to specify these functions every time they declare containers.
C11 introduced the _Generic
keyword:
#define print( val ) _Generic( /* Controlling expression: */ (val), \
/* Association list: */ \
int: print_int, \
double: print_double, \
char *: print_string \
)( val ) /* We may end the macro by calling the selected function. */ \
void print_int( int val ) { printf( "%d", val ); }
void print_double( double val ) { printf( "%f", val ); }
void print_string( char *val ) { printf( "%s", val ); }
The type of the (unevaluated) controlling expression determines which expression from the association list is selected and included at compile time.
In most cases, _Generic
lives inside a macro and acts as a primitive means of function overloading.
However, several aspects of _Generic
severely limit its usefulness.
In particular, all compatible types must be hard-coded by the programmer writing it.
Hence, a library developer cannot create a _Generic
macro into which the library's users can easily plug their own types and functions (or so it seems).
This problem has caused frustration for programmers seeking more flexibility.
In fact, we can circumvent the aforementioned limitation by providing empty “slots” in our _Generic
macro to be filled by user-supplied types and functions.
To do so, we must use preprocessor magic to include or omit another macro inside the _Generic
macro based on whether it is defined.
Here is one possible naive implementation of an extendible _Generic
macro that hashes a variable:
/*------------------------------------------------------ hash.h ------------------------------------------------------*/
#ifndef HASH_H
#define HASH_H
#include <stddef.h>
// IF_DEF_INCL_SLOT( macro ) includes a _Generic slot only if the specified macro is defined (as type, function_name).
// It takes advantage of the fact that if the macro is defined, then the expanded text will contain a comma.
#define COMMA ,
#define ARG_1( _1, ... ) _1
#define ARG_2( _1, _2, ... ) _2
#define ARG_3( _1, _2, _3, ... ) _3
#define IF_DEF_INCL_SLOT( macro ) ARG_3( macro, ARG_1( macro, ): ARG_2( macro, ) COMMA, , )
// Generic hash macro.
#define hash( val ) _Generic( (val), \
/* Extendibility slots: */ \
IF_DEF_INCL_SLOT( HASH_1 ) \
IF_DEF_INCL_SLOT( HASH_2 ) \
IF_DEF_INCL_SLOT( HASH_3 ) \
IF_DEF_INCL_SLOT( HASH_4 ) \
IF_DEF_INCL_SLOT( HASH_5 ) \
IF_DEF_INCL_SLOT( HASH_6 ) \
IF_DEF_INCL_SLOT( HASH_7 ) \
IF_DEF_INCL_SLOT( HASH_8 ) \
/* Built-ins: */ \
short: hash_short, \
int: hash_int \
)( val ) \
// Built-in functions.
static inline size_t hash_short( short val ) { return val * 2654435761ull; }
static inline size_t hash_int( int val ) { return val * 2654435761ull; }
#endif
Now the library user can add functions for new or unsupported types as follows:
/*------------------------------------------------------ main.c ------------------------------------------------------*/
#include <stdio.h>
#include "hash.h"
// Add some user-supplied hash type and function pairs into empty slots.
size_t hash_long( long val ){ return val * 2654435761ull; }
#define HASH_1 long, hash_long
size_t hash_long_long( long long val ){ return val * 2654435761ull; }
#define HASH_2 long long, hash_long_long
// Test.
int main( void )
{
printf( "%zu %zu %zu %zu\n", hash( (short)1 ), hash( (int)2 ), hash( (long)3 ), hash( (long long)4 ) );
}
The above naive solution requires library users to manually manage numbered slots. Hence, it is cumbersome and unscalable. Fortunately, we can use the preprocessor to automate the process and completely hide the slotted interface from users. To do so, we need an auxiliary header to transform a type and function pair supplied by the user from macro text into regular, persistent code that fills the first available slot.
/*------------------------------------------------------ hash.h ------------------------------------------------------*/
#ifndef HASH_H
#define HASH_H
#include <stddef.h>
// IF_DEF( macro )( code ) includes the bracketed code only if the specified macro is defined (as empty).
#define COMMA() ,
#define ARG_2_( _1, _2, ... ) _2
#define ARG_2( ... ) ARG_2_( __VA_ARGS__ )
#define INCL( ... ) __VA_ARGS__
#define OMIT( ... )
#define IF_DEF( macro ) ARG_2( COMMA macro () INCL, OMIT, )
// Generic hash macro.
#define hash( val ) _Generic( (val), \
/* Extendibility slots: */ \
IF_DEF( HASH_1 )( hash_1_ty: hash_1_fn, ) \
IF_DEF( HASH_2 )( hash_2_ty: hash_2_fn, ) \
IF_DEF( HASH_3 )( hash_3_ty: hash_3_fn, ) \
IF_DEF( HASH_4 )( hash_4_ty: hash_4_fn, ) \
IF_DEF( HASH_5 )( hash_5_ty: hash_5_fn, ) \
IF_DEF( HASH_6 )( hash_6_ty: hash_6_fn, ) \
IF_DEF( HASH_7 )( hash_7_ty: hash_7_fn, ) \
IF_DEF( HASH_8 )( hash_8_ty: hash_8_fn, ) \
/* Built-ins: */ \
short: hash_short, \
int: hash_int \
)( val ) \
// Built-in functions.
static inline size_t hash_short( short val ) { return val * 2654435761ull; }
static inline size_t hash_int( int val ) { return val * 2654435761ull; }
// DEF_HASH_N_TY_FN( n ) defines a new hash type and function pair.
// This macro is used in add_hash.h.
// Its purpose is to convert the text the user supplied via the HASH macro (defined as type, function_name) into
// persistent, non-macro code.
// n denotes the slot that the pair will fill.
#define ARG_1_( _1, ... ) _1
#define ARG_1( ... ) ARG_1_( __VA_ARGS__ )
#define DEF_HASH_N_TY_FN( n ) \
typedef ARG_1( HASH ) hash_##n##_ty; \
static inline size_t hash_##n##_fn( hash_##n##_ty val ) \
{ \
return ARG_2( HASH, )( val ); \
} \
#endif
/*---------------------------------------------------- add_hash.h ----------------------------------------------------*/
// Check for correct usage.
#ifndef HASH
#error Define HASH as type, function_name before including add_hash.h.
#endif
// Insert hash type and function pair into the first available slot.
#if !defined( HASH_1 )
#define HASH_1 // Activate slot 1.
DEF_HASH_N_TY_FN( 1 ) // Define a type and function for slot 1.
#elif !defined( HASH_2 )
#define HASH_2
DEF_HASH_N_TY_FN( 2 )
#elif !defined( HASH_3 )
#define HASH_3
DEF_HASH_N_TY_FN( 3 )
#elif !defined( HASH_4 )
#define HASH_4
DEF_HASH_N_TY_FN( 4 )
#elif !defined( HASH_5 )
#define HASH_5
DEF_HASH_N_TY_FN( 5 )
#elif !defined( HASH_6 )
#define HASH_6
DEF_HASH_N_TY_FN( 6 )
#elif !defined( HASH_7 )
#define HASH_7
DEF_HASH_N_TY_FN( 7 )
#elif !defined( HASH_8 )
#define HASH_8
DEF_HASH_N_TY_FN( 8 )
#else
#error Sorry, too many hash functions!
#endif
// Undef HASH so that the user doesn't have to.
#undef HASH
Now our users can add hash functions without concerning themselves with slot numbers and collisions:
/*------------------------------------------------------ main.c ------------------------------------------------------*/
#include <stdio.h>
#include "hash.h"
// Add some user-supplied hash type and function pairs (now there is no need to specify slots).
size_t hash_long( long val ){ return val * 2654435761ull; }
#define HASH long, hash_long
#include "add_hash.h"
size_t hash_long_long( long long val ){ return val * 2654435761ull; }
#define HASH long long, hash_long_long
#include "add_hash.h"
// Test.
int main( void )
{
printf( "%zu %zu %zu %zu\n", hash( (short)1 ), hash( (int)2 ), hash( (long)3 ), hash( (long long)4 ) );
}
While the previous solution hides the slotted interface from the library user, it still requires the library programmer to include as many slots as users could conceivably need in a giant macro that the preprocessor must parse upon every invocation. A superior, albeit more complex, approach is to use more preprocessor magic to generate the necessary slots (up to some inordinately high limit) instead of omitting the empty ones.[1] To do this, we can create an octal counter inspired by the Boost preprocessor counter. Here is an implementation that uses a three-digit counter to support up to 511 type and function pairs:
/*------------------------------------------------------ hash.h ------------------------------------------------------*/
#ifndef HASH_H
#define HASH_H
#include <stddef.h>
// Octal counter that supports up to 511 hash type and function pairs.
#define HASH_COUNT_D1 0 // Digit 1, i.e. least significant digit.
#define HASH_COUNT_D2 0
#define HASH_COUNT_D3 0
// Standard concatenation macros.
#define CAT_2_( a, b ) a##b
#define CAT_2( a, b ) CAT_2_( a, b )
#define CAT_3_( a, b, c ) a##b##c
#define CAT_3( a, b, c ) CAT_3_( a, b, c )
#define CAT_4_( a, b, c, d ) a##b##c##d
#define CAT_4( a, b, c, d ) CAT_4_( a, b, c, d )
// Provides the current value of the counter as a three-digit octal number preceded by 0.
#define HASH_COUNT CAT_4( 0, HASH_COUNT_D3, HASH_COUNT_D2, HASH_COUNT_D1 )
// HASH_SLOTS generates _Generic slots from 0 to HASH_COUNT - 1 (inclusive) via macro pseudo-recursion.
// In the case of multiple extendible-_Generic macros, this code could be adjusted to support multiple counters (see CC
// library source code).
#define HASH_SLOT( n ) CAT_3( hash_, n, _ty ): CAT_3( hash_, n, _fn ),
#define R1_0( d3, d2 )
#define R1_1( d3, d2 ) HASH_SLOT( CAT_4( 0, d3, d2, 0 ) )
#define R1_2( d3, d2 ) HASH_SLOT( CAT_4( 0, d3, d2, 1 ) ) R1_1( d3, d2 )
#define R1_3( d3, d2 ) HASH_SLOT( CAT_4( 0, d3, d2, 2 ) ) R1_2( d3, d2 )
#define R1_4( d3, d2 ) HASH_SLOT( CAT_4( 0, d3, d2, 3 ) ) R1_3( d3, d2 )
#define R1_5( d3, d2 ) HASH_SLOT( CAT_4( 0, d3, d2, 4 ) ) R1_4( d3, d2 )
#define R1_6( d3, d2 ) HASH_SLOT( CAT_4( 0, d3, d2, 5 ) ) R1_5( d3, d2 )
#define R1_7( d3, d2 ) HASH_SLOT( CAT_4( 0, d3, d2, 6 ) ) R1_6( d3, d2 )
#define R1_8( d3, d2 ) HASH_SLOT( CAT_4( 0, d3, d2, 7 ) ) R1_7( d3, d2 )
#define R2_0( d3 )
#define R2_1( d3 ) R1_8( d3, 0 )
#define R2_2( d3 ) R1_8( d3, 1 ) R2_1( d3 )
#define R2_3( d3 ) R1_8( d3, 2 ) R2_2( d3 )
#define R2_4( d3 ) R1_8( d3, 3 ) R2_3( d3 )
#define R2_5( d3 ) R1_8( d3, 4 ) R2_4( d3 )
#define R2_6( d3 ) R1_8( d3, 5 ) R2_5( d3 )
#define R2_7( d3 ) R1_8( d3, 6 ) R2_6( d3 )
#define R2_8( d3 ) R1_8( d3, 7 ) R2_7( d3 )
#define R3_0
#define R3_1 R2_8( 0 )
#define R3_2 R2_8( 1 ) R3_1
#define R3_3 R2_8( 2 ) R3_2
#define R3_4 R2_8( 3 ) R3_3
#define R3_5 R2_8( 4 ) R3_4
#define R3_6 R2_8( 5 ) R3_5
#define R3_7 R2_8( 6 ) R3_6
#define HASH_SLOTS \
CAT_2( R1_, HASH_COUNT_D1 )( HASH_COUNT_D3, HASH_COUNT_D2 ) \
CAT_2( R2_, HASH_COUNT_D2 )( HASH_COUNT_D3 ) \
CAT_2( R3_, HASH_COUNT_D3 ) \
// Generic hash macro.
#define hash( val ) _Generic( (val), \
/* Extendibility slots: */ \
HASH_SLOTS \
/* Built-ins: */ \
short: hash_short, \
int: hash_int \
)( val ) \
// Built-in functions.
static inline size_t hash_short( short val ) { return val * 2654435761ull; }
static inline size_t hash_int( int val ) { return val * 2654435761ull; }
// Used in add_hash.h.
#define ARG_1_( _1, _2 ) _1
#define ARG_1( ... ) ARG_1_( __VA_ARGS__ )
#define ARG_2_( _1, _2 ) _2
#define ARG_2( ... ) ARG_2_( __VA_ARGS__ )
#endif
/*---------------------------------------------------- add_hash.h ----------------------------------------------------*/
// Check for correct usage.
#ifndef HASH
#error Define HASH as type, function_name before including add_hash.h.
#endif
// Convert the user-supplied HASH macro (defined as type, function_name ) into hash_XXXX_ty and hash_XXXX_fn, where XXXX
// is the current value of the HASH_COUNT counter.
typedef ARG_1( HASH ) CAT_3( hash_, HASH_COUNT, _ty ); // typeof( ARG_1( HASH ) ) would allow exotic types such as array
// and function pointers.
// typeof will become standard in C23.
static inline size_t CAT_3( hash_, HASH_COUNT, _fn )( CAT_3( hash_, HASH_COUNT, _ty ) val )
{
return ARG_2( HASH )( val );
}
// Increment the counter.
#if HASH_COUNT_D1 == 0
#undef HASH_COUNT_D1
#define HASH_COUNT_D1 1
#elif HASH_COUNT_D1 == 1
#undef HASH_COUNT_D1
#define HASH_COUNT_D1 2
#elif HASH_COUNT_D1 == 2
#undef HASH_COUNT_D1
#define HASH_COUNT_D1 3
#elif HASH_COUNT_D1 == 3
#undef HASH_COUNT_D1
#define HASH_COUNT_D1 4
#elif HASH_COUNT_D1 == 4
#undef HASH_COUNT_D1
#define HASH_COUNT_D1 5
#elif HASH_COUNT_D1 == 5
#undef HASH_COUNT_D1
#define HASH_COUNT_D1 6
#elif HASH_COUNT_D1 == 6
#undef HASH_COUNT_D1
#define HASH_COUNT_D1 7
#elif HASH_COUNT_D1 == 7
#undef HASH_COUNT_D1
#define HASH_COUNT_D1 0
#if HASH_COUNT_D2 == 0
#undef HASH_COUNT_D2
#define HASH_COUNT_D2 1
#elif HASH_COUNT_D2 == 1
#undef HASH_COUNT_D2
#define HASH_COUNT_D2 2
#elif HASH_COUNT_D2 == 2
#undef HASH_COUNT_D2
#define HASH_COUNT_D2 3
#elif HASH_COUNT_D2 == 3
#undef HASH_COUNT_D2
#define HASH_COUNT_D2 4
#elif HASH_COUNT_D2 == 4
#undef HASH_COUNT_D2
#define HASH_COUNT_D2 5
#elif HASH_COUNT_D2 == 5
#undef HASH_COUNT_D2
#define HASH_COUNT_D2 6
#elif HASH_COUNT_D2 == 6
#undef HASH_COUNT_D2
#define HASH_COUNT_D2 7
#elif HASH_COUNT_D2 == 7
#undef HASH_COUNT_D2
#define HASH_COUNT_D2 0
#if HASH_COUNT_D3 == 0
#undef HASH_COUNT_D3
#define HASH_COUNT_D3 1
#elif HASH_COUNT_D3 == 1
#undef HASH_COUNT_D3
#define HASH_COUNT_D3 2
#elif HASH_COUNT_D3 == 2
#undef HASH_COUNT_D3
#define HASH_COUNT_D3 3
#elif HASH_COUNT_D3 == 3
#undef HASH_COUNT_D3
#define HASH_COUNT_D3 4
#elif HASH_COUNT_D3 == 4
#undef HASH_COUNT_D3
#define HASH_COUNT_D3 5
#elif HASH_COUNT_D3 == 5
#undef HASH_COUNT_D3
#define HASH_COUNT_D3 6
#elif HASH_COUNT_D3 == 6
#undef HASH_COUNT_D3
#define HASH_COUNT_D3 7
#elif HASH_COUNT_D3 == 7
#error Are you mad? 512 hash functions are too many!
#endif
#endif
#endif
// Undef HASH so that the user doesn't have to.
#undef HASH
_Generic
disallows multiple expressions associated with the same type.
Hence, the above solutions prevent the library user from supplying custom functions for types with support built into the library.
This causes inconvenience (for example, users of our hash macro may prefer a faster or more uniformly distributed hash function for a particular type based on the patterns in their data).
To allow the built-in functions to be overwritten, we can place them inside a second _Generic
expression nested in the default
path of the primary one.
Thus, the built-in function for a given type will not be selected if a user-supplied function for that type short-circuits the macro first.
#define hash( val ) _Generic( (val), \
/* Extendibility slots: */ \
HASH_SLOTS \
/* Built-ins: */ \
default: _Generic( (val), \
short: hash_short, \
int: hash_int, \
default: NULL \
) \
)( val ) \
Single-file libraries are popular because they are very easy to use.
If our library is suitably small and we wish to follow the single-file paradigm, we can eliminate the need for a separate add_XXXX.h
by incorporating it into the primary header and selecting between the primary code and macro-extending code depending on whether the user has defined the extension macro:
/*------------------------------------------------------ hash.h ------------------------------------------------------*/
#ifndef HASH // The user intends to include the primary header.
#ifndef HASH_H
#define HASH_H
// Primary header code...
#endif
#else // The user intends to add a new type and function pair.
// Marco-extending and counter-incrementing code...
#undef HASH
#endif
Now the user need only re-include the same header when supplying a new type and function pair:
#include "hash.h" // Include once at the top of the file.
size_t hash_long( long val ){ return val * 2654435761ull; }
#define HASH long, hash_long
#include "hash.h" // Include again to add a new type and function pair.
Thus, with a little work and creativity, we can overcome a key limitation of _Generic
and turn it into a powerful tool for generic programming.
Moreover, the technique described above is not limited to _Generic
—it could easily be applied to other library macros that could benefit from user extendibility.
The final code implementing all aspects of this article can be found here and tested here.
Future articles in this series will explore other ways that we can harness _Generic
and overcome its limitations, as well as how we can manipulate C's type system to associate multiple types with one pointer at compile time.
1. Credit for the preprocessor counter idea is owed to Reddit user Jinren.
Comments on this article belong here.
The article was also discussed on Reddit.