Skip to content

Latest commit

 

History

History
295 lines (225 loc) · 10.1 KB

README.md

File metadata and controls

295 lines (225 loc) · 10.1 KB

Linked lists, pointer tricks and good taste

Tanslantion: 简体中文(zh-CN)


Introduction

In a 2016 TED interview (14:10) Linus Torvalds speaks about what he considers good taste in coding. As an example, he presents two implementations of item removal in singly linked lists (reproduced below). In order to remove the first item from a list, one of the implementations requires a special case, the other one does not. Linus, obviously, prefers the latter.

His comment is:

[...] I don't want you to understand why it doesn't have the if statement. But I want you to understand that sometimes you can see a problem in a different way and rewrite it so that a special case goes away and becomes the normal case, and that's good code. [...] -- L. Torvalds

The code snippets he presents are C-style pseudocode and are simple enough to follow. However, as Linus mentions in the comment, the snippets lack a conceptual explanation and it is not immediately evident how the more elegant solution actually works.

The next two sections look at the technical approach in detail and demonstrate how and why the indirect addressing approach is so neat. The last section extends the solution from item deletion to insertion.

The code

The basic data structure for a singly linked list of integers is shown in Figure 1.

linked list
Figure 1: Singly linked list of integers.

Numbers are arbitrarily chosen integer values and arrows indicate pointers. head is a pointer of type list_item * and each of the boxes is an instance of an list_item struct, each with a member variable (called next in the code) of type list_item * that points to the next item.

The C implementation of the data structure is:

struct list_item {
        int value;
        struct list_item *next;
};
typedef struct list_item list_item;

struct list {
        struct list_item *head;
};
typedef struct list list;

We also include a (minimal) API:

/* The textbook version */
void remove_cs101(list *l, list_item *target);
/* A more elegant solution */
void remove_elegant(list *l, list_item *target);

With that in place, let's have a look at the implementations of remove_cs101() and remove_elegant(). The code of these examples is true to the pseudocode from Linus' example and also compiles and runs.

The CS101 version

simple data model
Figure 2: The conceptual model for the list data structure in the CS101 algorithm.

void remove_cs101(list *l, list_item *target)
{
        list_item *cur = l->head, *prev = NULL;
        while (cur != target) {
                prev = cur;
                cur = cur->next;
        }
        if (prev)
                prev->next = cur->next;
        else
                l->head = cur->next;
}

The standard CS101 approach makes use of two traversal pointers cur and prev, marking the current and previous traversal position in the list, respectively. cur starts at the list head head, and advances until the target is found. prev starts at NULL and is subsequently updated with the previous value of cur every time cur advances. After the target is found, the algorithm tests if prev is non-NULL. If yes, the item is not at the beginning of the list and the removal consists of re-routing the linked list around cur. If prev is NULL, cur is pointing to the first element in the list, in which case, removal means moving the list head forward.

A more elegant solution

The more elegant version has less code and does not require a separate branch to deal with deletion of the first element in a list.

void remove_elegant(list *l, list_item *target)
{
        list_item **p = &l->head;
        while (*p != target)
                p = &(*p)->next;
        *p = target->next;
}

The code uses an indirect pointer p that holds the address of a pointer to a list item, starting with the address of head. In every iteration, that pointer is advanced to hold the address of the pointer to the next list item, i.e. the address of the next element in the current list_item. When the pointer to the list item *p equals target, we exit the search loop and remove the item from the list.

How does it work?

The key insight is that using an indirect pointer p has two conceptual benefits:

  1. It allows us to interpret the linked list in a way that makes the head pointer an integral part the data structure. This eliminates the need for a special case to remove the first item.
  2. It also allows us to evaluate the condition of the while loop without having to let go of the pointer that points to target. This allows us to modify the pointer that points to target and to get away with a single iterator as opposed to prev and cur.

Let's look each of these points in turn.

Integrating the head pointer

The standard model interprets the linked list as a sequence of list_item instances. The beginning of the sequence can be accessed through a head pointer. This leads to the conceptual model illustrated in Figure 2 above. The head pointer is merely considered as a handle to access the start of the list. prev and cur are pointers of type list_item * and always point to an item or NULL.

The elegant implementation uses indirect addressing scheme that yields a different view on the data structure:

Data model for indirect addressing
Figure 3: The conceptual model for the list data structure in the more elegant approach.

Here, p is of type list_item ** and holds the address of the pointer to the current list item. When we advance the pointer, we forward to the address of the pointer to the next list item.

In code, this translates to p = &(*p)->next, meaning we

  1. (*p): dereference the address to the pointer to the current list item
  2. ->next: dereference that pointer again and select the field that holds the address of the next list item
  3. &: take the address of that address field (i.e. get a pointer to it)

This corresponds to an interpretation of the data structure where the list is a a sequence of pointers to list_items (cf. Figure 3).

Maintaining a handle

An additional benefit of that particular interpretation is that it supports editing the next pointer of the predecessor of the current item throughout the entire traversal.

With p holding the address of a pointer to a list item, the comparison in the search loop becomes

while (*p != target)

The search loop will exit if *p equals target, and once it does, we are still able to modify *p since we hold its address p. Thus, despite iterating the loop until we hit target, we still maintain a handle (the address of the next field or the head pointer) that can be used to directly modify the pointer that points to the item.

This is the reason why we can modify the incoming pointer to an item to point to a different location using *p = target->next and why we do not need prev and cur pointers to traverse the list for item removal.

Going beyond

It turns out that the idea behind remove_elegant() can be applied to yield a particularly concise implementation of another function in the list API: insert_before(), i.e. inserting a given item before another one.

Inserting before existing items

First, let's add the following declaration to the list API in list.h:

void insert_before(list *l, list_item *before, list_item *item);

The function will take a pointer to a list l, a pointer before to an item in that list and a pointer to a new list item item that the function will insert before before.

Quick refactor

Before we move on, we refactor the search loop into a separate function

static inline list_item **find_indirect(list *l, list_item *target)
{
        list_item **p = &l->head;
        while (*p != target)
                p = &(*p)->next;
        return p;
}

and use that function in remove_elegant() like so

void remove_elegant(list *l, list_item *target)
{
        list_item **p = find_indirect(l, target);
        *p = target->next;
}

Implementing insert_before()

Using find_indirect(), it is straightforward to implement insert_before():

void insert_before(list *l, list_item *before, list_item *item)
{
        list_item **p = find_indirect(l, before);
        *p = item;
        item->next = before;
}

A particularly beautiful outcome is that the implementation has consistent semantics for the edge cases: if before points to the list head, the new item will be inserted at the beginning of the list, if before is NULL or invalid (i.e. the item does not exist in l), the new item will be appended at the end.

Conclusion

The premise of the more elegant solution for item deletion is a single, simple change: using an indirect list_item ** pointer to iterate over the pointers to the list items. Everything else flows from there: there is no need for a special case or branching and a single iterator is sufficient to find and remove the target item. It also turns out that the same approach provides an elegant solution for item insertion in general and for insertion before an existing item in particular.

So, going back to Linus' initial comment: is it good taste? Hard to say, but it's certainly a different, creative and very elegant solution to a well-known CS task.