Heaps of UIDs

Creating a Simple Static Structured Hierarchy of User ID Numbers using a (216) Radix Heap

Radix Heaps

A few words in background.

A radix tree (also known as n-tree) is a singly-rooted tree where each node has exactly the same number of children. It is infinite in extent although it practice it is usually truncated at a certain depth. If we consider an alphabet of (the radix) n letters then the root node may be labelled as an empty list [], the first level nodes by a singleon list for each letter, second level nodes by enumerating all pairs of letters, and so on.

In particular a denary tree could be labelled so:

root node
[]
1st level nodes
[0],[1],...,[9]
2nd level nodes
[00],...,[99]
3rd level nodes
[000],...,[999]
4th level nodes
and so on

(A node label can be decomposed into a parent label and a sibling identifier by splitting the label at its last symbol. The root label cannot be decomposed.)

(e.g. node [435] is sibling 5 of parent node [43])

The labels of the tree look very much like denary numerals but are not as the leading zeros are significant. However, with only a slight modification, we can map nodes directly to numerals. A diminished radix tree of radix n is identical to a radix n tree except it has only n-1 1st level nodes. Building on our previous example we label a diminished denary tree so:

root node
[]
1st level nodes
[1],...,[9]
2nd level nodes
[10],...,[99]
3rd level nodes
[100],...,[999]
4th level nodes
and so on

(Now leading zeros can be added to taste — our labels map directly to numerals, and so to non-negative integers — integer division by the radix splits our label thus: quotient -> parent label, remainder -> sibling ordinal.)

The mapping of diminished radix tree node labels to integers works for any radix of 2 or greater. Radices of powers of 2 make for very fast mapping.

* counting node labels to a certain depth yields the ubiquitous identity: 1 + ( r - 1 )( 1 + r + ... + rn-1 ) = rn

In the special case of a diminished binary tree, we see that by omitting node 0, (the sole subtree of root is necessarily a binary tree without diminishment) we can directly map node labels of a binary tree to strictly positive integers — an arrangement already familiar to programmers in implementing a binary heap. For radices greater 2, no such sleight of hand can help us.

A heap data structure requires that the total ordering of the data is consistent with the partial ordering implicit to the tree labels. With only a slight abuse of terminology we can refer to this partial ordering of the labels themselves as a heap although this is not a data structure per se; it carries no state. This statelessness — that the ordering relationship can be computed very efficiently without recourse to any auxiliary data — is the great virtue of the scheme that follows.

Understanding UIDs as a (216) Radix Heap

In all Unix like operating systems, UIDs are simply either zero for root, or positive integers for all other independent users. Further they are usually restricted to being less than 65536 (2^16). This restriction, although purely conventional, presents an opportunity for a significant extension of the semantics of UIDs. If UIDs were instead considered as the label to a dimished radix tree of radix 65536, we would at a stroke extend the meaning of all UIDs greater than 2^16 to being the children of an ordinary UID in the range 1-65535, the idea being that any parent or ancestor user has the “power of root” over that child or descendant user. Any user could set its EUID to one of its descendant users (including itself) with only trivial added logic to the kernel — logic that is independent of the size of the word used to hold the UID, although the word size will obviously limit the depth of the heap.

Because an ordinary user would be able to virtualize a full space of traditional UIDs as child UIDs this reduces the need for other more complex mechanisms of virtualization. A ordinary unpriveleged user could mount filesystems with UIDs in the range of 0-65536 mapped as children of that user — this could simply be a standard mechanism available to all users (UID heap depth permitting).

Furthermore, by mapping our parent/child relationship directly to the arithmetical properties of integers and not through the binary representation of a machine word, our logic is completely independent of endianness or any peculiarities of a processor’s instruction set — C is expressive enough for our needs.

C Code to Implement a (216) Radix Heap

Here is some sample code to represent the core logic.

/* **** **** **** **** **** **** **** **** **** **** **** **** */

/* divmod 2**16 */
int split_heap_label(unsigned int uid, unsigned int *parentuid, int *sibling)
{
	/* returns 0 on failure */
	if (uid == 0) return 0;
	*sibling = uid & 65535;
	*parentuid = uid >> 16;
	return -1;
}

/* inverse of split_heap_label */
unsigned int assemble_heap_label(unsigned int parentuid, unsigned int sibling)
{
	/* precondition for success
	**	parentuid || sibling
	**	parentuid == ((parentuid << 16) >> 16)
	** returns 0 on failure
	*/
	unsigned int uid;
	uid = (parentuid << 16) | (sibling & 65535);
	return ((uid >> 16) == parentuid) ? uid : 0;
}

/* **** **** **** **** **** **** **** **** **** **** **** **** */

/* this is a reflexive function --- is_descendant(x,x) is true */
int is_descendant(unsigned int uid1, unsigned int uid2)
{
	while (uid1 > uid2) uid1 >>= 16;
	if (uid1 == uid2) return -1;
	return 0;
}

/* is_proper_descendant(x,x) is false */
int is_proper_descendant(unsigned int uid1, unsigned int uid2)
{
	if (uid1 == uid2) return 0;
	return is_descendant(uid1, uid2);
}

/* gcd analogue */
unsigned int greatest_common_ancestor(unsigned int uid1, unsigned int uid2)
{
	while (uid1 != uid2) {
		while (uid1 > uid2) uid1 >>= 16;
		while (uid2 > uid1) uid2 >>= 16;
	}
	return uid1;
}

/* **** **** **** **** **** **** **** **** **** **** **** **** */

/* 2**16 log */
int heap_depth(unsigned int uid)
{
	/* note that depth of root uid is 0 */
	unsigned int depth = 0;
	while (uid > 0) {
		uid >>= 16;
		++depth;
	}
	return depth;
}

/* shifts uid into parentuid */
unsigned int concatenate_heap_labels(unsigned int parentuid, unsigned int uid)
{
	/* shifts uid into parentuid until would overflow parentuid
	** if no overflow
	**	depth of concatenated = sum of depths
	** note that depth of root uid is 0 and does not contribute to concat
	*/
	unsigned int depth = 0, rev = 0;
	while (uid > 0) {
		rev = (rev << 16) | (uid & 65535);
		uid >>= 16;
		++depth;
	}
	for (; depth; --depth) {
		uid = (parentuid << 16) | (rev & 65535);
		if ((uid >> 16) != parentuid) break;
		parentuid = uid;
		rev >>= 16;
	}
	return parentuid;
}

/* shifts uid into parentuid with special handling of uid 0 */
unsigned int merge_heap_labels(unsigned int parentuid, unsigned int uid)
{
	unsigned int depth = 0, rev = 0;
	do {
		rev = (rev << 16) | (uid & 65535);
		uid >>= 16;
		++depth;
	} while (uid > 0);
	for (; depth; --depth) {
		uid = (parentuid << 16) | (rev & 65535);
		if ((uid >> 16) != parentuid) break;
		parentuid = uid;
		rev >>= 16;
	}
	return parentuid;
}

/* children with sibling id == 0 are promoted to parent id  */
unsigned int promote_heap_label(unsigned int uid)
{
	/* shifts out silbings == 0 */
	if (uid == 0) return 0;
	while ((uid & 65535) == 0) uid >>= 16;
	return uid;
}

/* shifts uid into parentuid */
unsigned int contract_heap_labels(unsigned int parentuid, unsigned int uid)
{
	/* shifts promoted uid into parentuid until would overflow parentuid
	** if no overflow
	**	depth of concatenated = sum of depths of promoted uid and parent
	** identical to concatenate followed by promote
	*/
	unsigned int rev = 0;
	while (uid > 0) {
		rev = (rev << 16) | (uid & 65535);
		uid >>= 16;
	}
	while (rev > 0) {
		uid = (parentuid << 16) | (rev & 65535);
		if ((uid >> 16) != parentuid) break;
		parentuid = uid;
		rev >>= 16;
	}
	return parentuid;
}

/* **** **** **** **** **** **** **** **** **** **** **** **** */