But then we've just arrived in a roundabout sort of way at the whole trie being essentially a bitwise trie, since the nodes in the original trie with 256 bit keysize overlap exactly with subtries of the bitwise trie.
Was there a reason for not proposing a single bit keysize from the start? I thought about it a while back, but disregarded it for some reason I can't recall.
I thought about the same thing, and disregarded it for some reason, as well. I don't remember though. Though, working with raw bits is a bit more complicated than bytes, especially when you have skip strings that are like 13 bits. But I see the benefit that you don't even really need linked lists at each node, only two pointers. But you end up with a lot more total overhead, since you have the trienode overhead for so many more nodes...
I'll have to think about this one. I think there is clearly a benefit to a higher branching factor: if you consider a 2^32-way Trie, there is a single root node and just N trienodes -- one for each entry (so it's really just a lookup table, and lookup is fast). If you have a 2-way (bitwise) Trie, you still have N leaf nodes, but you have a ton of other intermediate nodes and all the data that comes with them. And a lot more pointers and "hops" between nodes to get to your lefa. It leads me to believe that you want a higher branching factor, but you need to balance against the fact that the branching adds some efficiency (i.e. in the case of using linked lists between entries, it would obviously be bad to have a branching factor of 2**32).