It looks very interesting.
Here's how Cuckoo Cycle at 2^{28} nodes compares:
It has the tiniest possible SHA256 component (one call), and a moderate siphash-2-4 component.
Each instance requires 1GB of memory to perform and hashes 1GB of data.
By contrast, the verification only requires 1/3 KB and takes less than one 10th of a *micro* second.
Each instance has about a 5% probability of having a solution at the default setting.
Memory access in Cuckoo is maximally random, so it's constrained by memory latency
rather than bandwidth, and caches are irrelevant. The amount of computation is minimized
relative to memory access. See https://github.com/tromp/cuckoo for a paper and implementation.