I had considered the potential for such a design. DHT will not perform well enough, you would need 100BT ethernet or perhaps gigabit before the hash table could be filled quickly enough. From what I can tell a SUPER COMPUTER with a ton of RAM and high-speed interconnects would give you the best performance. Here is the difference, such a super computer could be built from commodity parts.
Well, if the hash is so fast you could just collaborate by splitting the birthday space in N ranges and each computer just throws away any result that isn't in its range. This trades hashing speed for memory, so works with rather than against your pow.
What range do you suggest for N and how fast is the hash?
This should be possible to test.
Any time you throw away hashes because they fall outside of your memory range you move the computational complexity from the p(n) curve to the q(n) curve in my white paper. You save on memory but the cost is far greater. Memory and CPU power must always be paired proportionally or you will have idle memory or waisted CPU cycles.