New Cuckoo Cycle GPU solver available. Bounties included...

lokiverloren

newbie

Activity: 85

Merit: 0

There is only so much cloud servers, and if this were the case, then the price of them will massively ramp up (maybe time to invest in cloud server companies?)

So, I'd be talking about cuckoo34 then. That puts it beyond GPU solvers and into the domain of 32gb+ cpu/motherboard combination. This is not something that can be optimised with any kind of special card or anything. The price is pretty flat. It might raise the cost at first, but then the market would probably provide this standard at a lower cost after a while. But there's no possible way it's going to let GPUs into the game, at least not before over 3840 pixel wide screens become cheaper, to demand over 16gb of memory. I mean, I remember 75 and 80hz refresh displays, and playing games way below my hardware needs and watching things swish around. Now we are back to 60hz! URGH, TEH FLICKER!

I will just push it to see how far it goes between solutions. You are right, it might only require Cuckoo32 to put it out of reach of anyone who can't buy hundreds of cpu/motherboard/memory units to do it. Main point is that the profit per cost ratio is leveled again, and that requires putting it out of reach of GPUs. For now that sounds like Cuckoo32, but later on it may require 34.

What I meant in the last thing you responded to was, to shift the bandwidth bottleneck to the network. That would require another model completely, and actually I already thought about this idea a long time ago, of 'proof of service'. It would admittedly concentrate mining power a little bit to places where bandwidth is cheaper, but there isn't that many places where bandwidth is much cheaper than average, and they are pretty randomly distributed. Not only that, but Maidsaafe and others have already started working on this. But there is very few projects looking at making proof of bandwidth service a means to gain the right to issue new tokens.

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: lokiverloren on February 13, 2018, 02:18:11 PM

Very few video cards have more than 8gb of memory, and to be safe, targeting 16gb of memory would put it outside the range of GPU processing and bring it back down to CPU. Pushing the memory requirement beyond 32gb would shift the performance bottleneck into disk caches.

SSDs still have way less bandwidth than DRAM, so requiring more than 32GB would just push
mining to (cloud) servers instead. But then a single attempt would take on the order of a minute,
which is too slow for the proof-of-work to be progress-free.

Quote

I haven't read perhaps as deeply as I could have so far into the algorithm, but as I gather from initial reading, you can force the memory requirement upwards by requiring longer paths in the graph.

Nope. You increase memory requirements by doubling the graph size. Cuckoo30 requires over 2GB for fast mining, cuckoo32 requires 10GB, and cuckoo34 would require 40GB.

Quote

I want to see if I can force my machine into swap

That's gonna take hours if not days per attempt. Rewriting the solver to read and write from a few hundred files will be way faster.

Quote

It occurs to me that a next generation beyond loading up storage bandwidth would be to bind the solution to a network graph

To function as proof-of-work, Cuckoo Cycle needs to work on randomly generated graphs.

lokiverloren

newbie

Activity: 85

Merit: 0

So, as expected (well, I had this same thought in the early days of the development of equihash) in the end memory bandwidth is the performance limiter. Thus you have discovered really quite quickly in your development cycle that you can run the solver significantly faster, with a significantly faster memory architecture and memory bus, ie, on a video card.

I know that 16gb-32gb of memory is becoming commonplace, but it occurs to me that a new target for these kinds of proof of work algorithms could be disk storage, and this can be forced by requiring over 32gb of memory. Very few people have much more than 32gb of memory on their system, where as most have more than 32gb of storage on disk. Presumably that would naturally mean that NVMe SSD's become a vessel for this.

So, I have started reading into your algorithm, and it occurs to me that you can push this out of the memory bandwidth box and onto disk simply by requiring significantly more than 8gb of memory. Very few video cards have more than 8gb of memory, and to be safe, targeting 16gb of memory would put it outside the range of GPU processing and bring it back down to CPU. Pushing the memory requirement beyond 32gb would shift the performance bottleneck into disk caches.

I haven't read perhaps as deeply as I could have so far into the algorithm, but as I gather from initial reading, you can force the memory requirement upwards by requiring longer paths in the graph. 42 is a lot, but as you have found, around 3gb is enough to store random graphs that give you this nice (funny) number of paths in a cycle. However, what would happen if you raised the minimum to, say, 56 nodes in a cycle, or 64. I would think the odds of finding this level of solution would be powers of two times the number of increased minimum nodes in a cycle, more likely beyond powers of 10 times the 42 node solutions.

As an interesting aside, these types of large graph tables are routinely used in rasterisation and ray tracing algorithms, the former have been pretty much maxed out such that photorealism requires the equivalent of a pair of 1080ti's at anything above 3840x2160 resolution, at anything more than 60 frames per second.

I am looking into this because I am seeking a PoW algorithm that will not even fall to GPU within 12 months. So, I am going to explore the Cuckoo Cycle, but with a massively increased requirement for numbers of nodes forming a loop out of a seed generating a graph. I want to see if I can force my machine into swap, and turn the NVMe drive on my machine into the primary bottleneck, which will drastically reduce the solution rate. Yet at the same time, an NVMe is not that expensive, but this bus is definitely slower than a PCI bus and slower than a memory bus.

Onward and upward... to move from memory hard to IO hard

PS: I see that the cuckoo cycle is an implementation of the graph solving puzzle to find the shortest path between some arbitrary number of nodes in a graph. It's quite interesting because this path-finding puzzle was at the centre of what enabled the implementation of internet routing. It naturally requires a certain minimum of processing and memory utilisation, even based on a static graph such as (relatively static) internet routes. It occurs to me that a next generation beyond loading up storage bandwidth would be to bind the solution to a network graph, which naturally greatly increases the latency and greatly reduces the throughput of the finding of solutions, though this also introduces byzantine attacks to cut the paths and prevent the finding of solutions, which would depend on cooperative action by nodes to respond to the solvers. Just a thought.

tromp

legendary

Activity: 1000

Merit: 1120

See report at https://github.com/tromp/cuckoo/blob/master/GPU.md

Topic: New Cuckoo Cycle GPU solver available. Bounties included... (Read 419 times)