As developer of the Cuckoo Cycle proof of work, I'm trying to determine the maximum number of threads that can effectively work together on a single problem instance.
The largest machine I have access to has 32 threads (dual 8 core hyperthreaded) which yield a speedup of 16.7 over single-threaded runs. I'm very curious to know how many threads it takes to saturate the memory IO, so that additional threads bring no benefit.
The Makefile provided at
https://github.com/tromp/cuckoo contains a basic speedup test
make speedup.25
that only goes up to 8 threads, using 128MB instances (size 25).
If anyone has access to a Linux machine with more than 32 threads, could you please run a variation of that test with instances of size 28 (using 1GB) and as many threads as your system supports, and post a summary of your results?
For instance, if your system supports up to 60 threads, and you only want to try a subset of thread counts, you could do
for i in 1 2 8 16 32 48 52 56 60; do echo $i; cc -o cuckoo.spd -DNTHREADS=$i -DSIZEMULT=1 -DSIZESHIFT=28 cuckoo.c -O3 -std=c99 -Wall -Wno-deprecated-declarations -pthread -l crypto; time for j in {0..9}; do ./cuckoo.spd $j; done; done
Each single threaded run at size 28 takes about half a minute, so the entire test above should take under 15 minutes.
Any help is appreciated!