Author

Topic: Pollard's kangaroo ECDLP solver - page 109. (Read 55445 times)

member
Activity: 144
Merit: 10
June 02, 2020, 02:11:58 PM

How many Chrome tabs can you open with 4TB or RAM Smiley

Not on a single server, a distributed hash table on 8 servers with 512GB per server.
full member
Activity: 1050
Merit: 219
Shooters Shoot...
June 02, 2020, 02:11:08 PM
DP count is most important. You are running at DP 25, so it will take 2^57 (whatever expected group ops is for 115) - 2^25 (your DP setting) so roughly you want your DP count up to 2^32 to be getting close to solving.

I have files for DP 30.  So I need to get close to 2^27 to be getting close to solving. 2^57(expected ops) - 2^30(my DP setting) = 2^27 DP count .

To be precise, 2^57(expected ops) / 2^30(my DP setting) = 2^(57-30) = 2^27 DP count .

There are 3 phases in this algorithm:

phase 1)

to generate a collision, you need to form on average N couples of points (T, W) to get 1 couple with the same point, a collision (exploiting the birthday paradox)

that because in a space of N points there are N possible couples with the same point among N^2 possible couples, then the chance to get a collision each time you generate a couple (T, W) is N/N^2 = 1/N ,  thefor you need N tries to get a collision.

To form N couple, you need 2 lists of sqrt(N) points each (because sqrt(N) * sqrt(N) = N)

This is the heaviest part of the algorithm, 2*sqrt(N) steps; thefor the idea is to parallelize this task, and at this stage parallelization is ok, especially for the gpus.  


phase 2)

you cannot compare each couple of points you generate, because there are N possible combinations before to get a collision, too much;  

the idea is to choose each jump in such a way that the next point depends only by the current point, this trick makes an occasional collision between 2 kangaroos permanent and we don't need to check every point against all the others to know if a collision has happened.

To 'fix' a delay between the moment of the collision (after about 2*sqrt(N) steps) and the moment of the detection of the collision we store the distinguished points. On average a 30 bit distinguished point is met each 2^30 steps. Only then we can know if that kangaroo has collide with another one.

These phase is strange: if you are using a cpu, a delay of 2^30 steps is nothing, a single core of a cpu can perform 2^30 steps in a few minutes.

But for the slow gpu is not the same, like BitCrack noted:

That 2^27 figure assumes the average walk length is 2^30. The GPU works by doing many slow walks in parallel e.g. 60 million walks that do 20 iterations per second. At that rate, it will take 2^30 / 20 seconds = 1.7 years before any of the walks are 2^30. Your DP count is going to be a few powers of 2 higher than 27.

On the other hand, the choice of the distinguished points is not only about a delay, it is about the storage too, because we can't store 2*sqrt(N) points (if we could, we would use the BSGS algorithm, that is faster and that finds the key surely).


With a high DP value we realize a low frequent sampling, then we have few samples to store in the hash table, but we waste a lot of time with the gpu (overhead).

With a lower DP value we realize a high frequent sampling, but we have to deal with the limit of our RAM.

You can see at the DP value at this way too: how much you need to reduce the points (among the 2*sqrt(N) points generated) to put in the hash table ?

DP = 2^30 means you choose to reduce by a factor of 2^30 the storage of the 2^57 points needed the get a collision (2^57 / 2^30 = 2^27 DP in the hash table). But in this way you delay by 2^30 steps the detection of the collision (and you increase by a factor of 2^30  the waste of the computation power, 2^30 steps * number of kangaroos in parallel, all these steps are useless because they are realized after the collision has already happened)

In the Zielar's case, from what I understood, there were about 2^33 kangaroos in parallel and DP = 22, N = 2^109 then:

total steps:  2*sqrt(2^109) = 2^55.5

hash table: 2^55.5 / 2^22 = 2^33.5

number of steps generated by each computing unit in parallel: 2^55.5 / 2^33 = 2^22.5, then each computing unit has generated on average 1,4 kangaroo (1,4 walk from start to the end point, DP)  


phase 3)

When we know which wild kangaroo collides, if we don't know the distance covered by that kangaroo we only need to redo the steps of that kangaroo until the DP, and with a cpu this work can be rapidily finished.
So the reason why the GPUs find solution faster, is by the sheer number of Kangaroos they bring to the hunt? Example, I can find solution 50 times faster with GPU versus CPU, but GPU has more than 50 times the amount of Kangaroos. Something like that?
full member
Activity: 1050
Merit: 219
Shooters Shoot...
June 02, 2020, 02:06:49 PM
This is a race, we keep few settings secret.
We will probably end to fight after #120.
However now news from Zielar, he is probably sleeping Cheesy

Good luck and you guys have way more computing power that you are willing to commit. Cool

I am going straight for the knowledge, world record, and bragging rights for secp256k1 130-bit private key (129-bit interval). But I only have 4TB of RAM and still working on how to minimize storage below 16bytes/point with data compression techniques. Grin

How many Chrome tabs can you open with 4TB or RAM Smiley

full member
Activity: 1050
Merit: 219
Shooters Shoot...
June 02, 2020, 02:05:31 PM
Can you explain yourself better? What is 'Ygrid' ?

Ygrid is the second coordinate of the GPU thread array.
It affects performance. This is very useful for linear algebra because you can use thread coordinates to make fast matrix calculus.
In our case, this is just used to tune performance.

I was in weekend, Yesterday was also a day off in France, I restart the work today. I also have professional work to perform but I will continue to make update on github when I can for everybody. I will try to integrate ASAP the mods from  PatatasFritas and the support of -ws for the client mode. News from AndrewBrz who is progressing well with the OpenCL kernel.

Zielar solved #110 with official 1.8 and 1.9alpha for merging.

It'll be interesting to see what my Vega VIIs can do. They murder all the RTX cards when mining ethash (90 to 100 Mh/s compared to 40-55 Mh/s). I know mining is different but we shall see how they perform when it comes to this program. Hopefully Andrew has success, at least for himself.
Someone is working on opencl support ?
Yes, that link JLP posted to his Github, under the Issues tab, Optimization subject...AndrewBrz(?). He posted a pic of the  progress he has made so far.
member
Activity: 144
Merit: 10
June 02, 2020, 02:04:22 PM
This is a race, we keep few settings secret.
We will probably end to fight after #120.
However now news from Zielar, he is probably sleeping Cheesy

Good luck and you guys have way more computing power that you are willing to commit. Cool

I am going straight for the knowledge, world record, and bragging rights for secp256k1 130-bit private key (129-bit interval). But I only have 4TB of RAM and still working on how to minimize storage below 16bytes/point with data compression techniques. Grin
member
Activity: 814
Merit: 20
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
June 02, 2020, 01:49:05 PM

We will probably end to fight after #120.



 Grin Grin Grin Grin Grin
sr. member
Activity: 462
Merit: 696
June 02, 2020, 01:47:26 PM
This is a race, we keep few settings secret.
We will probably end to fight after #120.
However now news from Zielar, he is probably sleeping Cheesy
legendary
Activity: 1914
Merit: 2071
June 02, 2020, 01:34:37 PM
Zielar solved #110 by merging 2 files of 2^29.55 DP each = 2^30.55 => DP25 => total of 2^(30.55 + 25) operations = ~2^(109/2+1) , a little bit after the average.

Ok, then:

total steps:  2^55.55 = 1.035 * 2*sqrt(N)  (3,5% after the average)

hash table: 2^55.55 / 2^25 = 2^30.55

how many kangaroos were used in parallel?

I suppose about 2^30, probably more; that means that each computing unit has generated only 1 DP on average?
sr. member
Activity: 462
Merit: 696
June 02, 2020, 01:25:36 PM
Zielar solved #110 by merging 2 files of 2^29.55 DP each = 2^30.55 => DP25 => total of 2^(30.55 + 25) operations = ~2^(109/2+1) , a little bit after the average.

legendary
Activity: 1914
Merit: 2071
June 02, 2020, 01:17:25 PM
Someone can confirm these numbers? I'm curious:


In the Zielar's case, from what I understood, there were about 2^33 kangaroos in parallel and DP = 22, N = 2^109 then:

total steps:  2*sqrt(2^109) = 2^55.5

hash table: 2^55.5 / 2^22 = 2^33.5

number of steps generated by each computing unit in parallel: 2^55.5 / 2^33 = 2^22.5, then each computing unit has generated on average 1,4 kangaroo (1,4 walk from the start to the end point, DP) before to get a collision

How many steps to retrieve #110?
member
Activity: 144
Merit: 10
June 02, 2020, 01:12:52 PM
@arulbero

Well said, the information above should be very useful for those new to the area.

Also as it relates to FPGAs, those are more suited for binary curves; and GPUs such as RTX 2080Tis, RX Vega 64, and the likes are well suited for prime curves $ for $.

If you would like to play around with FPGAs and ECDSA (secp256k1), everything you need to start is right here:
https://github.com/ZcashFoundation/zcash-fpga
legendary
Activity: 1914
Merit: 2071
June 02, 2020, 12:47:01 PM
DP count is most important. You are running at DP 25, so it will take 2^57 (whatever expected group ops is for 115) - 2^25 (your DP setting) so roughly you want your DP count up to 2^32 to be getting close to solving.

I have files for DP 30.  So I need to get close to 2^27 to be getting close to solving. 2^57(expected ops) - 2^30(my DP setting) = 2^27 DP count .

To be precise, 2^57(expected ops) / 2^30(my DP setting) = 2^(57-30) = 2^27 DP count .

There are 3 phases in this algorithm:

phase 1)

to generate a collision, you need to form on average N couples of points (T, W) to get 1 couple with the same point, a collision (exploiting the birthday paradox)

that because in a space of N points there are N possible couples with the same point among N^2 possible couples, then the chance to get a collision each time you generate a couple (T, W) is N/N^2 = 1/N ,  thefor you need N tries to get a collision.

To form N couple, you need 2 lists of sqrt(N) points each (because sqrt(N) * sqrt(N) = N)

This is the heaviest part of the algorithm, 2*sqrt(N) steps; thefor the idea is to parallelize this task, and at this stage parallelization is ok, especially for the gpus.  


phase 2)

you cannot compare each couple of points you generate, because there are N possible combinations before to get a collision, too much;  

the idea is to choose each jump in such a way that the next point depends only by the current point, this trick makes an occasional collision between 2 kangaroos permanent and we don't need to check every point against all the others to know if a collision has happened.

To 'fix' a delay between the moment of the collision (after about 2*sqrt(N) steps) and the moment of the detection of the collision we store the distinguished points. On average a 30 bit distinguished point is met each 2^30 steps. Only then we can know if that kangaroo has collide with another one.

These phase is strange: if you are using a cpu, a delay of 2^30 steps is nothing, a single core of a cpu can perform 2^30 steps in a few minutes.

But for the slow gpu is not the same, like BitCrack noted:

That 2^27 figure assumes the average walk length is 2^30. The GPU works by doing many slow walks in parallel e.g. 60 million walks that do 20 iterations per second. At that rate, it will take 2^30 / 20 seconds = 1.7 years before any of the walks are 2^30. Your DP count is going to be a few powers of 2 higher than 27.

On the other hand, the choice of the distinguished points is not only about a delay, it is about the storage too, because we can't store 2*sqrt(N) points (if we could, we would use the BSGS algorithm, that is faster and that finds the key surely).


With a high DP value we realize a low frequent sampling, then we have few samples to store in the hash table, but we waste a lot of time with the gpu (overhead).

With a lower DP value we realize a high frequent sampling, but we have to deal with the limit of our RAM.

You can see at the DP value at this way too: how much you need to reduce the points (among the 2*sqrt(N) points generated) to put in the hash table ?

DP = 2^30 means you choose to reduce by a factor of 2^30 the storage of the 2^57 points needed the get a collision (2^57 / 2^30 = 2^27 DP in the hash table). But in this way you delay by 2^30 steps the detection of the collision (and you increase by a factor of 2^30  the waste of the computation power, 2^30 steps * number of kangaroos in parallel, all these steps are useless because they are realized after the collision has already happened)

In the Zielar's case, from what I understood, there were about 2^33 kangaroos in parallel and DP = 22, N = 2^109 then:

total steps:  2*sqrt(2^109) = 2^55.5

hash table: 2^55.5 / 2^22 = 2^33.5

number of steps generated by each computing unit in parallel: 2^55.5 / 2^33 = 2^22.5, then each computing unit has generated on average 1,4 kangaroo (1,4 walk from start to the end point, DP)  


phase 3)

When we know which wild kangaroo collides, if we don't know the distance covered by that kangaroo we only need to redo the steps of that kangaroo until the DP, and with a cpu this work can be rapidily finished.
member
Activity: 814
Merit: 20
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
June 02, 2020, 12:22:39 PM
As said by MrFreeDragon this kind of ASIC is dedicated to hashing only and Kangaroo has nothing to do with hashing. To my knowledge no ASIC exist for ECDSA calculation...


High-Performance CUDA Kernel Execution on FPGAs/ p.s.  FPGA 4xFaster then CUDA GPU

https://cadlab.cs.ucla.edu/~cong/papers/FCUDA_extAbstract_ICS09_final3.pdf

Jast for Ex. There are many prommable ASICS and ASICS with FPGA

FPGA Miner https://www.ebay.com/itm/BlackMiner-F1-FPGA-Better-than-Bitmain-Antminer-ASIC-UK-In-Hand-w-PSU-and-SD/124207553835?hash=item1ceb58dd2b:g:nMgAAOSwT9lcq7AE

"FPGAs can be uploaded with a public or a private bitstream(s) so that you can mine a new algorithm (you can't do this with ASICs). "

If the price of equipment and cost of technology implementation does not important for you, better have a look at quantum computers. Instead of normal bits (with 2 possible values 0 or 1) they operate with quibits: could be represented as a sphere with many many many different states.

How do you think, will quantum computer will be faster than ASIC?  Wink

Qantum computer now for send signal to enoter galxy ))) No, notcommutation this is intresting for education, because not all peoe know about 2+3<>3+2 (If you cnoq quantums I thik you know  what in qant wolrd 2+3 qantr not equel 3+2 qants !!!) F.... But, my exact opinion, GPU for making 10trillions operation is not so good like ald ASIC with FPGA.

So, in my opinion only, most wanted targets for Kangaroo progect is a cilen-server(witjout bugs and fine worked) and FPGA-ASIC''S

And I think interesting idea to implement Hensel's Lift like in this code to kangaroo for finding ranges for ex. https://github.com/elliptic-shiho/ecpy/blob/ccdb872124ca2c218b8a7261a2956efd5ec83705/ecpy/elliptic_curve/sssa_attack.py#L1
sr. member
Activity: 443
Merit: 350
June 02, 2020, 12:06:50 PM
As said by MrFreeDragon this kind of ASIC is dedicated to hashing only and Kangaroo has nothing to do with hashing. To my knowledge no ASIC exist for ECDSA calculation...


High-Performance CUDA Kernel Execution on FPGAs/ p.s.  FPGA 4xFaster then CUDA GPU

https://cadlab.cs.ucla.edu/~cong/papers/FCUDA_extAbstract_ICS09_final3.pdf

Jast for Ex. There are many prommable ASICS and ASICS with FPGA

FPGA Miner https://www.ebay.com/itm/BlackMiner-F1-FPGA-Better-than-Bitmain-Antminer-ASIC-UK-In-Hand-w-PSU-and-SD/124207553835?hash=item1ceb58dd2b:g:nMgAAOSwT9lcq7AE

"FPGAs can be uploaded with a public or a private bitstream(s) so that you can mine a new algorithm (you can't do this with ASICs). "

If the price of equipment and cost of technology implementation does not important for you, better have a look at quantum computers. Instead of normal bits (with 2 possible values 0 or 1) they operate with quibits: could be represented as a sphere with many many many different states.

How do you think, will quantum computer will be faster than ASIC?  Wink
member
Activity: 313
Merit: 34
June 02, 2020, 11:43:21 AM
As said by MrFreeDragon this kind of ASIC is dedicated to hashing only and Kangaroo has nothing to do with hashing. To my knowledge no ASIC exist for ECDSA calculation...


High-Performance CUDA Kernel Execution on FPGAs/ p.s.  FPGA 4xFaster then CUDA GPU

https://cadlab.cs.ucla.edu/~cong/papers/FCUDA_extAbstract_ICS09_final3.pdf

there is more then 35 company asic devices, min 10 company ASIC/FPGA devices and developer fee you can pay to developer for your personal needs, thats best solution

You can start go to this 35 companys )))) Your post is a joke I think...
2 most senior developer explain all facts and figure about your Q in details, and you still unable to understand, i think they are trying to educate but joker love to listen jokes only
member
Activity: 814
Merit: 20
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
June 02, 2020, 11:31:23 AM
As said by MrFreeDragon this kind of ASIC is dedicated to hashing only and Kangaroo has nothing to do with hashing. To my knowledge no ASIC exist for ECDSA calculation...


High-Performance CUDA Kernel Execution on FPGAs/ p.s.  FPGA 4xFaster then CUDA GPU

https://cadlab.cs.ucla.edu/~cong/papers/FCUDA_extAbstract_ICS09_final3.pdf

there is more then 35 company asic devices, min 10 company ASIC/FPGA devices and developer fee you can pay to developer for your personal needs, thats best solution

You can start go to this 35 companys )))) Your post is a joke I think...
jr. member
Activity: 30
Merit: 122
June 02, 2020, 11:31:03 AM
I think some people are drastically underestimating the cost of producing an ASIC. ASICs cost millions of dollars to design and manufacture.

FPGAs are a possibility. I'm not sure how much faster an FPGA will be than a GPU though. You will basically be re-creating the hardware multipliers that already exist on the GPU.
member
Activity: 313
Merit: 34
June 02, 2020, 11:30:11 AM
As said by MrFreeDragon this kind of ASIC is dedicated to hashing only and Kangaroo has nothing to do with hashing. To my knowledge no ASIC exist for ECDSA calculation...


High-Performance CUDA Kernel Execution on FPGAs/ p.s.  FPGA 4xFaster then CUDA GPU

https://cadlab.cs.ucla.edu/~cong/papers/FCUDA_extAbstract_ICS09_final3.pdf

there is more then 35 company asic devices, min 10 company ASIC/FPGA devices and developer fee you can pay to developer for your personal needs, thats best solution
full member
Activity: 204
Merit: 437
June 02, 2020, 11:26:32 AM
IMO OpenCL on AMD would work only if written in assembler. I lost the race for #90 because of it. One week of random hard lock-up. And the worst. Wrong computations. Totally unusable. (It might have a chance to work properly with newer hardware, and a specific driver version, but I highly doubt.)

I looked last year at FPGA, and it will be slower than current top-end GPUs, costing much more for the same performance (per chip).

As for an ASIC, if it ever is done, I expect 3x performance per chip price compared to GPU. The performance per watt would be way better, maybe 50x. It might be feasible only after bitcoin price goes above $5M, making new chip costs millions. There's a huge disadvantage - unlike mining chips it would be used only one-two times.
member
Activity: 814
Merit: 20
$$P2P BTC BRUTE.JOIN NOW ! https://uclck.me/SQPJk
June 02, 2020, 11:23:45 AM
As said by MrFreeDragon this kind of ASIC is dedicated to hashing only and Kangaroo has nothing to do with hashing. To my knowledge no ASIC exist for ECDSA calculation...


High-Performance CUDA Kernel Execution on FPGAs/ p.s.  FPGA 4xFaster then CUDA GPU

https://cadlab.cs.ucla.edu/~cong/papers/FCUDA_extAbstract_ICS09_final3.pdf

Jast for Ex. There are many prommable ASICS and ASICS with FPGA

FPGA Miner https://www.ebay.com/itm/BlackMiner-F1-FPGA-Better-than-Bitmain-Antminer-ASIC-UK-In-Hand-w-PSU-and-SD/124207553835?hash=item1ceb58dd2b:g:nMgAAOSwT9lcq7AE

"FPGAs can be uploaded with a public or a private bitstream(s) so that you can mine a new algorithm (you can't do this with ASICs). "
Jump to: