Pages:
Author

Topic: Science Fair Project to trap Bitcoin private keys using Kangaroos! - page 4. (Read 7850 times)

legendary
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
Thanks for all your input.  We are reading it over carefully to see what we can use in our project and will get back to you.

We have already rewritten the following functions to make them as fast as possible:

static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context *ctx, secp256k1_gej *r, const secp256k1_scalar *gn)
SECP256K1_INLINE static int secp256k1_scalar_reduce(secp256k1_scalar *r, unsigned int overflow)
static int secp256k1_scalar_add(secp256k1_scalar *r, const secp256k1_scalar *a, const secp256k1_scalar *b)

We are working on rewriting:

static void secp256k1_gej_add_ge(secp256k1_gej *r, const secp256k1_gej *a, const secp256k1_ge *b)

Our internal function

Code:
#define ec_point_mul_and_add(r,a,n)    
{
    secp256k1_gej rj;
    secp256k1_ecmult_gen(&ecgrp->ecmult_gen_ctx, &rj, n);  // highly optimized
    secp256k1_gej_add_ge_var(&rj, &rj, a, NULL); // optimization is work in progress
    secp256k1_ge_set_gej_var(r, &rj);
}

We heavily rely on and use the following observation to inform our parameter choices:

Code:
If the PRF is defined as follows:

    x = 1 << (X % m)

The PRF will generate one of the numbers: 2**0, 2**1, .., 2**(m-1) with equal probability

So the average hop distance as a function of m can be calculated as follows:

    ahd = (1/m)(2**0) + (1/m)(2**1) + .. + (1/m)(2**(m-1))
        = (1/m)[(2**0) + (2**1) + .. + (2**(m-1))]
        = (1/m)[(2**m) - 1]
        = ((2**m) - 1)/m

Therefore the entire distance hopped by the kangaroo, given nn hops, can be estimated as:

    c = nn(ahd)
      = (nn/m)((2**m) - 1)


jr. member
Activity: 38
Merit: 18
The claimed avg "expected of 2w^(1/2) group operations" is not achievable with m = w^(1/2))/2 (mean jump size from acticles by Pollard)
Empirical tests have shown that 2w1/2 is achievable at m * 8.
I ask you to check on your implementation what speed will be if the MID/MAX step size is increased by 8 times, plz.
thx, no need
all norm after add
Code:
# get JmaxofSp
def getJmaxofSp(optimalmeanjumpsize, dS):
if flag_verbose > 0:
print('[optimal_mean_jumpsize] %s' % optimalmeanjumpsize)

sumjumpsize = 0

for i in range(1,len(dS)):

#sumjumpsize = (2**i)-1
#sumjumpsize += 2**(i-1)
sumjumpsize += dS[i-1]

now_meanjumpsize = int(round(1.0*(sumjumpsize)/(i)))

#next_meanjumpsize = int(round(1.0*(sumjumpsize+2**i)/(i+1)))
next_meanjumpsize = int(round(1.0*(sumjumpsize+dS[i])/(i+1)))

if flag_verbose > 1:
print('[meanjumpsize#Sp[%d]] %s(now) <= %s(optimal) <= %s(next)' % (i, now_meanjumpsize, optimalmeanjumpsize, next_meanjumpsize ))


if  optimalmeanjumpsize - now_meanjumpsize <= next_meanjumpsize - optimalmeanjumpsize :
if flag_verbose > 0:
print('[meanjumpsize#Sp[%d]] %s(now) <= %s(optimal) <= %s(next)' % (i, now_meanjumpsize, optimalmeanjumpsize, next_meanjumpsize ))

if flag_verbose > 0:
print('[JmaxofSp] Sp[%s]=%s nearer to optimal mean jumpsize of Sp set' % (i, now_meanjumpsize))

return i

print("\n[FATAL_ERROR] JmaxofSp not defined!\n"); exit(-1)

#################
about speed

We have updated the OP with our latest results.  Here is a summary of the results of runs for tests 0 through 49:...
That is already good result!
no, absolutely not

August 26, 2019
Code:
pkey    Standard OpenSSL library     secp256k1 (S 8X32, F 10X26)
bits   hops/second   test run time   hops/second   test run time
====   ===========================   ===========================
  31   1942.378857   28.4496 secs    5520.634781   10.0240 secs
Standard OpenSSL library averaged 1897 hops per second    
secp256k1 (S 8X32, F 10X26) averaged 5258 hops per second  

September 09, 2019 (secp256k1)
Code:
test       time         hps       hops
----  --------------  -------  ----------
31    9.2328 secs    11,884  109705
already better, but..

it very slow for 1core C/C++ openssl/secp256k1, it should be more!
(even if u built early classic 1 Wild algo with 3.28w^(1/2), this does not explain x10 speed drop)

Are you using affine or jacobian coordinates for the points?
42: Answer to the Ultimate Question of Life, the Universe, and Everything"

secp256k1 have functions J+J->J , J+A->J , and no have functions A+A->A (Is this a precisely optimized library for bitcoin? Smiley)
(all simple, because in real often need multiply rand point instead adding, and jacobian is better for multiply)
J+J->J (12M, 4S) - bad choice
J+A->J (8M, 3S) - already better
(look eprint.iacr.org/2016/103.pdf)
you can take the code from break_short by Jochen Hoenicke, it fits perfectly.
this code uses J+A->J and convert only X to Affine xJ ->xA (its max what we have in secp256k1 as it is)
where A is S-table with pre-computed G,2G,4G,8G,16G..
..u used pre-computed, y? not say me what u calc every Si permanently used multiplication, nonono,plz,no! X%mask instead pow2(X%k)?
(..by the way, that would explain your terrible speed..)
..and use old alternatively functions without protection against side channel attacks, +10-20% speed
secp256k1_gej_add_ge_var() instead secp256k1_gej_add_ge()
(warn, _var() need normalize before use in some functions.. my early mistake, "leading zeros" github.com/bitcoin-core/secp256k1/issues/601)
in ideal - need write new functions:
A+A->A (1I, 2M, 1S) - best choice, because in the end we need X in affine for unique representation each point to adding in storage hashtable
..u used hashtable, y?)
after which do not need to spend 8M,3S, only nearly 20M to each 1I, it +35% speed-up
like that

########
I seem to know what the problem is.
This is a very obvious fact for me, I should have started with it.
(I should have guessed at a time when about ECMULT_WINDOW_SIZE)

No multiplies the points inside the loop!
I warn, multiplication points is very expensive, more expensive than the slowest addition points.
Shouldn't use it at all in multi-million integration cycles.
You must calculate all possible displacement points before the start of the cycle.
Then just add them, being inside cycle.

I think you are using a mask
https://bitcointalksearch.org/topic/m.52068840
..so you can’t store several million pre-calculated points in memory. it explains your choice.
Quote
   x = PRF(X)
      = 1 << (X % m)
...
ec_point_mul_and_add(r,a,n)
not mask, i see,.. then simple, u tried built classic pattern of a*b+c

Quote
..rewritten the following functions..
I tell you - take loop from break_short as it is (with standart functions), rewrite to kangaroos, and benchmark it.
if after you still want to rewrite what, then ok (but i think what not)
I think the "secret sauce" will not create, it will turn out an exotic slow implementation through multiplication.
I would like to be mistaken, but my experience says that you lose time.
..How do I know about multiplication?.. I checked it myself,.. and from the topic about why brainflayer is slow and why it cannot be done faster,.. and it was on one of the topics pages LBC.
This is not the most famous fact, it is not surprising that you did not know about it, no fault.

Its not random, not brainflayer,  its pseudo random walk, so we can pre-compute our jumps size (offset points) for using addition points instead multiplication points.
jr. member
Activity: 59
Merit: 3
We have updated the OP with our latest results.  Here is a summary of the results of runs for tests 0 through 49:...
That is already good result!
When will be the first release?
legendary
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
We have updated the OP with our latest results.  Here is a summary of the results of runs for tests 0 through 49:

Code:
test       time         hps       hops       compares    mem(bytes)    total time     time hopping   not hopping
----  --------------  -------  ----------  ------------  ----------  --------------  --------------  -----------
   0    0.3237 msecs   10,504  1           3             1032                   323              95          228
   1    0.2853 msecs   11,062  1           3             1032                   285              90          194
   2    0.2785 msecs   11,111  1           3             1032                   278              90          188
   3    1.8730 msecs   11,271  8           10            1032                 1,873             709        1,163
   4    1.8351 msecs   11,147  11          13            1032                 1,835             986          848
   5    2.1243 msecs   11,123  15          17            1032                 2,124           1,348          775
   6    5.2260 msecs   11,069  52          54            1032                 5,226           4,697          528
   7    3.9406 msecs   11,065  32          34            1032                 3,940           2,891        1,048
   8    4.6425 msecs   10,174  45          47            1032                 4,642           4,423          219
   9   46.8674 msecs   11,052  510         512           1032                46,867          46,146          721

  10   84.2980 msecs   10,674  893         895           1032                84,298          83,661          636
  11  122.2013 msecs   11,662  1413        1415          1032               122,201         121,162        1,039
  12  257.2438 msecs   11,579  2976        2978          1032               257,243         257,019          224
  13  514.6517 msecs   11,402  5840        5842          1032               514,651         512,196        2,455
  14  501.1961 msecs   11,791  5901        5903          1032               501,196         500,466          729
  15    1.1839 secs    11,886  14026       14028         1032             1,183,908       1,180,079        3,829
  16    3.0370 secs    11,631  35249       35251         1032             3,037,005       3,030,586        6,419
  17    5.6526 secs    11,230  63475       63477         1032             5,652,638       5,652,025          613
  18   14.0759 secs    11,847  166753      166755        1032            14,075,930      14,075,708          222
  19   15.5404 secs    11,922  185259      185261        1032            15,540,417      15,539,472          945

  20  742.1446 msecs   11,882  8804        5677          32088              742,144         740,944        1,200
  21  475.1973 msecs   11,881  5627        1903          32696              475,197         473,623        1,574
  22  561.2366 msecs   11,555  6359        3169          33912              561,236         550,304       10,932
  23  428.6246 msecs   11,867  5044        1167          36344              428,624         425,047        3,577
  24    2.2156 secs    11,842  26026       22985         41208            2,215,606       2,197,825       17,780
  25  350.9793 msecs   11,864  4025        695           50936              350,979         339,275       11,704
  26  650.9386 msecs   11,695  7326        4443          70392              650,938         626,434       24,504
  27   11.2562 secs    11,793  132214      128675        109304          11,256,223      11,211,378       44,845
  28    4.5728 secs    11,819  52883       49679         187128           4,572,770       4,474,294       98,476
  29    4.2294 secs    11,907  48300       45216         342776           4,229,449       4,056,522      172,927

  30   10.6122 secs    11,759  124766      56010         62808           10,612,244      10,610,509        1,735
  31    9.2328 secs    11,884  109705      41191         63416            9,232,760       9,231,168        1,592
  32   14.3117 secs    11,922  170592      103053        64632           14,311,717      14,309,552        2,165
  33   55.8702 secs    11,902  664893      596368        67064           55,870,151      55,862,085        8,066
  34   22.1413 secs    11,923  263922      195453        71928           22,141,326      22,135,223        6,103
  35    1.0531 mins    11,927  753508      683623        81656           63,186,883      63,175,281       11,602
  36   28.3459 secs    11,924  337734      269198        101112          28,345,873      28,323,662       22,210
  37    9.2387 mins    11,860  6573832     6505951       140024         554,319,152     554,275,660       43,492
  38   11.2025 secs    11,926  132534      64338         217848          11,202,535      11,113,380       89,154
  39    4.5557 mins    11,925  3257543     3189102       373496         273,342,973     273,170,580      172,393

  40   13.1216 mins    11,927  9389808     7285993       124248         787,294,602     787,289,468        5,133
  41   10.3313 mins    11,929  7394352     5292081       124856         619,880,117     619,878,329        1,787
  42    3.8015 mins    11,935  2722138     620876        126072         228,090,413     228,087,570        2,842
  43    5.7648 mins    11,924  4124314     2019820       128504         345,888,060     345,884,231        3,828
  44    3.2043 mins    11,930  2293481     193018        133368         192,258,803     192,252,406        6,397
  45   33.9166 mins    11,919  24254078    22153676      143096       2,034,994,703   2,034,982,887       11,815
  46   47.9810 mins    11,922  34321486    32221012      162552       2,878,859,972   2,878,832,117       27,854
  47    2.4000 hours   11,918  102968649   100868429     201464       8,639,864,920   8,639,820,958       43,962
  48    4.1320 hours   11,924  177371081   175268406     279288      14,875,131,910  14,875,045,149       86,760
  49   23.9987 hours   11,914  1029325281  1027225432    434936      86,395,345,966  86,395,170,751      175,214
      ==============   ======                                        ==============  ==============  ===========
                       11,630                                       118,106,695,147 118,105,560,480    1,134,666
Where
Code:
test          test number
time          run time needed to find the private key
hps           hops per second (averaged 11,630 on my laptop with a highly modified secp256k1 library)
hops          total number of hops it took to find the private key
cmps          total number of point comparisons it took to find the private key
mem           amount of memory dynamically allocated during the run
total time    total run time in msec
time hopping  total time spent hopping in msec
not hopping   the overhead (time not hopping) in msec
member
Activity: 245
Merit: 17

I have tried the python code on linux and windows with or without gmpy2.
I was able to retrieve all known private keys (up to case 100) with linux python.
However when using windows things go wrong beyond case 59.
For exemple
case 60 ... the private key should be  0xFC07A1825367BDA   but I get 0xFC07A1825367BXX  with  XX a random number
case 61 ... the private key should be  0x13C96A3742F64906  but I get 0x13C96A3742F64XXX with XXX a random number
case 63 ... the private key should be  0x7CCE5EFDACCF6808 but I get 0x7CCE5EFDACCF6XXX with XXX a random number
etc ...

This bug persists even without gmpy2 !!!

I am working on it  Grin


What hardware do you use and how long it took to crack 100?

To check all known private keys, I changed the code to search within a given range [a, b], not necessarily the whole range. For instance, private key for case 100 is  0xaf55fc59c335c8ec67ed24826, so if I run my modified python script for range [ 0xaf55fc59c335c8e0000000000, 0xaf55fc59c335c8f0000000000 ], I get the answer in less than half a minute.


like to share modifed script for search range ?


See here:  https://bitcointalksearch.org/topic/m.52375040
legendary
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
legendary
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
I am up for testing also? It is live on GitHub yet?
Hello, I am glad to participate in this project.
Regarding the Elliptic Curve, I think there is a weakness in the current Algorithm.
Also, I'm interested in testing... the fpga port.
I realise that is a far way off, and I'm willing to help with the development if you and your daughter are open to the idea.
You have been added to the list of possible testers.  See the OP of this thread.

Whatup whatup, all doing their own test, nice to have something to test on a pkey or address,
how should i know how it works, i don't know, any news fpga or gpu or cpu dont matter, i'm in,
systems ready, got two days off,
so love peace out,
You are already on the list.  We are still working on our first release of code.
legendary
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
Does anyone know why this post has been erased? it is still here because "brainless" quoted it?   

Yes, your entire conversation is still in the thread, again:

Please note that since this thread will be used as reference material in a science fair project we delete most posts in this thread just to keep it clean.

In other words, once a post is quoted then we delete the original post.  

Your posts will all still be in the thread.

The thread stays shorter and cleaner this way.

I hope this is not an issue for anyone.  It is just to keep the thread as short and clean as possible.  It is not because we do not appreciate your posts.

We do!

Thanks!
member
Activity: 348
Merit: 34

I have tried the python code on linux and windows with or without gmpy2.
I was able to retrieve all known private keys (up to case 100) with linux python.
However when using windows things go wrong beyond case 59.
For exemple
case 60 ... the private key should be  0xFC07A1825367BDA   but I get 0xFC07A1825367BXX  with  XX a random number
case 61 ... the private key should be  0x13C96A3742F64906  but I get 0x13C96A3742F64XXX with XXX a random number
case 63 ... the private key should be  0x7CCE5EFDACCF6808 but I get 0x7CCE5EFDACCF6XXX with XXX a random number
etc ...

This bug persists even without gmpy2 !!!

I am working on it  Grin


What hardware do you use and how long it took to crack 100?

To check all known private keys, I changed the code to search within a given range [a, b], not necessarily the whole range. For instance, private key for case 100 is  0xaf55fc59c335c8ec67ed24826, so if I run my modified python script for range [ 0xaf55fc59c335c8e0000000000, 0xaf55fc59c335c8f0000000000 ], I get the answer in less than half a minute.


like to share modifed script for search range ?
member
Activity: 245
Merit: 17
Terminology for the science fair project

-------------------------
-------------------------


0x034AF4B81F8C450C2C870CE1DF184AFF1297E5FCD54944D98D81E1A545FFB22596
54 bit

222872.369 h/s
223958.708 h/s
---------------
230794.045 h/s
230895.206 h/s
230857.369 h/s

What is your PC configuration ?

intel i3-6100, 8gb ddr4 ram, ubuntu 18.x

It runs at 3.7GHz ... that explains your rate ...
member
Activity: 245
Merit: 17
Finally I got gmpy2 installed on windows  Roll Eyes Thanks to this site : "https://www.lfd.uci.edu/~gohlke/pythonlibs/" which provides unofficial Windows binaries for Python Extension Packages.
Code:
 
D:\newwork\python>python kang.py
P-table prepared
tame and wild herds are prepared
115378.321 h/s
115352.684 h/s
113976.331 h/s
total time: 18.41 sec
SOLVED: 1003651412950
Hops: 2101873
tame and wild herds are prepared
112978.446 h/s
112514.481 h/s
105994.030 h/s
total time: 34.77 sec
SOLVED: 1003651412950
Hops: 1799360
tame and wild herds are prepared
110405.305 h/s
110319.404 h/s
113115.999 h/s
114104.774 h/s
114309.523 h/s
112088.956 h/s
110844.133 h/s
113467.999 h/s
112101.691 h/s
total time: 81.57 sec
SOLVED: 1003651412950
Hops: 5249192
3050141.6666666665 +/- 1102987.655596129
Average time to solve: 27.19 sec


My modest config: Intel(R) Core(TM) i7-2670QM CPU @2.20GHz  16Gb  (Dell Inspiron  N5110)

Without gmpy2, I was getting some 8000 h/s ...

for example, if you are running windows 10 64 bit and you have python 3.7   installed, download this file : gmpy2-2.0.8-cp37-cp37m-win_amd64.whl from the site above and do:

pip install gmpy2-2.0.8-cp37-cp37m-win_amd64.whl

Next
0) uncomment ligne 3   (import gmpy2)
1) uncomment ligne 45 (c = dy * gmpy2.invert(dx, p) % p)
2) comment ligne 45     (#c = dy * rev(dx, p) % p )

good luck !

Ooops !!! , I saw this afteward ...

https://bitcointalk.org/index.php?topic=5166284.260  from Telariust
newbie
Activity: 17
Merit: 2
Hello, I am glad to participate in this project.
Regarding the Elliptic Curve, I think there is a weakness in the current Algorithm.
jr. member
Activity: 59
Merit: 3
To Firefox:
At startup files wild.txt & tame.txt were also empty, but after 2 days of work has appeared a few lines.
Does that mean that kargaroo jumps appears very late?

I think yes. Puzzle # 105 has a lot of difficulty.
Obviously, you need a version of the program on the GPU. There are already several programs for computing using the Pollard algorithm on the GPU, but they are not in the public domain.
Also on freelance sites there are several orders for the development of such programs on CUDA and OpenCL. Budget from 300 to 1000 dollars.
Maybe someone will share or sell a similar program.
That is the problem, that versions of the script which are capable to use a GPU resources are available only for money.
If any of it realy exist... I have a huge doubt! Otherwise why to sell it if you have it )). Take a fu****g loan in the bank, rent any huge cloud GPU VPS and open all addresses with available PUBKEY. That it!
But as far as we can see, since this scripts were anounced none of the 100+ addresses was openned, so looks like or it doesn't work or it's fake.)))

Anyhow, hopefully a good part of community soon will release an open source version on such script. Fingers crossed!
newbie
Activity: 4
Merit: 0
To Firefox:
At startup files wild.txt & tame.txt were also empty, but after 2 days of work has appeared a few lines.
Does that mean that kargaroo jumps appears very late?

I think yes. Puzzle # 105 has a lot of difficulty.
Obviously, you need a version of the program on the GPU. There are already several programs for computing using the Pollard algorithm on the GPU, but they are not in the public domain.
Also on freelance sites there are several orders for the development of such programs on CUDA and OpenCL. Budget from 300 to 1000 dollars.
Maybe someone will share or sell a similar program.
jr. member
Activity: 59
Merit: 3
Who need some code, I have it for Python 2.7!
This is my own code and I apologize for absolutely no-PEP. And for my English, I apologize too.
The code is not so fast, but my estimation is more optimistic - only 10000 years on average for the problem 105.
Tests with C++ (VS2013) leads to x2..x3 faster calculations only (single-thread), but much more complicated code.

The code is on my WebSite:
http://fe57.org/forum/thread.php?board=4&thema=1#1

This is my main interest. You are now informed that my site exists in the Universe!
To be more interesting: test problem 75 from pickachunakapika to drotika solution:
26970178626862821196031
(decimal)
I guess that the solution can be posted now because all deadlines are finished. Or I miss something and drotika was already posted the answer.

Very nice Smiley

For those who use python3, you may have to change a few things for the code to work:

1) for all print statements you need (), for example line 60
change this > print 'prime must be 3 modulo 4'
to this > print ('prime must be 3 modulo 4')

2) make sure you distinguish between integer division //  and float division /
for example, line 63
change this >  pw = (p + 1) / 4
to this >  pw = (p + 1) // 4

For case 32 I get
P-table prepared
tame and wild herds are prepared
6091.198 h/s
6119.991 h/s
6047.619 h/s
6028.418 h/s
6020.816 h/s
5995.830 h/s
6041.220 h/s
6026.008 h/s
6077.185 h/s
total time: 45.38 sec
SOLVED: 3093472814
Hops: 273742
tame and wild herds are prepared
5995.223 h/s
6044.420 h/s
5966.429 h/s
6050.819 h/s
total time: 68.41 sec
SOLVED: 3093472814
Hops: 137749
tame and wild herds are prepared
6038.017 h/s
5968.435 h/s
total time: 82.83 sec
SOLVED: 3093472814
Hops: 85546
165679.0 +/- 56093.66357263537
Average time to solve: 27.61 sec



Hi, how did you change line 160 for python 3?

if you mean this line:

print '%.3f h/s'%((Hops-Hops_old)/(t1-t0))
just add parenthesis
print ('%.3f h/s'%((Hops-Hops_old)/(t1-t0)))
do the same for all print statements.






Now everything works!
I didn’t put it there ()
thank Wink

I have installed gmpy2 module v.2.0.8 and got a good results, speed increased from ~6k h/s till ~150k h/s and for e.g. #45 solved in ~100sec. But when I run the same script for e.g. #105 files wild.txt & tame.txt are empty ((.
When you, guys, run this script for the #105 addr, do your files wild.txt & tame.txt being filled-up with any data?
If yes share pls your version of script.
legendary
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
I draw your attention, secp256k1_scalar_set_int() has x32-x64 problem
my fix
Code:
void secp256k1_scalar_set_uint64(secp256k1_scalar *r, uint64_t v){
    #if defined(USE_SCALAR_4X64)
r->d[0] = v;
r->d[1] = r->d[2] = r->d[3] = 0;
    #elif defined(USE_SCALAR_8X32)
r->d[0] = (uint32_t)(v);
r->d[1] = (uint32_t)(v>>32);
r->d[2]=r->d[3]=r->d[4]=r->d[5]=r->d[6]=r->d[7]=0;
    #endif
}
I must probably unsubscribe to the developers  Smiley I completely forgot
maybe your problems grow from here.
We also found this issue and also quickly created our own functions to get it to work.  We also called our new functions secp256k1_scalar_set/get_uint64 so I guess great minds think alike! Wink

We were just trying to quickly get everything to work so we have not optimized anything yet.  Our function was:

Code:
static uint64_t secp256k1_scalar_get_uint64(secp256k1_scalar * a)
{
    uint8_t b32[32];
    secp256k1_scalar_get_b32(b32, a);

    const uint64_t ret =
        ((uint64_t)b32[24] << 56 ) |
        ((uint64_t)b32[25] << 48 ) |
        ((uint64_t)b32[26] << 40 ) |
        ((uint64_t)b32[27] << 32 ) |
        ((uint64_t)b32[28] << 24 ) |
        ((uint64_t)b32[29] << 16 ) |
        ((uint64_t)b32[30] <<  8 ) |
        ((uint64_t)b32[31] <<  0 ) ;

    return ret;
}
Obviously yours is much better/faster so we have replaced ours with yours.  Thanks!

ENDOMORMPHSIM
Endomorphism is implemented here only to accelerate the signing of the transaction, so do not expect this to accelerate the usual multiplication.
When we ran our benchmark there was no difference.  That explains it, so thanks.  We have turned off this flag since it does not help us.

We currently have ECMULT_WINDOW_SIZE set to 15.  Does anyone know how changing this setting will affect performance?
table #1
eprint.iacr.org/2016/103.pdf
The size of a pre-computed table in memory is created just for this. Speeds up multiplication in exchange for RAM.
Ryan also used the increased size -w 18/20/22/24.. in the brainflayer.
ECMULT_WINDOW_SIZE?
maybe ECMULT_TABLE_SIZE with WINDOW_G?
ecmult_impl.h
Code:
#else
/** One table for window size 16: 1.375 MiB. */
#define WINDOW_G 16
#endif
We currently have ECMULT_WINDOW_SIZE set to 15.  Does anyone know how changing this setting will affect performance?

questions like these can only be answered with a well designed benchmark.
in this case you have to measure how long it takes to do the precomputation with different values ∈ [2, 24] and how that compares with the rest of computation. if the rest is significantly longer then it can justify that long initial precomputation and allocation of memory.
Thanks for the very useful information.  We will play with this setting in the future.

We also found this in the code:
Code:
/** Larger values for ECMULT_WINDOW_SIZE result in possibly better
 *  performance at the cost of an exponentially larger precomputed
 *  table. The exact table size is
 *      (1 << (WINDOW_G - 2)) * sizeof(secp256k1_ge_storage)  bytes,
 *  where sizeof(secp256k1_ge_storage) is typically 64 bytes but can
 *  be larger due to platform-specific padding and alignment.
 *  If the endomorphism optimization is enabled (USE_ENDOMORMPHSIM)
 *  two tables of this size are used instead of only one.
 */
#  define WINDOW_G ECMULT_WINDOW_SIZE

/* Noone will ever need more than a window size of 24. The code might
 * be correct for larger values of ECMULT_WINDOW_SIZE but this is not
 * not tested.
 *
 * The following limitations are known, and there are probably more:
 * If WINDOW_G > 27 and size_t has 32 bits, then the code is incorrect
 * because the size of the memory object that we allocate (in bytes)
 * will not fit in a size_t.
 * If WINDOW_G > 31 and int has 32 bits, then the code is incorrect
 * because certain expressions will overflow.
 */
#if ECMULT_WINDOW_SIZE < 2 || ECMULT_WINDOW_SIZE > 24
#  error Set ECMULT_WINDOW_SIZE to an integer in range [2..24].
#endif
legendary
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
We now have a compile time switch between OpenSSL, secp256k1(SCALAR_8X32, FIELD_10X26), secp256k1(SCALAR_4X64, FIELD_5X52), and secp256k1(SCALAR_4X64, FIELD_5X52, ENDOMORMPHSIM).  Here is the run time and hops per second comparison for the 32 unit tests:

Code:
pkey    Standard OpenSSL library     secp256k1 (S 8X32, F 10X26)    secp256k1(S 4X64, F 5X52)    secp256k1(S4X64, F5X52, EM)
bits   hops/second   test run time   hops/second   test run time   hops/second   test run time   hops/second   test run time
====   ===========================   ===========================   ===========================   ===========================
   1   2003.205128    6.0950 msecs   5114.435494    6.3592 msecs   4318.255425    6.4021 msecs   5269.397971    5.8835 msecs
   2   1757.199025    8.3635 msecs   4655.764418    7.8586 msecs   5142.049107    7.0887 msecs   4802.497299    7.6262 msecs
   3   1716.811880    5.9904 msecs   5003.752815    6.9754 msecs   5110.514884    6.0526 msecs   4268.487888    6.8087 msecs
   4   1960.438354   26.3895 msecs   5092.427560   27.7752 msecs   5187.260089    7.4309 msecs   4996.252810    7.7770 msecs
   5   1880.119970   30.0658 msecs   5013.009954   31.3478 msecs   5215.318134    9.4163 msecs   5236.254831    9.7029 msecs
   6   1608.708475   30.6862 msecs   4410.467510   30.2971 msecs   4926.917392   22.3417 msecs   5644.933672   20.1065 msecs
   7   1866.896315   34.8628 msecs   5558.843043   31.2577 msecs   5131.685676   21.5328 msecs   5181.347150   20.9325 msecs
   8   1990.502460   31.4858 msecs   5039.596832   30.4067 msecs   4838.430966   20.6268 msecs   5072.279990   28.5510 msecs
   9   1928.760658   40.3494 msecs   4950.639215   41.2204 msecs   5260.714838   22.1908 msecs   5241.170939   20.5226 msecs
  10   1890.373754   81.8738 msecs   5256.467332   57.0551 msecs   5429.994625   32.7629 msecs   5453.290894   41.4165 msecs
  11   1694.971872  101.2429 msecs   4763.924918   61.4267 msecs   5340.172912   41.3388 msecs   5439.249145   46.5306 msecs
  12   1858.940556    1.0718 secs    5145.349959  527.2134 msecs   5326.737472  410.4776 msecs   5221.758623  434.4509 msecs
  13   1895.071971   51.6923 msecs   5386.060874   38.1821 msecs   5233.349227   23.4991 msecs   5373.390968   31.1977 msecs
  14   1908.767643  896.4858 msecs   5082.972044  385.9540 msecs   5430.446276  321.9922 msecs   5441.145301  317.7290 msecs
  15   1939.352878  439.6496 msecs   5326.381969  195.0367 msecs   5488.742759  160.9818 msecs   5420.695712  179.5955 msecs
  16   1918.732814  530.3667 msecs   5258.277326  219.2995 msecs   5510.868625  188.6750 msecs   5466.555751  201.7339 msecs
  17   1895.048388  205.8168 msecs   5430.753319   95.6410 msecs   5453.448080   79.5526 msecs   5339.284536   85.5150 msecs
  18   1930.620117    1.3492 secs    5136.760660  546.6825 msecs   5545.967087  474.6491 msecs   5433.455762  484.8219 msecs
  19   1959.292325  594.2687 msecs   5364.976931  228.5351 msecs   5407.810389  219.3399 msecs   5489.027106  219.1785 msecs
  20   1937.090596    1.2060 secs    5519.443944  438.4141 msecs   5567.924674  427.1013 msecs   5487.179193  432.8127 msecs
  21   1934.545311    2.1349 secs    5493.508717  774.8660 msecs   5504.112724  758.1783 msecs   5469.417987  766.5974 msecs
  22   1929.260673    3.4962 secs    5376.994864    1.2774 secs    5564.087823    1.2174 secs    5465.781801    1.2414 secs
  23   1934.889761    7.0861 secs    5476.220007    2.5224 secs    5548.504187    2.4744 secs    5465.500565    2.5118 secs
  24   1939.683627    5.8652 secs    5461.486912    2.1110 secs    5548.744349    2.0488 secs    5417.722111    2.1042 secs
  25   1925.342112   47.4712 secs    5476.818503   16.7678 secs    5557.570589   16.4377 secs    5479.097929   16.6700 secs
  26   1930.362383   16.1728 secs    5486.557150    5.7494 secs    5553.792419    5.6187 secs    5544.780029    5.6293 secs
  27   1945.269663    7.2153 secs    5443.169561    2.6035 secs    5534.059747    2.5462 secs    5507.743611    2.5530 secs
  28   1920.384111   37.4526 secs    5479.914967   13.1715 secs    5553.680361   12.9380 secs    5507.436467   13.0454 secs
  29   1940.307975    7.8315 secs    5530.787084    2.7661 secs    5547.749218    2.7347 secs    5510.767015    2.7558 secs
  30   1927.658333    1.1905 mins    5468.604811   25.2139 secs    5438.685448   25.3136 secs    5555.456216   24.7841 secs
  31   1942.378857   28.4496 secs    5520.634781   10.0240 secs    5501.207943   10.0433 secs    5510.264398   10.0245 secs
  32   1904.831829   59.7541 secs    5521.457239   20.6445 secs    5537.529539   20.5532 secs    5496.131886   20.7084 secs
Standard OpenSSL library averaged 1897 hops per second    
secp256k1 (S 8X32, F 10X26) averaged 5258 hops per second    
secp256k1(S 4X64, F 5X52) averaged 5352 hops per second
secp256k1(S 4X64, F 5X52, , ENDOMORMPHSIM) averaged 5350 hops per second

We currently have ECMULT_WINDOW_SIZE set to 15.  Does anyone know how changing this setting will affect performance?
legendary
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
One weird thing is that the runs on the Linux machine took longer than the ones on Windows (usually is the other way around). And  the other weird thing is that the program is called 'roo' in Windows and 'ROO.exe' in Linux Smiley
It is actually ROO.exe in Windows but Windows adds the .exe for you and is not case sensitive.
..OpenSSL library is pretty slow.
...
ideally you need C bitcoin/secp256k1
...
Thanks for the suggestion!  I should have thought of that first.  It is silly to use OpenSSL for secp256k1 when there is a highly optimized specialized implementation available.  Results available soon.
jr. member
Activity: 38
Merit: 18
..OpenSSL library is pretty slow.
OpenSSL's advantage in versatility, but not in speed, no.

I bet python+coincurve will be faster!
ideally you need C bitcoin/secp256k1
I am porting your code to bitcoin/secp256k1 as soon as the source code is published.
(there is the usual addition and multiplication of points, nothing complicated)
(..but I feel you want to do it yourself, your own and not someone else's head?)
legendary
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
We are about to release the first executable of our project for feedback.  There will be a Windows 64 bit cygwin version and a 64 bit Linux version.  The program is written in C using the standard OpenSSL library BIGNUM and EC_POINT objects.  The main thing we are interested in for this release of the program is the number of hops per second on various machines.  It seems to us the OpenSSL library is pretty slow.  Here is what three runs of the program, for a 35 bit key, looks like on my Windows laptop:

Code:
C:\ROO\Debug>roo

ver   = 121 with 128 bit integers
test  = 34
seed  = 1566657361 (randomly selected)
bits  = 35
a     = 0x400000000
b     = 0x7FFFFFFFF
P     = 0x2F6A8148A62320E149CB15C544FE8A25AB483A0095D2280D03B8A00A7FEADA13D
ba    = 0x3FFFFFFFF
k0000 = 0x9D8358B56E7B4401A459C0D828A75681
s     = 0x4AED21170
%     =   17.0723
s*G   = 0x2F6A8148A62320E149CB15C544FE8A25AB483A0095D2280D03B8A00A7FEADA13D
time  =    9.0817 mins (544899402700 nsecs)
hps   = 1968.057310

C:\ROO\Debug>roo

ver   = 121 with 128 bit integers
test  = 34
seed  = 1566658074 (randomly selected)
bits  = 35
a     = 0x400000000
b     = 0x7FFFFFFFF
P     = 0x2F6A8148A62320E149CB15C544FE8A25AB483A0095D2280D03B8A00A7FEADA13D
ba    = 0x3FFFFFFFF
k0000 = 0xC7B9A08889DCE535759804A45AE046BC
s     = 0x4AED21170
%     =   17.0723
s*G   = 0x2F6A8148A62320E149CB15C544FE8A25AB483A0095D2280D03B8A00A7FEADA13D
time  =    9.0295 mins (541767121800 nsecs)
hps   = 1986.290418

C:\ROO\Debug>roo

ver   = 121 with 128 bit integers
test  = 34
seed  = 1566658634 (randomly selected)
bits  = 35
a     = 0x400000000
b     = 0x7FFFFFFFF
P     = 0x2F6A8148A62320E149CB15C544FE8A25AB483A0095D2280D03B8A00A7FEADA13D
ba    = 0x3FFFFFFFF
k0000 = 0xDF1DD25825020380D574A71285072428
s     = 0x4AED21170
%     =   17.0723
s*G   = 0x2F6A8148A62320E149CB15C544FE8A25AB483A0095D2280D03B8A00A7FEADA13D
time  =    9.0542 mins (543253475900 nsecs)
hps   = 1981.294420

C:\ROO\Debug>
And here are three runs on my Linux machine.  Maybe I just need to buy a much faster computer.
Code:
burt@burt-MS-7640:~/ROO1/Debug$ ./ROO.exe

ver   = 121 with 128 bit integers
test  = 34
seed  = 1566661363 (randomly selected)
bits  = 35
a     = 0x400000000
b     = 0x7FFFFFFFF
P     = 0x2F6A8148A62320E149CB15C544FE8A25AB483A0095D2280D03B8A00A7FEADA13D
ba    = 0x3FFFFFFFF
k0000 = 0xE6A6C5C436EBA130AD726DC36B710275
s     = 0x4AED21170
%     =   17.0723
s*G   = 0x2F6A8148A62320E149CB15C544FE8A25AB483A0095D2280D03B8A00A7FEADA13D
time  =   15.0364 mins (902182961776 nsecs)
hps   = 1225.428980

burt@burt-MS-7640:~/ROO1/Debug$ ./ROO.exe

ver   = 121 with 128 bit integers
test  = 34
seed  = 1566663911 (randomly selected)
bits  = 35
a     = 0x400000000
b     = 0x7FFFFFFFF
P     = 0x2F6A8148A62320E149CB15C544FE8A25AB483A0095D2280D03B8A00A7FEADA13D
ba    = 0x3FFFFFFFF
k0000 = 0x1970EBB70E07E6F19E9D0390432AD93C
s     = 0x4AED21170
%     =   17.0723
s*G   = 0x2F6A8148A62320E149CB15C544FE8A25AB483A0095D2280D03B8A00A7FEADA13D
time  =   14.6529 mins (879173107944 nsecs)
hps   = 1219.290980

burt@burt-MS-7640:~/ROO1/Debug$ ./ROO.exe

ver   = 121 with 128 bit integers
test  = 34
seed  = 1566665966 (randomly selected)
bits  = 35
a     = 0x400000000
b     = 0x7FFFFFFFF
P     = 0x2F6A8148A62320E149CB15C544FE8A25AB483A0095D2280D03B8A00A7FEADA13D
ba    = 0x3FFFFFFFF
k0000 = 0x504A020DE4246A5FF39CD676B27DD5A3
s     = 0x4AED21170
%     =   17.0723
s*G   = 0x2F6A8148A62320E149CB15C544FE8A25AB483A0095D2280D03B8A00A7FEADA13D
time  =   14.7170 mins (883021823424 nsecs)
hps   = 1217.660480

burt@burt-MS-7640:~/ROO1/Debug$
Pages:
Jump to: