Large Bitcoin Collider (Collision Finders Pool) - page 38.

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

Quote from: arulbero on February 13, 2017, 01:17:16 PM

6,2 s: CPU generates 16.7 M of public keys (x,y)
1,8 s: GPU performs SHA256 / ripemd160 of (x,y) and (x) <-compressed,

Yes.

Quote

what do you mean "compressed key is done with GPU"?

Code:

sha256_in[0] = 0x02 | (sha256_in[64] & 0x01);

Quote

Anyway at the moment the cpu is the bottleneck, gpu does his work at least x3 faster than cpu...

Sure. It is a 1st step. The big advantage of this is, it works like a drop-in replacement.
I see lots of optimization potential, originally, my notebook maxed out at ~ 2.8 Mkeys/s and now

Code:

$ LBC -c 8
Ask for work... got blocks [383054009-383054392] (402 Mkeys)
oooooooooooooooooooooooo (7.30 Mkeys/s)

Rico

edit:

LOL...

Code:

$ LBC -t 1 -l 0
Ask for work... Server doesn't like us. Answer: toofast.

arulbero

legendary

Activity: 1948

Merit: 2097

Quote from: rico666 on February 13, 2017, 12:48:37 PM

Quote from: arulbero on February 13, 2017, 12:11:28 PM

CPU only for public keys generation + GPU for sha256/ripemd160?

Exactly. meanwhile I am at

Code:

real    0m8.561s
user    0m8.093s
sys     0m0.413s

(= 1959955 keys/s per CPU core with GPU support) and memory requirement on GPU a mere 29MB (GPU is bored)

Of the aforementioned 8 seconds, around 6.2 are ECC public key generation (16M uncompressed keys, the compressed key is done @ GPU).

6,2 s: CPU generates 16.7 M of public keys (x,y)
1,8 s: GPU performs SHA256 / ripemd160 of (x,y) and (x) <-compressed, what do you mean "compressed key is done with GPU"? Do you use 1 or 2 compressed keys? The x is always the same, you don't need to compute the y so you can generate 2 compressed keys for each uncompressed. Do you generate 33M of addresses each 8s or 50M of addresses?

Anyway at the moment the cpu is the bottleneck, gpu does his work at least x3 faster than cpu...

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

Quote from: arulbero on February 13, 2017, 12:11:28 PM

CPU only for public keys generation + GPU for sha256/ripemd160?

Exactly. meanwhile I am at

Code:

real    0m8.561s
user    0m8.093s
sys     0m0.413s

(= 1959955 keys/s per CPU core with GPU support) and memory requirement on GPU a mere 29MB (GPU is bored)

Quote

Why in the meantime the pool performance has fell down?

Because two (in words: two!) guys turned their machines off. Cheesy

I have a feeling this dip in performance is only temporary...

Of the aforementioned 8 seconds, around 6.2 are ECC public key generation (16M uncompressed keys, the compressed key is done @ GPU).
Every second less here counts, so naturally all you did towards ECC optimization will have maximum effect with the CPU/GPU hybrid.

Rico

arulbero

legendary

Activity: 1948

Merit: 2097

Hi,

Quote from: rico666 on February 13, 2017, 09:36:57 AM

Unoptimized CPU/GPU hybrid generator. 1st successful run on 1 CPU core with Nvidia GPU in tandem: 1811207 keys/s

CPU only for public keys generation + GPU for sha256/ripemd160? Why in the meantime the pool performance has fell down?

I have a new version of the ecc_for_collider:

1) + complement private keys

2) + comments

https://www.dropbox.com/s/3jsxjy7sntx3p4a/ecc_for_collider07.zip?dl=0

The file foo.py performs 16,4 M of useless products; just to appreciate the efficiency of the generation of public keys of the script gen_batches_points07.py:

main_batch --> (x,y) 3,5M + 1S for each point

batch2 --> (betax,y) 1M for each point

batch3 --> (beta^2*x,y) 1M for each point

batch_minus --> (x,-y) (betax,-y) (beta^2*x,-y) 0M and 0S

Total: about 1,1M for each point!
If you know the performance of the field multiplication in your C code, you can have an idea of the performance you could reach. How long it takes your C code to perform 16,4 M multiplications (operands: big numbers and multiplication mod p)?

In the next days I want to perform some tests about endomorphism, just to be sure that everything is ok (for example we'd like to avoid twice computation of the same key)

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

Unoptimized CPU/GPU hybrid generator. 1st successful run on 1 CPU core with Nvidia GPU in tandem: 1811207 keys/s

Code:

$ time hrd-core -I 0000000000000000000000000000000000000000000000000000000000000001 -c 10000
Num platforms: 2
Platform - 0
  1.1 CL_PLATFORM_NAME: Intel(R) OpenCL
  1.2 CL_PLATFORM_VENDOR: Intel(R) Corporation
  1.3 CL_PLATFORM_VERSION: OpenCL 2.0 
  1.4 CL_PLATFORM_PROFILE: FULL_PROFILE
  1.5 CL_PLATFORM_EXTENSIONS: cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
  Device - 0:
    CL_DEVICE_NAME: Intel(R) HD Graphics
    CL_DEVICE_VENDOR: Intel(R) Corporation
    CL_DRIVER_VERSION: r2.0.54425
    CL_DEVICE_VERSION: OpenCL 2.0 
    CL_DEVICE_MAX_COMPUTE_UNITS: 24
Platform - 1
  2.1 CL_PLATFORM_NAME: NVIDIA CUDA
  2.2 CL_PLATFORM_VENDOR: NVIDIA Corporation
  2.3 CL_PLATFORM_VERSION: OpenCL 1.2 CUDA 8.0.0
  2.4 CL_PLATFORM_PROFILE: FULL_PROFILE
  2.5 CL_PLATFORM_EXTENSIONS: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_khr_gl_event
  Device - 0:
    CL_DEVICE_NAME: Quadro M2000M
    CL_DEVICE_VENDOR: NVIDIA Corporation
    CL_DRIVER_VERSION: 375.26
    CL_DEVICE_VERSION: OpenCL 1.2 CUDA
    CL_DEVICE_MAX_COMPUTE_UNITS: 5
2d17543d32448acc7a1c43c5f72cd5be459ab302:u:priv:0000000000000000000000000000000000000000000000000000000000000001 + 0x5e
02e62151191a931d51cdc513a86d4bf5694f4e51:c:priv:0000000000000000000000000000000000000000000000000000000000000001 + 0x65
9d74ffdb31068ca2a1feb8e34830635c0647d714:u:priv:00000000000000000000000000000000000000000000000000000000000f9001 + 0xf8c
3d6871076780446bd46fc564b0c443e1fd415beb:c:priv:00000000000000000000000000000000000000000000000000000000000f9001 + 0xf8c
response: 30-19-0

real    0m9.263s
user    0m8.117s
sys     0m1.097s

Rico

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

New BLF file on FTP
New LBC client version (1.010) available

./LBC -u is your friend.

As mentioned in #433, you can now attach a BTC address with your id for rewards to your client.
As mentioned in #436, you can now call the LBC client with a --gpu parameter. The best case scenario you will see is currently this:

Code:

$ ./LBC --gpu
OpenCL diagnostics written.
GPU authorized: yes

If you see this, you're on the highway to a GPU accelerated client. If you see instead this:

Code:

Perl module 'OpenCL' not found - please make sure:
 * OpenCL is installed correctly on your system
 * then install the Perl OpenCL module via CPAN
   (cpan install OpenCL)

you want to make sure OpenCL is installed correctly on your system. Some pointers to do so:
https://wiki.tiker.net/OpenCLHowTo
http://askubuntu.com/questions/796770/how-to-install-libopencl-so-on-ubuntu

Won't work in a VM. At least not without advanced magic. If oclvanitygen runs on your system, you're fine. The only thing left to do is to install the Perl bindings for OpenCL:

https://metacpan.org/pod/OpenCL

For this, it's the usual:

Code:

$ cpan
cpan> install OpenCL

or - in one batch:

$ cpan install OpenCL

The message "OpenCL diagnostics written" indicates you will see a file diagnostics-OpenCL.txt in your directory. Please do not post its output here as it is quite extensive. Either pastebin it and post the link here, or send its content to [email protected]. (If there are any problems, or if you want to make sure your config is supported).

Well, and if you see a

Code:

GPU authorized: no

instead and you would want to change that - you want to be in the top30 or will have to fork out 0.1 BTC

OpenCL generator ETA: "really soon now(tm)"

Rico

edit:

Short HowTo install LBC @ AWS Ubuntu instance including OpenCL

Code:

# $ is shell/bash
# cpan> is cpan shell


$ sudo apt-get update
$ sudo apt-get install gcc xdelta3 make
$ sudo apt-get install nvidia-opencl-dev nvidia-opencl-icd-367 nvidia-modprobe clinfo
$ clinfo
$ sudo cpan
cpan> install JSON OpenCL

$ mkdir collider; cd collider; tmux
$ wget ftp://ftp.cryptoguru.org/LBC/client/LBC
$ chmod a+x LBC
$ ./LBC -h

arulbero

legendary

Activity: 1948

Merit: 2097

Imagine you want to generate a batch from 10000 to 14096 (the script actually generates batches of 4097 points)

First you generate the key k = 12048 (always we start with the middle point, to exploit the symmetry), this is the only point (a pivot point) of the batch that we get with the slower function mult

Code:

... k ...  <-- one batch, only one key k
	
	jkx,jky,jkz = mul(k,Gx,Gy,1)
	invjkz = inv(jkz,p)	
	(kx,ky) = jac_to_aff(jkx, jky, jkz, invjkz)

k can be any number greater than 2048 (otherwise, if k=3 for example, kG+3G gives a error because you are trying to use the addition formula instead of the double...) The first batch you can create with this script goes from 1 to 4097, the start key in that case would be k=2049.

Then the script generates three batches, each batch has 1 point + 2048 couple of points:

first batch: this is the batch you are more interested of, because it has 4097 points in your range, including the point 12048G:

(12048),(12048+1,12048-1),(12048+2,12048-2),....,(12048+2048=14096,12048-2048=10000)

the script computes this batch with the function double_add_P_Q_inv

Element #0 of the list is always kG, element #1 is the couple kG+1G, kG-1G, #2 is the couple kG+2G, kG-2G, and so on ... --> #2048 is the couple kG+2048,kG-2048G

Code:

batch = batch + list(map(double_add_P_Q_inv,kxl[1:],kyl[1:],mGx[1:],mGy[1:],kminverse[1:]))

Batch 1 and 2: these keys are not in your range, here we use endomorphism:

batch1:
(12048*lambda), ((12048+1)*lambda,(12048-1)*lambda), ((12048+2)*lambda,(12048-2)*lambda), ...., (10000*lambda,14096*lambda)

batch2:
(12048*lambda^2), ((12048+1)*lambda^2, (12048-1)*lambda^2), ((12048+2)*lambda^2, (12048-2)*lambda^2), ...., (14096*lambda^2, 10000*lambda^2)

EDIT:

Quote from: rico666 on February 11, 2017, 03:27:52 PM

to make sure work still is distributable / parallelizable and the bookkeeping still being sane.

You don't worry about each key, in my opinion you have to store only a private key for 3 batches, you can think at the single key in the middle of the batch like a special seed. 99,9999% of the batches doesn't match any address with bitcoin, so when a match occurs only then you have to regenerate the entire 3 batches from this single seed to fetch the correct private key. Batch 1 and 2 are sequence of keys each different from each other, so you are sure that you are not wasting your computational efforts. I'm ~~almost~~ sure about the last sentence, there can't be more than three points with the same y, it is not possible checking the same key twice. Note that the 3 batches are related, they must be computed together.

Imagine you know that the pool has searched so far from key 1 to 2^50, then you know that the pool has searched keys 1*lambda, 2*lambda, 3*lambda ... to 2^50*lambda (mod n) too, and keys 1*lambda^2, 2*lambda^2, 3*lambda^2,... to 2^50*lambda^2 (mod n).

Quote from: rico666 on February 11, 2017, 03:27:52 PM

05 runs nearly 22 seconds for 16M keys on my notebook. This is now only 3.5 times slower than what LBC optimized C version needs for 16M keys. I don't dare to estimate what optimized C code can make of this.

Shocked

I dare: if you use complement too, you can generate 16M keys in less than half a second (with cpu, I don't know for GPU)

Quote from: arulbero on February 03, 2017, 02:44:44 PM

Considering that your current code performs 6M + 1S only for the transition from jacobian to affine coordinates for each point and that you are using J+J --> J to perform each addition (12M + 4S), your current cost should be 18M + 5S each point.

Let's say 1S = 0,8M, you have about 22M for point.

If you are now using instead J+A --> J to perform addition (8M + 3S), then you have about 17,2M for point.

My code uses 3,5M + 1S for each point of the first batch, and only 1M for each point of the other 2 batches.
So the average is: 5,5/3= 1,83M + 0,33S for point, let's say about 2,1M for point.

Now your speed is 16M/6s = 2,7 M/s for each cpu core.

If you could achieve a 8x - 10x improvement, let's say a 8x, so you could perform at least 21M/s. If you use (X,Y) --> (X,-Y) too, 42M/s. Let's say at least 40M k/s for each core, 15x respect of your actual speed.
With a 8-core cpu, you could generate more keys than your entire pool can handle at this moment.

Maybe tomorrow I'll add more comments on the code. Anyway read again this post, I edited it.

EDIT2:

this is a version with more comments:

https://www.dropbox.com/s/6o2az7n6x0luld4/ecc_for_collider06.zip?dl=0

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

Quote from: arulbero on February 11, 2017, 03:04:21 PM

Another update of the library with endomorphism:

https://www.dropbox.com/s/7v5i36n4k6d849b/ecc_for_collider05.zip?dl=0

05 runs nearly 22 seconds for 16M keys on my notebook. This is now only 3.5 times slower than what LBC optimized C version needs for 16M keys. I don't dare to estimate what optimized C code can make of this.

Shocked

Code:

real    0m21.790s
user    0m21.787s
sys     0m0.000s

Your code is a tremendous help, but it would speed up my understanding of the code and porting it to C (and possibly OpenCL) if there were more comments. e.g. you start with

Code:

start=2**55+789079076   #k is a random private key

but start cannot be smaller than 2049 else

Code:

$ time python ./gen_batch_points05.py 
Traceback (most recent call last):
  File "./gen_batch_points05.py", line 50, in 
    kminverse = [invjkz] + inv_batch(kx,mGx,p)
  File "/data/soft/lin/LBC/generator/HRD/arulbero-ECC/5/ecc_for_collider05.py", line 56, in inv_batch
    inverse=inv(partial[2048],p) # 1I
  File "/data/soft/lin/LBC/generator/HRD/arulbero-ECC/5/ecc_for_collider05.py", line 32, in inv
    q, r = divmod(v,u)
ZeroDivisionError: integer division or modulo by zero

It'd also help if you could lay out the effective sequence of private keys as it is computed, because if LBC should adopt this, I have to merge this - somehow - with the LBC interval arithmetics to make sure work still is distributable / parallelizable and the bookkeeping still being sane.

Rico

arulbero

legendary

Activity: 1948

Merit: 2097

Quote from: rico666 on February 10, 2017, 05:08:49 PM

I tried it. On my notebook it takes

Code:

real    0m26.493s
user    0m26.490s
sys     0m0.000s

for the ~4.1mio keys (1000 * 4096). And

Code:

real    1m47.661s
user    1m47.657s
sys     0m0.003s

for 16M keys. So around ~160 000 keys/s

Ok, I don't know how to use the lists on Python Grin

This version is faster (at least 50%) with only few modifications:

https://www.dropbox.com/s/wrbolxzbiu3y9su/ecc_for_collider04.zip?dl=0

Another update of the library with endomorphism:

https://www.dropbox.com/s/7v5i36n4k6d849b/ecc_for_collider05.zip?dl=0

becoin

legendary

Activity: 3431

Merit: 1233

Quote from: TooDumbForBitcoin on February 11, 2017, 05:36:23 AM

Quote from: rico666 on February 11, 2017, 03:07:14 AM

Good morning!

HeavenlyCreatures found #49

Code:

From	XXX
To	[email protected]
Date	Today 08:02
Hi,

I found #49

0d2f533966c6578e1111978ca698f8add7fffdf3:c:priv:000000000000000000000000000000000000000000000000000174176b015001
+ 0xf4c

Looking at the PK, the pool must have found it GMT: Sat, 11 Feb 2017 04:32:26 GMT

edit: trophies update.

cheers!

Rico

16 days, 16 days, 33 days, 67 days, ......

And all are with standard and negligible amount of bitcoins, right? Cheesy

I bet all they are from early days bitcoin conferences when participants were given QR code badges with naked privkeys. There are hundreds if not thousands of them. Don't waste your time with this 'project', just contact conference organizers and ask for the list with privkeys!

TooDumbForBitcoin

legendary

Activity: 1638

Merit: 1001

Quote from: rico666 on February 11, 2017, 03:07:14 AM

Good morning!

HeavenlyCreatures found #49

Code:

From	XXX
To	[email protected]
Date	Today 08:02
Hi,

I found #49

0d2f533966c6578e1111978ca698f8add7fffdf3:c:priv:000000000000000000000000000000000000000000000000000174176b015001
+ 0xf4c

Looking at the PK, the pool must have found it GMT: Sat, 11 Feb 2017 04:32:26 GMT

edit: trophies update.

cheers!

Rico

16 days, 16 days, 33 days, 67 days, ......

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

Good morning!

HeavenlyCreatures found #49

Code:

From	XXX
To	[email protected]
Date	Today 08:02
Hi,

I found #49

0d2f533966c6578e1111978ca698f8add7fffdf3:c:priv:000000000000000000000000000000000000000000000000000174176b015001
+ 0xf4c

Looking at the PK, the pool must have found it GMT: Sat, 11 Feb 2017 04:32:26 GMT

edit: trophies update.

cheers!

Rico

arulbero

legendary

Activity: 1948

Merit: 2097

Quote from: rico666 on February 10, 2017, 05:08:49 PM

I tried it. On my notebook it takes

Code:

real    0m26.493s
user    0m26.490s
sys     0m0.000s

for the ~4.1mio keys (1000 * 4096). And

Code:

real    1m47.661s
user    1m47.657s
sys     0m0.003s

for 16M keys. So around ~160 000 keys/s

However, the HRD core also does the hash160 of the uncompressed and also hash160 of the compressed public keys.
The public key generation part takes around 6 seconds for the 16M keys. If I understand your code correctly, it also does only the ECC computation part. Then we are at around 6 seconds versus 107 seconds. Not sure how much slower Python vs. C is

http://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=python3&lang2=gcc

suggests it can be anything between a factor of 3 and 100 Wink

Factor 100 would be cool, then your code should be somewhere ~1s in C ... we'll see. It certainly seems more lightweight than what we have now.

So in your case 107 / 6 means about 18x slower, I think it is not bad at all.

I'm sure that your optimizations with field_5x52_int128_impl.h can make a huge difference. The field operations make the difference. We'll see.

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

Quote from: arulbero on February 10, 2017, 10:48:36 AM

EDIT:
with this script https://www.dropbox.com/s/4q3i0mqvdfgd258/ecc_for_collider03.zip?dl=0
I can generate about 4.1 millions of keys in 51 s, about 80000 keys/s

Code:

antonio@ubuntu:~/src/python$ time python3 ./gen_batch_points03.py 

real	0m52.423s
user	0m51.992s
sys	0m0.040s

With your code my old pc generates about 530000 keys/s (200000 keys/s --> 200000 * 22/6 = 530000), my script then is only 6,5x slower than your code! No assembly, no C!

I tried it. On my notebook it takes

Code:

real    0m26.493s
user    0m26.490s
sys     0m0.000s

for the ~4.1mio keys (1000 * 4096). And

Code:

real    1m47.661s
user    1m47.657s
sys     0m0.003s

for 16M keys. So around ~160 000 keys/s

However, the HRD core also does the hash160 of the uncompressed and also hash160 of the compressed public keys.
The public key generation part takes around 6 seconds for the 16M keys. If I understand your code correctly, it also does only the ECC computation part. Then we are at around 6 seconds versus 107 seconds. Not sure how much slower Python vs. C is

http://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=python3&lang2=gcc

suggests it can be anything between a factor of 3 and 100 Wink

Factor 100 would be cool, then your code should be somewhere ~1s in C ... we'll see. It certainly seems more lightweight than what we have now.

Rico

Syarina

newbie

Activity: 1

Merit: 0

1GmjDxtKXF59MZiqmGR3YjLyUyttbC3WmJ

TooDumbForBitcoin

legendary

Activity: 1638

Merit: 1001

Quote from: rico666 on February 10, 2017, 09:55:46 AM

Also, the price for becoin is 0.5 BTC.

I had a boss pull this on me 30 years ago in front of a crowd of people. It was very effective in shutting my smartass mouth.

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

Also in the next LBC release, there will be a tiny small inconspicuous parameter

Code:

    --gpu
      Enable GPU acceleration if available.

When given, LBC will check your OpenCL installation and print out some debug information about your GPU hardware. This will show you if your hardware is capable to use GPU acceleration in principle. If you want to make sure your hardware is supported well/in the optimum way, please send the output of this to [email protected] for further examination.

Also it will check if the id of the LBC client is eligible to run GPU acceleration and print the result of the check.

The output will look like this:

Code:

$VAR1 = {
          'Intel(R) OpenCL' => [
                                 {
                                   'vector_width' => {
                                                       'preferred' => {
                                                                        'long' => 1,
                                                                        'half' => 8,
                                                                        'char' => 16,
...
                                   'max' => {
                                              'clock_frequency' => 1050,
                                              'work' => {
                                                          'item_dimensions' => 3,
                                                          'item_sizes' => [
                                                                            256,
                                                                            256,
                                                                            256
                                                                          ],
                                                          'group_size' => 256
                                                        },
                                              'compute_units' => 24,
...
                                   'mem' => {
                                              'global' => {
                                                            'cache_type' => 2,
                                                            'cacheline_size' => 64,
                                                            'size' => '13344676250',
                                                            'cache_size' => 524288
                                                          },
                                              'max_alloc_size' => 6672338125,
                                              'host_unified' => 1
                                            },
                                   'device' => {
                                                 'name' => 'Intel(R) HD Graphics',
                                                 'driver_version' => 'r2.0.54425',
                                                 'version' => 'OpenCL 2.0 ',
                                                 'vendor' => 'Intel(R) Corporation',
...
                                   'device' => {
                                                 'name' => 'Intel(R) Xeon(R) CPU E3-1505M v5 @ 2.80GHz',
                                                 'driver_version' => '1.2.0.10264',
                                                 'version' => 'OpenCL 2.0 (Build 10264)',
                                                 'vendor' => 'Intel(R) Corporation',
                                                 'profile' => 'FULL_PROFILE',
                                                 'available' => 1,
                                                 'endian_little' => 1,
                                                 'compiler_available' => 1,
                                                 'local_mem' => 32768,
                                                 'type' => 2,
                                                 'address_bits' => 64,
                                                 'extensions' => 'cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer '
                                               },
                                   'mem' => {
                                              'global' => {
                                                            'cache_type' => 2,
                                                            'size' => '16691331072',
                                                            'cacheline_size' => 64,
                                                            'cache_size' => 262144
                                                          },
                                              'host_unified' => 1,
                                              'max_alloc_size' => 4172832768
...
          'NVIDIA CUDA' => [
                             {
                               'mem' => {
                                          'host_unified' => '',
                                          'max_alloc_size' => 1059487744,
                                          'global' => {
                                                        'cache_type' => 2,
                                                        'cacheline_size' => 128,
                                                        'size' => 4237950976,
                                                        'cache_size' => 81920
                                                      }
                                        },
                               'device' => {
                                             'name' => 'Quadro M2000M',
                                             'version' => 'OpenCL 1.2 CUDA',
                                             'driver_version' => '375.26',
                                             'compiler_available' => 1,
....

etc. etc. - above example is very shortened. See ftp://ftp.cryptoguru.org/LBC/source/opencl.txt to get an idea of the full output. This is what my notebook looks like. There are 2 OpenCL platforms found (Intel & Nvidia), Intel has 2 devices, the built-in GPU "Intel HD Graphics" and the Xeon CPU, and Nvidia has 1 device: The Quadro M2000M.

We have been talking about "GPU client" in the past, but it's really an OpenCL client we meant.

Rico

arulbero

legendary

Activity: 1948

Merit: 2097

Quote from: rico666 on January 30, 2017, 04:40:55 PM

2. Outside the Box

2. Outside the Box

IMO, for any substantial optimizations, it is required to think outside the box. The box here being the libsecp256k1 library. This library provides us with an API - a set of functions - which is functionally complete, but may sometimes be obstructive for certain tasks. If you look at the use case from above, it would certainly be nice if we had a function that could efficiently sum up affine points into jacobian.

Thinking outside the box, why do we need a function A + A -> J ?

We have to add only affine points, and we have never to add the result of an addition with other points.

So we can get rid of the jacobian coordinates. In this way we achieve only 3,5M + 1S for each point, if we use the classical (simple and more efficient) A+A-> A formula!

Here is my script:

https://www.dropbox.com/s/kr10oil2idlhd80/ecc_for_collider02.zip?dl=0 it works with python3 too Wink

1) kG
2) all the inverse of the batch (3M for couple, 1,5M for each point)
3) 2048 couples (2M + 1S with A+A->A for each point)

total 3,5M + 1S for each point. Very very fast.

EDIT:
with this script https://www.dropbox.com/s/4q3i0mqvdfgd258/ecc_for_collider03.zip?dl=0
I can generate about 4.1 millions of keys in 51 s, about 80000 keys/s

Code:

antonio@ubuntu:~/src/python$ time python3 ./gen_batch_points03.py 

real	0m52.423s
user	0m51.992s
sys	0m0.040s

With your code my old pc generates about 530000 keys/s (200000 keys/s --> 200000 * 22/6 = 530000), my script then is only 6,5x slower than your code! No assembly, no C!

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

I modified the stats slightly:

The pool performance shown is a 24 average and as such is equivalent to the last point (0 on x-axis) shown in the pool performance graph.
For #51 you can now see a "from to" estimate. The previous value was just the (pessimistic) "max" value assuming #51 would be a private key 2⁵¹-1

Based on the 2nd you can see the time - at most - it will take to hit #50. (currently 51 days)
Estimate for #49 is left as an exercise to the reader.

Rico

rico666

legendary

Activity: 1120

Merit: 1037

฿ → ∞

When showing the help via ./LBC -h the 1st option you saw was

Code:

     --address 
      Give a BTC address for rewards to this client. NYI

The NYI (not yet implemented) will be gone with the next version of the LBC client and also the type of the BTC address is not constrained anymore (you can give a P2SH as well). I intend to release the next LBC client this weekend and yours will most probably auto-update - so leave the download section alone for now

.

Code:

$ ./LBC -h

         LBC - Large Bitcoin Collider v. 1.005
    client fingerprint: 8eb48dd9fd0a77b8b4110be83baf04c2

 Usage:
    LBC [options]

 Options:
    --address 
      Give a BTC address for rewards to this client.

(remark: The to-be-released-client will probably have a bigger version number than 1.005)

So how does setting a BTC address look like and what will it do?

You can set a BTC address for the id you own any time/as often as you want, but only the last value is valid. You can check the BTC address set via query (-q). And you can do both in one step:

Code:

$ LBC -q -a 1DoofusZqKv2wDhUAvKRjUPVQsyZw3NKCw
Server answer to 'query' is:
{
"done" : 18075470,
"btcadr" : "1DoofusZqKv2wDhUAvKRjUPVQsyZw3NKCw",
"ips" : {
...
},
"lastsee" : ...
}
'done' means we have delivered  18953.504 valid Gkeys.

From then on, the given BTC address is attached to your id and whatever reward - relevant to your client - the pool will issue, will be paid to this BTC address (or whichever is active at time of payout - as you can change them as often as you want).

Wait - what reward?!

When the pool started, i planted some small bounties in the search space for any client to find. Some may remember 1AKKm1J8hZ9HjNqjknSCAfkLR4GgvCAPjq and 1TinnSyfYkFG8KC3gZ72KpYxBXsxSadD8.

While this is some fun and I plan doing so again, it's more of a "the winner takes it all" game and that's not what pools are about - is it?
So there will be another incentive, but this time - with a BTC address attached to an id - the pool can reward clients proportionately.

Let's say - and this is just a thought experiment / example - I put out a 1 BTC reward to be distributed among all ids in the top30 (except __rico666__ - of course) proportionately to their delivered GKeys, with this in place I can do so.

Or let's say the pool does find something and no one claims ownership - we can use this for distribution of the find.

Or let's say the pool does find something, the rightful owner does claim (and prove) ownership and we can distribute a finders fee among the clients... or ... or ...

Some time ago, I promised to make sure people who supported the pool early on will have favors. The 1st such favor will be, that any id in the top30 will get a GPU client for free (i.e. as many GPU client instances as you want/need - for your id), everyone else wanting the GPU client will fork out 0.1 BTC Tongue

I reserve the right to give out GPU freebies to special supporters - like arulbero - should they be interested. Also, the price for becoin is 0.5 BTC.

The 0.1BTC for any GPU client will not be paid to yours truly (aka me), but - see above - proportionately to all pool members except myself. Specs of the GPU client will follow soon.

So maybe it is time to have a look at the cost-benefit evaluation of being in the top30 of the LBC:

The cheapest price/performance AWS compute nodes will allow you to compute/check 65 Gkeys for 40 us cent. If I look at some examples of the current top 30:

Code:

Rank	Client Id           #GKeys	Activity
1	Unknownhostname	                       101313	3m 6s
2	a01f54a6df31b6d9075a99507b7c4a27	48206	5m 25s
3	Brother_of_Castor	                44736	3m 54s
4	HeavenlyCreatures	                37771	1m 16s
...
7	JudeAustin	                        17611	4m 15s
...
13	John_Snow	                         6238	2h 15m 12s
...
30	a14f027942e6f0507083a3d4f2ae376f	 1866	13d 21h 14m 56s

based on the AWS price, #30 is worth some $11, John_Snow delivered keys worth about $38, Unknownhostname about $623(!) etc.
So right now, if you are not in the top30 and want a GPU client, it's way cheaper to deliver some Gkeys than 0.1BTC
Once you are in top30 and you have a GPU client, you will be part of the LBC establishment for quite some time to come.

If you are not in the top30, but have delivered quite some Gkeys under different ids (different computers, hardware changed, etc.), I repeat my offer to merge your contributions to one id.

Right now, the vast majority of the pool performance comes from Unknownhostname and HeavenlyCreatures. That is two guys (or groups - but I believe guys), delivering over 80% of the pool performance. Most of the ids in the top30 are dormant as you can see. The entry barrier is f*ing low, so carpe diem.

Rico

Topic: Large Bitcoin Collider (Collision Finders Pool) - page 38. (Read 193484 times)