Pages:
Author

Topic: the fastest possible way to mass-generate addresses in Python - page 2. (Read 1349 times)

hero member
Activity: 630
Merit: 731
Bitcoin g33k
@yoshimitsu777:

try this:

- rename main.cpp to main.cpp.bak
- rename gen.cpp to main.cpp
- delete existing ./VanitySearch
- run "make"

then just execute ./VanitySearch and you got this program executed. Hope this helps.

@AlexanderCurl: did you manage to run it on CUDA just like VanitySearch is being capable of ?
newbie
Activity: 72
Merit: 0
C++ code(.cpp) is compiled with g++.
gcc is used for C(.c)
you have just use "make" command to compile.

look. you showed this program which seems to be a modified vanity search that will generate random addresses - correct?

And according to my tests pure VanitySearch code base for CPU is two times faster than used through python.
Will try it out later but my guess is that VanitySearch performance for CPU can be optimized even further by placing most used functions local scope vars to global.
Code:
#include "secp256k1/SECP256k1.h"
#include "secp256k1/Int.h"
#include
#include

int main() {
    
    Secp256K1 *secp256k1 = new Secp256K1();
    secp256k1->Init();
    Int privKey;
    privKey.SetBase10("1");
    Point pub;
    std::string bitAddr;
    std::ofstream outFile;
    outFile.open("address.txt", std::ios::app);
    for(int i = 0; i < 1000000; i++) {
        pub = secp256k1->ComputePublicKey(&privKey);
        bitAddr = secp256k1->GetAddress(0, false, pub);
        outFile << bitAddr << '\n';
        privKey.AddOne();
    }
    outFile.close();
    return 0;
}

i am trying to execute this program to see the speed of this.
but i cannot run this code.
what i tried so far:

i downloaded from your repository bitcoin_tools and change directory to VanitySearch_Linux
then i create gen.cpp with your code you showed
then i run "make" and i get no error
i see there is a executable called "VanitySearch" but i do not see an executable gen that was compiled from gen.cpp
i tried
Code:
$ g++ gen.cpp
gen.cpp: In function ‘int main()’:
gen.cpp:11:23: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
   11 |     privKey.SetBase10("1");
      |                       ^~~
/usr/bin/ld: /tmp/ccVxtbUk.o: in function `main':
gen.cpp:(.text+0x32): undefined reference to `Secp256K1::Secp256K1()'
/usr/bin/ld: gen.cpp:(.text+0x48): undefined reference to `Secp256K1::Init()'
/usr/bin/ld: gen.cpp:(.text+0x57): undefined reference to `Int::Int()'
/usr/bin/ld: gen.cpp:(.text+0x70): undefined reference to `Int::SetBase10(char*)'
/usr/bin/ld: gen.cpp:(.text+0x7f): undefined reference to `Point::Point()'
/usr/bin/ld: gen.cpp:(.text+0xea): undefined reference to `Secp256K1::ComputePublicKey(Int*)'
/usr/bin/ld: gen.cpp:(.text+0x1cb): undefined reference to `Point::~Point()'
/usr/bin/ld: gen.cpp:(.text+0x1f5): undefined reference to `Secp256K1::GetAddress[abi:cxx11](int, bool, Point&)'
/usr/bin/ld: gen.cpp:(.text+0x252): undefined reference to `Int::AddOne()'
/usr/bin/ld: gen.cpp:(.text+0x2aa): undefined reference to `Point::~Point()'
/usr/bin/ld: gen.cpp:(.text+0x319): undefined reference to `Point::~Point()'
collect2: error: ld returned 1 exit status
how should i compile gen.cc to run your suggested program ?
newbie
Activity: 72
Merit: 0
There is Makefile to compile. All build commands are there. Just cd to the directory with code and use make in terminal.
$ make
$ ./VanitySearch

i did so.

see here file structure
Quote
$ ls -lh

drwxrwx--- ./
drwxrwx--- ../
drwxrwx--- base58/
-rw-rw---- Base58.o
drwxrwx--- bech32/
-rw-rw---- Bech32.o
-rw-rw---- gen.cpp
drwxrwx--- hash/
-rw-rw---- IntGroup.o
-rw-rw---- IntMod.o
-rw-rw---- Int.o
-rw-rw---- main.cpp
-rw-rw---- main.o
-rw-rw---- Makefile
-rw-rw---- Point.o
-rw-rw---- Random.o
drwxrwx--- secp256k1/
-rw-rw---- SECP256K1.o
-rwxrwx--- VanitySearch*

$ ls secp256k1/

Int.cpp       IntGroup.h  IntMod.cpp  Point.h     Random.h       SECP256k1.h  Timer.h
IntGroup.cpp  Int.h       Point.cpp   Random.cpp  SECP256K1.cpp  Timer.cpp

cat gen.cpp

Quote
#include "secp256k1/SECP256k1.h"
#include "secp256k1/Int.h"
#include
#include

int main() {
   
    Secp256K1* secp256k1 = new Secp256K1();
    secp256k1->Init();
    Int privKey;
    privKey.SetBase10("1");
    Point pub;
    std::string bitAddr;
    std::ofstream outFile;
    outFile.open("address.txt", std::ios::app);
    for(int i = 0; i < 1000000; i++) {
        pub = secp256k1->ComputePublicKey(&privKey);
        bitAddr = secp256k1->GetAddress(0, false, pub);
        outFile << bitAddr << '\n';
        privKey.AddOne();
    }
    outFile.close();
    return 0;
}

as soon as i try to compile with command
Code:
$ gcc -o gen_cpp gen.cpp 

i get this error
Quote
gen.cpp: In function ‘int main()’:
gen.cpp:11:23: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
   11 |     privKey.SetBase10("1");
      |                       ^~~
/usr/bin/ld: /tmp/cc3i7Un9.o: in function `main':
gen.cpp:(.text+0x27): undefined reference to `operator new(unsigned long)'
/usr/bin/ld: gen.cpp:(.text+0x32): undefined reference to `Secp256K1::Secp256K1()'
/usr/bin/ld: gen.cpp:(.text+0x48): undefined reference to `Secp256K1::Init()'
/usr/bin/ld: gen.cpp:(.text+0x57): undefined reference to `Int::Int()'
/usr/bin/ld: gen.cpp:(.text+0x70): undefined reference to `Int::SetBase10(char*)'
/usr/bin/ld: gen.cpp:(.text+0x7f): undefined reference to `Point::Point()'
/usr/bin/ld: gen.cpp:(.text+0x8e): undefined reference to `std::__cxx11::basic_string, std::allocator >::basic_string()'
/usr/bin/ld: gen.cpp:(.text+0x9d): undefined reference to `std::basic_ofstream >::basic_ofstream()'
/usr/bin/ld: gen.cpp:(.text+0xbb): undefined reference to `std::basic_ofstream >::open(char const*, std::_Ios_Openmode)'
/usr/bin/ld: gen.cpp:(.text+0xea): undefined reference to `Secp256K1::ComputePublicKey(Int*)'
/usr/bin/ld: gen.cpp:(.text+0x1cb): undefined reference to `Point::~Point()'
/usr/bin/ld: gen.cpp:(.text+0x1f5): undefined reference to `Secp256K1::GetAddress[abi:cxx11](int, bool, Point&)'
/usr/bin/ld: gen.cpp:(.text+0x20e): undefined reference to `std::__cxx11::basic_string, std::allocator >::operator=(std::__cxx11::basic_string, std::allocator >&&)'
/usr/bin/ld: gen.cpp:(.text+0x21d): undefined reference to `std::__cxx11::basic_string, std::allocator >::~basic_string()'
/usr/bin/ld: gen.cpp:(.text+0x236): undefined reference to `std::basic_ostream >& std::operator<< , std::allocator >(std::basic_ostream >&, std::__cxx11::basic_string, std::allocator > const&)'
/usr/bin/ld: gen.cpp:(.text+0x243): undefined reference to `std::basic_ostream >& std::operator<< >(std::basic_ostream >&, char)'
/usr/bin/ld: gen.cpp:(.text+0x252): undefined reference to `Int::AddOne()'
/usr/bin/ld: gen.cpp:(.text+0x278): undefined reference to `std::basic_ofstream >::close()'
/usr/bin/ld: gen.cpp:(.text+0x28c): undefined reference to `std::basic_ofstream >::~basic_ofstream()'
/usr/bin/ld: gen.cpp:(.text+0x29b): undefined reference to `std::__cxx11::basic_string, std::allocator >::~basic_string()'
/usr/bin/ld: gen.cpp:(.text+0x2aa): undefined reference to `Point::~Point()'
/usr/bin/ld: gen.cpp:(.text+0x2d1): undefined reference to `operator delete(void*, unsigned long)'
/usr/bin/ld: gen.cpp:(.text+0x2f2): undefined reference to `std::basic_ofstream >::~basic_ofstream()'
/usr/bin/ld: gen.cpp:(.text+0x30a): undefined reference to `std::__cxx11::basic_string, std::allocator >::~basic_string()'
/usr/bin/ld: gen.cpp:(.text+0x319): undefined reference to `Point::~Point()'
/usr/bin/ld: /tmp/cc3i7Un9.o: in function `__static_initialization_and_destruction_0(int, int)':
gen.cpp:(.text+0x365): undefined reference to `std::ios_base::Init::Init()'
/usr/bin/ld: gen.cpp:(.text+0x380): undefined reference to `std::ios_base::Init::~Init()'
/usr/bin/ld: /tmp/cc3i7Un9.o:(.data.rel.local.DW.ref.__gxx_personality_v0[DW.ref.__gxx_personality_v0]+0x0): undefined reference to `__gxx_personality_v0'
collect2: error: ld returned 1 exit status
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
@AlexanderCurl, regarding sequential key generation:

Have you tried adding a large prime number, like 0xdeadbeef (just an example, I don't even know if that's prime)?

Sure, you'll have to check for overflow more frequently - fortunately, that's just a matter of doing a Greater-Than comparison followed by a subtraction - but a sufficiently large increment should make the keys look pseudorandom as far as the bits are concerned.
newbie
Activity: 72
Merit: 0
project structure: https://github.com/AlexCurl/bitcoin_tools/tree/main/JeanLucPons for Windows, Linux and Windows Cygwin64.

now i used the secp256k1 from the linux folder you suggested.
i get this error:
Code:
$ gcc -o test test.cpp 
test.cpp: In function ‘int main()’:
test.cpp:11:23: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
   11 |     privKey.SetBase10("1");
      |                       ^~~
/usr/bin/ld: /tmp/ccU4lCiy.o: in function `main':
test.cpp:(.text+0x27): undefined reference to `operator new(unsigned long)'
/usr/bin/ld: test.cpp:(.text+0x32): undefined reference to `Secp256K1::Secp256K1()'
/usr/bin/ld: test.cpp:(.text+0x48): undefined reference to `Secp256K1::Init()'
/usr/bin/ld: test.cpp:(.text+0x57): undefined reference to `Int::Int()'
/usr/bin/ld: test.cpp:(.text+0x70): undefined reference to `Int::SetBase10(char*)'
/usr/bin/ld: test.cpp:(.text+0x7f): undefined reference to `Point::Point()'
/usr/bin/ld: test.cpp:(.text+0x8e): undefined reference to `std::__cxx11::basic_string, std::allocator >::basic_string()'
/usr/bin/ld: test.cpp:(.text+0x9d): undefined reference to `std::basic_ofstream >::basic_ofstream()'
/usr/bin/ld: test.cpp:(.text+0xbb): undefined reference to `std::basic_ofstream >::open(char const*, std::_Ios_Openmode)'
/usr/bin/ld: test.cpp:(.text+0xea): undefined reference to `Secp256K1::ComputePublicKey(Int*)'
/usr/bin/ld: test.cpp:(.text+0x1cb): undefined reference to `Point::~Point()'
/usr/bin/ld: test.cpp:(.text+0x1f5): undefined reference to `Secp256K1::GetAddress[abi:cxx11](int, bool, Point&)'
/usr/bin/ld: test.cpp:(.text+0x20e): undefined reference to `std::__cxx11::basic_string, std::allocator >::operator=(std::__cxx11::basic_string, std::allocator >&&)'
/usr/bin/ld: test.cpp:(.text+0x21d): undefined reference to `std::__cxx11::basic_string, std::allocator >::~basic_string()'
/usr/bin/ld: test.cpp:(.text+0x236): undefined reference to `std::basic_ostream >& std::operator<< , std::allocator >(std::basic_ostream >&, std::__cxx11::basic_string, std::allocator > const&)'
/usr/bin/ld: test.cpp:(.text+0x243): undefined reference to `std::basic_ostream >& std::operator<< >(std::basic_ostream >&, char)'
/usr/bin/ld: test.cpp:(.text+0x252): undefined reference to `Int::AddOne()'
/usr/bin/ld: test.cpp:(.text+0x278): undefined reference to `std::basic_ofstream >::close()'
/usr/bin/ld: test.cpp:(.text+0x28c): undefined reference to `std::basic_ofstream >::~basic_ofstream()'
/usr/bin/ld: test.cpp:(.text+0x29b): undefined reference to `std::__cxx11::basic_string, std::allocator >::~basic_string()'
/usr/bin/ld: test.cpp:(.text+0x2aa): undefined reference to `Point::~Point()'
/usr/bin/ld: test.cpp:(.text+0x2d1): undefined reference to `operator delete(void*, unsigned long)'
/usr/bin/ld: test.cpp:(.text+0x2f2): undefined reference to `std::basic_ofstream >::~basic_ofstream()'
/usr/bin/ld: test.cpp:(.text+0x30a): undefined reference to `std::__cxx11::basic_string, std::allocator >::~basic_string()'
/usr/bin/ld: test.cpp:(.text+0x319): undefined reference to `Point::~Point()'
/usr/bin/ld: /tmp/ccU4lCiy.o: in function `__static_initialization_and_destruction_0(int, int)':
test.cpp:(.text+0x365): undefined reference to `std::ios_base::Init::Init()'
/usr/bin/ld: test.cpp:(.text+0x380): undefined reference to `std::ios_base::Init::~Init()'
/usr/bin/ld: /tmp/ccU4lCiy.o:(.data.rel.local.DW.ref.__gxx_personality_v0[DW.ref.__gxx_personality_v0]+0x0): undefined reference to `__gxx_personality_v0'
collect2: error: ld returned 1 exit status
newbie
Activity: 72
Merit: 0
this seems to need secp256k1 library.
i took from albertobsd keyhunt this folder secp256k1 but when trying to compile i get this error

Code:
$ gcc -o test test.cpp 

test.cpp: In function ‘int main()’:
test.cpp:18:30: error: ‘class Secp256K1’ has no member named ‘GetAddress’
   18 |         bitAddr = secp256k1->GetAddress(0, false, pub);
      |                              ^~~~~~~~~~

Code:
$ ls -l

drwx------ secp256k1
-rw-rw---- test.cpp

$ ls -l secp256k1/

-rw------- Int.cpp
-rw------- IntGroup.cpp
-rw------- IntGroup.h
-rw------- Int.h
-rw------- IntMod.cpp
-rw------- Point.cpp
-rw------- Point.h
-rw------- Random.cpp
-rw------- Random.h
-rw------- SECP256K1.cpp
-rw------- SECP256k1.h

hero member
Activity: 630
Merit: 731
Bitcoin g33k
but now you are comparing a library written in C against pure python, it seems not fair Smiley
The statement is justified. I didn't know until now that the iceland2k14/secp256k1 library is written in C++ and only imported into Python. I understand that the huge speed advantage of this library is due to the native implementation. Despite the justified criticism regarding security (iceland2k14/secp256k1 is closed-source because the source code is not publicly known), the library is nevertheless widely used by various developers and tools with success and satisfaction. Nevertheless, I will include this note in my original post, as I consider it important.

I'm pretty sure that more c functions you introduce in your python script, more fast it becomes.
This is understandable. Nevertheless, I am very curious and interested in running the so far simplistic and demonstrated code in the GPU using PyCuda or similar. I suspect that PyCuda could give very good results after all what I have read and understood so far on the PyCuda project website. So I'm still asking for helpful ideas and code suggestions to achieve the goal via GPU using PyCuda or similar.

And how can we measure speed. We all have different hardware. And Linux will be faster than Windows.
That's right. It would be fatal to compare my results with arulbero's, because we simply use too different hardware. A comparison in this context is only valid with significance if everyone compares between different program versions on its own hardware. As you can clearly see from the results posted so far, arulbero uses faster hardware than I do, for example. But that's not a problem, you just have to be careful to compare your own results with each other on your own rig and not with the results of other users. The posted results are just to illustrate what differences the different programs provide.

I see that you're using gen_private_key. This generates arbitrary secure random private keys, using urandom. urandom might be slow depending on your system - do you really need it? I assume whatever you're doing (brute force search?) will generate the private keys in some other fashion, so you can just set prvkey_dec to an integer representing the private key directly.
That's a good point as well, which I unfortunately overlooked and only became aware of when re-reading the thread. The library I originally used, FASTECDSA, really seems to be too slow for the purpose intended here, as it is too complicated. Accordingly, I have started another optimization attempt. Thereby I don't use fastecdsa anymore for the randomly generated private keys, instead I use Python's secure library:
Code:
secrets.randbelow(2**256)
This resulted in a performance boost of over +32%

using fastecda, 1 thread ==> 29.681 sec
using secrets, 1 thread ==> 19.923 sec +32.9 %

using fastecdsa, 16 threads ==> 20.736 sec
using secrets, 16 threads ==> 13.612 sec +34.4 %

using fastecdsa, ProcessPoolExecutor, 1 core ==> 28.961 sec
using secrets, ProcessPoolExecutor, 1 core ==> 19.921 sec +31.3 %

using fastecdsa, ProcessPoolExecutor, 16 cores ==> 5.120 sec
using secrets, ProcessPoolExecutor, 16 cores ==> 3.744 sec +36.9 %

To achieve this further speed advantage, the following change is required:

- removing the fastecdsa library by removing those two lines from the top of the python code :
Quote
#!/usr/bin/env python3
import fastecdsa.keys as fkeys
import fastecdsa.curve as fcurve

- instead we import the Python library "secrets" :
Quote
import secrets

- we replace the randomly generated 256-bit private key command:
Quote
 # Generate a NumPy array of random private keys using fastecdsa
  private_keys = np.array([fkeys.gen_private_key(fcurve.P256) for _ in range(start, end)])


  # Generate a NumPy array of random private keys using "secrets" library
  private_keys = np.array([secrets.randbelow(2**256) for _ in range(start, end)])

here's the updated complete code variant with using multicore functionality and the secrets library for enhanded speed:

Code:
#!/usr/bin/env python3
# 2023/Jan/01, citb0in_multicore_secrets.py
import concurrent.futures
import os
import numpy as np
import secrets
import secp256k1 as ice

# how many cores to use
#num_cores = 1
num_cores = os.cpu_count()

# Set the number of addresses to generate
num_addresses = 1000000

# Define a worker function that generates a batch of addresses and returns them
def worker(start, end):
  # Generate a NumPy array of random private keys using "secrets" library
  private_keys = np.array([secrets.randbelow(2**256) for _ in range(start, end)])

  # Use secp256k1 to convert the private keys to addresses
  thread_addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])

  return thread_addresses

# Use a ProcessPoolExecutor to generate the addresses in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
  # Divide the addresses evenly among the available CPU cores
  addresses_per_core = num_addresses // num_cores

  # Submit a task for each batch of addresses to the executor
  tasks = []
  for i in range(num_cores):
    start = i * addresses_per_core
    end = (i+1) * addresses_per_core
    tasks.append(executor.submit(worker, start, end))

  # Wait for the tasks to complete and retrieve the results
  addresses = []
  for task in concurrent.futures.as_completed(tasks):
    addresses.extend(task.result())

# Write the addresses to a file
np.savetxt('addresses_1M_multicore_secrets.txt', addresses, fmt='%s')

legendary
Activity: 1932
Merit: 2077

==> I have generated 1,007,862 addresses (about 1 million keys) in about 15 seconds using one single core with your program.
==> I have generated exactly 1 millions addresses in 10 seconds using one single core with my program.


Conclusion:

The originally enormous speed advantage is therefore, according to these tests, not really to be found in the code difference but in the fact that randomly generated keys are simply more computationally intensive and cost more time than simple sequential key generation. So just as you originally said arulbero.

You said : "compare apples with apples"

but now you are comparing a library written in C against pure python, it seems not fair Smiley

For the record:

I wrote a library in C with a single function: from public keys to address (without b58 encoding)

Code:
Private key : 0000000000000000000000000000000000000000000000000000000000000001
Public key  :
x: 79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798
y: 483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8
 

PrKey WIF c.: KwDiBf89QgGbjEhKnhXJuH7LrciVrZi3qYjgd9M7rFU73sVHnoWn
Address before b58 encode  : 751e76e8199196d454941c45d1b3a323f1433bd6
Address   with b58 encode  : 1BgGZ9tcN4rm9KBzDn7KprQz87SZ26SAMH

I imported this function as file .so in my python script,

result:

Code:
1 core, 1M of addresses
time python3 gen_batches.py  > addresses.out

real 0m3,961s
user 0m3,910s
sys 0m0,055s


16 cores, 1M of addresses
time python3 gen_batches.py  > addresses.out

real 0m0,527s
user 0m7,404s
sys 0m0,151s



1 core, 16.4M of addresses
time python3 gen_batches.py  > addresses.out

real 1m1,757s
user 1m1,382s
sys 0m0,348s


16 cores, 16.4M of addresses
time python3 gen_batches.py  > addresses.out

real 0m6,672s
user 2m2,989s
sys 0m0,698s


less addresses.out

c71c6741ae4862dadaf2d9c9295d47410a27e194
1d9a3ef1efeec75b05f0ece73ad9402cac767843
44abfeb8701c716380ed2a40aeb72370ef61d7aa
d6a2aa6799f42078faa889dcda92b29c6005d84d
f3a0e804269feeba68d8c1facf68206b73b2a987
7a795a7b4500f80101e489e9ad5d7bc1855edf13
d8e619b13be4c2d6182aa10c0e7237a35ba0ab05
be801790f2cefb432bdf5dce2946fd22364b36b0
531a9c231f3e42d4bdb02d44bf01104e6eccd83d
7c8391a816b578f0d6e0a56595b67e12a314f945
6d0b359ba6f3d382eb99a346f99f0d27c29e2963
aa4116886a8631699f5077d78f47e6013a1af05f
2574970ffa9d5b26d52d89a76eef9c3c1bfbe798
...



then the speed is now about 2.4 MKeys/s.

I'm pretty sure that more c functions you introduce in your python script, more fast it becomes.
hero member
Activity: 630
Merit: 731
Bitcoin g33k
Thank you arulbero for your efforts. Very interesting. I was curious to see the comparison of random_generated_addresses and sequentially_generated_address as you showed with your code.

Following was my python program that used 16 threads and multithreading, but without / before the implementation of the ProcessPoolExecutor

Code:
#!/usr/bin/env python3
# 2022/Dec/26, [b]citb0in_multithreaded.py[/b]
import threading
import numpy as np
import fastecdsa.keys as fkeys
import fastecdsa.curve as fcurve
import secp256k1 as ice

# Set the number of addresses to generate and the number of threads to use
num_addresses = 1000000
num_threads = 16

# Divide the addresses evenly among the threads
addresses_per_thread = num_addresses // num_threads

# Create a list to store the generated addresses
addresses = []

# Define a worker function that generates a batch of addresses and stores them in the shared list
def worker(start, end):
  # Generate a NumPy array of random private keys using fastecdsa
  private_keys = np.array([fkeys.gen_private_key(fcurve.P256) for _ in range(start, end)])

  # Use secp256k1 to convert the private keys to addresses
  thread_addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])

  # Add the addresses to the shared list
  addresses.extend(thread_addresses)

# Create a list of threads
threads = []

# Create and start the threads
for i in range(num_threads):
  start = i * addresses_per_thread
  end = (i+1) * addresses_per_thread
  t = threading.Thread(target=worker, args=(start, end))
  threads.append(t)
  t.start()

# Wait for the threads to finish
for t in threads:
  t.join()

# Write the addresses to a file
np.savetxt('addresses_1M_multithreaded.txt', addresses, fmt='%s')

Code:
$ time python3 citb0in_multithreaded.py 
Quote
real   0m19,759s
user   0m36,834s
sys   0m6,259s

==> It took less than 20 seconds to generate 1 million addresses using 16 threads on my system.

Then I modified that code to use only one single thread:
Code:
#!/usr/bin/env python3
# 2022/Dec/30, [b]citb0in_singlethreaded.py[/b]
[...]
num_threads = 1
[...]

Code:
$ time python3 citb0in_singlethreaded.py 
Quote
real   0m26,317s
user   0m26,309s
sys   0m1,568s

==> It took about 26 seconds to generate 1 million addresses using 16 threads on my system.

Then I modified the program to use ProcessPoolExecutor which distributes the load to all available cores and it's multithreaded. This gave me the best results:
Code:
#!/usr/bin/env python3
# 2022/Dec/30, [b]citb0in_multicore.py[/b]
import concurrent.futures
import os
import numpy as np
import fastecdsa.keys as fkeys
import fastecdsa.curve as fcurve
import secp256k1 as ice

# how many cores to use
#num_cores = 1
num_cores = os.cpu_count()

# Set the number of addresses to generate
num_addresses = 1000000

# Define a worker function that generates a batch of addresses and returns them
def worker(start, end):
  # Generate a NumPy array of random private keys using fastecdsa
  private_keys = np.array([fkeys.gen_private_key(fcurve.P256) for _ in range(start, end)])

  # Use secp256k1 to convert the private keys to addresses
  thread_addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])

  return thread_addresses

# Use a ProcessPoolExecutor to generate the addresses in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
  # Divide the addresses evenly among the available CPU cores
  addresses_per_core = num_addresses // num_cores

  # Submit a task for each batch of addresses to the executor
  tasks = []
  for i in range(num_cores):
    start = i * addresses_per_core
    end = (i+1) * addresses_per_core
    tasks.append(executor.submit(worker, start, end))

  # Wait for the tasks to complete and retrieve the results
  addresses = []
  for task in concurrent.futures.as_completed(tasks):
    addresses.extend(task.result())

# Write the addresses to a file
np.savetxt('addresses_1M_multicore.txt', addresses, fmt='%s')

Quote
real   0m4,785s
user   0m54,832s
sys   0m1,992s

==> It took less than 5 seconds to generate 1 million addresses using all available cores (=16) on my system.

Now I change the settings to use only one single core...

Code:
#!/usr/bin/env python3
# 2022/Dec/30, [b]citb0in_singlecore.py[/b]
[...]
num_cores = 1
#num_cores = os.cpu_count()
[...]
np.savetxt('addresses_1M_singlecore.txt', addresses, fmt='%s')

Quote
real   0m26,271s
user   0m26,077s
sys   0m1,618s

==> It takes about 26 seconds to generate 1 million addresses using one single core on my system.

Well, so far so good. I showed this just to summarize things up. Afterwards I modified my code according to your suggestion so the private keys are not generated randomly but instead they are sequentially generated from a pre-defined starting point. Here's the modified version:

Code:
#!/usr/bin/env python3
# 2022/Dec/26, [b]citb0in_seq.py[/b]
import concurrent.futures
import os
import numpy as np
import secp256k1 as ice

# how many cores to use
num_cores = 1
#num_cores = os.cpu_count()

# Number of addresses to generate
num_addresses = 1000000

# Starting point (decimal/integer) for private key
starting_point = 123456789

# Define a worker function that generates a batch of addresses and returns them
def worker(start, end, starting_point):
  # Initialize the private key to the starting point
  private_key = starting_point

  # Initialize the list to hold to private keys
  private_keys = []

  # Generate a batch of private keys sequentially
  for i in range(start, end):
    # Increment the private key
    private_key += 1

    # Add the private key to the list
    private_keys.append(private_key)

  # Use secp256k1 to convert the private keys to addresses
  thread_addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])

  return thread_addresses
  #return (thread_addresses, private_keys, start_int)

# Use a ProcessPoolExecutor to generate the addresses in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
  # Divide the addresses evenly among the available CPU cores
  addresses_per_core = num_addresses // num_cores

  # Submit a task for each batch of addresses to the executor
  tasks = []
  for i in range(num_cores):
    start = i * addresses_per_core
    end = (i+1) * addresses_per_core
    tasks.append(executor.submit(worker, start, end, starting_point))

  # Wait for the tasks to complete and retrieve the results
  addresses = []
  for task in concurrent.futures.as_completed(tasks):
    addresses.extend(task.result())

# Write the addresses to a file
np.savetxt('addresses_1M_seq_singlecore.txt', addresses, fmt='%s')

Quote
real   0m10,049s
user   0m10,012s
sys   0m1,442s

==> By this change it takes only 10 seconds to generate 1 million addresses using one single core on my system. By replacing the private key generation from randomly to sequentially I got 2.6x better performance.

Now let's activate all available cores...
Code:
[...]
#num_cores = 1
num_cores = os.cpu_count()
[...]
# Write the addresses to a file
np.savetxt('addresses_1M_seq_singlecore.txt', addresses, fmt='%s')

Quote
real   0m2,481s
user   0m18,081s
sys   0m1,661s

==> It takes only 2.5 seconds to generate 1 million addresses using all 16 cores on my system. By replacing the private key generation from randomly to sequentially gradually resulted in higher performance.

Now as final comparison to your program:
Quote
[...]
('1FZqHE6MndwVFKDD9vPYjHQqCNzf9f4V4G', '1BvzQLEoausQEjdQZnaC9TsHEd8CfL54yy')
('19UvEYkDhi7WzP2rFufkWEhbJhzh1DBKed', '1geYTWHYsgokQ2HuELJALYSaBw8an5q7r')
('1N8vnptoRf7HAmdLt2iV7Tr1n6BBff5Ltd', '18x8zZjHqP4d3nUaNPLz7TWfYdYLgiHPsx')
('1PTofsPjc2FUtKoM3nWfCfCQi8KfnNMJy2', '1GYfAZ7R6mnMcRFUgTaELUqLEjcdV9CYjX')
('169x29jPKW3ehUWZcWXRgG8tCwqjDhPAn7', '1LiTZGU8B6gK5zNxBSiRBDyAxZLYSfaHnZ')
('13Xm9hWcfgNfamDEhfuisnnnAD6nCxS9k2', '1NXKfq5eqFJmmCaarjhKoqQXGqfVJKJZYq')
('1NJzBnxVzYRGUSSBqVpV5tpxGMb25hroQj', '1FtgciBUc2mmioTpxgSMvANw7JB3nQEhFc')
('19Qkdthp9m8uqJys58zBgfp4kbp6bS8ZA4', '1KiAJjTY7BvsogAF1suffpbsmB1VB6dLHu')
You have just generated 1,007,862 keys (1 MKeys)

real   0m14,765s
user   0m14,187s
sys   0m0,448s


==> I have generated 1,007,862 addresses (about 1 million keys) in about 15 seconds using one single core with your program.
==> I have generated exactly 1 millions addresses in 10 seconds using one single core with my program.


Conclusion:

The originally enormous speed advantage is therefore, according to these tests, not really to be found in the code difference but in the fact that randomly generated keys are simply more computationally intensive and cost more time than simple sequential key generation. So just as you originally said arulbero.

It always depends on the purpose of use, of course, but I have my doubts that the sequential generation of private keys is not the best way. I therefore think, also in terms of security for other applications, that random private keys are always preferable to sequential ones despite the fact they are more time-intensive.

Thank you very much for your effort and your valuable contribution dear arulbero.

My conclusion so far remains: secp256k1 seems to be pretty damn fast and so far I haven't seen any example that can beat it. Of course I would still be happy if other suggestions flow in. In particular, I would still be interested if someone is able to move the already working code via pycuda, numba, jut or similar solutions into the GPU using CUDA to speed up the computation process.

I am looking forward to further feedback and thank you all in advance.
legendary
Activity: 1932
Merit: 2077

The example I originally showed using iceland2k14/libsecp256k1 was about half the speed. With it I can generate 1 million addresses using all cores (in my case 16 cores) and write to the file in 4.8 seconds, this is a rate of about 208,300 keys/sec. I have modified my initial program so that you can now also configure the cores under which the program is executed. So one can select specifically "1" as value, so that we also compare apples with apples.

Code:
#!/usr/bin/env python3
# 2022/Dec/30, citb0in
import concurrent.futures
import os
import numpy as np
import fastecdsa.keys as fkeys
import fastecdsa.curve as fcurve
import secp256k1 as ice

# how many cores to use
num_cores = 1
#num_cores = os.cpu_count()

# Set the number of addresses to generate
num_addresses = 1000000

# Define a worker function that generates a batch of addresses and returns them
def worker(start, end):
  # Generate a NumPy array of random private keys using fastecdsa
  private_keys = np.array([fkeys.gen_private_key(fcurve.P256) for _ in range(start, end)])

  # Use secp256k1 to convert the private keys to addresses
  thread_addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])

  return thread_addresses

# Use a ProcessPoolExecutor to generate the addresses in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
  # Divide the addresses evenly among the available CPU cores
  addresses_per_core = num_addresses // num_cores

  # Submit a task for each batch of addresses to the executor
  tasks = []
  for i in range(num_cores):
    start = i * addresses_per_core
    end = (i+1) * addresses_per_core
    tasks.append(executor.submit(worker, start, end))

  # Wait for the tasks to complete and retrieve the results
  addresses = []
  for task in concurrent.futures.as_completed(tasks):
    addresses.extend(task.result())

# Write the addresses to a file
np.savetxt('addresses_1M_with_singleCore.txt', addresses, fmt='%s')

Running this under one single core, now I get 26.5 sec for 1 million generated addresses. This is a rate of 37,735 keys/second. I am curious to see your program using all available cores.

Ok, finally I managed to use your multithread program with my script.

Results:   30s to generate 16.4 M of addresses, about 540k addresses / s, and store them in a file.

Code:
time python3 gen_batches.py > addresses.out

real 0m29,810s
user 8m22,402s
sys 0m6,273s

wc addresses.out

 16474040  16474040 575934502

less addresses.out

18GW1MFwpYZgmRSy8R4wgjUtBnscJtXeUZ
1FXUP9HViWxrtgAvqbpgVrd69yarx6Njdp
1MuxV99fRFnhoNopdxV7kkQB4u14chCtqA
17AzXPYExdE8nxkiV6zJGLuJq32WkAZ134
12DUcGmFMkmXV8tYihM2jLQ36pc4zLBDP6
1AqwPisfi9hDHoF95SEoSEVN89fX8NSGjo
17qRbvdShkoCDw3XJGhEqxaFkQHaK36Yk1
187x8hPiVpRLGDRsvy4nreeMNW7aW9zuFv
1EGSDnc2j7p3qsPJsPakKgavCTz4szYU3w
1F4KjKLNqQSbmw9xsJurRVQ3RciL7Kox7g
1AKaHBfhMEedrtnjGgmFFsdG3SzyN3hpcZ
1GaWuuPPM747Na3CTwaR3DcewoBHc9VVSv
12VeWT4jZCep17HJswMtfCEqh14oDcRr3L
16MBC6WLWMvNeexYYX853Ucf2ZsPoYLRoS
1JVrhzEKTcgzGtSc9u9LpFjksFZiQveFT6
17kz2Wpry3vnLM6RMjB3KXoG6kRxjeDQcj
1TBh3q7vQoTmgX6pY75ADHDSkcaWkmCvd
13QVDFeeWkErhTAFqX3SP2YzrzWt3Bh7iu
1GPyzJC8Ey8PasHn2Qa8aFanjrXuE1aNK5
17QE8iNrtLnju5pinG1vdoCVZpn8zbrM7S
17nKRy1XdKdroWXPsnsd4RvxWKVySZxath
1KWaY8K824oEFk5q4tZioCZ86viLJuKNgS
141m3BUZNjMxbs4a9Cb6SbCmEzx4nPJKCL
12ueqsLkT1XCVheGM34uQWW2fpnhVvtwmE
14eH2SFvCq3fxmrK9NgKTxfMCeKixNwQ6c
1G9CaAqoKJSfTRyGcvPhVFewLRXKWzkwh7
15RthH7DBuDSmYcFCYanUhBsrdEYzBahNw
13BTmQ2gCaZ8x4T7Sba9k2MRBSDRrZLy6u
1NpbKVze4pPVCghjaa2CV7LfRkotCNu76Y
1HfQ42v7DEDHEuQMhH5caPzX2RgtSmoYWw
...

Only 1.3 s to generate 16.4 M public keys (without writing)

30s to generate 16.4 M addresses and store them in a file (24s if each thread writes on the file without waiting for the others)
legendary
Activity: 1932
Merit: 2077
@arulbero

Impressive how you're getting those speeds on Python, considering that multithreading is complicated to do efficiently in that language, and GPU/FPGA support is lackluster to nonexistent.

It is not programming related, computing P+G is much more easier than compute P=3512878756844563653653674365654352352*G
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
@arulbero

Impressive how you're getting those speeds on Python, considering that multithreading is complicated to do efficiently in that language, and GPU/FPGA support is lackluster to nonexistent.
legendary
Activity: 1932
Merit: 2077
what is third and fourth element representing?

54338240394213138931889225770364196918256374885639854771044914499200701239376
91288621757068410199936994202114003372515557459679014336095768812037691308008

As I recognize, they seem not to be the public keys of the consequent keys 36028797808045094, 36028797808045095, 36028797808045096

The keys are generated in this way:


P
(P + 1*G , P - 1*G)   -> 4 coordinates, x and y * 2 points
(P + 2*G , P - 2*G)
(P + 3*G , P - 3*G)
(P + 4*G , P - 4*G)
(P + 5*G , P - 5*G)
(P + 6*G , P - 6*G)
.....
.....
(P + 2048*G, P - 2048*G)

each batch has all the consecutive keys from P - 2048*G to P+2048*G   (4097 public keys), but they are not in order

Code:
batch=[(0,0,0,0)]*2048
#the element number 0 of the list will be the middle point kG=(kx,ky), you can see it as the couple ((k+0)G, (k-0)G)  
#the element number 1 will be the couple (k+1)G, (k-1)G
#the element number 2 will be the couple (k+2)G, (k-2)G
#...
#the element number 2048 will be the couple (k+2048)G, (k-2048)G
#each element of this list is a couple of points -> 4 scalars: x_(k+m)G,y_(k+m)G,x_(k-m)G,y_(k-m)G,
#but the first element is only a fake couple
batch = [(kx,ky,kx,ky)]+batch #the element number 0 of the list is now the middle point kG=(kx,ky), you can see it as the fake couple (k+0)G, (k-0)G
hero member
Activity: 630
Merit: 731
Bitcoin g33k
I would like to try another approach and using iceland's secp256k1 library for generating the address from the pubkey. Can you tell me in which variable I can find the public key in your program?

For example. When I print the second element of the list "batch" I get
Code:
print(batch[1])
Quote
(26088438602568126703237513271074049975900194846956880971674554085576597277452, 66899334846368891744016007453456293662065123677302105461153252024609684696055, 54338240394213138931889225770364196918256374885639854771044914499200701239376, 91288621757068410199936994202114003372515557459679014336095768812037691308008)

The first element is the xkey (pubkey) of that key:

Quote
private key (dec) : 36028797808045093
private key (hex) : 8000002f086c25
public pair x        : 26088438602568126703237513271074049975900194846956880971674554085576597277452

the second element
66899334846368891744016007453456293662065123677302105461153252024609684696055
represents the public pair y

what is third and fourth element representing?

54338240394213138931889225770364196918256374885639854771044914499200701239376
91288621757068410199936994202114003372515557459679014336095768812037691308008

As I recognize, they seem not to be the public keys of the consequent keys 36028797808045094, 36028797808045095, 36028797808045096


legendary
Activity: 1932
Merit: 2077

I realized that you output tuples of addresses using list(map). I suggest to adjust your program so it outputs a single address on each line so we compares apples by apples. I am not sure if the list output affects performance in some way, that's why I am pointing to it.

The example I originally showed using iceland2k14/libsecp256k1 was about half the speed. With it I can generate 1 million addresses using all cores (in my case 16 cores) and write to the file in 4.8 seconds, this is a rate of about 208,300 keys/sec. I have modified my initial program so that you can now also configure the cores under which the program is executed. So one can select specifically "1" as value, so that we also compare apples with apples.

To generate 1 address for line you need to replace

print (*addresses,sep="\n")
print (*addresses2,sep="\n")
print (*addresses3,sep="\n")

with

for i in addresses:
      print (*i, sep="\n")
for i in addresses2:
      print (*i, sep="\n")
for i in addresses3:
      print (*i, sep="\n")

The results are the same (slightly faster):
Code:
time python3 gen_batches.py > addresses.out

real 3m5,736s
user 3m5,268s
sys 0m0,404s

wc addresses.out
 
  16400196  16400196 573355137


less addresses.out

1K9oVg45UaApB1SmkxKFBZPCKsj2XiKCYw
1K9oVg45UaApB1SmkxKFBZPCKsj2XiKCYw
13hXNkmedXJ5UXt4RY5rLGAZGBgeSHUNT7
17G6zU4yVBM8WJ7TvJvkVkZr8qtJheAYVy
1LZtUG1S8V7yfwthDfJBy1Yg7LWxpZKmQX
1PDBxaiy3bp8woFbrErNAPoMF8W442B1Nb
1CAapBviMeMp91UjqgTaUfr1tGc5Lsvhgx
1Lmrc69RFRbwGUc3UTRGxumAApEDsJKUrf
1JNGqSWZdJFptRrLaCtmUJ6Hq1necKuTCd
18aQxp32qDDHEFWgVjexS1BG9MdFbQAc6N
....
   
hero member
Activity: 630
Merit: 731
Bitcoin g33k
I don't have a fast  'pub_to_addr' function

from 12.3 s to 3m and 9s to generate 16.4 M of addresses,  16.4 M / 190 s = about 86k addresses/s.

12 seconds to generate 16.4 M public keys, almost 3m to compute 16.4 M addresses.

I had already thought that halfway. Now with your updated version which also generates addresses it's much slower, the result is clearly different than just calculating the pubkey as you showed in your initial version.
I get about 70,000 addresses/sec with your updated version:

Code:
$ time python3 new_gen_batches.py > addresses.out 

Quote
real   0m13,378s
user   0m13,347s
sys   0m0,012s

I realized that you output tuples of addresses using list(map). I suggest to adjust your program so it outputs a single address on each line so we compares apples by apples. I am not sure if the list output affects performance in some way, that's why I am pointing to it.

The example I originally showed using iceland2k14/libsecp256k1 was about half the speed. With it I can generate 1 million addresses using all cores (in my case 16 cores) and write to the file in 4.8 seconds, this is a rate of about 208,300 keys/sec. I have modified my initial program so that you can now also configure the cores under which the program is executed. So one can select specifically "1" as value, so that we also compare apples with apples.

Code:
#!/usr/bin/env python3
# 2022/Dec/30, citb0in
import concurrent.futures
import os
import numpy as np
import fastecdsa.keys as fkeys
import fastecdsa.curve as fcurve
import secp256k1 as ice

# how many cores to use
num_cores = 1
#num_cores = os.cpu_count()

# Set the number of addresses to generate
num_addresses = 1000000

# Define a worker function that generates a batch of addresses and returns them
def worker(start, end):
  # Generate a NumPy array of random private keys using fastecdsa
  private_keys = np.array([fkeys.gen_private_key(fcurve.P256) for _ in range(start, end)])

  # Use secp256k1 to convert the private keys to addresses
  thread_addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])

  return thread_addresses

# Use a ProcessPoolExecutor to generate the addresses in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
  # Divide the addresses evenly among the available CPU cores
  addresses_per_core = num_addresses // num_cores

  # Submit a task for each batch of addresses to the executor
  tasks = []
  for i in range(num_cores):
    start = i * addresses_per_core
    end = (i+1) * addresses_per_core
    tasks.append(executor.submit(worker, start, end))

  # Wait for the tasks to complete and retrieve the results
  addresses = []
  for task in concurrent.futures.as_completed(tasks):
    addresses.extend(task.result())

# Write the addresses to a file
np.savetxt('addresses_1M_with_singleCore.txt', addresses, fmt='%s')

Running this under one single core, now I get 26.5 sec for 1 million generated addresses. This is a rate of 37,735 keys/second. I am curious to see your program using all available cores.
legendary
Activity: 1932
Merit: 2077
@arulbero: thanks for sharing, sounds interesting. I need to dig into it, unfortunately I have to go right now but I certainly will check your code later when back at home. I need to think about the consecutive vs. random part and make some comparisons each other. Currently when I run your code it says it generated 1 million keys in average of 1.22 seconds. I just need to test if this speed is realistisc. Can you modify your code so it will generate an address and output all the generated addresses to a file called address.out ?

I don't have a fast  'pub_to_addr' function

from 12.3 s to 3m and 9s to generate 16.4 M of addresses,  16.4 M / 190 s = about 86k addresses/s.

12 seconds to generate 16.4 M public keys, almost 3m to compute 16.4 M addresses.

Code:
time python3 gen_batches.py > address.out

real 3m8,827s
user 3m8,252s
sys 0m0,520s

wc  address.out

  8200098  16400196 630754794

less address.out

('1K9oVg45UaApB1SmkxKFBZPCKsj2XiKCYw', '1K9oVg45UaApB1SmkxKFBZPCKsj2XiKCYw')
('13hXNkmedXJ5UXt4RY5rLGAZGBgeSHUNT7', '17G6zU4yVBM8WJ7TvJvkVkZr8qtJheAYVy')
('1LZtUG1S8V7yfwthDfJBy1Yg7LWxpZKmQX', '1PDBxaiy3bp8woFbrErNAPoMF8W442B1Nb')
('1CAapBviMeMp91UjqgTaUfr1tGc5Lsvhgx', '1Lmrc69RFRbwGUc3UTRGxumAApEDsJKUrf')
('1JNGqSWZdJFptRrLaCtmUJ6Hq1necKuTCd', '18aQxp32qDDHEFWgVjexS1BG9MdFbQAc6N')
('1CMNQMqqZdyABzU5bms6sE1J4hz2z7w86J', '1Awa5kf9npJ9sJKK65yA1vXFXaqC75tWvt')
('1GXDr1MF5ee5HfZHK98QQrDb8Mk785Hwtj', '14R3fvJcA8SSFgiCAPSQCvrquEG1NM7JcQ')
('1G578XchMTjHTuWduB37XNzNxEyvKxNNGC', '1E2aoTr22SDH18NJdamF3zqjztjtA5i5Cz')
('13BdD8vK9xaN1aw2WXn73rtLfjFvLAs3ES', '1Hgi86UjskxQpCmiKWPBK4Qjr97awHgZ7q')
('1Bnh87jjfNFy9QmRmnsW9vEDWbMDbBbjrF', '1P5wAX3U4v4oNde8yGUcg4YHwQdGRw33wo')
('1DXzgTcLD86LJDxrZerLEBXgQTLpCCRHRC', '1Ce5FYPKPSd9nf5Fv3KtKnzbpdCzJ7bPjg')
...

new ecc.py
Code:
import hashlib
from hashlib import sha256
from binascii import  a2b_hex, b2a_hex
from ripemd import ripemd160  #https://pypi.org/project/ripemd-hash/  -->  pip install ripemd-hash

#see http://www.secg.org/sec2-v2.pdf at page 9


#Gx=55066263022277343669578718895168534326250603453777594175500187360389116729240
#Gy=32670510020758816978083085130507043184471273380659243275938904335757337482424
#p=115792089237316195423570985008687907853269984665640564039457584007908834671663
#n=115792089237316195423570985008687907852837564279074904382605163141518161494337

Gx=0x79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798
Gy=0x483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8
p=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F # Fp
n=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141 # E(Fp)

#endomorphism
lambd=0x5363ad4cc05c30e0a5261c028812645a122e22ea20816678df02967c1b23bd72 # (lambda^3 = 1 mod n)    
lambd2=0xac9c52b33fa3cf1f5ad9e3fd77ed9ba4a880b9fc8ec739c2e0cfc810b51283ce #(lambda^2)

beta=0x7ae96a2b657c07106e64479eac3434e99cf0497512f58995c1396c28719501ee # (beta^3 = 1 mod p)
beta2=0x851695d49a83f8ef919bb86153cbcb16630fb68aed0a766a3ec693d68e6afa40 # (beta^2)
############################################################################################
###Field operations

#a must be a number between 1 and p-1;    input: a  output: 1/a
#see http://www.springer.com/?SGWID=4-102-45-110359-0 page 40 alg. 2.20
def inv(a,p):
u, v = a%p, p
x1, x2 = 1, 0
while u != 1:
#q = v//u
#r = v-q*u
q, r = divmod(v,u)
x = x2-q*x1
v = u
u = r
x2 = x1
x1 = x
return x1%p


#inverse of a batch of x
#input: batch of x : [x1,x2,x3,x4,x5,x6,x7,x8,...,x2048]
#x is known (x of kG)
#output: batch of inverse: 1/(x1-x), 1/(x2-x), 1/(x3-x), ....
#3M for each inverse
def inv_batch(x,batchx,p):

a = (batchx[1]-x) % p
partial= [0]*2049
partial[1]=a

for i in range(2,2049):
a = (a*(batchx[i]-x)) % p #2047M
partial[i]=a

inverse=inv(partial[2048],p) # 1I
batch_inverse=[0]*2049

for i in range(2048,1,-1):
batch_inverse[i-1]=(partial[i-1]*inverse) % p #2047M
inverse=(inverse*(batchx[i]-x)) %p #2047M

batch_inverse[0]=inverse

return batch_inverse

############################################################################################
##Group operations

#  https://en.wikipedia.org/wiki/Elliptic_curve_point_multiplication#Point_addition
#'double' addition
#add 2 points P and Q -->  P+Q
#add 2 points P and Q -->  P-Q
#
#(X1,Y1) + (X2,Y2)  -->  (X3,Y3)  (the inverse of (x2-x1) must be known)
#(X1,Y1) + (X2,-Y2) -->  (X4,Y4)  (the inverse of (x2-x1) must be known)
#
#input: (X1,Y1), (X2,Y2), the inverse of (X2 - X1): 5 scalars
#output: (X3,Y3), (X4,Y4) :  4 scalars
#4M + 2S

def double_add_P_Q_inv(x1, y1, x2, y2, invx2x1):

dy = (y2-y1) #% p
a = dy*invx2x1 % p #1M
a2 = a**2     #1S
x3 = (a2 - x1 -x2) % p
y3 = (a*(x1-x3) -y1) % p #1M


dy = p-(y2+y1) #% p # only the sign of y2 changes, "y of -Q = -y of +Q"
a = dy*invx2x1 % p #1M  x2 and then the inverse invx2x1 are the same
a2 = a**2   #1S
x4 = (a2 - x1 -x2) % p
y4 = (a*(x1-x4) -y1) % p #1M

return x3, y3, x4, y4


#https://en.wikibooks.org/wiki/Cryptography/Prime_Curve/Jacobian_Coordinates
#jacobian coordinates
def add_j_j(jax, jay, jaz, jbx, jby, jbz):

u1 = (jax*jbz**2) % p
u2 = (jbx*jaz**2) % p
s1 = (jay*jbz**3) % p
s2 = (jby*jaz**3) % p
h = (u2-u1) % p
r = (s2-s1) % p
jcx = (r**2 -h**3-2*u1*h**2) % p
jcy = (r*(u1*h**2-jcx)-s1*h**3) % p
jcz = (h*jaz*jbz) % p
return jcx, jcy, jcz

def double_j_j(jx, jy, jz):
s = (4*jx*jy**2) % p
m = (3*jx**2) % p
jx2 = (m**2 - 2*s) % p
jy2 = (m*(s-jx2) - 8*jy**4) % p
jz2 = (2*jy*jz) % p
return jx2, jy2, jz2


def mul(k,jx,jy,jz):

jkx, jky, jkz = 0, 0, 1
if (k%2 == 1):
jkx, jky, jkz = jx, jy, jz #only if d is odd
q=k//2
while q>0:
jx, jy, jz = double_j_j(jx,jy,jz)
a=q%2
if (a > jkx):
jkx, jky, jkz = jx, jy, jz  #only if d is even
elif (a):
jkx, jky, jkz = add_j_j(jx, jy, jz, jkx, jky, jkz)  
q=q//2
                                 
return jkx, jky, jkz

############################################################################################
#Coordinates

#from jacobian to affine
def jac_to_aff(jax, jay, jaz, invjaz):

ax, ay = (jax*invjaz**2) % p, (jay*invjaz**3) % p

return ax, ay

############################################################################################
# 58 character alphabet used
__b58chars = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'
__b58base = len(__b58chars)

def b58encode(v): #encode v, which is a string of bytes, to base58.    
long_value = 0

if bytes == str:  # python2
for (i, c) in enumerate(v[::-1]):
long_value += ord(c) << (8*i) # 2x speedup vs. exponentiation
else: # python3
for (i, c) in enumerate(v[::-1]):
long_value += ord(chr(c)) << (8*i) # 2x speedup vs. exponentiation

result = ''
#long_value = v
while long_value >= __b58base:
     div, mod = divmod(long_value, __b58base)
     result = __b58chars[mod] + result
     long_value = div
result = __b58chars[long_value] + result
#return result

 # Bitcoin does a little leading-zero-compression:
 # leading 0-bytes in the input become leading-1s
nPad = 0
#if bytes == str:  # python2
# for c in v:
# if c == '\0': nPad += 1
# else: break

#else:  # python3
for c in v:
if chr(c) == '\0': nPad += 1
else: break

return (__b58chars[0]*nPad) + result

def b58decode(v, length):
    """ decode v into a string of len bytes."""
    print(type(v))
    long_value = 0
    for (i, c) in enumerate(v[::-1]):
        long_value += __b58chars.find(c) * (__b58base**i)

    div, mod = divmod(long_value, 256)
    result = struct.pack("B", mod)   #converte numero intero tra 0 e 255 in 1 byte https://docs.python.org/2/library/struct.html
    long_value = div

    while long_value >= 256:
         div, mod = divmod(long_value, 256)
         result = struct.pack("B", mod) + result   #stringa di byte
         long_value = div
    result = struct.pack("B", long_value) + result
 
    nPad = 0
    for c in v:
         if c == __b58chars[0]: nPad += 1
         else: break
 
    result = struct.pack("B", 0)*nPad + result

    
    if length is not None and len(result) != length:
        return None
    print(type(result))  
    return result
    
def pub_to_addr(couple):
        
#d = int(d1,16)
#Px, Py = mul2G(d)  # P = d*G
Px=couple[0]
Py=couple[1]
Px2=couple[2]
Py2=couple[3]

#if (compressed == 1):
if (Py % 2) == 0:
P_string  = '02' + hex(Px)[2:].rstrip('L').rjust(64,'0')         #formato compresso - caso y pari
else:
P_string  = '03' + hex(Px)[2:].rstrip('L').rjust(64,'0')        #formato compresso - caso y pari
#else: #uncompressed
# P_string = '04' + hex(Px)[2:].rstrip('L').rjust(64,'0') + hex(Py)[2:].rstrip('L').rjust(64,'0')


h = sha256(a2b_hex(P_string))                    #sha256

h1=ripemd160.new()                               #ripmed160
h1.update(h.digest())


a = b'\x00' + h1.digest()  #aggiunge byte 00 all'inizio

h2 = sha256(sha256(a).digest()).digest()  #doppio sha256 per controllo

addr = a + h2[:4]                

address1 = b58encode(addr)  #ultimo passaggio: codifica in base 58


if (Py2 % 2) == 0:
P_string  = '02' + hex(Px2)[2:].rstrip('L').rjust(64,'0')         #formato compresso - caso y pari
else:
P_string  = '03' + hex(Px2)[2:].rstrip('L').rjust(64,'0')        #formato compresso - caso y pari
#else: #uncompressed
# P_string = '04' + hex(Px)[2:].rstrip('L').rjust(64,'0') + hex(Py)[2:].rstrip('L').rjust(64,'0')


h = sha256(a2b_hex(P_string))                    #sha256

h1=ripemd160.new()                               #ripmed160
h1.update(h.digest())

a = b'\x00' + h1.digest()  #aggiunge byte 00 all'inizio

h2 = sha256(sha256(a).digest()).digest()  #doppio sha256 per controllo

addr = a + h2[:4]                

address2 = b58encode(addr)  #ultimo passaggio: codifica in base 58

return address1, address2
#print (address)

new gen_batches.py
Code:
#!/usr/bin/env python

#this script computes 3x1334 batchs of 4097 points (1 single point + 2048 couples of points -> (kG+mG, kG-mg))
#16,4 millions of points
#3 : 1 batch + 2 endomorphism


from ecc import *

########################################################################
#in this section the script pre-computes 1G, 2G, 3G, ..., 2048G
mG = [(0,0)]*2049
mGx = [0]*2049
mGy = [0]*2049

mGx[1]=Gx
mGy[1]=Gy
mG[1]=(Gx,Gy) #list of multiples of G,
 #you have to precompute and store these multiples somewhere

mGx[2]=0xC6047F9441ED7D6D3045406E95C07CD85C778E4B8CEF3CA7ABAC09B95C709EE5
mGy[2]=0x1AE168FEA63DC339A3C58419466CEAEEF7F632653266D0E1236431A950CFE52A
mG[2]=(mGx[2],mGy[2])

for i in range(3,2049):

jx,jy,jz=add_j_j(Gx, Gy, 1, mGx[i-1], mGy[i-1], 1) #this is not a efficient way, but it is not important
jzinv = inv(jz,p)
(x,y) = jac_to_aff(jx, jy, jz, jzinv)
mG[i] = (x,y)
mGx[i] = x
mGy[i] = y

###########################################################################
lowest_key=2**55+789079076 #put here the lowest key you want to generate
number_of_batches=1334 #put here how many consecutive batches you want to generate
number_of_keys=4097*3*number_of_batches #this is the number of consecutive keys you are going to generate
#this number is always a multiple of 3*4097=12291, because 12291 is the minimum size
#remember: you generate always 3 batches at once, 1 in (1,2^160) and 2 out of this range
#their name are:  batch (the main), batch2, batch3 (that are derived from the main with endomorphism)


id_batch=lowest_key+2048 #id_batch is now the private key of the middle point of the first batch that you want compute
distance_between_successive_batches=4097 #this is the distance from the middle point of the first batch and the middle point of the successive batch in (1,2^160)
#4097 means that I'm generating only consecutive batches in (1,2^160), this is the default


start=id_batch #the first key to generate of the the first batch is the lowest key + 2048 --> the middle point of the first batch
stop=id_batch + number_of_batches*distance_between_successive_batches #the key of the middle point of the last batch




#k is always the id of the current main batch and the key of the middle point of the current main batch
for k in range(start,stop,distance_between_successive_batches):

jkx,jky,jkz = mul(k,Gx,Gy,1) #here we use the slow function mul to generate the middle point
invjkz = inv(jkz,p)
(kx,ky) = jac_to_aff(jkx, jky, jkz, invjkz)

kminverse = [0]*2048
kminverse = inv_batch(kx,mGx,p) #you need to compute 2048 inverse, 1 for each couple
#to perform this task you need only the x coordinates of the multiples of G and the x coordinate of kG: kx
kminverse = [invjkz]+kminverse
#the element number 0 of the list is the inverse invjkz that you have computed to get kG=(kx,ky)
#the element number 1 of the list is the inverse necessary to perform kG + 1G (and kG-1G) --->  1/(x_of_1G - kx)
#the element number 2 of the list is the inverse necessary to perform kG + 2G (and kG-2G) --->  1/(x_of_2G - kx)
#...
#the element number m of the list is the inverse necessary to perform kG + mG (and kG-mG) --->  1/(x_of_mG - kx)
#then the name: "kminverse", the inverse to get the generic kG+mG
#...
#the element number 2048 of the list is the inverse necessary to perform kG+2048G and kG-2048G ---> 1/(x_of_2048G - kx)

kxl = [kx] * 2049 #this is a list of kx, all elements are the same, only to use the function map of python
kyl = [ky] * 2049   #this is a list of ky, all elements are the same, only to use the function map of python

batch=[(0,0,0,0)]*2048
#the element number 0 of the list will be the middle point kG=(kx,ky), you can see it as the couple ((k+0)G, (k-0)G)  
#the element number 1 will be the couple (k+1)G, (k-1)G
#the element number 2 will be the couple (k+2)G, (k-2)G
#...
#the element number 2048 will be the couple (k+2048)G, (k-2048)G
#each element of this list is a couple of points -> 4 scalars: x_(k+m)G,y_(k+m)G,x_(k-m)G,y_(k-m)G,
#but the first element is only a fake couple
batch = list(map(double_add_P_Q_inv,kxl[1:],kyl[1:],mGx[1:],mGy[1:],kminverse[1:]))
#the function double_add_P_Q_inv computes kG +/-mG, it uses the list of the inverse and return a list of couple of points
#each element of this list is a couple of points, 4 scalars, x_(k+m)G,y_(k+m)G,x_(k-m)G,y_(k-m)G
#this list has 2049 elements

batch = [(kx,ky,kx,ky)]+batch #the element number 0 of the list is now the middle point kG=(kx,ky), you can see it as the fake couple (k+0)G, (k-0)G

#batch_tot = batch_tot + batch
addresses = list(map(pub_to_addr, batch))
print (*addresses, sep="\n")


batch2=[(0,0,0,0)] * 2049 #this is the batch2 with lambda*kG, ( lambda*(k+1)G, lambda*(k-1)G ),  ( lambda*(k+2)G, lambda*(k-2)G ), ....,
# (lambda*(k+2048)G, lambda*(k-2048)G)
#this list actually has only the x-cooordinates of the points, because the y-coordinates are the same of main batch
#each element of this list is a couple of x -> 2 scalars: x_lambda*(k+m)G, x_lambda*(k-m)G
#this list has 2049 elements




batch3=[(0,0,0,0)] * 2049 #this is the batch3 with lambda^2*kG, ( lambda^2*(k+1)G, lambda^2*(k-1)G ),  ( lambda^2*(k+2)G, lambda^2*(k-2)G ), ....,
# (lambda^2*(k+2048)G, lambda^2*(k-2048)G)
#this list actually has only the x-cooordinates of the points, because the y-coordinates are the same of main batch
#each element of this list is a couple of x -> 2 scalars: x_lambda^2*(k+m)G, x_lambda^2*(k-m)G
#this list has 2049 elements


for i in range(0,2049): #endomorphism  
batch2[i] = ((batch[i][0]*beta  % p), batch[i][1], (batch[i][2]*beta % p), batch[i][3]) #only x coordinates, the first and the third scalar of each element of the main batch
#private key: lambda*k mod n,  coordinate x: beta*x mod p



batch3[i] = ((batch[i][0]*beta2 % p), batch[i][1], (batch[i][2]*beta2 % p), batch[i][3]) #only x coordinates, the first and the third scalar of each element of the main batch
#private key: lambda^2*k mod n,  coordinate x: beta^2*x mod p

      
addresses2 = list(map(pub_to_addr, batch2))
print (*addresses2, sep="\n")
addresses3 = list(map(pub_to_addr, batch3))
print (*addresses3, sep="\n")
        



#k in this case is the id of the last batch (or if you prefer, k is the private key of the middle point of the last batch)
"""
m=465 #a random number between 0 and 2048, you can pick up a couple of points in the last main batch (or in batch2 / batch3)
(kmx1,kmy1,kmx2,kmy2)=batch[m]
(kmx1,kmx2)=batch3[m] #in this case the points chosen are in batch3
#the y are from main batch, because batch3 (and batch2) have only x coordinates

print ('kG+mG')
print (hex(lambd2*(k+m)%n)) #private key
print (hex(kmx1), hex(kmy1)) #public key (first and second coordinate)
print ('*******')


print ('kG-mG')
print (hex(lambd2*(k-m)%n)) #private key
print (hex(kmx2), hex(kmy2)) #public key (first and second coordinate)
print ('*******')
"""

"""
print (' ')
a=number_of_keys//1000000
print ('You have just generated ' + str(a) + ' Millions of keys!!!')
"""
hero member
Activity: 630
Merit: 731
Bitcoin g33k
I made quick comparison by editing lowest_key=1000000 and number_of_batches=82 ...
what was the reason therefore? lowest_key is just the number to start the consecutive run IMHO. And I doubt that  you generated 1,007,862 MILLION (!) in 1.6 seconds Smiley
You have just generated 1007862 Millions of keys!!!

real    0m1.626s
With number_of_batches = 82 you should have been generated 1,007,862 keys (which is about 1 million) which the result for of 1.6 is realistic. You could replace the very last line of arulsbero's code in gen_batches.py to get a more clear output:
Code:
#print ('You have just generated ' + str(a) + ' Millions of keys!!!')
print(f'You have just generated {f"{number_of_keys:,.0f}"} keys ({str(a)} MKeys)', end='\n')

We should take into consideration that arulbero's way does not generate the corresponding addresses and not saving them into a file, we need to rewrite the code and see how it behaves.

@arulbero: thanks for sharing, sounds interesting. I need to dig into it, unfortunately I have to go right now but I certainly will check your code later when back at home. I need to think about the consecutive vs. random part and make some comparisons each other. Currently when I run your code it says it generated 1 million keys in average of 1.22 seconds. I just need to test if this speed is realistisc. Can you modify your code so it will generate an address and output all the generated addresses to a file called address.out ?
legendary
Activity: 1932
Merit: 2077

Quote
real   0m4,768s
user   0m54,444s
sys   0m1,894s

Perfect! Now I'm under 5 seconds for 1 million addresses. That's a rate of roughly 208,000 addresses/sec
How can we further improve the performance without using GPU? Is there anything else you can suggest, that will speed up the process?
Let's make a contest-like game. Create a simple code like this and try to beat the high score of currently 4.7 seconds for 1million bech32 addresses without using GPU.

The next step would be ==> how can we shuffle this code to the GPU for ultra-fast performance ?

Long time ago I wrote a python script that computes 16.4 million of consecutive (non random) public keys (not addresses) in 12.3 s.


No numpy, pure python, only 1 thread.

ecc.py
Code:

#see http://www.secg.org/sec2-v2.pdf at page 9


#Gx=55066263022277343669578718895168534326250603453777594175500187360389116729240
#Gy=32670510020758816978083085130507043184471273380659243275938904335757337482424
#p=115792089237316195423570985008687907853269984665640564039457584007908834671663
#n=115792089237316195423570985008687907852837564279074904382605163141518161494337

Gx=0x79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798
Gy=0x483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8
p=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F # Fp
n=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141 # E(Fp)

#endomorphism
lambd=0x5363ad4cc05c30e0a5261c028812645a122e22ea20816678df02967c1b23bd72 # (lambda^3 = 1 mod n)    
lambd2=0xac9c52b33fa3cf1f5ad9e3fd77ed9ba4a880b9fc8ec739c2e0cfc810b51283ce #(lambda^2)

beta=0x7ae96a2b657c07106e64479eac3434e99cf0497512f58995c1396c28719501ee # (beta^3 = 1 mod p)
beta2=0x851695d49a83f8ef919bb86153cbcb16630fb68aed0a766a3ec693d68e6afa40 # (beta^2)
############################################################################################
###Field operations

#a must be a number between 1 and p-1;    input: a  output: 1/a
#see http://www.springer.com/?SGWID=4-102-45-110359-0 page 40 alg. 2.20
def inv(a,p):
u, v = a%p, p
x1, x2 = 1, 0
while u != 1:
#q = v//u
#r = v-q*u
q, r = divmod(v,u)
x = x2-q*x1
v = u
u = r
x2 = x1
x1 = x
return x1%p


#inverse of a batch of x
#input: batch of x : [x1,x2,x3,x4,x5,x6,x7,x8,...,x2048]
#x is known (x of kG)
#output: batch of inverse: 1/(x1-x), 1/(x2-x), 1/(x3-x), ....
#3M for each inverse
def inv_batch(x,batchx,p):

a = (batchx[1]-x) % p
partial= [0]*2049
partial[1]=a

for i in range(2,2049):
a = (a*(batchx[i]-x)) % p #2047M
partial[i]=a

inverse=inv(partial[2048],p) # 1I
batch_inverse=[0]*2049

for i in range(2048,1,-1):
batch_inverse[i-1]=(partial[i-1]*inverse) % p #2047M
inverse=(inverse*(batchx[i]-x)) %p #2047M

batch_inverse[0]=inverse

return batch_inverse

############################################################################################
##Group operations

#  https://en.wikipedia.org/wiki/Elliptic_curve_point_multiplication#Point_addition
#'double' addition
#add 2 points P and Q -->  P+Q
#add 2 points P and Q -->  P-Q
#
#(X1,Y1) + (X2,Y2)  -->  (X3,Y3)  (the inverse of (x2-x1) must be known)
#(X1,Y1) + (X2,-Y2) -->  (X4,Y4)  (the inverse of (x2-x1) must be known)
#
#input: (X1,Y1), (X2,Y2), the inverse of (X2 - X1): 5 scalars
#output: (X3,Y3), (X4,Y4) :  4 scalars
#4M + 2S

def double_add_P_Q_inv(x1, y1, x2, y2, invx2x1):

dy = (y2-y1) #% p
a = dy*invx2x1 % p #1M
a2 = a**2     #1S
x3 = (a2 - x1 -x2) % p
y3 = (a*(x1-x3) -y1) % p #1M


dy = p-(y2+y1) #% p # only the sign of y2 changes, "y of -Q = -y of +Q"
a = dy*invx2x1 % p #1M  x2 and then the inverse invx2x1 are the same
a2 = a**2   #1S
x4 = (a2 - x1 -x2) % p
y4 = (a*(x1-x4) -y1) % p #1M

return x3, y3, x4, y4


#https://en.wikibooks.org/wiki/Cryptography/Prime_Curve/Jacobian_Coordinates
#jacobian coordinates
def add_j_j(jax, jay, jaz, jbx, jby, jbz):

u1 = (jax*jbz**2) % p
u2 = (jbx*jaz**2) % p
s1 = (jay*jbz**3) % p
s2 = (jby*jaz**3) % p
h = (u2-u1) % p
r = (s2-s1) % p
jcx = (r**2 -h**3-2*u1*h**2) % p
jcy = (r*(u1*h**2-jcx)-s1*h**3) % p
jcz = (h*jaz*jbz) % p
return jcx, jcy, jcz

def double_j_j(jx, jy, jz):
s = (4*jx*jy**2) % p
m = (3*jx**2) % p
jx2 = (m**2 - 2*s) % p
jy2 = (m*(s-jx2) - 8*jy**4) % p
jz2 = (2*jy*jz) % p
return jx2, jy2, jz2


def mul(k,jx,jy,jz):

jkx, jky, jkz = 0, 0, 1
if (k%2 == 1):
jkx, jky, jkz = jx, jy, jz #only if d is odd
q=k//2
while q>0:
jx, jy, jz = double_j_j(jx,jy,jz)
a=q%2
if (a > jkx):
jkx, jky, jkz = jx, jy, jz  #only if d is even
elif (a):
jkx, jky, jkz = add_j_j(jx, jy, jz, jkx, jky, jkz)  
q=q//2
                                 
return jkx, jky, jkz

############################################################################################
#Coordinates

#from jacobian to affine
def jac_to_aff(jax, jay, jaz, invjaz):

ax, ay = (jax*invjaz**2) % p, (jay*invjaz**3) % p

return ax, ay

gen_batches.py
Code:
#!/usr/bin/env python

#this script computes 3x1334 batchs of 4097 points (1 single point + 2048 couples of points -> (kG+mG, kG-mg))
#16,4 millions of points
#3 : 1 batch + 2 endomorphism


from ecc import *

########################################################################
#in this section the script pre-computes 1G, 2G, 3G, ..., 2048G
mG = [(0,0)]*2049
mGx = [0]*2049
mGy = [0]*2049

mGx[1]=Gx
mGy[1]=Gy
mG[1]=(Gx,Gy) #list of multiples of G,
 #you have to precompute and store these multiples somewhere

mGx[2]=0xC6047F9441ED7D6D3045406E95C07CD85C778E4B8CEF3CA7ABAC09B95C709EE5
mGy[2]=0x1AE168FEA63DC339A3C58419466CEAEEF7F632653266D0E1236431A950CFE52A
mG[2]=(mGx[2],mGy[2])

for i in range(3,2049):

jx,jy,jz=add_j_j(Gx, Gy, 1, mGx[i-1], mGy[i-1], 1) #this is not a efficient way, but it is not important
jzinv = inv(jz,p)
(x,y) = jac_to_aff(jx, jy, jz, jzinv)
mG[i] = (x,y)
mGx[i] = x
mGy[i] = y

###########################################################################
lowest_key=2**55+789079076 #put here the lowest key you want to generate
number_of_batches=1334 #put here how many consecutive batches you want to generate
number_of_keys=4097*3*number_of_batches #this is the number of consecutive keys you are going to generate
#this number is always a multiple of 3*4097=12291, because 12291 is the minimum size
#remember: you generate always 3 batches at once, 1 in (1,2^160) and 2 out of this range
#their name are:  batch (the main), batch2, batch3 (that are derived from the main with endomorphism)


id_batch=lowest_key+2048 #id_batch is now the private key of the middle point of the first batch that you want compute
distance_between_successive_batches=4097 #this is the distance from the middle point of the first batch and the middle point of the successive batch in (1,2^160)
#4097 means that I'm generating only consecutive batches in (1,2^160), this is the default


start=id_batch #the first key to generate of the the first batch is the lowest key + 2048 --> the middle point of the first batch
stop=id_batch + number_of_batches*distance_between_successive_batches #the key of the middle point of the last batch


#k is always the id of the current main batch and the key of the middle point of the current main batch
for k in range(start,stop,distance_between_successive_batches):

jkx,jky,jkz = mul(k,Gx,Gy,1) #here we use the slow function mul to generate the middle point
invjkz = inv(jkz,p)
(kx,ky) = jac_to_aff(jkx, jky, jkz, invjkz)

kminverse = [0]*2048
kminverse = inv_batch(kx,mGx,p) #you need to compute 2048 inverse, 1 for each couple
#to perform this task you need only the x coordinates of the multiples of G and the x coordinate of kG: kx
kminverse = [invjkz]+kminverse
#the element number 0 of the list is the inverse invjkz that you have computed to get kG=(kx,ky)
#the element number 1 of the list is the inverse necessary to perform kG + 1G (and kG-1G) --->  1/(x_of_1G - kx)
#the element number 2 of the list is the inverse necessary to perform kG + 2G (and kG-2G) --->  1/(x_of_2G - kx)
#...
#the element number m of the list is the inverse necessary to perform kG + mG (and kG-mG) --->  1/(x_of_mG - kx)
#then the name: "kminverse", the inverse to get the generic kG+mG
#...
#the element number 2048 of the list is the inverse necessary to perform kG+2048G and kG-2048G ---> 1/(x_of_2048G - kx)

kxl = [kx] * 2049 #this is a list of kx, all elements are the same, only to use the function map of python
kyl = [ky] * 2049   #this is a list of ky, all elements are the same, only to use the function map of python

batch=[(0,0,0,0)]*2048
#the element number 0 of the list will be the middle point kG=(kx,ky), you can see it as the couple ((k+0)G, (k-0)G)  
#the element number 1 will be the couple (k+1)G, (k-1)G
#the element number 2 will be the couple (k+2)G, (k-2)G
#...
#the element number 2048 will be the couple (k+2048)G, (k-2048)G
#each element of this list is a couple of points -> 4 scalars: x_(k+m)G,y_(k+m)G,x_(k-m)G,y_(k-m)G,
#but the first element is only a fake couple
batch = list(map(double_add_P_Q_inv,kxl[1:],kyl[1:],mGx[1:],mGy[1:],kminverse[1:]))
#the function double_add_P_Q_inv computes kG +/-mG, it uses the list of the inverse and return a list of couple of points
#each element of this list is a couple of points, 4 scalars, x_(k+m)G,y_(k+m)G,x_(k-m)G,y_(k-m)G
#this list has 2049 elements

batch = [(kx,ky,kx,ky)]+batch #the element number 0 of the list is now the middle point kG=(kx,ky), you can see it as the fake couple (k+0)G, (k-0)G

batch2=[0] * 2049 #this is the batch2 with lambda*kG, ( lambda*(k+1)G, lambda*(k-1)G ),  ( lambda*(k+2)G, lambda*(k-2)G ), ....,
# (lambda*(k+2048)G, lambda*(k-2048)G)
#this list actually has only the x-cooordinates of the points, because the y-coordinates are the same of main batch
#each element of this list is a couple of x -> 2 scalars: x_lambda*(k+m)G, x_lambda*(k-m)G
#this list has 2049 elements


batch3=[0] * 2049 #this is the batch3 with lambda^2*kG, ( lambda^2*(k+1)G, lambda^2*(k-1)G ),  ( lambda^2*(k+2)G, lambda^2*(k-2)G ), ....,
# (lambda^2*(k+2048)G, lambda^2*(k-2048)G)
#this list actually has only the x-cooordinates of the points, because the y-coordinates are the same of main batch
#each element of this list is a couple of x -> 2 scalars: x_lambda^2*(k+m)G, x_lambda^2*(k-m)G
#this list has 2049 elements


for i in range(0,2049): #endomorphism  
batch2[i] = [(batch[i][0]*beta  % p), (batch[i][2]*beta % p)] #only x coordinates, the first and the third scalar of each element of the main batch
#private key: lambda*k mod n,  coordinate x: beta*x mod p
batch3[i] = [(batch[i][0]*beta2 % p), (batch[i][2]*beta2 % p)] #only x coordinates, the first and the third scalar of each element of the main batch
#private key: lambda^2*k mod n,  coordinate x: beta^2*x mod p


#k in this case is the id of the last batch (or if you prefer, k is the private key of the middle point of the last batch)
m=465 #a random number between 0 and 2048, you can pick up a couple of points in the last main batch (or in batch2 / batch3)
(kmx1,kmy1,kmx2,kmy2)=batch[m]
(kmx1,kmx2)=batch3[m] #in this case the points chosen are in batch3
#the y are from main batch, because batch3 (and batch2) have only x coordinates

print ('kG+mG')
print (hex(lambd2*(k+m)%n)) #private key
print (hex(kmx1), hex(kmy1)) #public key (first and second coordinate)
print ('*******')


print ('kG-mG')
print (hex(lambd2*(k-m)%n)) #private key
print (hex(kmx2), hex(kmy2)) #public key (first and second coordinate)
print ('*******')

print (' ')
a=number_of_keys//1000000
print ('You have just generated ' + str(a) + ' Millions of keys!!!')

Code:
time python3 gen_batches.py 
 
You have just generated 16.4 Millions of keys!!!

real 0m12,303s
user 0m12,302s
sys 0m0,000s

Generating consecutive public keys is way faster than generate random public keys.
hero member
Activity: 630
Merit: 731
Bitcoin g33k
next step is to shift the load to GPU with pycuda, numba, jit or similar. To use the wrapper PyCUDA we need to modify the code to offload the address generation and private key conversion tasks to the GPU. This can be done by defining GPU kernels that perform these tasks and then using PyCUDA's DeviceAllocation and Kernel functions to run these kernels on the GPU. Unfortunately I was unsucessful with that task.
Pages:
Jump to: