Pages:
Author

Topic: Bitcoin puzzle transaction ~32 BTC prize to who solves it - page 87. (Read 230098 times)

full member
Activity: 1162
Merit: 237
Shooters Shoot...
keyhunt not good

you should learn to configure it, defaul thread subrange is 32 bits, for small ranges with multiple threads you should lower the N value  with "-n number" if the range is less than 1 Million keys you should use -n 0x10000
where 0x10000 its a 16 bits subrange per thread

Look



Found i less than a second

are you key hunt creator?
nice too meet you
It was my mistake, I apologize, dear friend
ok , thank you
I meant https://github.com/WanderingPhilosopher/KeyHuntCudaClient
i use of first version keyhunt cuda

Zero clues of how or what you ran, but with a single CPU core, using keyhunt-cuda, it is found pretty much as the program starts:

Code:
KeyHunt-Cuda v1.08

COMP MODE    : COMPRESSED
COIN TYPE    : BITCOIN
SEARCH MODE  : Single Address
DEVICE       : CPU
CPU THREAD   : 1
SSE          : YES
RKEY         : 0 Mkeys
MAX FOUND    : 65536
BTC ADDRESS  : 1E5V4LbVrTbFrfj7VN876DamzkaNiGAvFo
OUTPUT FILE  : Found.txt

Start Time   : Sat May 25 10:11:49 2024
Global start : 20000000000000000 (66 bit)
Global end   : 21000000000000000 (66 bit)
Global range : 1000000000000000 (61 bit)


[00:00:02] [CPU+GPU: 5.28 Mk/s] [GPU: 0.00 Mk/s] [C: 0.000000 %] [R: 0] [T: 10,706,944 (24 bit)] [F: 1]

BYE

SO maybe you entered wrong things/flags when trying to run the program.
hero member
Activity: 1736
Merit: 857

I have ~120 Ekeys/s in BSGS/keyhunt on AMD after used  the AOCC compiler  Grin


Is this the speed at which public keys are checked in the hash table (using a bloom filter?), or is this the real speed at which the processor generates public keys?
Quite a high speed even for BSGS on a video card
member
Activity: 499
Merit: 38


I've tried
AMD Ryzen 9 7950X - 4.5GHZ, 16c/32th + 128GB - 4ek on start, reaches 6ek after week


I have ~120 Ekeys/s in BSGS/keyhunt on AMD after used  the AOCC compiler  Grin


I've been tweaking linux for months to get every atom out of Dual CPU configuration.

#RT kernel
Code:
wget -qO - https://dl.xanmod.org/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/xanmod-archive-keyring.gpg
echo 'deb [signed-by=/usr/share/keyrings/xanmod-archive-keyring.gpg] http://deb.xanmod.org releases main' | sudo tee /etc/apt/sources.list.d/xanmod-release.list
sudo apt-get -y update && sudo apt install linux-xanmod-rt-x64v3

Code:
sudo apt install -y tuned tuned-utils tuned-utils-systemtap
sudo tuned-adm profile latency-performance

Code:
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
should be performance for all cores

/etc/default/grub
Code:
quiet msr.allow_writes=on nosoftlockup mce=ignore_ce skew_tick=1 clocksource=hpet iommu=soft noresume mitigations=off nmi_watchdog=0


Nvme config (It must be a good Heatsink on Nvme, otherwise it goes over 50 C/I have had white smoke from them more than once - SAMSUNG MZVL2512HCJQ.)

Code:
sudo nvme smart-log /dev/nvme0 | grep -i '^temperature'
temperature            : 42 C
Temperature Sensor 1           : 42 C
Temperature Sensor 2           : 50 C


/etc/fstab
Code:
ext4 noatime,nodiratime,errors=remount-ro,inode_readahead_blks=0 0 1

AMD EPYC config
Code:
wrmsr -a 0xc0011020 0x4400000000000
wrmsr -a 0xc0011021 0x4000000000040
wrmsr -a 0xc0011022 0x8680000401570000
wrmsr -a 0xc001102b 0x2040cc10

And so on and on....This is only part of it.

member
Activity: 499
Merit: 38

AVX512 and so on maybe can bring a 50% speed-up, nothing to write home about.

Benchmarked OpenSSL / Apple CommonCrypto and fast SHA with SSE3.2 intrinsics (last one was like 10% faster, probably because of inlining). I would bet that the CPUs that have hardware support for SHA instructions are actually used by the SHA routines available from the system APIs, and we wouldn't need to hack them ourself.

For AVX you'd actually need a distributed scheduling: https://github.com/minio/sha256-simd


I achieved a 20% performance increase in Keyhunt on Zen3 architecture compared to GCC versions 12, 13, and 14.
To compile with Clang, I used the AOCC compiler located at /opt/AMD/aocc-compiler-4.2.0/bin/clang.

However, it was essential to remove all Intel intrinsics (_builtin_ia32) from the code since these intrinsics are specific to Intel processors and incompatible with AMD processors.

In my case, I need to rewrite both the SHA and RIPEMD implementations for Zen3 to achieve a significant performance boost.

Imagine achieving a 70% performance increase!  Grin

Additionally, optimizing for Zen4 by leveraging its specific architectural features can lead to even greater efficiency gains.
member
Activity: 165
Merit: 26
Any sort of strategy is useless if you use either Python or ASM as long as any sort of higher-level op like SHA / RIPEMD is the actual bottleneck.

Nothing better (faster) and regularly updated is available than the following:

https://github.com/JayDDee/cpuminer-opt/tree/master/algo/ripemd (ripemd)
https://github.com/JayDDee/cpuminer-opt/tree/master/algo/sha (sha)

4-way, 8-way, avx2/avx512vl optimizations.

I don't see these implemented in the tools we use here; they are only used in the miner.

These existing ones have been deprecated.

Unfortunately, I don't have the time to address this myself.

Code:
    while (1) {
#if defined(USE_CUSTOM_SHA256)
        sha256_init(&s256ctx);
        sha256_update(&s256ctx, compressed_pubkey, 33);
        sha256_final(&s256ctx, sha256hash);
#else
#if defined(__APPLE__) && defined(USE_CC_SHA)
        CC_SHA256(compressed_pubkey, 33, sha256hash);
#else
        SHA256(compressed_pubkey, 33, sha256hash);
#endif
#endif

//        RIPEMD160(sha256hash, 32, rmd_hash);

        ++count;
        if (count % (1 << 26) == 0) {
            ticks = clock();
            speed = count * CLOCKS_PER_SEC / (ticks - start);
            printf("SHA hashes: %" PRIu64 " speed: %" PRIu64 " hashes/s\n", count, (uint64_t) speed);
        }
    }

SHA hashes: 134217728 speed: 7485947 hashes/s


Code:
 SHA256_Init(&shaCtx);
 while (1) {
       //...
       SHA256_Update(&shaCtx, compressed_pubkey, 33);

       ++count;
       if (count % (1 << 26) == 0) {
            ticks = clock();
            speed = count * CLOCKS_PER_SEC / (ticks - start);
            printf("Hashed bytes: %" PRIu64 " speed: %" PRIu64 " MB/s\n", count * 33, (uint64_t) (speed * 33) >> 20);
      }    
}

Hashed bytes: 22145925120 speed: 1712 MB/s


So 1.7 GB/s with your everyday SHA hasher is not bad, what's bad is that it's doing a single hash of a 22 GB message, not hundreds of millions of hashes of 33 bytes.
In our case, the hash context needs to be reinitialized for every public key we need to hash, so AVX512 and so on maybe can bring a 50% speed-up, nothing to write home about.

Benchmarked OpenSSL / Apple CommonCrypto and fast SHA with SSE3.2 intrinsics (last one was like 10% faster, probably because of inlining). I would bet that the CPUs that have hardware support for SHA instructions are actually used by the SHA routines available from the system APIs, and we wouldn't need to hack them ourself.

For AVX you'd actually need a distributed scheduling: https://github.com/minio/sha256-simd
newbie
Activity: 15
Merit: 0

are you key hunt creator?
nice too meet you


Hi, Yes I developed the CPU version all other are copies, if you have some doubt about it please use the next topic: https://bitcointalksearch.org/topic/keyhunt-development-requests-bug-reports-5322040

Of course it is, and you are the main programmer and developer
nice to meet you
I have also written various programs for this work, soon I will post a complete version on GitHub

Do you know how to search randomly in KeyhuntCuda version 2   ?

Version 1 with -r key and the value of key space was sequential with random
Like
KeyHunt-Cuda.exe -t 0 -g --gpui 0 --gpux 24,256 -m address --coin BTC --range 20000000000000000:40000000000000000 13zb1hQbWVsc2S7ZTZnP2G4undNNpdh5so -r 2000
member
Activity: 499
Merit: 38
Any sort of strategy is useless if you use either Python or ASM as long as any sort of higher-level op like SHA / RIPEMD is the actual bottleneck.

Nothing better (faster) and regularly updated is available than the following:

https://github.com/JayDDee/cpuminer-opt/tree/master/algo/ripemd (ripemd)
https://github.com/JayDDee/cpuminer-opt/tree/master/algo/sha (sha)

4-way, 8-way, avx2/avx512vl optimizations.

I don't see these implemented in the tools we use here; they are only used in the miner.

These existing ones have been deprecated.

Unfortunately, I don't have the time to address this myself.
hero member
Activity: 862
Merit: 662

are you key hunt creator?
nice too meet you


Hi, Yes I developed the CPU version all other are copies, if you have some doubt about it please use the next topic: https://bitcointalksearch.org/topic/keyhunt-development-requests-bug-reports-5322040
newbie
Activity: 15
Merit: 0
keyhunt not good

you should learn to configure it, defaul thread subrange is 32 bits, for small ranges with multiple threads you should lower the N value  with "-n number" if the range is less than 1 Million keys you should use -n 0x10000
where 0x10000 its a 16 bits subrange per thread

Look

https://talkimg.com/images/2024/05/21/1iVIH.png

Found i less than a second

are you key hunt creator?
nice too meet you
It was my mistake, I apologize, dear friend
ok , thank you
I meant https://github.com/WanderingPhilosopher/KeyHuntCudaClient
i use of first version keyhunt cuda
newbie
Activity: 24
Merit: 0
After talk with a mathematician friend,I was clearly not wrong.... That's a relief

And I could reduce to 1/10 of the pool... what is great but not enough tho because the error bar still in 15 orders of magnitude Roll Eyes
Good but not enough for me


Searching a little more is actually not far fetch my assumptions not my techniques.
Some papers already did something close but I need to think more and have dedicated pc for that to neutralize the noise.

Interesting how reminds me my time in spectroscopy Lab
https://imgur.com/a/NTlgYpp

https://imgur.com/a/NTlgYpp

And that's pretty cool!
hero member
Activity: 862
Merit: 662
keyhunt not good

you should learn to configure it, defaul thread subrange is 32 bits, for small ranges with multiple threads you should lower the N value  with "-n number" if the range is less than 1 Million keys you should use -n 0x10000
where 0x10000 its a 16 bits subrange per thread

Look



Found i less than a second
newbie
Activity: 12
Merit: 1
keyhunt not good

i get a test with key hunt
..........................................



KEYHUNT in BSGS mode work well

member
Activity: 165
Merit: 26

Is it non-pointless even if writing it in assembler?

Therefore, whatever assembler optimizations and algorithmic tricks you can imagine to push the boundaries, they do not overcome the inherent speed limits imposed by modular inversion and hashing.

Double SHA-256 hashing is the ultimate bottleneck  Grin
That was my point, read carefully the question Smiley

Any sort of strategy is useless if you use either Python or ASM as long as any sort of higher-level op like SHA / RIPEMD is the actual bottleneck.

But if we only operate on EC then it's another story, this is where kangaroo / rho / bsgs comes into play, and we can start optimizing the operations that are the bottleneck at THAT level. It's one thing to do a few thousand mul/s in Python and another to do batched additions (with a single inversion) on a GPU, to reach giga adds/second. Millions of times faster for identical results. And ofcourse it's still not enough even at that level, so...

Something to think about on 256-bit modular math:
- a single inversion is ~60 times slower than a multiplication (GCD algo, not a simple operation at all)
- a multiplication is ~2 times slower than a squaring
- a squaring is ~3 times slower than an addition, ~2 slower than a normalization

A simple point addition requires one inversion, 2 multiplications, a squaring, 2 normalizations, and 6 additions.
A simple point multiplication requires a bunch of point additions (in truth, it's somewhat faster bcz of Jacobian shortcut, but still a lot of multiplications).

So stuff like doing point-scalar multiplication is a bad joke, compared to the problem of optimizing the primitives Smiley
member
Activity: 499
Merit: 38

Is it non-pointless even if writing it in assembler?


The necessary double SHA-256 hashing in Bitcoin's address generation process is substantially slower than EC operations, ensuring that even perfect optimization of point additions will not eliminate the hashing bottleneck.

Therefore, whatever assembler optimizations and algorithmic tricks you can imagine to push the boundaries, they do not overcome the inherent speed limits imposed by modular inversion and hashing.

Double SHA-256 hashing is the ultimate bottleneck  Grin
newbie
Activity: 15
Merit: 0


i have a simple alghoritm with cpu in python , you can test it


Code:

import bitcoin
import ecdsa



def private_key_to_public_key(private_key):
    sk = ecdsa.SigningKey.from_string(bytes.fromhex(private_key), curve=ecdsa.SECP256k1)
    vk = sk.get_verifying_key()
    compressed_public_key = vk.to_string("compressed").hex()
    return compressed_public_key



        bitcoin_address = bitcoin.pubtoaddr(public_key)




Why use Bitcoin and ECDSA imports? They're so slow, it feels like a waste of time.

Instead, utilize ICE (import secp256k1 as ice) for this function and the Bitcoin address line:


def private_key_to_public_key(private_key):
    priv_int = int(private_key, 16)
    return ice.scalar_multiplication(priv_int)

and

bitcoin_address = ice.pubkey_to_address(0, True, public_key)


It's approximately 10 times faster than ECDSA. But even that is miserable if you attack the dinosaur numbers.

The more you delve into Python, the more apparent it becomes that searching for Puzzle 66 through it is pointless.

Perhaps someone knowingly obscures things by selling Python scripts as the ultimate solution.  Grin






In fact, what I meant by this comment was the problem of Key Hunt and I wrote different versions of these programs. And compared to Key Hunt, I put this code so that you can test and see the speed of a simple program with a  CPUs and a powerful program with GPU that does not work properly.

In general, I am very happy that you paid attention to this code and took your time, and I thank you. And also ice's idea was very good Wink Wink
member
Activity: 499
Merit: 38
It takes a thousand GPUs to make something serious. Everything else is just kidding

Why GPU? BSGS works on CPU.

btw, what's more important RAM or CPU?

I've tried
AMD Ryzen 9 7950X - 4.5GHZ, 16c/32th + 128GB - 4ek on start, reaches 6ek after week
Intel Core i7-11700K 3.6GHZ, 8c/16th + 128GB - only 2ek, no growth
Intel Core i7-11700K 3.6GHZ, 8c/16th + 64GB - around 1ek, no growth

E5-2670v3 2.3 GHz 12c/24th + 256GB - 7ek on start

Why RAM ?

How much storage is required for a hash table for a search space of Puzzle 130?

Each entry in the hash table will contain a key (of 256 bits) and a pointer (of 80 bits) to the corresponding data.
So, each entry requires 256 bits + 80 bits = 336 bits.

Total bits required for hash table = 336 bits per entry * 2^130 entries
Total bits = 336 * 2^130

To convert bits to petabytes:
1 petabyte (PB) = 8 * 10^15 bits

So, total petabytes (PB) = (336 * 2^130) / (8 * 10^15)

Total petabytes (PB) ≈ 42 PB RAM

It is not possible to get such amount of ram in the next 80 years.
member
Activity: 165
Merit: 26
Why use Bitcoin and ECDSA imports? They're so slow, it feels like a waste of time.

Instead, utilize ICE (import secp256k1 as ice) for this function and the Bitcoin address line:

def private_key_to_public_key(private_key):
    priv_int = int(private_key, 16)
    return ice.scalar_multiplication(priv_int)

and

bitcoin_address = ice.pubkey_to_address(0, True, public_key)


It's approximately 10 times faster than ECDSA. But even that is miserable if you attack the dinosaur numbers.

The more you delve into Python, the more apparent it becomes that searching for Puzzle 66 through it is pointless.
Is it non-pointless even if writing it in assembler? Some dick size updates: 11 million point additions/s (affine coords). There is a trick that even JLP's VanitySearch is missing which allows for some optimization on batch additions (8.5 Mkeys/s with that one). It's all about the branch processing.

Quote
https://github.com/iceland2k14/secp256k1

On my old Laptop with i7 4810 MQ CPU
With 3500000 continuous keys in 1 group call, we get 3.5 Miilion Key/s Speed with 1 cpu:
This is very misleading. We only get the result key, not 3.500.000, simply because it does 3500000 Jacobian additions and a single final affine conversion. I don't even need to disassemble the DLL to be 100% sure about this, because the fastest known algorithm to do an modular inversion on 256-bit numbers requires 4000 CPU cycles on my i9 13th gen CPU, which means it can never ever do more than around 1 million inversions per second. But Jacobian point additions? 8 million. But those intermediary points are useless since they do not have any invariant characteristic unless you actually reduce the fractions it holds (so, the expensive mod inverse).

Anyway, all this is completely irrelevant for puzzle 66, even if magically we can do an infinite amount of additions / second, it has zero impact on the speed, because all the hashing required is many times slower than any EC operation.
member
Activity: 499
Merit: 38


i have a simple alghoritm with cpu in python , you can test it


Code:

import bitcoin
import ecdsa



def private_key_to_public_key(private_key):
    sk = ecdsa.SigningKey.from_string(bytes.fromhex(private_key), curve=ecdsa.SECP256k1)
    vk = sk.get_verifying_key()
    compressed_public_key = vk.to_string("compressed").hex()
    return compressed_public_key



        bitcoin_address = bitcoin.pubtoaddr(public_key)




Why use Bitcoin and ECDSA imports? They're so slow, it feels like a waste of time.

Instead, utilize ICE (import secp256k1 as ice) for this function and the Bitcoin address line:


def private_key_to_public_key(private_key):
    priv_int = int(private_key, 16)
    return ice.scalar_multiplication(priv_int)

and

bitcoin_address = ice.pubkey_to_address(0, True, public_key)


It's approximately 10 times faster than ECDSA. But even that is miserable if you attack the dinosaur numbers.

The more you delve into Python, the more apparent it becomes that searching for Puzzle 66 through it is pointless.

Perhaps someone knowingly obscures things by selling Python scripts as the ultimate solution.  Grin

newbie
Activity: 15
Merit: 0
keyhunt not good

i get a test with key hunt


you can test it.  key hunt speed in my pc is  60 Mk/s


i search for  this addrrss   1E5V4LbVrTbFrfj7VN876DamzkaNiGAvFo
this privatekey is  200000000000f4240

search start of 20000000000000000


this private key is 1,000,000 key of start

in fact keyhunt must be find in 2 sec , but fin at 100 Sec

In the Cuda programming, an algorithm is executed in parallel, and in fact, this search operation in each core performs a similar task in parallel, and the cores do not help each other's process to increase the search speed, but each of them works as an island exactly the same way. They do something else and that's why this program has problems

i have a simple alghoritm with cpu in python , you can test it


Code:
# -*- coding: utf-8 -*-

"""
Created on Thu Mar 14 17:50:54 2024

1000 Bitcoin Puzzle Scanner for 2^66 ~ 2^67

@author: Amin Solhi , Contacts =>  email: [email protected] , +9891111842779
"""
import bitcoin
import ecdsa
import secrets
from timeit import default_timer as timer   
import datetime

global target_address
global output_file
global rng
global private_key
global ks
global start
global random_mode


target_address = "13zb1hQbWVsc2S7ZTZnP2G4undNNpdh5so"
output_file = "data.txt"
rng = 20 # int(input("Enter Random Space Number 1 ~ 100 :"))
private_key="20000000000000000"
ks=0
start = timer()
random_mod = True # True or False

print ("\nBTCGEN Bitcoin Puzzle #66 Scanner \n")
print ("BTC Address : ",target_address)
print ("OutPut File : ",output_file)
print ("Randome Mod : ",f"{str(random_mod)}")
if (random_mod):
    print ("Random Key  : ",f'per {rng}K key')
print ("Device      :  CPU")
print ("Global Start: ",private_key)
print ("Global END  :  40000000000000000")

print('\n')



def remove_zeros(input_string):
    result = ""
    zero = ""
    for char in input_string:
        if char != "0":
            zero = "finish"
        if zero =="finish" :
            result += char
    return result


t=""
t +="0"*47

def h(a):
    #a = a[:1] + '0' + a[1:]   
    if (len(a) < 64):
        #a = a[:1] + '0' + a[1:]
        a = '0' + a[:]
        if (len(a) < 64):
           a = h(a)           
    return a 

def generate_random_priv():
    p=str(secrets.choice(range(2, 4)))
    return (p+secrets.token_hex(8))


def generate_private_key(num_hex):
    num_decimal = int(num_hex, 16)
    num_decimal += 1
    num_hex = h(str(f'{num_decimal:x}'))
    return (num_hex)

def private_key_to_public_key(private_key):
    sk = ecdsa.SigningKey.from_string(bytes.fromhex(private_key), curve=ecdsa.SECP256k1)
    vk = sk.get_verifying_key()
    compressed_public_key = vk.to_string("compressed").hex()
    return compressed_public_key

def onmain():
    #start = timer()
    global private_key
    global rng
    global ks
    global start
    global target_address
    global i
    for _ in range(rng*1000):#while True:
        ks+=1
        private_key = generate_private_key(private_key)#secrets.randbelow(32))  # Generate a random private key
        public_key = private_key_to_public_key(private_key)  # Convert private key to compressed public key       
        #print (private_key)
        #print(i,"--- ",timer()-start,"seconds ---" ,)
        # Generate Bitcoin address from public key
        bitcoin_address = bitcoin.pubtoaddr(public_key)
        if (bitcoin_address == target_address):
            f = open(output_file, "a")
            f.write('\nprivate key int: '
                    + private_key
                    +'\nBitcoin address: '
                    + bitcoin_address+'\n_________\n')
            f.close()
                   
            print(f"\nFound matching Bitcoin address for private key: {private_key}")
            input("")
    print(f"\r[Total : {(ks/1000000)} Mk/{int(timer()-start)}s] [Private key hex: {(remove_zeros(private_key))}]  ", end="")     


def onmain_random():
    #start = timer()
    global private_key
    global rng
    global ks
    global start
    global target_address
    global i
    global random_str
    for _ in range(rng*1000):#while True:
        ks+=1
        private_key = generate_private_key(private_key)#secrets.randbelow(32))  # Generate a random private key
        public_key = private_key_to_public_key(private_key)  # Convert private key to compressed public key       
        #print (private_key)
        #print(i,"--- ",timer()-start,"seconds ---" ,)
        # Generate Bitcoin address from public key
        bitcoin_address = bitcoin.pubtoaddr(public_key)
        if (bitcoin_address == target_address):
            f = open(output_file, "a")
            f.write('\nprivate key int: '
                    + private_key
                    +'\nBitcoin address: '
                    + bitcoin_address+'\n_________\n')
            f.close()
                   
            print(f"\nFound matching Bitcoin address for private key: {private_key}")
            input("")
    print(f"\r[{str(datetime.timedelta(seconds=int(timer()-start)))}] [Total : {(ks/1000000)} Mk] [R: {i}] [Private key hex: {(remove_zeros(private_key))}]  ", end="")     

def main():
    global private_key
    global i
    global random_str
    random_str =""
    i=0
    if (random_mod):
        while True:
            i+=1
            private_key=generate_random_priv()
            onmain_random()
    else:
        while True:
            i+=1
            onmain()
       

if __name__ == "__main__":
    main()
   
newbie
Activity: 12
Merit: 1
for keyhunt
both, cpu and ram
Pages:
Jump to: