Pages:
Author

Topic: Bitcoin puzzle transaction ~32 BTC prize to who solves it - page 3. (Read 189831 times)

newbie
Activity: 9
Merit: 0
keyhunt not good

you should learn to configure it, defaul thread subrange is 32 bits, for small ranges with multiple threads you should lower the N value  with "-n number" if the range is less than 1 Million keys you should use -n 0x10000
where 0x10000 its a 16 bits subrange per thread

Look

https://talkimg.com/images/2024/05/21/1iVIH.png

Found i less than a second

are you key hunt creator?
nice too meet you
It was my mistake, I apologize, dear friend
ok , thank you
I meant https://github.com/WanderingPhilosopher/KeyHuntCudaClient
i use of first version keyhunt cuda

Zero clues of how or what you ran, but with a single CPU core, using keyhunt-cuda, it is found pretty much as the program starts:

Code:
KeyHunt-Cuda v1.08

COMP MODE    : COMPRESSED
COIN TYPE    : BITCOIN
SEARCH MODE  : Single Address
DEVICE       : CPU
CPU THREAD   : 1
SSE          : YES
RKEY         : 0 Mkeys
MAX FOUND    : 65536
BTC ADDRESS  : 1E5V4LbVrTbFrfj7VN876DamzkaNiGAvFo
OUTPUT FILE  : Found.txt

Start Time   : Sat May 25 10:11:49 2024
Global start : 20000000000000000 (66 bit)
Global end   : 21000000000000000 (66 bit)
Global range : 1000000000000000 (61 bit)


[00:00:02] [CPU+GPU: 5.28 Mk/s] [GPU: 0.00 Mk/s] [C: 0.000000 %] [R: 0] [T: 10,706,944 (24 bit)] [F: 1]

BYE

SO maybe you entered wrong things/flags when trying to run the program.




this is 1 milion key after start 200000000000f4240
and this is p2ph compressed public key  :  1E5V4LbVrTbFrfj7VN876DamzkaNiGAvFo

KeyHunt-Cuda.exe -t 0 -g --gpui 0 --gpux 24,256 -m address --coin BTC --range 20000000000000000:40000000000000000 1E5V4LbVrTbFrfj7VN876DamzkaNiGAvFo


my keyhunt cuda speed is 60 Mk/s . In fact, Cuda should find it in less than 1 second, but it takes about 1:40 minute

KeyHunt-Cuda.exe -t 0 -g --gpui 0 --gpux 24,256 -m address --coin BTC --range 20000000000000000:40000000000000000 1E5V4LbVrTbFrfj7VN876DamzkaNiGAvFo

KeyHunt-Cuda v1.07

COMP MODE    : COMPRESSED
COIN TYPE    : BITCOIN
SEARCH MODE  : Single Address
DEVICE       : GPU
CPU THREAD   : 0
GPU IDS      : 0
GPU GRIDSIZE : 24x256
SSE          : YES
RKEY         : 0 Mkeys
MAX FOUND    : 65536
BTC ADDRESS  : 1E5V4LbVrTbFrfj7VN876DamzkaNiGAvFo
OUTPUT FILE  : Found.txt

Start Time   : Sun May 26 14:04:50 2024
Global start : 20000000000000000 (66 bit)
Global end   : 40000000000000000 (67 bit)
Global range : 20000000000000000 (66 bit)

GPU          : GPU #0 Quadro P1000 (4x128 cores) Grid(24x256)

[00:01:36] [CPU+GPU: 62.21 Mk/s] [GPU: 62.21 Mk/s] [C: 0.000000 %] [R: 0] [T: 5,976,883,200 (33 bit)] [F: 0]

The problem is that Cuda performs the same task in all graphics cores in parallel and repetitively, and instead of the cores each helping the program, each of them repeats the same task as an island, for example, each core in parallel from the same start. He does and moves forward. And maybe in the best case, it divides the entire collection into the number of cores and each core starts working in that interval, in this case, once you exit the program, all your efforts will end in an unknown place, which will be used the next time you run the program. You don't know where to start

Of course, it seems that you are using a higher version of this software  v1.08. But anyway, I have to tell the truth. You have written a very good software. I really enjoyed your code ideas.
newbie
Activity: 12
Merit: 0
hi guys! who can help me to develop rust coding language?! (don't judge my statement and result, since i just new for learning this puzzle and trying new stuff)

i run this like 4-5 hours straight, i use pool keyspace and multi thread, the search only use CPU, let's find out can develop it with GPU power.

it's sequential and not random.

the thread is dividing key ranges and make the 1,5 bytes changed from left padding following how many thread are used.

  • Puzzle search
  • Script started at: 2024-05-25 20:56:03.700485 +07:00
  • from:0x23000000000000000 to:0x3ffffffffffffffff
  • target:13zb1hQbWVsc2S7ZTZnP2G4undNNpdh5so
0000000000000000000000000000000000000000000000035200000002cc179e| 13zbaq4dADzLenZB6VdSdmAaqV8TABkHEf

average current speed is 15.000 Key/sec for each thread.

with some GPU power, i think my codes (?) can solve that in couple months.
my codes still make some bug, the address display is misleading the corresponding of private key, but it's still same prefix.
all of that still can search the matched, but the displaying of progress is not correct.
🙏🏻

example of pre-run.

000000000000000000000000000000000000000000000003c600000000002b0f  | 13zbpbczKXAcx9q4NvTuEa5szGu3 000000000000000000000000000000000000000000000003520000000000a039 | 13zbNacfa9Nrz8YttQbzLVqT7y4w 000000000000000000000000000000000000000000000003c60000000000d456 | 13zbFwGk1JussoSK3gax39256VBu 000000000000000000000000000000000000000000000002de000000000103a3 | 13zbt3NfsVUjJu8FQWxFfxdePAFP 0000000000000000000000000000000000000000000000026a00000000010bd3 | 13zbix1E22g4FRk5hUcoMV1Hhvpd 000000000000000000000000000000000000000000000002de0000000001333e | 13zbDQnUKS1N8AqmynykC1QYNgwc 000000000000000000000000000000000000000000000002de00000000013e02 | 13zb7xLuqjuPKdbGt7rshDEjdprK 00000000000000000000000000000000000000000000000352000000000168ae | 13zbm8dR61ur1GqnQEmjrRsM5knP 000000000000000000000000000000000000000000000002300000000001923c | 13zbpdWGSwZpHK2zyR2o5XqgfXdv 0000000000000000000000000000000000000000000000026a0000000001c928 | 13zbp7u1bEeK94RCMtsvuNtqBrYo 0000000000000000000000000000000000000000000000038c0000000001e94a | 13zb56DAy494Drdngk2YchYRNfvF 0000000000000000000000000000000000000000000000023000000000023728 | 13zbFaEpeoY7Hrx6sTLdM3ZvvUW5 0000000000000000000000000000000000000000000000031800000000023ade | 13zbtT9na4bJdYoexnZ6YSt6KDHp
I believe Keyhuntcuda2 from siupune has this functioning on the gpu
jr. member
Activity: 40
Merit: 6

I have ~120 Ekeys/s in BSGS/keyhunt on AMD after used  the AOCC compiler  Grin


Is this the speed at which public keys are checked in the hash table (using a bloom filter?), or is this the real speed at which the processor generates public keys?
Quite a high speed even for BSGS on a video card

120 exakeys = 120.000.000.000 gigakeys

Pretending some 4Ghz CPU generates, sequentially, one key per cycle (it doesn't, more like one key over 300 cycles on average, and that's with all possible optimizations) it would still take 30.000.000.000 CPU cores to reach that speed.

Or (300 cycles/key): 9.000.000.000.000 cores that are running all at 100% with no OS, nothing else running, all working at full speed doing nothing except crunching numbers inside the CPU registers.

I think maybe that speed reflects space coverage rather than operating time, and space coverage speed is logarithmic not linear.
jr. member
Activity: 37
Merit: 1
hi guys! who can help me to develop rust coding language?! (don't judge my statement and result, since i just new for learning this puzzle and trying new stuff)

i run this like 4-5 hours straight, i use pool keyspace and multi thread, the search only use CPU, let's find out can develop it with GPU power.

it's sequential and not random.

the thread is dividing key ranges and make the 1,5 bytes changed from left padding following how many thread are used.

  • Puzzle search
  • Script started at: 2024-05-25 20:56:03.700485 +07:00
  • from:0x23000000000000000 to:0x3ffffffffffffffff
  • target:13zb1hQbWVsc2S7ZTZnP2G4undNNpdh5so
0000000000000000000000000000000000000000000000035200000002cc179e| 13zbaq4dADzLenZB6VdSdmAaqV8TABkHEf

average current speed is 15.000 Key/sec for each thread.

with some GPU power, i think my codes (?) can solve that in couple months.
my codes still make some bug, the address display is misleading the corresponding of private key, but it's still same prefix.
all of that still can search the matched, but the displaying of progress is not correct.
🙏🏻

example of pre-run.

000000000000000000000000000000000000000000000003c600000000002b0f  | 13zbpbczKXAcx9q4NvTuEa5szGu3 000000000000000000000000000000000000000000000003520000000000a039 | 13zbNacfa9Nrz8YttQbzLVqT7y4w 000000000000000000000000000000000000000000000003c60000000000d456 | 13zbFwGk1JussoSK3gax39256VBu 000000000000000000000000000000000000000000000002de000000000103a3 | 13zbt3NfsVUjJu8FQWxFfxdePAFP 0000000000000000000000000000000000000000000000026a00000000010bd3 | 13zbix1E22g4FRk5hUcoMV1Hhvpd 000000000000000000000000000000000000000000000002de0000000001333e | 13zbDQnUKS1N8AqmynykC1QYNgwc 000000000000000000000000000000000000000000000002de00000000013e02 | 13zb7xLuqjuPKdbGt7rshDEjdprK 00000000000000000000000000000000000000000000000352000000000168ae | 13zbm8dR61ur1GqnQEmjrRsM5knP 000000000000000000000000000000000000000000000002300000000001923c | 13zbpdWGSwZpHK2zyR2o5XqgfXdv 0000000000000000000000000000000000000000000000026a0000000001c928 | 13zbp7u1bEeK94RCMtsvuNtqBrYo 0000000000000000000000000000000000000000000000038c0000000001e94a | 13zb56DAy494Drdngk2YchYRNfvF 0000000000000000000000000000000000000000000000023000000000023728 | 13zbFaEpeoY7Hrx6sTLdM3ZvvUW5 0000000000000000000000000000000000000000000000031800000000023ade | 13zbtT9na4bJdYoexnZ6YSt6KDHp


my search techniques uses this
2^31 and 2^36
(36 bit)
MacBook-Pro:Desktop tepan$ python3 subs.py
Enter the range x: 31  
Enter the range y: 36

> 0x91000000 <

let's hunt.

    let begin: u128 = 0x910000000;
    let end: u128 = 0x9ffffffff;

const TARGET: &str = "1Be2UF9NLfyLFbtm3TCbmuocc9N1Kduci1";


result :

  • Puzzle search
  • Script started at: 2024-05-26 02:25:21.816449 +07:00
  • from:0x910000000 to:0x9ffffffff
  • target:1Be2UF9NLfyLFbtm3TCbmuocc9N1Kduci1

 00000000000000000000000000000000000000000000000000000009f7a02888| 1Be2JoBBasXePHijh1AD1FK5Wsui 00000000000000000000000000000000000000000000000000000009f3702fc4| 1Be2m52dDvKNf2wrvbvSoh4V4t8Q 00000000000000000000000000000000000000000000000000000009e2b0702a| 1Be2DebJWkfJWXb8cszjA8mN6fvh 00000000000000000000000000000000000000000000000000000009e6e07a23| 1Be2wkS3jkyDGWnVpiNXVzUYdTtY 00000000000000000000000000000000000000000000000000000009eb10e589| 1Be2f5NaU8Qv35nPzJk4S6Jntzvs 00000000000000000000000000000000000000000000000000000009fbd0fcef| 1Be2uQTHQC5V9QPTqSkHvGNpFA5H 00000000000000000000000000000000000000000000000000000009e2b117d2| 1Be2nTuUQDYvYgyHPfE6S4hBTVNc 00000000000000000000000000000000000000000000000000000009eb119a5e| 1Be2ncy8U6oS9UzUGQU13CW7prqo 00000000000000000000000000000000000000000000000000000009f371a027| 1Be2tHcpUiDPVcqdvu2v2KmtA1m9 00000000000000000000000000000000000000000000000000000009e6e1b74a| 1Be22Wi2usp5z2J1wRvfovS8pD7R 00000000000000000000000000000000000000000000000000000009fbd1be53| 1Be2eU7ktk7abQsFRHgVP1qmmpZW 00000000000000000000000000000000000000000000000000000009f371d249| 1Be2zLYgYPdSjL9cog8kCsWUVjax 00000000000000000000000000000000000000000000000000000009f7a1e133| 1Be2xv3Q6XtLnBAz4SmbqbaSkhs8 00000000000000000000000000000000000000000000000000000009de820a7c| 1Be2UF9NLfyLFbtm3TCbmuocc9N1 00000000000000000000000000000000000000000000000000000009e2b20b15|
  • --------------------------------------------------------------------------------
  • KEY FOUND! 2024-05-26 02:29:44.164813 +07:00
  • private key (WIF): KwDiBf89QgGbjEhKnhXJuH7LrciVrZi3qYjgd9Mg1Upu7eJAtiDr
  • private key (hex): 00000000000000000000000000000000000000000000000000000009de820a7c
  • public key: 02b3e772216695845fa9dda419fb5daca28154d8aa59ea302f05e916635e47b9f6
  • address: 1Be2UF9NLfyLFbtm3TCbmuocc9N1Kduci1
  • --------------------------------------------------------------------------------
  • Search completed Smiley


it's like 3 minutes for search by address.



look at this, example of use.

34   2^33...2^34-1   200000000...3ffffffff | x : 29 y: 34 - > > 0x330000000 < [ 0x330000000:0x3ffffffff ] [found on 0x34a65911d ]

38   2^37...2^38-1   2000000000...3fffffffff | x : 33 y: 38 - > > 0x2100000000 < [0x2100000000:0x2ffffffffff] [found on 0x22382facd0 ]

.. etc
i make this premature codes for saving more times with lowkey cpu speed for search, but work until 79 bit, rest of that, the output result for keyspace is not correct.
sorry i cant provide the real codes, because someone can brute that easily with his huge mining GPU farm Smiley

jr. member
Activity: 37
Merit: 1
hi guys! who can help me to develop rust coding language?! (don't judge my statement and result, since i just new for learning this puzzle and trying new stuff)

i run this like 4-5 hours straight, i use pool keyspace and multi thread, the search only use CPU, let's find out can develop it with GPU power.

it's sequential and not random.

the thread is dividing key ranges and make the 1,5 bytes changed from left padding following how many thread are used.

  • Puzzle search
  • Script started at: 2024-05-25 20:56:03.700485 +07:00
  • from:0x23000000000000000 to:0x3ffffffffffffffff
  • target:13zb1hQbWVsc2S7ZTZnP2G4undNNpdh5so
0000000000000000000000000000000000000000000000035200000002cc179e| 13zbaq4dADzLenZB6VdSdmAaqV8TABkHEf

average current speed is 15.000 Key/sec for each thread.

with some GPU power, i think my codes (?) can solve that in couple months.
my codes still make some bug, the address display is misleading the corresponding of private key, but it's still same prefix.
all of that still can search the matched, but the displaying of progress is not correct.
🙏🏻

example of pre-run.

000000000000000000000000000000000000000000000003c600000000002b0f  | 13zbpbczKXAcx9q4NvTuEa5szGu3 000000000000000000000000000000000000000000000003520000000000a039 | 13zbNacfa9Nrz8YttQbzLVqT7y4w 000000000000000000000000000000000000000000000003c60000000000d456 | 13zbFwGk1JussoSK3gax39256VBu 000000000000000000000000000000000000000000000002de000000000103a3 | 13zbt3NfsVUjJu8FQWxFfxdePAFP 0000000000000000000000000000000000000000000000026a00000000010bd3 | 13zbix1E22g4FRk5hUcoMV1Hhvpd 000000000000000000000000000000000000000000000002de0000000001333e | 13zbDQnUKS1N8AqmynykC1QYNgwc 000000000000000000000000000000000000000000000002de00000000013e02 | 13zb7xLuqjuPKdbGt7rshDEjdprK 00000000000000000000000000000000000000000000000352000000000168ae | 13zbm8dR61ur1GqnQEmjrRsM5knP 000000000000000000000000000000000000000000000002300000000001923c | 13zbpdWGSwZpHK2zyR2o5XqgfXdv 0000000000000000000000000000000000000000000000026a0000000001c928 | 13zbp7u1bEeK94RCMtsvuNtqBrYo 0000000000000000000000000000000000000000000000038c0000000001e94a | 13zb56DAy494Drdngk2YchYRNfvF 0000000000000000000000000000000000000000000000023000000000023728 | 13zbFaEpeoY7Hrx6sTLdM3ZvvUW5 0000000000000000000000000000000000000000000000031800000000023ade | 13zbtT9na4bJdYoexnZ6YSt6KDHp
full member
Activity: 1050
Merit: 219
Shooters Shoot...
keyhunt not good

you should learn to configure it, defaul thread subrange is 32 bits, for small ranges with multiple threads you should lower the N value  with "-n number" if the range is less than 1 Million keys you should use -n 0x10000
where 0x10000 its a 16 bits subrange per thread

Look



Found i less than a second

are you key hunt creator?
nice too meet you
It was my mistake, I apologize, dear friend
ok , thank you
I meant https://github.com/WanderingPhilosopher/KeyHuntCudaClient
i use of first version keyhunt cuda

Zero clues of how or what you ran, but with a single CPU core, using keyhunt-cuda, it is found pretty much as the program starts:

Code:
KeyHunt-Cuda v1.08

COMP MODE    : COMPRESSED
COIN TYPE    : BITCOIN
SEARCH MODE  : Single Address
DEVICE       : CPU
CPU THREAD   : 1
SSE          : YES
RKEY         : 0 Mkeys
MAX FOUND    : 65536
BTC ADDRESS  : 1E5V4LbVrTbFrfj7VN876DamzkaNiGAvFo
OUTPUT FILE  : Found.txt

Start Time   : Sat May 25 10:11:49 2024
Global start : 20000000000000000 (66 bit)
Global end   : 21000000000000000 (66 bit)
Global range : 1000000000000000 (61 bit)


[00:00:02] [CPU+GPU: 5.28 Mk/s] [GPU: 0.00 Mk/s] [C: 0.000000 %] [R: 0] [T: 10,706,944 (24 bit)] [F: 1]

BYE

SO maybe you entered wrong things/flags when trying to run the program.
hero member
Activity: 1736
Merit: 857

I have ~120 Ekeys/s in BSGS/keyhunt on AMD after used  the AOCC compiler  Grin


Is this the speed at which public keys are checked in the hash table (using a bloom filter?), or is this the real speed at which the processor generates public keys?
Quite a high speed even for BSGS on a video card
member
Activity: 286
Merit: 15


I've tried
AMD Ryzen 9 7950X - 4.5GHZ, 16c/32th + 128GB - 4ek on start, reaches 6ek after week


I have ~120 Ekeys/s in BSGS/keyhunt on AMD after used  the AOCC compiler  Grin


I've been tweaking linux for months to get every atom out of Dual CPU configuration.

#RT kernel
Code:
wget -qO - https://dl.xanmod.org/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/xanmod-archive-keyring.gpg
echo 'deb [signed-by=/usr/share/keyrings/xanmod-archive-keyring.gpg] http://deb.xanmod.org releases main' | sudo tee /etc/apt/sources.list.d/xanmod-release.list
sudo apt-get -y update && sudo apt install linux-xanmod-rt-x64v3

Code:
sudo apt install -y tuned tuned-utils tuned-utils-systemtap
sudo tuned-adm profile latency-performance

Code:
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
should be performance for all cores

/etc/default/grub
Code:
quiet msr.allow_writes=on nosoftlockup mce=ignore_ce skew_tick=1 clocksource=hpet iommu=soft noresume mitigations=off nmi_watchdog=0


Nvme config (It must be a good Heatsink on Nvme, otherwise it goes over 50 C/I have had white smoke from them more than once - SAMSUNG MZVL2512HCJQ.)

Code:
sudo nvme smart-log /dev/nvme0 | grep -i '^temperature'
temperature            : 42 C
Temperature Sensor 1           : 42 C
Temperature Sensor 2           : 50 C


/etc/fstab
Code:
ext4 noatime,nodiratime,errors=remount-ro,inode_readahead_blks=0 0 1

AMD EPYC config
Code:
wrmsr -a 0xc0011020 0x4400000000000
wrmsr -a 0xc0011021 0x4000000000040
wrmsr -a 0xc0011022 0x8680000401570000
wrmsr -a 0xc001102b 0x2040cc10

And so on and on....This is only part of it.

member
Activity: 286
Merit: 15

AVX512 and so on maybe can bring a 50% speed-up, nothing to write home about.

Benchmarked OpenSSL / Apple CommonCrypto and fast SHA with SSE3.2 intrinsics (last one was like 10% faster, probably because of inlining). I would bet that the CPUs that have hardware support for SHA instructions are actually used by the SHA routines available from the system APIs, and we wouldn't need to hack them ourself.

For AVX you'd actually need a distributed scheduling: https://github.com/minio/sha256-simd


I achieved a 20% performance increase in Keyhunt on Zen3 architecture compared to GCC versions 12, 13, and 14.
To compile with Clang, I used the AOCC compiler located at /opt/AMD/aocc-compiler-4.2.0/bin/clang.

However, it was essential to remove all Intel intrinsics (_builtin_ia32) from the code since these intrinsics are specific to Intel processors and incompatible with AMD processors.

In my case, I need to rewrite both the SHA and RIPEMD implementations for Zen3 to achieve a significant performance boost.

Imagine achieving a 70% performance increase!  Grin

Additionally, optimizing for Zen4 by leveraging its specific architectural features can lead to even greater efficiency gains.
jr. member
Activity: 40
Merit: 6
Any sort of strategy is useless if you use either Python or ASM as long as any sort of higher-level op like SHA / RIPEMD is the actual bottleneck.

Nothing better (faster) and regularly updated is available than the following:

https://github.com/JayDDee/cpuminer-opt/tree/master/algo/ripemd (ripemd)
https://github.com/JayDDee/cpuminer-opt/tree/master/algo/sha (sha)

4-way, 8-way, avx2/avx512vl optimizations.

I don't see these implemented in the tools we use here; they are only used in the miner.

These existing ones have been deprecated.

Unfortunately, I don't have the time to address this myself.

Code:
   while (1) {
#if defined(USE_CUSTOM_SHA256)
        sha256_init(&s256ctx);
        sha256_update(&s256ctx, compressed_pubkey, 33);
        sha256_final(&s256ctx, sha256hash);
#else
#if defined(__APPLE__) && defined(USE_CC_SHA)
        CC_SHA256(compressed_pubkey, 33, sha256hash);
#else
        SHA256(compressed_pubkey, 33, sha256hash);
#endif
#endif

//        RIPEMD160(sha256hash, 32, rmd_hash);

        ++count;
        if (count % (1 << 26) == 0) {
            ticks = clock();
            speed = count * CLOCKS_PER_SEC / (ticks - start);
            printf("SHA hashes: %" PRIu64 " speed: %" PRIu64 " hashes/s\n", count, (uint64_t) speed);
        }
    }

SHA hashes: 134217728 speed: 7485947 hashes/s


Code:
SHA256_Init(&shaCtx);
 while (1) {
       //...
       SHA256_Update(&shaCtx, compressed_pubkey, 33);

       ++count;
       if (count % (1 << 26) == 0) {
            ticks = clock();
            speed = count * CLOCKS_PER_SEC / (ticks - start);
            printf("Hashed bytes: %" PRIu64 " speed: %" PRIu64 " MB/s\n", count * 33, (uint64_t) (speed * 33) >> 20);
      }    
}

Hashed bytes: 22145925120 speed: 1712 MB/s


So 1.7 GB/s with your everyday SHA hasher is not bad, what's bad is that it's doing a single hash of a 22 GB message, not hundreds of millions of hashes of 33 bytes.
In our case, the hash context needs to be reinitialized for every public key we need to hash, so AVX512 and so on maybe can bring a 50% speed-up, nothing to write home about.

Benchmarked OpenSSL / Apple CommonCrypto and fast SHA with SSE3.2 intrinsics (last one was like 10% faster, probably because of inlining). I would bet that the CPUs that have hardware support for SHA instructions are actually used by the SHA routines available from the system APIs, and we wouldn't need to hack them ourself.

For AVX you'd actually need a distributed scheduling: https://github.com/minio/sha256-simd
newbie
Activity: 9
Merit: 0

are you key hunt creator?
nice too meet you


Hi, Yes I developed the CPU version all other are copies, if you have some doubt about it please use the next topic: https://bitcointalksearch.org/topic/keyhunt-development-requests-bug-reports-5322040

Of course it is, and you are the main programmer and developer
nice to meet you
I have also written various programs for this work, soon I will post a complete version on GitHub

Do you know how to search randomly in KeyhuntCuda version 2   ?

Version 1 with -r key and the value of key space was sequential with random
Like
KeyHunt-Cuda.exe -t 0 -g --gpui 0 --gpux 24,256 -m address --coin BTC --range 20000000000000000:40000000000000000 13zb1hQbWVsc2S7ZTZnP2G4undNNpdh5so -r 2000
member
Activity: 286
Merit: 15
Any sort of strategy is useless if you use either Python or ASM as long as any sort of higher-level op like SHA / RIPEMD is the actual bottleneck.

Nothing better (faster) and regularly updated is available than the following:

https://github.com/JayDDee/cpuminer-opt/tree/master/algo/ripemd (ripemd)
https://github.com/JayDDee/cpuminer-opt/tree/master/algo/sha (sha)

4-way, 8-way, avx2/avx512vl optimizations.

I don't see these implemented in the tools we use here; they are only used in the miner.

These existing ones have been deprecated.

Unfortunately, I don't have the time to address this myself.
hero member
Activity: 861
Merit: 662

are you key hunt creator?
nice too meet you


Hi, Yes I developed the CPU version all other are copies, if you have some doubt about it please use the next topic: https://bitcointalksearch.org/topic/keyhunt-development-requests-bug-reports-5322040
newbie
Activity: 9
Merit: 0
keyhunt not good

you should learn to configure it, defaul thread subrange is 32 bits, for small ranges with multiple threads you should lower the N value  with "-n number" if the range is less than 1 Million keys you should use -n 0x10000
where 0x10000 its a 16 bits subrange per thread

Look

https://talkimg.com/images/2024/05/21/1iVIH.png

Found i less than a second

are you key hunt creator?
nice too meet you
It was my mistake, I apologize, dear friend
ok , thank you
I meant https://github.com/WanderingPhilosopher/KeyHuntCudaClient
i use of first version keyhunt cuda
newbie
Activity: 19
Merit: 0
After talk with a mathematician friend,I was clearly not wrong.... That's a relief

And I could reduce to 1/10 of the pool... what is great but not enough tho because the error bar still in 15 orders of magnitude Roll Eyes
Good but not enough for me


Searching a little more is actually not far fetch my assumptions not my techniques.
Some papers already did something close but I need to think more and have dedicated pc for that to neutralize the noise.

Interesting how reminds me my time in spectroscopy Lab
https://imgur.com/a/NTlgYpp

https://imgur.com/a/NTlgYpp

And that's pretty cool!
hero member
Activity: 861
Merit: 662
keyhunt not good

you should learn to configure it, defaul thread subrange is 32 bits, for small ranges with multiple threads you should lower the N value  with "-n number" if the range is less than 1 Million keys you should use -n 0x10000
where 0x10000 its a 16 bits subrange per thread

Look



Found i less than a second
newbie
Activity: 12
Merit: 1
keyhunt not good

i get a test with key hunt
..........................................



KEYHUNT in BSGS mode work well

jr. member
Activity: 40
Merit: 6

Is it non-pointless even if writing it in assembler?

Therefore, whatever assembler optimizations and algorithmic tricks you can imagine to push the boundaries, they do not overcome the inherent speed limits imposed by modular inversion and hashing.

Double SHA-256 hashing is the ultimate bottleneck  Grin
That was my point, read carefully the question Smiley

Any sort of strategy is useless if you use either Python or ASM as long as any sort of higher-level op like SHA / RIPEMD is the actual bottleneck.

But if we only operate on EC then it's another story, this is where kangaroo / rho / bsgs comes into play, and we can start optimizing the operations that are the bottleneck at THAT level. It's one thing to do a few thousand mul/s in Python and another to do batched additions (with a single inversion) on a GPU, to reach giga adds/second. Millions of times faster for identical results. And ofcourse it's still not enough even at that level, so...

Something to think about on 256-bit modular math:
- a single inversion is ~60 times slower than a multiplication (GCD algo, not a simple operation at all)
- a multiplication is ~2 times slower than a squaring
- a squaring is ~3 times slower than an addition, ~2 slower than a normalization

A simple point addition requires one inversion, 2 multiplications, a squaring, 2 normalizations, and 6 additions.
A simple point multiplication requires a bunch of point additions (in truth, it's somewhat faster bcz of Jacobian shortcut, but still a lot of multiplications).

So stuff like doing point-scalar multiplication is a bad joke, compared to the problem of optimizing the primitives Smiley
member
Activity: 286
Merit: 15

Is it non-pointless even if writing it in assembler?


The necessary double SHA-256 hashing in Bitcoin's address generation process is substantially slower than EC operations, ensuring that even perfect optimization of point additions will not eliminate the hashing bottleneck.

Therefore, whatever assembler optimizations and algorithmic tricks you can imagine to push the boundaries, they do not overcome the inherent speed limits imposed by modular inversion and hashing.

Double SHA-256 hashing is the ultimate bottleneck  Grin
newbie
Activity: 9
Merit: 0


i have a simple alghoritm with cpu in python , you can test it


Code:

import bitcoin
import ecdsa



def private_key_to_public_key(private_key):
    sk = ecdsa.SigningKey.from_string(bytes.fromhex(private_key), curve=ecdsa.SECP256k1)
    vk = sk.get_verifying_key()
    compressed_public_key = vk.to_string("compressed").hex()
    return compressed_public_key



        bitcoin_address = bitcoin.pubtoaddr(public_key)




Why use Bitcoin and ECDSA imports? They're so slow, it feels like a waste of time.

Instead, utilize ICE (import secp256k1 as ice) for this function and the Bitcoin address line:


def private_key_to_public_key(private_key):
    priv_int = int(private_key, 16)
    return ice.scalar_multiplication(priv_int)

and

bitcoin_address = ice.pubkey_to_address(0, True, public_key)


It's approximately 10 times faster than ECDSA. But even that is miserable if you attack the dinosaur numbers.

The more you delve into Python, the more apparent it becomes that searching for Puzzle 66 through it is pointless.

Perhaps someone knowingly obscures things by selling Python scripts as the ultimate solution.  Grin






In fact, what I meant by this comment was the problem of Key Hunt and I wrote different versions of these programs. And compared to Key Hunt, I put this code so that you can test and see the speed of a simple program with a  CPUs and a powerful program with GPU that does not work properly.

In general, I am very happy that you paid attention to this code and took your time, and I thank you. And also ice's idea was very good Wink Wink
Pages:
Jump to: