Bitcoin puzzle transaction ~32 BTC prize to who solves it - page 3.

aminsolhi

newbie

Activity: 9

Merit: 0

Quote from: WanderingPhilospher on May 25, 2024, 11:15:22 AM

Quote from: aminsolhi on May 22, 2024, 06:15:31 AM

Quote from: albert0bsd on May 21, 2024, 06:54:21 PM

Quote from: aminsolhi on May 20, 2024, 07:57:42 PM

keyhunt not good

you should learn to configure it, defaul thread subrange is 32 bits, for small ranges with multiple threads you should lower the N value with "-n number" if the range is less than 1 Million keys you should use -n 0x10000
where 0x10000 its a 16 bits subrange per thread

Look

https://talkimg.com/images/2024/05/21/1iVIH.png

Found i less than a second

are you key hunt creator?
nice too meet you
It was my mistake, I apologize, dear friend
ok , thank you
I meant https://github.com/WanderingPhilosopher/KeyHuntCudaClient
i use of first version keyhunt cuda

Zero clues of how or what you ran, but with a single CPU core, using keyhunt-cuda, it is found pretty much as the program starts:

Code:

KeyHunt-Cuda v1.08

COMP MODE    : COMPRESSED
COIN TYPE    : BITCOIN
SEARCH MODE  : Single Address
DEVICE       : CPU
CPU THREAD   : 1
SSE          : YES
RKEY         : 0 Mkeys
MAX FOUND    : 65536
BTC ADDRESS  : 1E5V4LbVrTbFrfj7VN876DamzkaNiGAvFo
OUTPUT FILE  : Found.txt

Start Time   : Sat May 25 10:11:49 2024
Global start : 20000000000000000 (66 bit)
Global end   : 21000000000000000 (66 bit)
Global range : 1000000000000000 (61 bit)


[00:00:02] [CPU+GPU: 5.28 Mk/s] [GPU: 0.00 Mk/s] [C: 0.000000 %] [R: 0] [T: 10,706,944 (24 bit)] [F: 1]

BYE

SO maybe you entered wrong things/flags when trying to run the program.

this is 1 milion key after start 200000000000f4240
and this is p2ph compressed public key : 1E5V4LbVrTbFrfj7VN876DamzkaNiGAvFo

KeyHunt-Cuda.exe -t 0 -g --gpui 0 --gpux 24,256 -m address --coin BTC --range 20000000000000000:40000000000000000 1E5V4LbVrTbFrfj7VN876DamzkaNiGAvFo

my keyhunt cuda speed is 60 Mk/s . In fact, Cuda should find it in less than 1 second, but it takes about 1:40 minute

KeyHunt-Cuda.exe -t 0 -g --gpui 0 --gpux 24,256 -m address --coin BTC --range 20000000000000000:40000000000000000 1E5V4LbVrTbFrfj7VN876DamzkaNiGAvFo

KeyHunt-Cuda v1.07

COMP MODE : COMPRESSED
COIN TYPE : BITCOIN
SEARCH MODE : Single Address
DEVICE : GPU
CPU THREAD : 0
GPU IDS : 0
GPU GRIDSIZE : 24x256
SSE : YES
RKEY : 0 Mkeys
MAX FOUND : 65536
BTC ADDRESS : 1E5V4LbVrTbFrfj7VN876DamzkaNiGAvFo
OUTPUT FILE : Found.txt

Start Time : Sun May 26 14:04:50 2024
Global start : 20000000000000000 (66 bit)
Global end : 40000000000000000 (67 bit)
Global range : 20000000000000000 (66 bit)

GPU : GPU #0 Quadro P1000 (4x128 cores) Grid(24x256)

[00:01:36] [CPU+GPU: 62.21 Mk/s] [GPU: 62.21 Mk/s] [C: 0.000000 %] [R: 0] [T: 5,976,883,200 (33 bit)] [F: 0]

The problem is that Cuda performs the same task in all graphics cores in parallel and repetitively, and instead of the cores each helping the program, each of them repeats the same task as an island, for example, each core in parallel from the same start. He does and moves forward. And maybe in the best case, it divides the entire collection into the number of cores and each core starts working in that interval, in this case, once you exit the program, all your efforts will end in an unknown place, which will be used the next time you run the program. You don't know where to start

Of course, it seems that you are using a higher version of this software v1.08. But anyway, I have to tell the truth. You have written a very good software. I really enjoyed your code ideas.

madogss

newbie

Activity: 12

Merit: 0

Quote from: Tepan on May 25, 2024, 03:05:37 PM

hi guys! who can help me to develop rust coding language?! (don't judge my statement and result, since i just new for learning this puzzle and trying new stuff)

i run this like 4-5 hours straight, i use pool keyspace and multi thread, the search only use CPU, let's find out can develop it with GPU power.

it's sequential and not random.

the thread is dividing key ranges and make the 1,5 bytes changed from left padding following how many thread are used.

Puzzle search
Script started at: 2024-05-25 20:56:03.700485 +07:00
from:0x23000000000000000 to:0x3ffffffffffffffff
target:13zb1hQbWVsc2S7ZTZnP2G4undNNpdh5so

0000000000000000000000000000000000000000000000035200000002cc179e| 13zbaq4dADzLenZB6VdSdmAaqV8TABkHEf

average current speed is 15.000 Key/sec for each thread.

with some GPU power, i think my codes (?) can solve that in couple months.
my codes still make some bug, the address display is misleading the corresponding of private key, but it's still same prefix.
all of that still can search the matched, but the displaying of progress is not correct.
🙏🏻

example of pre-run.

000000000000000000000000000000000000000000000003c600000000002b0f | 13zbpbczKXAcx9q4NvTuEa5szGu3 000000000000000000000000000000000000000000000003520000000000a039 | 13zbNacfa9Nrz8YttQbzLVqT7y4w 000000000000000000000000000000000000000000000003c60000000000d456 | 13zbFwGk1JussoSK3gax39256VBu 000000000000000000000000000000000000000000000002de000000000103a3 | 13zbt3NfsVUjJu8FQWxFfxdePAFP 0000000000000000000000000000000000000000000000026a00000000010bd3 | 13zbix1E22g4FRk5hUcoMV1Hhvpd 000000000000000000000000000000000000000000000002de0000000001333e | 13zbDQnUKS1N8AqmynykC1QYNgwc 000000000000000000000000000000000000000000000002de00000000013e02 | 13zb7xLuqjuPKdbGt7rshDEjdprK 00000000000000000000000000000000000000000000000352000000000168ae | 13zbm8dR61ur1GqnQEmjrRsM5knP 000000000000000000000000000000000000000000000002300000000001923c | 13zbpdWGSwZpHK2zyR2o5XqgfXdv 0000000000000000000000000000000000000000000000026a0000000001c928 | 13zbp7u1bEeK94RCMtsvuNtqBrYo 0000000000000000000000000000000000000000000000038c0000000001e94a | 13zb56DAy494Drdngk2YchYRNfvF 0000000000000000000000000000000000000000000000023000000000023728 | 13zbFaEpeoY7Hrx6sTLdM3ZvvUW5 0000000000000000000000000000000000000000000000031800000000023ade | 13zbtT9na4bJdYoexnZ6YSt6KDHp

I believe Keyhuntcuda2 from siupune has this functioning on the gpu

kTimesG

jr. member

Activity: 40

Merit: 6

Quote from: viljy on May 25, 2024, 07:50:08 AM

Quote from: nomachine on May 24, 2024, 12:40:52 AM

I have ~120 Ekeys/s in BSGS/keyhunt on AMD after used the AOCC compiler Grin

Is this the speed at which public keys are checked in the hash table (using a bloom filter?), or is this the real speed at which the processor generates public keys?
Quite a high speed even for BSGS on a video card

120 exakeys = 120.000.000.000 gigakeys

Pretending some 4Ghz CPU generates, sequentially, one key per cycle (it doesn't, more like one key over 300 cycles on average, and that's with all possible optimizations) it would still take 30.000.000.000 CPU cores to reach that speed.

Or (300 cycles/key): 9.000.000.000.000 cores that are running all at 100% with no OS, nothing else running, all working at full speed doing nothing except crunching numbers inside the CPU registers.

I think maybe that speed reflects space coverage rather than operating time, and space coverage speed is logarithmic not linear.

Tepan

jr. member

Activity: 37

Merit: 1

Quote from: Tepan on May 25, 2024, 03:05:37 PM

hi guys! who can help me to develop rust coding language?! (don't judge my statement and result, since i just new for learning this puzzle and trying new stuff)

i run this like 4-5 hours straight, i use pool keyspace and multi thread, the search only use CPU, let's find out can develop it with GPU power.

it's sequential and not random.

the thread is dividing key ranges and make the 1,5 bytes changed from left padding following how many thread are used.

Puzzle search
Script started at: 2024-05-25 20:56:03.700485 +07:00
from:0x23000000000000000 to:0x3ffffffffffffffff
target:13zb1hQbWVsc2S7ZTZnP2G4undNNpdh5so

0000000000000000000000000000000000000000000000035200000002cc179e| 13zbaq4dADzLenZB6VdSdmAaqV8TABkHEf

average current speed is 15.000 Key/sec for each thread.

with some GPU power, i think my codes (?) can solve that in couple months.
my codes still make some bug, the address display is misleading the corresponding of private key, but it's still same prefix.
all of that still can search the matched, but the displaying of progress is not correct.
🙏🏻

example of pre-run.

000000000000000000000000000000000000000000000003c600000000002b0f | 13zbpbczKXAcx9q4NvTuEa5szGu3 000000000000000000000000000000000000000000000003520000000000a039 | 13zbNacfa9Nrz8YttQbzLVqT7y4w 000000000000000000000000000000000000000000000003c60000000000d456 | 13zbFwGk1JussoSK3gax39256VBu 000000000000000000000000000000000000000000000002de000000000103a3 | 13zbt3NfsVUjJu8FQWxFfxdePAFP 0000000000000000000000000000000000000000000000026a00000000010bd3 | 13zbix1E22g4FRk5hUcoMV1Hhvpd 000000000000000000000000000000000000000000000002de0000000001333e | 13zbDQnUKS1N8AqmynykC1QYNgwc 000000000000000000000000000000000000000000000002de00000000013e02 | 13zb7xLuqjuPKdbGt7rshDEjdprK 00000000000000000000000000000000000000000000000352000000000168ae | 13zbm8dR61ur1GqnQEmjrRsM5knP 000000000000000000000000000000000000000000000002300000000001923c | 13zbpdWGSwZpHK2zyR2o5XqgfXdv 0000000000000000000000000000000000000000000000026a0000000001c928 | 13zbp7u1bEeK94RCMtsvuNtqBrYo 0000000000000000000000000000000000000000000000038c0000000001e94a | 13zb56DAy494Drdngk2YchYRNfvF 0000000000000000000000000000000000000000000000023000000000023728 | 13zbFaEpeoY7Hrx6sTLdM3ZvvUW5 0000000000000000000000000000000000000000000000031800000000023ade | 13zbtT9na4bJdYoexnZ6YSt6KDHp

my search techniques uses this
2^31 and 2^36
(36 bit)
MacBook-Pro:Desktop tepan$ python3 subs.py
Enter the range x: 31
Enter the range y: 36

> 0x91000000 <

let's hunt.

let begin: u128 = 0x910000000;
let end: u128 = 0x9ffffffff;

const TARGET: &str = "1Be2UF9NLfyLFbtm3TCbmuocc9N1Kduci1";

result :

Puzzle search
Script started at: 2024-05-26 02:25:21.816449 +07:00
from:0x910000000 to:0x9ffffffff
target:1Be2UF9NLfyLFbtm3TCbmuocc9N1Kduci1

00000000000000000000000000000000000000000000000000000009f7a02888| 1Be2JoBBasXePHijh1AD1FK5Wsui 00000000000000000000000000000000000000000000000000000009f3702fc4| 1Be2m52dDvKNf2wrvbvSoh4V4t8Q 00000000000000000000000000000000000000000000000000000009e2b0702a| 1Be2DebJWkfJWXb8cszjA8mN6fvh 00000000000000000000000000000000000000000000000000000009e6e07a23| 1Be2wkS3jkyDGWnVpiNXVzUYdTtY 00000000000000000000000000000000000000000000000000000009eb10e589| 1Be2f5NaU8Qv35nPzJk4S6Jntzvs 00000000000000000000000000000000000000000000000000000009fbd0fcef| 1Be2uQTHQC5V9QPTqSkHvGNpFA5H 00000000000000000000000000000000000000000000000000000009e2b117d2| 1Be2nTuUQDYvYgyHPfE6S4hBTVNc 00000000000000000000000000000000000000000000000000000009eb119a5e| 1Be2ncy8U6oS9UzUGQU13CW7prqo 00000000000000000000000000000000000000000000000000000009f371a027| 1Be2tHcpUiDPVcqdvu2v2KmtA1m9 00000000000000000000000000000000000000000000000000000009e6e1b74a| 1Be22Wi2usp5z2J1wRvfovS8pD7R 00000000000000000000000000000000000000000000000000000009fbd1be53| 1Be2eU7ktk7abQsFRHgVP1qmmpZW 00000000000000000000000000000000000000000000000000000009f371d249| 1Be2zLYgYPdSjL9cog8kCsWUVjax 00000000000000000000000000000000000000000000000000000009f7a1e133| 1Be2xv3Q6XtLnBAz4SmbqbaSkhs8 00000000000000000000000000000000000000000000000000000009de820a7c| 1Be2UF9NLfyLFbtm3TCbmuocc9N1 00000000000000000000000000000000000000000000000000000009e2b20b15|

--------------------------------------------------------------------------------
KEY FOUND! 2024-05-26 02:29:44.164813 +07:00
private key (WIF): KwDiBf89QgGbjEhKnhXJuH7LrciVrZi3qYjgd9Mg1Upu7eJAtiDr
private key (hex): 00000000000000000000000000000000000000000000000000000009de820a7c
public key: 02b3e772216695845fa9dda419fb5daca28154d8aa59ea302f05e916635e47b9f6
address: 1Be2UF9NLfyLFbtm3TCbmuocc9N1Kduci1
--------------------------------------------------------------------------------
Search completed

it's like 3 minutes for search by address.

look at this, example of use.

34 2^33...2^34-1 200000000...3ffffffff | x : 29 y: 34 - > > 0x330000000 < [ 0x330000000:0x3ffffffff ] [found on 0x34a65911d ]

38 2^37...2^38-1 2000000000...3fffffffff | x : 33 y: 38 - > > 0x2100000000 < [0x2100000000:0x2ffffffffff] [found on 0x22382facd0 ]

.. etc
i make this premature codes for saving more times with lowkey cpu speed for search, but work until 79 bit, rest of that, the output result for keyspace is not correct.
sorry i cant provide the real codes, because someone can brute that easily with his huge mining GPU farm

Tepan

jr. member

Activity: 37

Merit: 1

hi guys! who can help me to develop rust coding language?! (don't judge my statement and result, since i just new for learning this puzzle and trying new stuff)

i run this like 4-5 hours straight, i use pool keyspace and multi thread, the search only use CPU, let's find out can develop it with GPU power.

it's sequential and not random.

the thread is dividing key ranges and make the 1,5 bytes changed from left padding following how many thread are used.

Puzzle search
Script started at: 2024-05-25 20:56:03.700485 +07:00
from:0x23000000000000000 to:0x3ffffffffffffffff
target:13zb1hQbWVsc2S7ZTZnP2G4undNNpdh5so

0000000000000000000000000000000000000000000000035200000002cc179e| 13zbaq4dADzLenZB6VdSdmAaqV8TABkHEf

average current speed is 15.000 Key/sec for each thread.

with some GPU power, i think my codes (?) can solve that in couple months.
my codes still make some bug, the address display is misleading the corresponding of private key, but it's still same prefix.
all of that still can search the matched, but the displaying of progress is not correct.
🙏🏻

example of pre-run.

000000000000000000000000000000000000000000000003c600000000002b0f | 13zbpbczKXAcx9q4NvTuEa5szGu3 000000000000000000000000000000000000000000000003520000000000a039 | 13zbNacfa9Nrz8YttQbzLVqT7y4w 000000000000000000000000000000000000000000000003c60000000000d456 | 13zbFwGk1JussoSK3gax39256VBu 000000000000000000000000000000000000000000000002de000000000103a3 | 13zbt3NfsVUjJu8FQWxFfxdePAFP 0000000000000000000000000000000000000000000000026a00000000010bd3 | 13zbix1E22g4FRk5hUcoMV1Hhvpd 000000000000000000000000000000000000000000000002de0000000001333e | 13zbDQnUKS1N8AqmynykC1QYNgwc 000000000000000000000000000000000000000000000002de00000000013e02 | 13zb7xLuqjuPKdbGt7rshDEjdprK 00000000000000000000000000000000000000000000000352000000000168ae | 13zbm8dR61ur1GqnQEmjrRsM5knP 000000000000000000000000000000000000000000000002300000000001923c | 13zbpdWGSwZpHK2zyR2o5XqgfXdv 0000000000000000000000000000000000000000000000026a0000000001c928 | 13zbp7u1bEeK94RCMtsvuNtqBrYo 0000000000000000000000000000000000000000000000038c0000000001e94a | 13zb56DAy494Drdngk2YchYRNfvF 0000000000000000000000000000000000000000000000023000000000023728 | 13zbFaEpeoY7Hrx6sTLdM3ZvvUW5 0000000000000000000000000000000000000000000000031800000000023ade | 13zbtT9na4bJdYoexnZ6YSt6KDHp

WanderingPhilospher

full member

Activity: 1050

Merit: 219

Shooters Shoot...

Quote from: aminsolhi on May 22, 2024, 06:15:31 AM

Quote from: albert0bsd on May 21, 2024, 06:54:21 PM

Quote from: aminsolhi on May 20, 2024, 07:57:42 PM

keyhunt not good

you should learn to configure it, defaul thread subrange is 32 bits, for small ranges with multiple threads you should lower the N value with "-n number" if the range is less than 1 Million keys you should use -n 0x10000
where 0x10000 its a 16 bits subrange per thread

Look

Found i less than a second

are you key hunt creator?
nice too meet you
It was my mistake, I apologize, dear friend
ok , thank you
I meant https://github.com/WanderingPhilosopher/KeyHuntCudaClient
i use of first version keyhunt cuda

Zero clues of how or what you ran, but with a single CPU core, using keyhunt-cuda, it is found pretty much as the program starts:

Code:

KeyHunt-Cuda v1.08

COMP MODE    : COMPRESSED
COIN TYPE    : BITCOIN
SEARCH MODE  : Single Address
DEVICE       : CPU
CPU THREAD   : 1
SSE          : YES
RKEY         : 0 Mkeys
MAX FOUND    : 65536
BTC ADDRESS  : 1E5V4LbVrTbFrfj7VN876DamzkaNiGAvFo
OUTPUT FILE  : Found.txt

Start Time   : Sat May 25 10:11:49 2024
Global start : 20000000000000000 (66 bit)
Global end   : 21000000000000000 (66 bit)
Global range : 1000000000000000 (61 bit)


[00:00:02] [CPU+GPU: 5.28 Mk/s] [GPU: 0.00 Mk/s] [C: 0.000000 %] [R: 0] [T: 10,706,944 (24 bit)] [F: 1]

BYE

SO maybe you entered wrong things/flags when trying to run the program.

viljy

hero member

Activity: 1736

Merit: 857

Quote from: nomachine on May 24, 2024, 12:40:52 AM

I have ~120 Ekeys/s in BSGS/keyhunt on AMD after used the AOCC compiler Grin

Is this the speed at which public keys are checked in the hash table (using a bloom filter?), or is this the real speed at which the processor generates public keys?
Quite a high speed even for BSGS on a video card

nomachine

member

Activity: 286

Merit: 15

Quote from: holy_ship on May 20, 2024, 03:17:15 PM

I've tried
AMD Ryzen 9 7950X - 4.5GHZ, 16c/32th + 128GB - 4ek on start, reaches 6ek after week

I have ~120 Ekeys/s in BSGS/keyhunt on AMD after used the AOCC compiler Grin

I've been tweaking linux for months to get every atom out of Dual CPU configuration.

#RT kernel

Code:

wget -qO - https://dl.xanmod.org/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/xanmod-archive-keyring.gpg
echo 'deb [signed-by=/usr/share/keyrings/xanmod-archive-keyring.gpg] http://deb.xanmod.org releases main' | sudo tee /etc/apt/sources.list.d/xanmod-release.list
sudo apt-get -y update && sudo apt install linux-xanmod-rt-x64v3

Code:

sudo apt install -y tuned tuned-utils tuned-utils-systemtap
sudo tuned-adm profile latency-performance

Code:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

should be performance for all cores

/etc/default/grub

Code:

quiet msr.allow_writes=on nosoftlockup mce=ignore_ce skew_tick=1 clocksource=hpet iommu=soft noresume mitigations=off nmi_watchdog=0

Nvme config (It must be a good Heatsink on Nvme, otherwise it goes over 50 C/I have had white smoke from them more than once - SAMSUNG MZVL2512HCJQ.)

Code:

sudo nvme smart-log /dev/nvme0 | grep -i '^temperature'

temperature : 42 C
Temperature Sensor 1 : 42 C
Temperature Sensor 2 : 50 C

/etc/fstab

Code:

ext4 noatime,nodiratime,errors=remount-ro,inode_readahead_blks=0 0 1

AMD EPYC config

Code:

wrmsr -a 0xc0011020 0x4400000000000
wrmsr -a 0xc0011021 0x4000000000040
wrmsr -a 0xc0011022 0x8680000401570000
wrmsr -a 0xc001102b 0x2040cc10

And so on and on....This is only part of it.

nomachine

member

Activity: 286

Merit: 15

Quote from: kTimesG on May 22, 2024, 07:49:17 PM

AVX512 and so on maybe can bring a 50% speed-up, nothing to write home about.

Benchmarked OpenSSL / Apple CommonCrypto and fast SHA with SSE3.2 intrinsics (last one was like 10% faster, probably because of inlining). I would bet that the CPUs that have hardware support for SHA instructions are actually used by the SHA routines available from the system APIs, and we wouldn't need to hack them ourself.

For AVX you'd actually need a distributed scheduling: https://github.com/minio/sha256-simd

I achieved a 20% performance increase in Keyhunt on Zen3 architecture compared to GCC versions 12, 13, and 14.
To compile with Clang, I used the AOCC compiler located at /opt/AMD/aocc-compiler-4.2.0/bin/clang.

However, it was essential to remove all Intel intrinsics (_builtin_ia32) from the code since these intrinsics are specific to Intel processors and incompatible with AMD processors.

In my case, I need to rewrite both the SHA and RIPEMD implementations for Zen3 to achieve a significant performance boost.

Imagine achieving a 70% performance increase! Grin

Additionally, optimizing for Zen4 by leveraging its specific architectural features can lead to even greater efficiency gains.

kTimesG

jr. member

Activity: 40

Merit: 6

Quote from: nomachine on May 22, 2024, 06:08:46 PM

Quote from: kTimesG on May 21, 2024, 08:31:07 AM

Any sort of strategy is useless if you use either Python or ASM as long as any sort of higher-level op like SHA / RIPEMD is the actual bottleneck.

Nothing better (faster) and regularly updated is available than the following:

https://github.com/JayDDee/cpuminer-opt/tree/master/algo/ripemd (ripemd)
https://github.com/JayDDee/cpuminer-opt/tree/master/algo/sha (sha)

4-way, 8-way, avx2/avx512vl optimizations.

I don't see these implemented in the tools we use here; they are only used in the miner.

These existing ones have been deprecated.

Unfortunately, I don't have the time to address this myself.

Code:

while (1) {
#if defined(USE_CUSTOM_SHA256)
   sha256_init(&s256ctx);
   sha256_update(&s256ctx, compressed_pubkey, 33);
   sha256_final(&s256ctx, sha256hash);
#else
#if defined(__APPLE__) && defined(USE_CC_SHA)
   CC_SHA256(compressed_pubkey, 33, sha256hash);
#else
   SHA256(compressed_pubkey, 33, sha256hash);
#endif
#endif

// RIPEMD160(sha256hash, 32, rmd_hash);

   ++count;
   if (count % (1 << 26) == 0) {
   ticks = clock();
   speed = count * CLOCKS_PER_SEC / (ticks - start);
   printf("SHA hashes: %" PRIu64 " speed: %" PRIu64 " hashes/s\n", count, (uint64_t) speed);
   }
   }

SHA hashes: 134217728 speed: 7485947 hashes/s

Code:

SHA256_Init(&shaCtx);
while (1) {
   //...
   SHA256_Update(&shaCtx, compressed_pubkey, 33);

   ++count;
   if (count % (1 << 26) == 0) {
   ticks = clock();
   speed = count * CLOCKS_PER_SEC / (ticks - start);
   printf("Hashed bytes: %" PRIu64 " speed: %" PRIu64 " MB/s\n", count * 33, (uint64_t) (speed * 33) >> 20);
   }
}

Hashed bytes: 22145925120 speed: 1712 MB/s

So 1.7 GB/s with your everyday SHA hasher is not bad, what's bad is that it's doing a single hash of a 22 GB message, not hundreds of millions of hashes of 33 bytes.
In our case, the hash context needs to be reinitialized for every public key we need to hash, so AVX512 and so on maybe can bring a 50% speed-up, nothing to write home about.

Benchmarked OpenSSL / Apple CommonCrypto and fast SHA with SSE3.2 intrinsics (last one was like 10% faster, probably because of inlining). I would bet that the CPUs that have hardware support for SHA instructions are actually used by the SHA routines available from the system APIs, and we wouldn't need to hack them ourself.

For AVX you'd actually need a distributed scheduling: https://github.com/minio/sha256-simd

aminsolhi

newbie

Activity: 9

Merit: 0

Quote from: albert0bsd on May 22, 2024, 10:38:48 AM

Quote from: aminsolhi on May 22, 2024, 06:15:31 AM

are you key hunt creator?
nice too meet you

Hi, Yes I developed the CPU version all other are copies, if you have some doubt about it please use the next topic: https://bitcointalksearch.org/topic/keyhunt-development-requests-bug-reports-5322040

Of course it is, and you are the main programmer and developer
nice to meet you
I have also written various programs for this work, soon I will post a complete version on GitHub

Do you know how to search randomly in KeyhuntCuda version 2 ?

Version 1 with -r key and the value of key space was sequential with random
Like
KeyHunt-Cuda.exe -t 0 -g --gpui 0 --gpux 24,256 -m address --coin BTC --range 20000000000000000:40000000000000000 13zb1hQbWVsc2S7ZTZnP2G4undNNpdh5so -r 2000

nomachine

member

Activity: 286

Merit: 15

Quote from: kTimesG on May 21, 2024, 08:31:07 AM

Any sort of strategy is useless if you use either Python or ASM as long as any sort of higher-level op like SHA / RIPEMD is the actual bottleneck.

Nothing better (faster) and regularly updated is available than the following:

https://github.com/JayDDee/cpuminer-opt/tree/master/algo/ripemd (ripemd)
https://github.com/JayDDee/cpuminer-opt/tree/master/algo/sha (sha)

4-way, 8-way, avx2/avx512vl optimizations.

I don't see these implemented in the tools we use here; they are only used in the miner.

These existing ones have been deprecated.

Unfortunately, I don't have the time to address this myself.

albert0bsd

hero member

Activity: 861

Merit: 662

Quote from: aminsolhi on May 22, 2024, 06:15:31 AM

are you key hunt creator?
nice too meet you

Hi, Yes I developed the CPU version all other are copies, if you have some doubt about it please use the next topic: https://bitcointalksearch.org/topic/keyhunt-development-requests-bug-reports-5322040

aminsolhi

newbie

Activity: 9

Merit: 0

Quote from: albert0bsd on May 21, 2024, 06:54:21 PM

Quote from: aminsolhi on May 20, 2024, 07:57:42 PM

keyhunt not good

you should learn to configure it, defaul thread subrange is 32 bits, for small ranges with multiple threads you should lower the N value with "-n number" if the range is less than 1 Million keys you should use -n 0x10000
where 0x10000 its a 16 bits subrange per thread

Look

https://talkimg.com/images/2024/05/21/1iVIH.png

Found i less than a second

are you key hunt creator?
nice too meet you
It was my mistake, I apologize, dear friend
ok , thank you
I meant https://github.com/WanderingPhilosopher/KeyHuntCudaClient
i use of first version keyhunt cuda

maylabel

newbie

Activity: 19

Merit: 0

After talk with a mathematician friend,I was clearly not wrong.... That's a relief

And I could reduce to 1/10 of the pool... what is great but not enough tho because the error bar still in 15 orders of magnitude Roll Eyes

Good but not enough for me

Searching a little more is actually not far fetch my assumptions not my techniques.
Some papers already did something close but I need to think more and have dedicated pc for that to neutralize the noise.

Interesting how reminds me my time in spectroscopy Lab
https://imgur.com/a/NTlgYpp

https://imgur.com/a/NTlgYpp

And that's pretty cool!

albert0bsd

hero member

Activity: 861

Merit: 662

Quote from: aminsolhi on May 20, 2024, 07:57:42 PM

keyhunt not good

you should learn to configure it, defaul thread subrange is 32 bits, for small ranges with multiple threads you should lower the N value with "-n number" if the range is less than 1 Million keys you should use -n 0x10000
where 0x10000 its a 16 bits subrange per thread

Look

Found i less than a second

Cryptoman2009

newbie

Activity: 12

Merit: 1

Quote from: aminsolhi on May 20, 2024, 07:57:42 PM

keyhunt not good

i get a test with key hunt
..........................................

KEYHUNT in BSGS mode work well

kTimesG

jr. member

Activity: 40

Merit: 6

Quote from: nomachine on May 21, 2024, 06:27:16 AM

Quote from: kTimesG on May 21, 2024, 04:28:49 AM

Is it non-pointless even if writing it in assembler?

Therefore, whatever assembler optimizations and algorithmic tricks you can imagine to push the boundaries, they do not overcome the inherent speed limits imposed by modular inversion and hashing.

Double SHA-256 hashing is the ultimate bottleneck Grin

That was my point, read carefully the question

Any sort of strategy is useless if you use either Python or ASM as long as any sort of higher-level op like SHA / RIPEMD is the actual bottleneck.

But if we only operate on EC then it's another story, this is where kangaroo / rho / bsgs comes into play, and we can start optimizing the operations that are the bottleneck at THAT level. It's one thing to do a few thousand mul/s in Python and another to do batched additions (with a single inversion) on a GPU, to reach giga adds/second. Millions of times faster for identical results. And ofcourse it's still not enough even at that level, so...

Something to think about on 256-bit modular math:
- a single inversion is ~60 times slower than a multiplication (GCD algo, not a simple operation at all)
- a multiplication is ~2 times slower than a squaring
- a squaring is ~3 times slower than an addition, ~2 slower than a normalization

A simple point addition requires one inversion, 2 multiplications, a squaring, 2 normalizations, and 6 additions.
A simple point multiplication requires a bunch of point additions (in truth, it's somewhat faster bcz of Jacobian shortcut, but still a lot of multiplications).

So stuff like doing point-scalar multiplication is a bad joke, compared to the problem of optimizing the primitives

nomachine

member

Activity: 286

Merit: 15

Quote from: kTimesG on May 21, 2024, 04:28:49 AM

Is it non-pointless even if writing it in assembler?

The necessary double SHA-256 hashing in Bitcoin's address generation process is substantially slower than EC operations, ensuring that even perfect optimization of point additions will not eliminate the hashing bottleneck.

Therefore, whatever assembler optimizations and algorithmic tricks you can imagine to push the boundaries, they do not overcome the inherent speed limits imposed by modular inversion and hashing.

Double SHA-256 hashing is the ultimate bottleneck Grin

aminsolhi

newbie

Activity: 9

Merit: 0

Quote from: nomachine on May 21, 2024, 02:12:51 AM

Quote from: aminsolhi on May 20, 2024, 07:57:42 PM

i have a simple alghoritm with cpu in python , you can test it

Code:

import bitcoin
import ecdsa

def private_key_to_public_key(private_key):
   sk = ecdsa.SigningKey.from_string(bytes.fromhex(private_key), curve=ecdsa.SECP256k1)
   vk = sk.get_verifying_key()
   compressed_public_key = vk.to_string("compressed").hex()
   return compressed_public_key

   bitcoin_address = bitcoin.pubtoaddr(public_key)

Why use Bitcoin and ECDSA imports? They're so slow, it feels like a waste of time.

Instead, utilize ICE (import secp256k1 as ice) for this function and the Bitcoin address line:

def private_key_to_public_key(private_key):
priv_int = int(private_key, 16)
return ice.scalar_multiplication(priv_int)

and

bitcoin_address = ice.pubkey_to_address(0, True, public_key)

It's approximately 10 times faster than ECDSA. But even that is miserable if you attack the dinosaur numbers.

The more you delve into Python, the more apparent it becomes that searching for Puzzle 66 through it is pointless.

Perhaps someone knowingly obscures things by selling Python scripts as the ultimate solution. Grin

In fact, what I meant by this comment was the problem of Key Hunt and I wrote different versions of these programs. And compared to Key Hunt, I put this code so that you can test and see the speed of a simple program with a CPUs and a powerful program with GPU that does not work properly.

In general, I am very happy that you paid attention to this code and took your time, and I thank you. And also ice's idea was very good Wink

Topic: Bitcoin puzzle transaction ~32 BTC prize to who solves it - page 3. (Read 189831 times)