Pages:
Author

Topic: Bitcoin puzzle transaction ~32 BTC prize to who solves it - page 100. (Read 248927 times)

jr. member
Activity: 130
Merit: 2
“Divine light, surround me. Shield me from harm”
I want to play, I just cant figure out a good way to hook up. Anyone have any suggestions? Pools to join?

Thanks in advance
member
Activity: 165
Merit: 26
I never got further than 2**50 steps. Somewhere on  2**45 it starts to slow down.

Steps duration more than doubles than the time of the previous depth, at each depth, duh.

Try to make SECPK1 6x (six times) faster first.

I think it's more like "stop using a secp256k1 batch group addition that runs 6x times slower than what the hardware can accomplish".

The fastest implementation for secp256k1 code that I ever see and use it is already inside of kangaroo tool.

https://github.com/JeanLucPons/Kangaroo/tree/master/SECPK1

I would like to see some new ideas here. Wink

Challenge accepted. Let's do this, shall we?

1. CPU: speed is faster when using 5 x 52-bit limbs instead of 4 x 64-bit limbs because instructions can now run in parallel (SIMD) instead of limbs being dependent on each other's results (due to carry flag propagation, which is a performance killer on any CPU).
    (this alone gets you a 50% speedup at least)
    NOW... how many tools that have blatantly copied each other's "fast" code assumed "this is the fastest, I won't bother"?
 
2. There's a trick to also compute the batched inverse faster (same idea, get rid of partial products inter-dependency).
    (speedup is around 5%)

There's some secp256k1 ASM code out there that uses 4x64-bit limbs, which is slower than using no ASM and 52-bit limbs, and letting the compiler do the SIMD instruction packaging.

Refs for the above: Bernstein (some 2009 PDF), also "Modern computer arithmetics".

3. GPU: this is way more problematic. Let's start with some facts:

- GPU global & local (stack) memory is very, very slow to access; fastest code will do very few memory loading and storing.
- all threads compute the exact same instruction at every cycle (conditional branches are masked no-ops)
- all instructions are actually executed by 32-bit arithmetic units (64-bit are emulated)
- you have some small amount (48 kB) of fast on-chip shared memory accessible to all threads in a block / SM.
- you have some very small amount (1 KB) of extremely (fastest possible) registers on each thread

Now, if we do the math depending on GPU specs, clock frequency, how many cycles a fused multiply-add instruction takes, etc, we can arrive at some very impressionable numbers, for example of how many secp256k1 field multiplications / s (with modular reduction included et all) the GPU is capable of. These values are indicating to us that "hey, why the hell can we do billions of modular mul/s but when we run JLP's kangaroo the numbers are 10% of the theoretical peak?"

Now... looking at JLP's Kangaroo which you call "the fastest", what do we see?

- kangaroos get copied between global memory and local memory (stack), but if the stack gets too large (which it does, because it wants a large "gpu group size" to theoretically speed up inverse calculus), this just results of copying from global memory to global memory, doubling both the loading/storing, AND the required memory size, and slowing the runtime overall).
   Someone might say: this is done for coalescing reasons, etc. - why the hell is this the GPU kernel's concern, not the host's? Smiley
- on-chip shared memory is completely ignored and used as L1 cache (but what is it caching? one-time memory reads? oh no...);
- are all the PTX-base  routines taking notice that the GPU has a multiply-and-add 32-bit HW instruction? answer: no, everything is a 64-bit pointer...;
- kernel logic for batched addition is convoluted and modularized to the point it makes the compiler optimizations close to impossible, increasing the register pressure.
- at the highest possible level, a lot of things can be strategically better thought. Things like "X items were lost because you have no idea what arguments to use, bro" should not exist. The GPU should act as a continuous DP generator, with no lunch breaks between the kernel launches. Separate better the concerns between producing DPs and consuming DPs.

Just are these some of the ideas. There are more which I won't mention, since it's much too technical.
jr. member
Activity: 42
Merit: 0
130-bit is solvable in time complexity of 2**64.5 steps

I could brag that I managed to write from scratch a completely working Kangaroo that currently works 6x (six times) faster

I never got further than 2**50 steps. Somewhere on  2**45 it starts to slow down.

Try to make SECPK1 6x (six times) faster first.

The fastest implementation for secp256k1 code that I ever see and use it is already inside of kangaroo tool.

https://github.com/JeanLucPons/Kangaroo/tree/master/SECPK1

I would like to see some new ideas here. Wink
newbie
Activity: 38
Merit: 0
Hypothetically lets say someone where to crack puzzle #66 tomorrow. Are there any practical ways they could sweep the wallet considering how many bots are monitoring the network ready to crack the weak public key as soon as it gets revealed?

It just feels to me like people are wasting there time with 66 as any attempts to take the prize will end in shambles. Would the only feasible way to spend the winnings be to mine a block yourself without broadcasting the transaction? Obviously not very practical for the average joe.

Interested in hearing your thoughts.

pseudospace.
I think the only way you can truly know is to test it yourself with a bot and a 66 bit private key address
newbie
Activity: 10
Merit: 1
Hypothetically lets say someone where to crack puzzle #66 tomorrow. Are there any practical ways they could sweep the wallet considering how many bots are monitoring the network ready to crack the weak public key as soon as it gets revealed?

It just feels to me like people are wasting there time with 66 as any attempts to take the prize will end in shambles. Would the only feasible way to spend the winnings be to mine a block yourself without broadcasting the transaction? Obviously not very practical for the average joe.

Interested in hearing your thoughts.

pseudospace.
member
Activity: 165
Merit: 26
Solving 130-bit is equivalent to solving 4 to 8 billion 66-bit puzzles

You're unprofessional if you compare 66 and 130, because BSGS is much faster than bruteforce. But I agree that p66 is shorter on same price HW.
obody, right?

WTF are you talking about? Who said anything about bruteforcing anything?
130-bit is solvable in time complexity of 2**64.5 steps
66-bit is solvable in time complexity 2**32.5 steps

The ratio is around 4 billion. BSGS or kangaroo, is irrelevant. No brute force. Brute force ratio between the two would be 2**64. What did I get wrong?


I'm asking for a proper solution from a community developer or a github the can vouch for that everyone knows, like wandering/alberto/digaran etc....
It's obviously just the 137th clone of JLP Kangaroo, same skeleton, same architecture, same issues. If you want a "proper" solution don't expect it to land freely on GitHub. Knowledge and experience comes with a price tag. I could brag that I managed to write from scratch a completely working Kangaroo that currently works 6x (six times) faster than both the original and all the n00b clones out there. I'm not gonna sell it or release it because for one, I don't need to prove anything to anyone except myself, and secondly, it's really ok if no one believes me that I can squeeze out 900 million jumps/s on an underclocked RTX 3050 laptop GPU that can never reach more than 200 Mops/s with JLP's app. BTW, the stats on JLP's program are biased, the real speed is slower than the one printed on the screen (there's some non-sense smoothing speed computation in there, and the computed time durations are bugged between worker threads and main thread). Real speed is 10% slower due to these bugs. Whatever.
jr. member
Activity: 115
Merit: 1
Solving 130-bit is equivalent to solving 4 to 8 billion 66-bit puzzles

You're unprofessional if you compare 66 and 130, because BSGS is much faster than bruteforce. But I agree that p66 is shorter on same price HW.

p.s. Hey, guys. Another discussion of stealing bot.
What if you (finder of p66)
make 2 transactions by yourself

1st - transfer BTC to your wallet with reasonable fee and RBF False.

and after several seconds you doublespend everything with fee == 6.6BTC and RBF true

so, either you win, either nobody, right?
newbie
Activity: 13
Merit: 0
My question is what are you all scanning with for the 130 puzzle, even when JeanLucPons himself clearly states, twice, that you cannot find 130 with his Kangaroo:

Quote
This program is limited to a 125bit interval search
Quote
(Not possible with this program without modification)



There is Kangaroo-256-bit on github.  Fastest thing for CPU I've tried so far.
Puzzle 65 for ~50 seconds on AMD Ryzen 9 7950X

I'm not running, or compiling a github that came out of nowhere only 3 days ago, the same time you registered here and have pasted the link several times.
Github is just a repository, it does not means there's a single line hidden somewhere in thousands of lines of codes and folders which sends you the key once solved.


I'm asking for a proper solution from a community developer or a github the can vouch for that everyone knows, like wandering/alberto/digaran etc....
member
Activity: 122
Merit: 11
My question is what are you all scanning with for the 130 puzzle, even when JeanLucPons himself clearly states, twice, that you cannot find 130 with his Kangaroo:

Quote
This program is limited to a 125bit interval search
Quote
(Not possible with this program without modification)



There is Kangaroo-256-bit on github.  Fastest thing for CPU I've tried so far.
Puzzle 65 for ~50 seconds on AMD Ryzen 9 7950X

Where is the Windows version ? not found

It's source code. You must compile it yourself. Anyway on 4core CPU average time to solve puzzle 130 with this program is about 90000 years so ...
jr. member
Activity: 65
Merit: 1
34Sf4DnMt3z6XKKoWmZRw2nGyfGkDgNJZZ
My question is what are you all scanning with for the 130 puzzle, even when JeanLucPons himself clearly states, twice, that you cannot find 130 with his Kangaroo:

Quote
This program is limited to a 125bit interval search
Quote
(Not possible with this program without modification)



There is Kangaroo-256-bit on github.  Fastest thing for CPU I've tried so far.
Puzzle 65 for ~50 seconds on AMD Ryzen 9 7950X

Where is the Windows version ? not found
jr. member
Activity: 42
Merit: 0
My question is what are you all scanning with for the 130 puzzle, even when JeanLucPons himself clearly states, twice, that you cannot find 130 with his Kangaroo:

Quote
This program is limited to a 125bit interval search
Quote
(Not possible with this program without modification)



There is Kangaroo-256-bit on github.  Fastest thing for CPU I've tried so far.
Puzzle 65 for ~50 seconds on AMD Ryzen 9 7950X
newbie
Activity: 13
Merit: 0
My question is what are you all scanning with for the 130 puzzle, even when JeanLucPons himself clearly states, twice, that you cannot find 130 with his Kangaroo:

Quote
This program is limited to a 125bit interval search
Quote
(Not possible with this program without modification)

jr. member
Activity: 65
Merit: 1
34Sf4DnMt3z6XKKoWmZRw2nGyfGkDgNJZZ

What is this, that works with gpu?

This is kangaroo, you can find it in this repo:

https://github.com/JeanLucPons/Kangaroo

As you see, it's works well with gpu.

So you couldn't use It tô get 130°?
How long It will take?

Avg 463.6y with single RTX 3090 GPU (Kangaroo-256-bit).
I need approximately 464 GPUs to solve the problem in 1 year.

Shall I ask something? You said that if we run 1 3090 graphics card with the kangaroo program, it will be found in 463 years. Well, isn't it possible to find this earlier? What if we have a lot of luck?  For example, can't we run the kangaroo program and find it within 1 hour?



As a mathematician, not a programmer, I am deeply familiar with the challenges by very large numbers. The algorithm we are discussing is probabilistic, meaning its runtime can vary depending on chance, though on average, it follows a predictable pattern. The difficulty of this problem is directly related to the size of the numbers involved.

In this algorithm, "kangaroos" (which represent random walks) attempt to land on the target (the solution) within this galactically vast numerical space. The likelihood of finding the solution within an hour, or any short period, is exceedingly low, though not entirely zero.

Given the complexity and the size of the numbers, it is more plausible that we will need to invent a new algorithm or a combination of different algorithms to solve such a problem, rather than relying on luck to solve Puzzle 130.


You approached the issue very professionally, you are absolutely right.
But the people looking for the puzzle here are all poor. He runs the kangaroo program with 1 graphics card and waits for luck to come his way. but we should not forget this. LUCK BEATS MATHEMATICS
Although it is very unlikely, puzzle 130 can be found with a 1 3090 graphics card.
jr. member
Activity: 42
Merit: 0

What is this, that works with gpu?

This is kangaroo, you can find it in this repo:

https://github.com/JeanLucPons/Kangaroo

As you see, it's works well with gpu.

So you couldn't use It tô get 130°?
How long It will take?

Avg 463.6y with single RTX 3090 GPU (Kangaroo-256-bit).
I need approximately 464 GPUs to solve the problem in 1 year.

Shall I ask something? You said that if we run 1 3090 graphics card with the kangaroo program, it will be found in 463 years. Well, isn't it possible to find this earlier? What if we have a lot of luck?  For example, can't we run the kangaroo program and find it within 1 hour?



As a mathematician, not a programmer, I am deeply familiar with the challenges by very large numbers. The algorithm we are discussing is probabilistic, meaning its runtime can vary depending on chance, though on average, it follows a predictable pattern. The difficulty of this problem is directly related to the size of the numbers involved.

In this algorithm, "kangaroos" (which represent random walks) attempt to land on the target (the solution) within this galactically vast numerical space. The likelihood of finding the solution within an hour, or any short period, is exceedingly low, though not entirely zero.

Given the complexity and the size of the numbers, it is more plausible that we will need to invent a new algorithm or a combination of different algorithms to solve such a problem, rather than relying on luck to solve Puzzle 130.
jr. member
Activity: 43
Merit: 1

What is this, that works with gpu?

This is kangaroo, you can find it in this repo:

https://github.com/JeanLucPons/Kangaroo

As you see, it's works well with gpu.

So you couldn't use It tô get 130°?
How long It will take?

Avg 463.6y with single RTX 3090 GPU (Kangaroo-256-bit).
I need approximately 464 GPUs to solve the problem in 1 year.

Shall I ask something? You said that if we run 1 3090 graphics card with the kangaroo program, it will be found in 463 years. Well, isn't it possible to find this earlier? What if we have a lot of luck?  For example, can't we run the kangaroo program and find it within 1 hour?

it can be done in 1 second
jr. member
Activity: 65
Merit: 1
34Sf4DnMt3z6XKKoWmZRw2nGyfGkDgNJZZ

What is this, that works with gpu?

This is kangaroo, you can find it in this repo:

https://github.com/JeanLucPons/Kangaroo

As you see, it's works well with gpu.

So you couldn't use It tô get 130°?
How long It will take?

Avg 463.6y with single RTX 3090 GPU (Kangaroo-256-bit).
I need approximately 464 GPUs to solve the problem in 1 year.

Shall I ask something? You said that if we run 1 3090 graphics card with the kangaroo program, it will be found in 463 years. Well, isn't it possible to find this earlier? What if we have a lot of luck?  For example, can't we run the kangaroo program and find it within 1 hour?
member
Activity: 122
Merit: 11
For 130 I have no idea what to do. I'm not a developer. But it is obvious that we need an acceleration of at least 100 times in the software itself. I don't know if that's possible. But you should think.
Solving 130-bit is equivalent to solving 4 to 8 billion 66-bit puzzles, so anyone can make their own math - do you have 8 billion seconds to spare? You can't "accelerate" something that runs at full speed and is already accelerated to the max. At most just make things run 2x, 3x, 10x faster on the same hardware, using all possible optimization tricks. But there is still the limitation based on the best known algorithm itself, software is just an implementation of theory, you can never have software or hardware that solves a problem faster than the underlying algorithm. There is no known "acceleration" of the theory, otherwise the problem would have been solved since long ago.

^^^^^^ Wise words ^^^^^^
sr. member
Activity: 653
Merit: 316
snip
Etar-Kangaroo : no result
snip
Because minimal range is 2^32. Try something like:
-rb 2F633CBE3EC02B9401000000007000000 -re 2F633CBE3EC02B9408000000009000000
and result will be.
member
Activity: 165
Merit: 26
For 130 I have no idea what to do. I'm not a developer. But it is obvious that we need an acceleration of at least 100 times in the software itself. I don't know if that's possible. But you should think.
Solving 130-bit is equivalent to solving 4 to 8 billion 66-bit puzzles, so anyone can make their own math - do you have 8 billion seconds to spare? You can't "accelerate" something that runs at full speed and is already accelerated to the max. At most just make things run 2x, 3x, 10x faster on the same hardware, using all possible optimization tricks. But there is still the limitation based on the best known algorithm itself, software is just an implementation of theory, you can never have software or hardware that solves a problem faster than the underlying algorithm. There is no known "acceleration" of the theory, otherwise the problem would have been solved since long ago.
jr. member
Activity: 42
Merit: 0

I understand about the probability, but you all say about that method how it's exactly, sure works. ..  "in X seconds, the 66 will be cracked", "in Y seconds, the 67 will be cracked", and also I thought that method, bsgs, and this software was publicly on github, after all the method is known.

But thank you.


On a better CPU, 66 can be solved in X seconds on a kangaroo. There's no need to buy a fancy GPU for the bot.

Yes, but I saying about 130 and higher.. but ok.
I get It.

For 130 I have no idea what to do. I'm not a developer. But it is obvious that we need an acceleration of at least 100 times in the software itself. I don't know if that's possible. But you should think.
Pages:
Jump to: