Pages:
Author

Topic: BitCrack - A tool for brute-forcing private keys - page 32. (Read 75623 times)

newbie
Activity: 26
Merit: 2
a.a
 Error: CL_INVALID_COMMAND_QUEUE: the specified command-queue is not a valid command-queue
 Error: CL_MEM_OBJECT_ALLOCATION_FAILURE: Failed to allocate memory for buffer object
jr. member
Activity: 77
Merit: 7
I am getting about 1100 MKey/s on my RTX 3070, I was wondering what settings other people used as I recently seen a screenshot of someone running a 3060 Ti with very similar numbers although the 3070 has more CUDA cores. 3070: 5888 CUDA Cores, 3060 Ti: 4864 CUDA Cores

Currently using: -b 128 -t 512 -p 756

so it is most likely a problem with my settings. what are other people with RTX 3070's using?
a.a
member
Activity: 126
Merit: 36
a.a
member
Activity: 126
Merit: 36
Please a little more specific?

What errors do you get?
newbie
Activity: 26
Merit: 2
a.a    your release is not working
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
Does somebody has an idea why there is an keyFinderKernelWithDouble additionally to keyFinderKernel. What is the point of this additional function?

After further inspection I found that the call chain trickles down like this:

keyFinderKernelWithDouble
|
v
doIterationWithDouble        -> CompleteBatchAddWithDouble
|
v
BeginBatchAddWithDouble

The same call chain appears for keyFinderKernel but without the "WithDouble" suffix.

keyFinderKernel and doIteration don't do anything different apart from calling these differently named functions, it's in Begin/CompleteBatchAdd where the interesting stuff happens.

Code:

__device__ __forceinline__ static void beginBatchAdd(const unsigned int *px, const unsigned int *x, unsigned int *chain, int i, int batchIdx, unsigned int inverse[8])
{
// x = Gx - x
unsigned int t[8];
subModP(px, x, t);

        ...
}


__device__ __forceinline__ static void beginBatchAddWithDouble(const unsigned int *px, const unsigned int *py, unsigned int *xPtr, unsigned int *chain, int i, int batchIdx, unsigned int inverse[8])
{
unsigned int x[8];
readInt(xPtr, i, x);

if(equal(px, x)) {
addModP(py, py, x);
} else {
// x = Gx - x
subModP(px, x, x);
}
        ...
}

Notice how there's an extra argument "py" (appears to be generator point y according to the comments) in the WithDouble function that doesn't appear in the non-double function. It appears to be doubling the py value if subtracting px-x (Gx-x) would make the point at 0, whilst there's no such protective measure in the non-double function.

In CompleteBatchAddWithDouble the only different snippet is this:

Code:

if(equal(px, x)) {
// currently s = 1 / 2y

unsigned int x2[8];
unsigned int tx2[8];

// 3x^2
mulModP(x, x, x2);
addModP(x2, x2, tx2);
addModP(x2, tx2, tx2);


// s = 3x^2 * 1/2y
mulModP(tx2, s);

// s^2
unsigned int s2[8];
mulModP(s, s, s2);

// Rx = s^2 - 2px
subModP(s2, x, newX);
subModP(newX, x, newX);

// Ry = s(px - rx) - py
unsigned int k[8];
subModP(px, newX, k);
mulModP(s, k, newY);
subModP(newY, py, newY);

} else {

unsigned int rise[8];
subModP(py, y, rise);

mulModP(rise, s);

// Rx = s^2 - Gx - Qx
unsigned int s2[8];
mulModP(s, s, s2);

subModP(s2, px, newX);
subModP(newX, x, newX);

// Ry = s(px - rx) - py
unsigned int k[8];
subModP(px, newX, k);
mulModP(s, k, newY);
subModP(newY, py, newY);
}

Specifically, this part is not in the non-double counterpart, while the rest are in there:

Code:
// 3x^2
mulModP(x, x, x2);
addModP(x2, x2, tx2);
addModP(x2, tx2, tx2);


// s = 3x^2 * 1/2y
mulModP(tx2, s);

// s^2
unsigned int s2[8];
mulModP(s, s, s2);

So it looks like the only changes are using the double of py in the beginBatchAdd and s = (3x^2 * 1/2y)^2 in the completeBatchAdd

in other words we just double Gy and use that if our search stumbles upon Gx,Gy point by chance, and we use an s = (3Gx^2 * 1/2Gy)^2 used to calculate the next point (Rx,Ry).

This might also explain why the main code only calls keyFinderKernelWithDouble and never keyFinderKernel.
a.a
member
Activity: 126
Merit: 36
Does somebody has an idea why there is an keyFinderKernelWithDouble additionally to keyFinderKernel. What is the point of this additional function?
member
Activity: 272
Merit: 20
the right steps towerds the goal
[2021-06-03.15:54:20] [Info] Compression: compressed
[2021-06-03.15:54:20] [Info] Starting at: 00000000000000000000000000000000000000000000000000000000CDB4C578
[2021-06-03.15:54:20] [Info] Ending at:   000000000000000000000000000000000000000000000000FFFFFFFFFFFFFFFF
[2021-06-03.15:54:20] [Info] Counting by: 0000000000000000000000000000000000000000000000000000000100000000
[2021-06-03.15:54:20] [Info] Initializing NVIDIA GeForce RTX 3060 Ti
[2021-06-03.15:54:21] [Info] Generating 39,845,888 starting points (1520.0MB)
[2021-06-03.15:54:24] [Info] 10.0%
[2021-06-03.15:54:24] [Info] 20.0%
[2021-06-03.15:54:25] [Info] 30.0%
[2021-06-03.15:54:25] [Info] 40.0%
[2021-06-03.15:54:25] [Info] 50.0%
[2021-06-03.15:54:25] [Info] 60.0%
[2021-06-03.15:54:25] [Info] 70.0%
[2021-06-03.15:54:26] [Info] 80.0%
[2021-06-03.15:54:26] [Info] 90.0%
[2021-06-03.15:54:26] [Info] 100.0%
[2021-06-03.15:54:26] [Info] Done
[2021-06-03.15:54:26] [Info] Loading addresses from 'd:/1.txt'
[2021-06-03.15:54:26] [Info] 105 addresses loaded (0.0MB)
[2021-06-03.15:54:26] [Info] Allocating bloom filter (0.0MB)
NVIDIA GeForce R 4684 / 8192MB | 105 targets 1000.03 MKey/s (3,506,438,144 total) [00:00:01][2021-06-03.15:54:30] [Info] Reached end of keyspace

with 1 3060ti = 1bk/s , 2^1 to 2^32, solve in 1 second
with 10 3060ti = 10bk/s , 2^1 to 2^36, solve in 1 second
with 100 3060ti = 100bk/s , 2^1 to 2^40, solve in 1 second
with 1000 3060ti = 1000bk/s , 2^1 to 2^44, solve in 1 second
with 10000 3060ti = 10000bk/s , 2^1 to 2^48, solve in 1 second
with 100000 3060ti = 100000bk/s , 2^1 to 2^52, solve in 1 second
with 1000000 3060ti = 100000bk/s , 2^1 to 2^56, solve in 1 second
with 10000000 3060ti = 1000000bk/s , 2^1 to 2^60, solve in 1 second
with 100000000 3060ti = 10000000bk/s , 2^1 to 2^64, solve in 1 second

Finally, I need 100 million 3060ti to solve puzzle 64 in 1 second, am I right?






a.a
member
Activity: 126
Merit: 36
I refactored more. Still trying to determine your Issue. Locally it works with my P620 Quadro and my Radeon Vega56. I use now Radeon Developer Tools and there I get some similar errors. So i will try to fix those issues and hoping solving the issues you mentioned.

I am now at about 95 MKeys/s. So I improved the performance about 30 Mkeys so about 45 % faster than standard OpenCL Bitcrack

When doing clBitCrack.exe -i addresses.txt --keyspace 1:fffffffffff -b 512 -t 256 -p 256 I get now about 360 MKeys/s

EDIT:

After reading again your remark:
Actually this should be solved now. But the remarks in Radeon Developer Tools make me think. So I will fix them also. Then the tool should be a piece of cake. Wink.


Also forgot to mention:
Now you get a System Beep when you find a key.
newbie
Activity: 9
Merit: 0
Hmm yeah I am on it. What gpu are you using?

RTX 3060Ti on home box

GTX 1080 on work box

Win 10 on both box and same error
a.a
member
Activity: 126
Merit: 36
Hmm yeah I am on it. What gpu are you using?
newbie
Activity: 9
Merit: 0
When you have time could you compile an .exe release please? Would like to test it vs. the original. Thanks.

Done:
https://github.com/Uzlopak/BitCrackOpenCL/releases/tag/v.0.4.0

I also tested the performance:
On my System with a Vega56 I had at the beginning with the old clBitCrack about 58 MKeys/s. Now I get about 83 MKeys/s. Keep in mind: This is just without any specified -b -t -p parameters, which will increase the throughput significally. with -p 5000. I get about 215 MKeys/s. With the old clBitCrack I get with -b 5000 about 190 MKeys/s. If I change only -b to 512 I get about 130 MKeys/s

So the 30 % Improvement are significantly when having low pressure on the GPU. On big load I get "just" 12 % improvement.

I read you can get the bytecode of the on-the-fly compiled openCL part.I will probably implement a solution to store the compiled openCL bytecode in the same folder and load if you run again clBitCrack. Thus should speed up the starting speed... Testing the best parameters should be then not taking always 20 seconds, because OpenCL is building.....

EDIT:
I get with clBitCrack.exe -i addresses.txt --keyspace 1:fffffffffff -b 512 -t 256 -p 256 about 268 MKeys/s

With the old clBitCrack I get in the same case about 210 MKeys/s

So the performance gain is definetely there Wink

Got this error:

[2021-05-28.22:18:44] [Info] Compiling OpenCL kernels...
[2021-05-28.22:18:44] [Info] Error: :518:10: error: implicit conversion from address space "generic" to address space "private" is not supported when passing to parameter of destination type
                                 addc(&a[7], &P[7], &carry, &c[7]);
                                      ^~~~~
                             :316:25: note: passing argument to parameter 'a' here
                             void addc(unsigned int *a, unsigned int *b, unsigned int *carry, unsigned int *sum)
                                                     ^
                             :518:32: error: implicit conversion from address space "generic" to address space "private" is not supported when passing to parameter of destination type
                                 addc(&a[7], &P[7], &carry, &c[7]);
                                                            ^~~~~

"Original" bitcrack works normal.
Suggestion?
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
I am looking for c code snippet, which covers the whole privatekey to pubkey compressed and uncompressed in a simple manner or a code snippet utilizing such a c-library?

I want to generate privatekeys and put them directly into secp256k1 to get the both pubkeys, but it seems, that the one I find are all optimized for point addition.

Any suggestions?

Better to make one ourselves that links with libsecp256k1 and use our own structures for representing points and uint256's for maximum speed (let's raid the uint256 class in Bitcoind  Cheesy)
a.a
member
Activity: 126
Merit: 36
I am looking for c code snippet, which covers the whole privatekey to pubkey compressed and uncompressed in a simple manner or a code snippet utilizing such a c-library?

I want to generate privatekeys and put them directly into secp256k1 to get the both pubkeys, but it seems, that the one I find are all optimized for point addition.

Any suggestions?
full member
Activity: 1148
Merit: 237
Shooters Shoot...
With the pool/program, you can't fake a range searched.

Well... I could do the same by just forking the client or sniffing the network commucation and sending garbage to the server... There is shares you can create like in pooled mining.
The pool creates an address within the range you are assigned to (along with the #64 address) and you must find that address' private key and send it back to the server before it will consider the range searched.

And most ask for more than 1 range at a time. For example, if you want to scan 16 ranges at once (to limit startup time from just running 1 small range) then you will have to find 16 generated addresses within the range and send back to server.

So if your program sends back garbage, i.e. it doesn't send the correct private keys to the randomly generated addresses assigned to you/your range, then the range will not be considered searched.

So if you have a way to "guess" private keys to addresses correctly, without actually searching a range, well...you can find any private key to any address and you own Bitcoin.

member
Activity: 170
Merit: 58
Puzzle 64 is between --keyspace D450000000000000:D46fffffffffffff Let's see who solve it first..

B*sh*t.

1) How do you know it is between D45...D47, not before D45 and not after D47?
2) You say that because you have found several addresses which starts with 16jY7qLJ, the final address must be somewhere close. Of course not. Address is a result of double hash on public key -> so you say that there is a range of private keys which produce public keys which double(!) hashed produce similar addresses. Nonsense.

Show me proof that key CANNOT be outside of your range.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
With the pool/program, you can't fake a range searched.

Well... I could do the same by just forking the client or sniffing the network commucation and sending garbage to the server... There is shares you can create like in pooled mining.

Anyone who's going to make a pool must have a method for banning spamming clients so that the pool server doesn't get overloaded.

fail2ban paired with a firewall should do the trick. I've never been good at managing firewalls though, and I believe fail2ban uses iptables.
a.a
member
Activity: 126
Merit: 36
With the pool/program, you can't fake a range searched.

Well... I could do the same by just forking the client or sniffing the network commucation and sending garbage to the server... There is shares you can create like in pooled mining.
full member
Activity: 1148
Merit: 237
Shooters Shoot...
good or not if we open new thread post only keyspace that already scan it

like public pool share keyspace scaned

if you scan what keyspace for what puzzle done already post to forum it not found key


I think that's a good idea. we have to create a pool something like that https://bitcointalksearch.org/topic/m.52933921 same... except here the rewards are splits in computing power of a person but our pool would be based on luck.
With no risk/reward you will just have trolls saying they searched x y and z ranges, just to troll or to keep one away from the range.

With the pool/program, you can't fake a range searched.

But you all can start the thread, just be careful with whom you trust.
Pages:
Jump to: