Pages:
Author

Topic: BitCrack - A tool for brute-forcing private keys - page 66. (Read 77200 times)

full member
Activity: 1232
Merit: 242
Shooters Shoot...

may I try your modified VanitySearch with keyspace search?

I would honestly not recommend using non-stable software. If I get it to work properly and understand the part going wrong, I'm happy to share. But at this moment, it needs front-running code to restart it. I would be feeling guilty and spending time helping people out, instead of fixing the real problem. I was hoping to trigger someone on the code part going wrong, rather then making people run unstable  Undecided
For staying in a certain range, (must be a small range); do you want it to end or push back into the range? Bitcrack ends, Kangaroo pushes back. Which route are you trying to go? To end, need last key function...
jr. member
Activity: 36
Merit: 3
What in/why is the copyBigInt() causing error?  Same error appears in multiple programs written prior to release of RTX 30xx cards.

It's not copyBigInt() itself that's problematic (it's a simple element-wise assignment)  but one of the arrays passed to it which is not aligned. CUDA wants all arrays aligned to 32-but boundaries and one of the arrays that eventually reaches copyBigInt() comes from "xp" and "x" pointer arguments of beginBatchAdd()...these are passed to SubModP() and the result is stored in an 8-element int array that's then passed to MulModP() and from there to copyBigInt().

At first it wasn't clear to me where this error was coming from because the problem disappeared in debug mode, so I could not use the debugger. That's right, if you pass -g -G switches to NVCC, you get a working but extremely slow bitcrack binary.

I tried draconian measures in a attempt to fix this like unrolling the loop, changing the array assignment to memcpy(), qualifying it with __restrict__ and __align__ keywords and I even changed it to a #define statement but the destination and source arrays just don't want to be accessed (since these arrays cannot even be used in the parent function, the problem stems deeper). More bafflingly, assigning a constant to an element in the dest array or making a variable that's initialized to an element from src works but this obviously breaks the elliptic curve stuff.

This is supposed to be performance-critical code so I did not attempt to change the static array to malloc.



For the uninitiated: this is where the bug is: https://github.com/brichard19/BitCrack/blob/master/cudaMath/secp256k1.cuh

CudaMath/secp256k1.cuh, everything in here are inline functions.

We arrive here from CudaKeySearchDevice via beginBatchAdd() and beginBatchAddWithDouble(). Both of these functions call MulModP for point multiplication. Methods like that need to copy to and from temporary arrays. Somehow the arrays being passed are not on an alignment boundary, and I'm honestly not sure what to do. (Of course, rewriting the whole secp256k1 module is also an option but really...? That's like opening a nut with a sledgehammer.)

Been following your debugging by hand, as the debugger runs versus the release crashing. I'm nowhere close to the base-function as you, but it seems I'm hitting a different path. You're saying it starts from "beginBatchAdd".

I know the following breaks the code, but just for finding the issue: if you comment out the following part
https://github.com/brichard19/BitCrack/blob/master/CudaKeySearchDevice/CudaKeySearchDevice.cu#L179-L190
The code runs for me (ofc its broken now).

The interesting part is, "doBatchInverse" as running the upfollowing loop will make it crash, while the loop never hits "completeBatchAdd".

May be hitting a different issue? Or did you mean "completeBatchAdd"?


Edit:

nvm, I didn't undo my function overwrites. It indeed bubbles from subModP.
https://github.com/brichard19/BitCrack/blob/master/cudaMath/secp256k1.cuh#L646

We're on the same track (i think), thank god Smiley *digging*

Edit 2:

Installed all the proper tools to debug simultaneous threads. The following breakpoint got hit.

Thats it for now, time for sleep Wink

Btw: when running in legacy mode (old hardware compatible), it was running fine using nsight. I’m not sure what flag that is on regular CUDA builds yet, just pressed the wrong button and was waiting for it to crash, totally didn’t. Will check tomorrow what speed that was on, could be interesting as fast-fix.


For staying in a certain range, (must be a small range); do you want it to end or push back into the range? Bitcrack ends, Kangaroo pushes back. Which route are you trying to go? To end, need last key function...


Just ends, its not that complicated. The CUDA part is just a little to much above my understanding atm. The Bitcrack parts are easier to understand for me at least.
full member
Activity: 431
Merit: 105

may I try your modified VanitySearch with keyspace search?

I would honestly not recommend using non-stable software. If I get it to work properly and understand the part going wrong, I'm happy to share. But at this moment, it needs front-running code to restart it. I would be feeling guilty and spending time helping people out, instead of fixing the real problem. I was hoping to trigger someone on the code part going wrong, rather then making people run unstable  Undecided

heb tijd and uit amsterdam, pm maar door, would be
able to do some testing in spare time,
jr. member
Activity: 36
Merit: 3

may I try your modified VanitySearch with keyspace search?

I would honestly not recommend using non-stable software. If I get it to work properly and understand the part going wrong, I'm happy to share. But at this moment, it needs front-running code to restart it. I would be feeling guilty and spending time helping people out, instead of fixing the real problem. I was hoping to trigger someone on the code part going wrong, rather then making people run unstable  Undecided
member
Activity: 282
Merit: 20
the right steps towerds the goal
I've modified it to do keyspace search on CUDA 11.2 on my RTX 30XX cards. It just keeps going out of bound at random, so releasing it will just fill my issues with "this doesn't work" Smiley
Using small grids, you could keep it running for a bit, but still wouldn't be as stable to put my name on it.

may I try your modified VanitySearch with keyspace search?
jr. member
Activity: 36
Merit: 3
When you say modified VanitySearch, what do you mean? How is it modified? Still searching for vanity/prefixes or doing a search sequentially like bitcracK? Vanity in general, is much more faster than bitcrack.

I've modified it to do keyspace search on CUDA 11.2 on my RTX 30XX cards. It just keeps going out of bound at random, so releasing it will just fill my issues with "this doesn't work" Smiley
Using small grids, you could keep it running for a bit, but still wouldn't be as stable to put my name on it.
full member
Activity: 1232
Merit: 242
Shooters Shoot...
Really wonder if someone was able to run this against compute_75 & what speed bitcrack would hit. I've been running a modified VanitySearch, doing 4.6GK/s on a single 3090. Sadly due to the 86k threads it trying to fill, it goes out of bounds now & then (GPU/GPUCompute.h:54). Just cannot wrap my head around that funny one yet. But besides of me trying to understand that & learning a lot, CUDA should be doing something near that speed on bitcrack too  Tongue

Neat idea. I might give that a go and submit a pull request or fork BitCrack with that function. It should be possible.

Edit: my repo is at https://github.com/bitcoinforktech/BitCrack.git which will have some updates in the next few days.

Yeah, cuda on bitcrack has this interesting problem on the new drivers. Will try with line info later, was just doing a quick run of your repo.

Code:
[2021-01-19.17:31:52] [Info] Error: misaligned address
========= Misaligned Shared or Local Address
=========     at 0x0000e610 in keyFinderKernelWithDouble(int, int)
=========     by thread (160,0,0) in block (0,0,0)
When you say modified VanitySearch, what do you mean? How is it modified? Still searching for vanity/prefixes or doing a search sequentially like bitcracK? Vanity in general, is much more faster than bitcrack.
jr. member
Activity: 36
Merit: 3
Really wonder if someone was able to run this against compute_75 & what speed bitcrack would hit. I've been running a modified VanitySearch, doing 4.6GK/s on a single 3090. Sadly due to the 86k threads it trying to fill, it goes out of bounds now & then (GPU/GPUCompute.h:54). Just cannot wrap my head around that funny one yet. But besides of me trying to understand that & learning a lot, CUDA should be doing something near that speed on bitcrack too  Tongue

Neat idea. I might give that a go and submit a pull request or fork BitCrack with that function. It should be possible.

Edit: my repo is at https://github.com/bitcoinforktech/BitCrack.git which will have some updates in the next few days.

Yeah, cuda on bitcrack has this interesting problem on the new drivers. Will try with line info later, was just doing a quick run of your repo.

Code:
[2021-01-19.17:31:52] [Info] Error: misaligned address
========= Misaligned Shared or Local Address
=========     at 0x0000e610 in keyFinderKernelWithDouble(int, int)
=========     by thread (160,0,0) in block (0,0,0)

Edit:
Most fascinating thing about this issue, is that it runs my full test keyspace in debug exe (400M)[ofc slow af], the release crashes on the error above.
jr. member
Activity: 32
Merit: 4
Quote from: NotATether
But it should be possible to brute-force bc1 addresses since those also use private keys, if that's not implemented that'll make yet another good science fair project or even a Google Summer of Code project  Grin

Neat idea. I might give that a go and submit a pull request or fork BitCrack with that function. It should be possible.

Edit: my repo is at https://github.com/bitcoinforktech/BitCrack.git which will have some updates in the next few days.
newbie
Activity: 18
Merit: 0
clbitcrack has always had issues, still does, and did you change the compute_cap in your makefile,
otherwise compiling the cubitcrack won't succeed. even if succeeded won't work with your hardware.
change it accordingly to your hardware.

I didn't change the compute_cap value and was still able to compile cubitcrack on windows simply by updating the references to CUDA 10.1 to 11.2 and making sure project resources were in the correct locations. I'm not able to actually run it because of the "misaligned address" error, so I am using clbitcrack instead until someone is able to fix cubitcrack and allow it to run again with CUDA 11.2+.

In the meantime, what issues should I expect clbitcrack to have running on Windows? I'm not working with any P2SH addresses. What other issues would cause clbitcrack to not find a private key, as @yoyodapro mentioned?

https://github.com/brichard19/BitCrack/issues/81
this was the main reason i said that, besides you can test it out easily if it works o.o.t.b.

Thanks for the link, I read up on the known issue. I then performed the test with the provided list of 18 addresses using Win64 + clbitcrack + 3090 + CUDA 11.2 and it found all keys in the list.
So what is your speed with 3090? If it's not doubling a 2080Ti, is it worth it? Meaning, it's great that it runs, but is it running as it should be, MH/s wise? I've only found one program that truly utilizes the new 30xx cards, on windows; but source code is not available.

Best I got was 1050MKey/sec
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
if I understood correctly you have P2SH addresses in the list
they start with "3"
BitCrack does not accept them

That's because "3" addresses are all P2SH addresses which are the RIPEMD160 hashes of a script. The addresses that haven't encoded a segwit script that is. Bitcrack's using a bloom filter that can quickly check if a hash of a private key matches a bunch of RIPEMD160 hashes of the input addresses (that's why it's more efficient to put many addresses in the input file at once).

A script is not generated from random bytes like a private key (according to this pictograph), but it's just a redeem script anyway if we somehow were to obtain the public script for such addresses, or guess what kinda math problem someone would make into a redeem script, then only the solution to that problem  (which is sometimes very easy) has to be brute-forced to spend the input, and bitcrack is completely incapable of doing because it works in terms of private keys.

But it should be possible to brute-force bc1 addresses since those also use private keys, if that's not implemented that'll make yet another good science fair project or even a Google Summer of Code project  Grin
full member
Activity: 1232
Merit: 242
Shooters Shoot...
clbitcrack has always had issues, still does, and did you change the compute_cap in your makefile,
otherwise compiling the cubitcrack won't succeed. even if succeeded won't work with your hardware.
change it accordingly to your hardware.

I didn't change the compute_cap value and was still able to compile cubitcrack on windows simply by updating the references to CUDA 10.1 to 11.2 and making sure project resources were in the correct locations. I'm not able to actually run it because of the "misaligned address" error, so I am using clbitcrack instead until someone is able to fix cubitcrack and allow it to run again with CUDA 11.2+.

In the meantime, what issues should I expect clbitcrack to have running on Windows? I'm not working with any P2SH addresses. What other issues would cause clbitcrack to not find a private key, as @yoyodapro mentioned?

https://github.com/brichard19/BitCrack/issues/81
this was the main reason i said that, besides you can test it out easily if it works o.o.t.b.

Thanks for the link, I read up on the known issue. I then performed the test with the provided list of 18 addresses using Win64 + clbitcrack + 3090 + CUDA 11.2 and it found all keys in the list.
So what is your speed with 3090? If it's not doubling a 2080Ti, is it worth it? Meaning, it's great that it runs, but is it running as it should be, MH/s wise? I've only found one program that truly utilizes the new 30xx cards, on windows; but source code is not available.
newbie
Activity: 18
Merit: 0
clbitcrack has always had issues, still does, and did you change the compute_cap in your makefile,
otherwise compiling the cubitcrack won't succeed. even if succeeded won't work with your hardware.
change it accordingly to your hardware.

I didn't change the compute_cap value and was still able to compile cubitcrack on windows simply by updating the references to CUDA 10.1 to 11.2 and making sure project resources were in the correct locations. I'm not able to actually run it because of the "misaligned address" error, so I am using clbitcrack instead until someone is able to fix cubitcrack and allow it to run again with CUDA 11.2+.

In the meantime, what issues should I expect clbitcrack to have running on Windows? I'm not working with any P2SH addresses. What other issues would cause clbitcrack to not find a private key, as @yoyodapro mentioned?

https://github.com/brichard19/BitCrack/issues/81
this was the main reason i said that, besides you can test it out easily if it works o.o.t.b.

Thanks for the link, I read up on the known issue. I then performed the test with the provided list of 18 addresses using Win64 + clbitcrack + 3090 + CUDA 11.2 and it found all keys in the list.
full member
Activity: 1232
Merit: 242
Shooters Shoot...
clbitcrack has always had issues, still does, and did you change the compute_cap in your makefile,
otherwise compiling the cubitcrack won't succeed. even if succeeded won't work with your hardware.
change it accordingly to your hardware.

I didn't change the compute_cap value and was still able to compile cubitcrack on windows simply by updating the references to CUDA 10.1 to 11.2 and making sure project resources were in the correct locations. I'm not able to actually run it because of the "misaligned address" error, so I am using clbitcrack instead until someone is able to fix cubitcrack and allow it to run again with CUDA 11.2+.

In the meantime, what issues should I expect clbitcrack to have running on Windows? I'm not working with any P2SH addresses. What other issues would cause clbitcrack to not find a private key, as @yoyodapro mentioned?
If you take away the last 8 or 9 characters, does the issue of "misaligned address" go away?
example:
original address 13x7a9384def882923xxxxxxxx
change to 13x7a9384def882923

Also, play with your driver version. I believe I was able to roll back down and use cubitcrack with a 3070. However, no matter which driver or CUDA, it still isn't optimized for 30xx series. The 3070 should be getting close or above a 2080Ti.
full member
Activity: 431
Merit: 105
clbitcrack has always had issues, still does, and did you change the compute_cap in your makefile,
otherwise compiling the cubitcrack won't succeed. even if succeeded won't work with your hardware.
change it accordingly to your hardware.

I didn't change the compute_cap value and was still able to compile cubitcrack on windows simply by updating the references to CUDA 10.1 to 11.2 and making sure project resources were in the correct locations. I'm not able to actually run it because of the "misaligned address" error, so I am using clbitcrack instead until someone is able to fix cubitcrack and allow it to run again with CUDA 11.2+.

In the meantime, what issues should I expect clbitcrack to have running on Windows? I'm not working with any P2SH addresses. What other issues would cause clbitcrack to not find a private key, as @yoyodapro mentioned?

https://github.com/brichard19/BitCrack/issues/81
this was the main reason i said that, besides you can test it out easily if it works o.o.t.b.
newbie
Activity: 18
Merit: 0
clbitcrack has always had issues, still does, and did you change the compute_cap in your makefile,
otherwise compiling the cubitcrack won't succeed. even if succeeded won't work with your hardware.
change it accordingly to your hardware.

I didn't change the compute_cap value and was still able to compile cubitcrack on windows simply by updating the references to CUDA 10.1 to 11.2 and making sure project resources were in the correct locations. I'm not able to actually run it because of the "misaligned address" error, so I am using clbitcrack instead until someone is able to fix cubitcrack and allow it to run again with CUDA 11.2+.

In the meantime, what issues should I expect clbitcrack to have running on Windows? I'm not working with any P2SH addresses. What other issues would cause clbitcrack to not find a private key, as @yoyodapro mentioned?
newbie
Activity: 26
Merit: 0
I have compiled cuBitCrack using CUDA 11.2, confirmed working on 2080, 2080ti, and 3070 without the multiplication issue preventing private keys from being found.

Can anyone help me create a windows binary?

https://github.com/yoyodapro/BitCrack/releases/tag/v11.2-alpha

Apologize for the late reply to this comment, but @yoyodapro, what "multiplication issue" are you referring to? I've read this entire thread and do not recall an issue like that being mentioned. While I am using a 3090, I was able to compile the current build of bitcrack against CUDA 11.2 but updating the references from CUDA 10.1 to 11.2. While cubitcrack still gives the misaligned address error, clbitcrack seems to work fine, albeit at a slower rate. But curious what issue you are referring to so I can understand how it may or may not affect what I am working on.

if I understood correctly you have P2SH addresses in the list
they start with "3"
BitCrack does not accept them
full member
Activity: 431
Merit: 105
clbitcrack has always had issues, still does, and did you change the compute_cap in your makefile,
otherwise compiling the cubitcrack won't succeed. even if succeeded won't work with your hardware.
change it accordingly to your hardware.
newbie
Activity: 18
Merit: 0
I have compiled cuBitCrack using CUDA 11.2, confirmed working on 2080, 2080ti, and 3070 without the multiplication issue preventing private keys from being found.

Can anyone help me create a windows binary?

https://github.com/yoyodapro/BitCrack/releases/tag/v11.2-alpha

Apologize for the late reply to this comment, but @yoyodapro, what "multiplication issue" are you referring to? I've read this entire thread and do not recall an issue like that being mentioned. While I am using a 3090, I was able to compile the current build of bitcrack against CUDA 11.2 but updating the references from CUDA 10.1 to 11.2. While cubitcrack still gives the misaligned address error, clbitcrack seems to work fine, albeit at a slower rate. But curious what issue you are referring to so I can understand how it may or may not affect what I am working on.
jr. member
Activity: 32
Merit: 4
See this message. Some function is feeding bad pointers to batchBeginAdd() and batchBeginAddWithDouble(), and these are passed to MulModP() --> copyBigInt() to the point where array access of either of copyBigInt() parameters generates athe Misaligned Access exception everyone here is getting.

It might be something in host code that's giving batchBeginAdd some pointer that's incremented by 1 and not aligned or something. I didn't get a chance to check.

Thanks for that, I suspect it needs some alignment specifiers, e.g. __align__(16) for the data.  I will have to compile myself and test.  I will come back here when I've done that with the results.

Cheers!
Pages:
Jump to: