BitCrack - A tool for brute-forcing private keys - page 66.

renedx

jr. member

Activity: 36

Merit: 3

Quote from: bitcoinforktech on January 20, 2021, 07:39:29 AM

I have just installed my 3070 and giving it a go, I've compiled the CUDA version a few times but only for older cards.

I hear that I have to roll back my driver to get it working for 3070, 3080 or 3090 cards, but not sure which one. I can't get it to start at all right now on the RTX 3070, using the driver that comes with CUDA development kit 11.2.

Aside, I think I know where to fix this, if I can just get it to work on my card so I can give it a whirl :/

You wouldn't need to rollback your drivers tho? It should run using the latest CUDA & drivers (and ofc crash due to the error), atleast on windows.
It runs on legacy compatibility mode using _75 (with CUDA injector at ~500M/k/s). Yet nowhere it is documented how to compile against compatibility mode and what the effect of it even is.
https://docs.nvidia.com/cuda/ampere-compatibility-guide/

Docs say I should compile against c86,sm86, so will test what that does using legacy mode. Whatever that mode even does.

Edit:
c86,s86 makes legacy build crash. So it has something to do with Turing vs Ampere CUDA.

Edit2:
So, to view the memory you gotta enable device debugging, but when enabled, it 'works' (slow af ofc). Great.

WanderingPhilospher

full member

Activity: 1232

Merit: 242

Shooters Shoot...

Quote from: bitcoinforktech on January 20, 2021, 07:39:29 AM

Quote from: renedx on January 19, 2021, 10:09:28 AM

Really wonder if someone was able to run this against compute_75 & what speed bitcrack would hit. I've been running a modified VanitySearch, doing 4.6GK/s on a single 3090. Sadly due to the 86k threads it trying to fill, it goes out of bounds now & then (GPU/GPUCompute.h:54). Just cannot wrap my head around that funny one yet. But besides of me trying to understand that & learning a lot, CUDA should be doing something near that speed on bitcrack too Tongue

Quote from: bitcoinforktech on January 16, 2021, 08:36:31 PM

Neat idea. I might give that a go and submit a pull request or fork BitCrack with that function. It should be possible.

Edit: my repo is at https://github.com/bitcoinforktech/BitCrack.git which will have some updates in the next few days.

Yeah, cuda on bitcrack has this interesting problem on the new drivers. Will try with line info later, was just doing a quick run of your repo.

Code:

[2021-01-19.17:31:52] [Info] Error: misaligned address
========= Misaligned Shared or Local Address
========= at 0x0000e610 in keyFinderKernelWithDouble(int, int)
========= by thread (160,0,0) in block (0,0,0)

Edit:
Most fascinating thing about this issue, is that it runs my full test keyspace in debug exe (400M)[ofc slow af], the release crashes on the error above.

I have just installed my 3070 and giving it a go, I've compiled the CUDA version a few times but only for older cards.

I hear that I have to roll back my driver to get it working for 3070, 3080 or 3090 cards, but not sure which one. I can't get it to start at all right now on the RTX 3070, using the driver that comes with CUDA development kit 11.2.

Aside, I think I know where to fix this, if I can just get it to work on my card so I can give it a whirl :/

I used either 452 or 456, but I have other cards attached as well.

bitcoinforktech

jr. member

Activity: 32

Merit: 4

Quote from: renedx on January 19, 2021, 10:09:28 AM

Really wonder if someone was able to run this against compute_75 & what speed bitcrack would hit. I've been running a modified VanitySearch, doing 4.6GK/s on a single 3090. Sadly due to the 86k threads it trying to fill, it goes out of bounds now & then (GPU/GPUCompute.h:54). Just cannot wrap my head around that funny one yet. But besides of me trying to understand that & learning a lot, CUDA should be doing something near that speed on bitcrack too Tongue

Quote from: bitcoinforktech on January 16, 2021, 08:36:31 PM

Neat idea. I might give that a go and submit a pull request or fork BitCrack with that function. It should be possible.

Edit: my repo is at https://github.com/bitcoinforktech/BitCrack.git which will have some updates in the next few days.

Yeah, cuda on bitcrack has this interesting problem on the new drivers. Will try with line info later, was just doing a quick run of your repo.

Code:

[2021-01-19.17:31:52] [Info] Error: misaligned address
========= Misaligned Shared or Local Address
========= at 0x0000e610 in keyFinderKernelWithDouble(int, int)
========= by thread (160,0,0) in block (0,0,0)

Edit:
Most fascinating thing about this issue, is that it runs my full test keyspace in debug exe (400M)[ofc slow af], the release crashes on the error above.

I have just installed my 3070 and giving it a go, I've compiled the CUDA version a few times but only for older cards.

I hear that I have to roll back my driver to get it working for 3070, 3080 or 3090 cards, but not sure which one. I can't get it to start at all right now on the RTX 3070, using the driver that comes with CUDA development kit 11.2.

Aside, I think I know where to fix this, if I can just get it to work on my card so I can give it a whirl :/

WanderingPhilospher

full member

Activity: 1232

Merit: 242

Shooters Shoot...

Quote

Just ends, its not that complicated. The CUDA part is just a little to much above my understanding atm. The Bitcrack parts are easier to understand for me at least.

You are correct, not complicated at all. Just ends...that's already been done.

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Quote from: renedx on January 19, 2021, 08:02:04 PM

Btw: when running in legacy mode (old hardware compatible), it was running fine using nsight. I’m not sure what flag that is on regular CUDA builds yet, just pressed the wrong button and was waiting for it to crash, totally didn’t. Will check tomorrow what speed that was on, could be interesting as fast-fix.

Make sure you track where the pointers that were passed to submodp were initialized from. Specifically, if you increment an array pointer by 1 or 2 or something like that in host code and then hand it over to CUDA then it will crap itself. It's too bad that CUDA doesn't have a native 256-bit unsigned type yet. Not only would that be faster but then we could avoid all this trickery to fix it.

Maybe the minimum memory alignment bytes increased for newer GPUs?

Is there a flag in nvcc that'll activate this legacy mode you're talking about? It's kind of frustrating that the code pretends to be fine when using debug flags.

WanderingPhilospher

full member

Activity: 1232

Merit: 242

Shooters Shoot...

Quote from: renedx on January 19, 2021, 02:38:54 PM

Quote from: zahid888 on January 19, 2021, 02:16:58 PM

may I try your modified VanitySearch with keyspace search?

I would honestly not recommend using non-stable software. If I get it to work properly and understand the part going wrong, I'm happy to share. But at this moment, it needs front-running code to restart it. I would be feeling guilty and spending time helping people out, instead of fixing the real problem. I was hoping to trigger someone on the code part going wrong, rather then making people run unstable Undecided

For staying in a certain range, (must be a small range); do you want it to end or push back into the range? Bitcrack ends, Kangaroo pushes back. Which route are you trying to go? To end, need last key function...

renedx

jr. member

Activity: 36

Merit: 3

Quote from: NotATether on January 08, 2021, 08:47:44 PM

Quote from: WanderingPhilospher on January 08, 2021, 06:32:57 PM

What in/why is the copyBigInt() causing error? Same error appears in multiple programs written prior to release of RTX 30xx cards.

It's not copyBigInt() itself that's problematic (it's a simple element-wise assignment) but one of the arrays passed to it which is not aligned. CUDA wants all arrays aligned to 32-but boundaries and one of the arrays that eventually reaches copyBigInt() comes from "xp" and "x" pointer arguments of beginBatchAdd()...these are passed to SubModP() and the result is stored in an 8-element int array that's then passed to MulModP() and from there to copyBigInt().

At first it wasn't clear to me where this error was coming from because the problem disappeared in debug mode, so I could not use the debugger. That's right, if you pass -g -G switches to NVCC, you get a working but extremely slow bitcrack binary.

I tried draconian measures in a attempt to fix this like unrolling the loop, changing the array assignment to memcpy(), qualifying it with __restrict__ and __align__ keywords and I even changed it to a #define statement but the destination and source arrays just don't want to be accessed (since these arrays cannot even be used in the parent function, the problem stems deeper). More bafflingly, assigning a constant to an element in the dest array or making a variable that's initialized to an element from src works but this obviously breaks the elliptic curve stuff.

This is supposed to be performance-critical code so I did not attempt to change the static array to malloc.

For the uninitiated: this is where the bug is: https://github.com/brichard19/BitCrack/blob/master/cudaMath/secp256k1.cuh

CudaMath/secp256k1.cuh, everything in here are inline functions.

We arrive here from CudaKeySearchDevice via beginBatchAdd() and beginBatchAddWithDouble(). Both of these functions call MulModP for point multiplication. Methods like that need to copy to and from temporary arrays. Somehow the arrays being passed are not on an alignment boundary, and I'm honestly not sure what to do. (Of course, rewriting the whole secp256k1 module is also an option but really...? That's like opening a nut with a sledgehammer.)

Been following your debugging by hand, as the debugger runs versus the release crashing. I'm nowhere close to the base-function as you, but it seems I'm hitting a different path. You're saying it starts from "beginBatchAdd".

I know the following breaks the code, but just for finding the issue: if you comment out the following part
https://github.com/brichard19/BitCrack/blob/master/CudaKeySearchDevice/CudaKeySearchDevice.cu#L179-L190
The code runs for me (ofc its broken now).

The interesting part is, "doBatchInverse" as running the upfollowing loop will make it crash, while the loop never hits "completeBatchAdd".

May be hitting a different issue? Or did you mean "completeBatchAdd"?

Edit:

nvm, I didn't undo my function overwrites. It indeed bubbles from subModP.
https://github.com/brichard19/BitCrack/blob/master/cudaMath/secp256k1.cuh#L646

We're on the same track (i think), thank god

*digging*

Edit 2:

Installed all the proper tools to debug simultaneous threads. The following breakpoint got hit.

Thats it for now, time for sleep Wink

Btw: when running in legacy mode (old hardware compatible), it was running fine using nsight. I’m not sure what flag that is on regular CUDA builds yet, just pressed the wrong button and was waiting for it to crash, totally didn’t. Will check tomorrow what speed that was on, could be interesting as fast-fix.

Quote from: WanderingPhilospher on January 19, 2021, 08:53:26 PM

For staying in a certain range, (must be a small range); do you want it to end or push back into the range? Bitcrack ends, Kangaroo pushes back. Which route are you trying to go? To end, need last key function...

Just ends, its not that complicated. The CUDA part is just a little to much above my understanding atm. The Bitcrack parts are easier to understand for me at least.

dextronomous

full member

Activity: 431

Merit: 105

Quote from: renedx on January 19, 2021, 02:38:54 PM

Quote from: zahid888 on January 19, 2021, 02:16:58 PM

may I try your modified VanitySearch with keyspace search?

I would honestly not recommend using non-stable software. If I get it to work properly and understand the part going wrong, I'm happy to share. But at this moment, it needs front-running code to restart it. I would be feeling guilty and spending time helping people out, instead of fixing the real problem. I was hoping to trigger someone on the code part going wrong, rather then making people run unstable Undecided

heb tijd and uit amsterdam, pm maar door, would be
able to do some testing in spare time,

renedx

jr. member

Activity: 36

Merit: 3

Quote from: zahid888 on January 19, 2021, 02:16:58 PM

may I try your modified VanitySearch with keyspace search?

I would honestly not recommend using non-stable software. If I get it to work properly and understand the part going wrong, I'm happy to share. But at this moment, it needs front-running code to restart it. I would be feeling guilty and spending time helping people out, instead of fixing the real problem. I was hoping to trigger someone on the code part going wrong, rather then making people run unstable Undecided

zahid888

member

Activity: 282

Merit: 20

the right steps towerds the goal

Quote from: renedx on January 19, 2021, 01:40:38 PM

I've modified it to do keyspace search on CUDA 11.2 on my RTX 30XX cards. It just keeps going out of bound at random, so releasing it will just fill my issues with "this doesn't work"

Using small grids, you could keep it running for a bit, but still wouldn't be as stable to put my name on it.

may I try your modified VanitySearch with keyspace search?

renedx

jr. member

Activity: 36

Merit: 3

Quote from: WanderingPhilospher on January 19, 2021, 01:37:52 PM

When you say modified VanitySearch, what do you mean? How is it modified? Still searching for vanity/prefixes or doing a search sequentially like bitcracK? Vanity in general, is much more faster than bitcrack.

I've modified it to do keyspace search on CUDA 11.2 on my RTX 30XX cards. It just keeps going out of bound at random, so releasing it will just fill my issues with "this doesn't work"

Using small grids, you could keep it running for a bit, but still wouldn't be as stable to put my name on it.

WanderingPhilospher

full member

Activity: 1232

Merit: 242

Shooters Shoot...

Quote from: renedx on January 19, 2021, 10:09:28 AM

Really wonder if someone was able to run this against compute_75 & what speed bitcrack would hit. I've been running a modified VanitySearch, doing 4.6GK/s on a single 3090. Sadly due to the 86k threads it trying to fill, it goes out of bounds now & then (GPU/GPUCompute.h:54). Just cannot wrap my head around that funny one yet. But besides of me trying to understand that & learning a lot, CUDA should be doing something near that speed on bitcrack too Tongue

Quote from: bitcoinforktech on January 16, 2021, 08:36:31 PM

Neat idea. I might give that a go and submit a pull request or fork BitCrack with that function. It should be possible.

Edit: my repo is at https://github.com/bitcoinforktech/BitCrack.git which will have some updates in the next few days.

Yeah, cuda on bitcrack has this interesting problem on the new drivers. Will try with line info later, was just doing a quick run of your repo.

Code:

[2021-01-19.17:31:52] [Info] Error: misaligned address
========= Misaligned Shared or Local Address
========= at 0x0000e610 in keyFinderKernelWithDouble(int, int)
========= by thread (160,0,0) in block (0,0,0)

When you say modified VanitySearch, what do you mean? How is it modified? Still searching for vanity/prefixes or doing a search sequentially like bitcracK? Vanity in general, is much more faster than bitcrack.

renedx

jr. member

Activity: 36

Merit: 3

Really wonder if someone was able to run this against compute_75 & what speed bitcrack would hit. I've been running a modified VanitySearch, doing 4.6GK/s on a single 3090. Sadly due to the 86k threads it trying to fill, it goes out of bounds now & then (GPU/GPUCompute.h:54). Just cannot wrap my head around that funny one yet. But besides of me trying to understand that & learning a lot, CUDA should be doing something near that speed on bitcrack too Tongue

Quote from: bitcoinforktech on January 16, 2021, 08:36:31 PM

Neat idea. I might give that a go and submit a pull request or fork BitCrack with that function. It should be possible.

Edit: my repo is at https://github.com/bitcoinforktech/BitCrack.git which will have some updates in the next few days.

Yeah, cuda on bitcrack has this interesting problem on the new drivers. Will try with line info later, was just doing a quick run of your repo.

Code:

[2021-01-19.17:31:52] [Info] Error: misaligned address
========= Misaligned Shared or Local Address
========= at 0x0000e610 in keyFinderKernelWithDouble(int, int)
========= by thread (160,0,0) in block (0,0,0)

Edit:
Most fascinating thing about this issue, is that it runs my full test keyspace in debug exe (400M)[ofc slow af], the release crashes on the error above.

bitcoinforktech

jr. member

Activity: 32

Merit: 4

Quote from: NotATether

But it should be possible to brute-force bc1 addresses since those also use private keys, if that's not implemented that'll make yet another good science fair project or even a Google Summer of Code project Grin

Neat idea. I might give that a go and submit a pull request or fork BitCrack with that function. It should be possible.

Edit: my repo is at https://github.com/bitcoinforktech/BitCrack.git which will have some updates in the next few days.

t0nyst4r

newbie

Activity: 18

Merit: 0

Quote from: WanderingPhilospher on January 16, 2021, 05:59:28 PM

Quote from: t0nyst4r on January 16, 2021, 04:35:51 PM

Quote from: dextronomous on January 16, 2021, 03:27:43 PM

Quote from: t0nyst4r on January 16, 2021, 10:57:03 AM

Quote from: dextronomous on January 16, 2021, 06:51:14 AM

clbitcrack has always had issues, still does, and did you change the compute_cap in your makefile,
otherwise compiling the cubitcrack won't succeed. even if succeeded won't work with your hardware.
change it accordingly to your hardware.

I didn't change the compute_cap value and was still able to compile cubitcrack on windows simply by updating the references to CUDA 10.1 to 11.2 and making sure project resources were in the correct locations. I'm not able to actually run it because of the "misaligned address" error, so I am using clbitcrack instead until someone is able to fix cubitcrack and allow it to run again with CUDA 11.2+.

In the meantime, what issues should I expect clbitcrack to have running on Windows? I'm not working with any P2SH addresses. What other issues would cause clbitcrack to not find a private key, as @yoyodapro mentioned?

https://github.com/brichard19/BitCrack/issues/81
this was the main reason i said that, besides you can test it out easily if it works o.o.t.b.

Thanks for the link, I read up on the known issue. I then performed the test with the provided list of 18 addresses using Win64 + clbitcrack + 3090 + CUDA 11.2 and it found all keys in the list.

So what is your speed with 3090? If it's not doubling a 2080Ti, is it worth it? Meaning, it's great that it runs, but is it running as it should be, MH/s wise? I've only found one program that truly utilizes the new 30xx cards, on windows; but source code is not available.

Best I got was 1050MKey/sec

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Quote from: Noname400 on January 16, 2021, 09:58:06 AM

if I understood correctly you have P2SH addresses in the list
they start with "3"
BitCrack does not accept them

That's because "3" addresses are all P2SH addresses which are the RIPEMD160 hashes of a script. The addresses that haven't encoded a segwit script that is. Bitcrack's using a bloom filter that can quickly check if a hash of a private key matches a bunch of RIPEMD160 hashes of the input addresses (that's why it's more efficient to put many addresses in the input file at once).

A script is not generated from random bytes like a private key (according to this pictograph), but it's just a redeem script anyway if we somehow were to obtain the public script for such addresses, or guess what kinda math problem someone would make into a redeem script, then only the solution to that problem (which is sometimes very easy) has to be brute-forced to spend the input, and bitcrack is completely incapable of doing because it works in terms of private keys.

But it should be possible to brute-force bc1 addresses since those also use private keys, if that's not implemented that'll make yet another good science fair project or even a Google Summer of Code project Grin

WanderingPhilospher

full member

Activity: 1232

Merit: 242

Shooters Shoot...

Quote from: t0nyst4r on January 16, 2021, 04:35:51 PM

Quote from: dextronomous on January 16, 2021, 03:27:43 PM

Quote from: t0nyst4r on January 16, 2021, 10:57:03 AM

Quote from: dextronomous on January 16, 2021, 06:51:14 AM

clbitcrack has always had issues, still does, and did you change the compute_cap in your makefile,
otherwise compiling the cubitcrack won't succeed. even if succeeded won't work with your hardware.
change it accordingly to your hardware.

I didn't change the compute_cap value and was still able to compile cubitcrack on windows simply by updating the references to CUDA 10.1 to 11.2 and making sure project resources were in the correct locations. I'm not able to actually run it because of the "misaligned address" error, so I am using clbitcrack instead until someone is able to fix cubitcrack and allow it to run again with CUDA 11.2+.

In the meantime, what issues should I expect clbitcrack to have running on Windows? I'm not working with any P2SH addresses. What other issues would cause clbitcrack to not find a private key, as @yoyodapro mentioned?

https://github.com/brichard19/BitCrack/issues/81
this was the main reason i said that, besides you can test it out easily if it works o.o.t.b.

Thanks for the link, I read up on the known issue. I then performed the test with the provided list of 18 addresses using Win64 + clbitcrack + 3090 + CUDA 11.2 and it found all keys in the list.

So what is your speed with 3090? If it's not doubling a 2080Ti, is it worth it? Meaning, it's great that it runs, but is it running as it should be, MH/s wise? I've only found one program that truly utilizes the new 30xx cards, on windows; but source code is not available.

t0nyst4r

newbie

Activity: 18

Merit: 0

Quote from: dextronomous on January 16, 2021, 03:27:43 PM

Quote from: t0nyst4r on January 16, 2021, 10:57:03 AM

Quote from: dextronomous on January 16, 2021, 06:51:14 AM

clbitcrack has always had issues, still does, and did you change the compute_cap in your makefile,
otherwise compiling the cubitcrack won't succeed. even if succeeded won't work with your hardware.
change it accordingly to your hardware.

I didn't change the compute_cap value and was still able to compile cubitcrack on windows simply by updating the references to CUDA 10.1 to 11.2 and making sure project resources were in the correct locations. I'm not able to actually run it because of the "misaligned address" error, so I am using clbitcrack instead until someone is able to fix cubitcrack and allow it to run again with CUDA 11.2+.

In the meantime, what issues should I expect clbitcrack to have running on Windows? I'm not working with any P2SH addresses. What other issues would cause clbitcrack to not find a private key, as @yoyodapro mentioned?

https://github.com/brichard19/BitCrack/issues/81
this was the main reason i said that, besides you can test it out easily if it works o.o.t.b.

Thanks for the link, I read up on the known issue. I then performed the test with the provided list of 18 addresses using Win64 + clbitcrack + 3090 + CUDA 11.2 and it found all keys in the list.

WanderingPhilospher

full member

Activity: 1232

Merit: 242

Shooters Shoot...

Quote from: t0nyst4r on January 16, 2021, 10:57:03 AM

Quote from: dextronomous on January 16, 2021, 06:51:14 AM

clbitcrack has always had issues, still does, and did you change the compute_cap in your makefile,
otherwise compiling the cubitcrack won't succeed. even if succeeded won't work with your hardware.
change it accordingly to your hardware.

I didn't change the compute_cap value and was still able to compile cubitcrack on windows simply by updating the references to CUDA 10.1 to 11.2 and making sure project resources were in the correct locations. I'm not able to actually run it because of the "misaligned address" error, so I am using clbitcrack instead until someone is able to fix cubitcrack and allow it to run again with CUDA 11.2+.

In the meantime, what issues should I expect clbitcrack to have running on Windows? I'm not working with any P2SH addresses. What other issues would cause clbitcrack to not find a private key, as @yoyodapro mentioned?

If you take away the last 8 or 9 characters, does the issue of "misaligned address" go away?
example:
original address 13x7a9384def882923xxxxxxxx
change to 13x7a9384def882923

Also, play with your driver version. I believe I was able to roll back down and use cubitcrack with a 3070. However, no matter which driver or CUDA, it still isn't optimized for 30xx series. The 3070 should be getting close or above a 2080Ti.

dextronomous

full member

Activity: 431

Merit: 105

Quote from: t0nyst4r on January 16, 2021, 10:57:03 AM

Quote from: dextronomous on January 16, 2021, 06:51:14 AM

clbitcrack has always had issues, still does, and did you change the compute_cap in your makefile,
otherwise compiling the cubitcrack won't succeed. even if succeeded won't work with your hardware.
change it accordingly to your hardware.

I didn't change the compute_cap value and was still able to compile cubitcrack on windows simply by updating the references to CUDA 10.1 to 11.2 and making sure project resources were in the correct locations. I'm not able to actually run it because of the "misaligned address" error, so I am using clbitcrack instead until someone is able to fix cubitcrack and allow it to run again with CUDA 11.2+.

In the meantime, what issues should I expect clbitcrack to have running on Windows? I'm not working with any P2SH addresses. What other issues would cause clbitcrack to not find a private key, as @yoyodapro mentioned?

https://github.com/brichard19/BitCrack/issues/81
this was the main reason i said that, besides you can test it out easily if it works o.o.t.b.

Topic: BitCrack - A tool for brute-forcing private keys - page 66. (Read 77647 times)