Pages:
Author

Topic: Large Bitcoin Collider (Collision Finders Pool) - page 29. (Read 193404 times)

sr. member
Activity: 378
Merit: 251
Imagine finding the private key for some of Satoshi's addresses...

That's the prime target of this so called "project". The "white" hacker Rico that doesn't have a single bitcoin in his wallet... Give me a break!

Oh yes - everyone new here: Meet becoin, our local mascot.


Rico


https://blockchain.info/address/1FeexV6bAHb8ybZjqQMjJrcCrHGW9sb6uF

HERE WE COMEEEEE!

https://blockchain.info/address/1JCe8z4jJVNXSjohjM4i9Hh813dLCNx2Sy
That's the one you should go for Cheesy
legendary
Activity: 1722
Merit: 1000
Thanks for the explanation guys. And good luck with the collisions! Smiley
legendary
Activity: 1140
Merit: 1000
The Real Jude Austin
Imagine finding the private key for some of Satoshi's addresses...

That's the prime target of this so called "project". The "white" hacker Rico that doesn't have a single bitcoin in his wallet... Give me a break!

Oh yes - everyone new here: Meet becoin, our local mascot.


Rico


https://blockchain.info/address/1FeexV6bAHb8ybZjqQMjJrcCrHGW9sb6uF

HERE WE COMEEEEE!

legendary
Activity: 1120
Merit: 1037
฿ → ∞
Imagine finding the private key for some of Satoshi's addresses...

That's the prime target of this so called "project". The "white" hacker Rico that doesn't have a single bitcoin in his wallet... Give me a break!

Oh yes - everyone new here: Meet becoin, our local mascot.


Rico
legendary
Activity: 3431
Merit: 1233
Imagine finding the private key for some of Satoshi's addresses...

That's the prime target of this so called "project". The "white" hacker Rico that doesn't have a single bitcoin in his wallet... Give me a break!
legendary
Activity: 1120
Merit: 1037
฿ → ∞
Is this project also based in Switzerland? Like the Large Hedron Collider?  Cheesy

Servers are all over Europe, clients are all over the world.
Some are even in Switzerland - as the Large Hadron Collider. Wink

Quote
Wondering if private keys are found for a loaded BTC address what is going to happen to the funds?
Is this just for research or actually trying to steal people's money?

See the thread. So far we announced and have a specific process defined. The main task is to search for collisions.
If the clients get a regular finders fee before returning funds to their rightful owner, it's thier motivation.
Blatantly stealing funds is frowned upon.

Quote
Imagine finding the private key for some of Satoshi's addresses...

While Satoshi may own some million BTC, it's a common misconception these would be all on one address.
More probably they are distributed among some thousands of early 50 BTC block rewards sitting on the blockchain.

And ... boy would it be cool if he showed up to claim his funds!  Smiley


Rico
legendary
Activity: 1722
Merit: 1000
Is this project also based in Switzerland? Like the Large Hedron Collider?  Cheesy

Wondering if private keys are found for a loaded BTC address what is going to happen to the funds?

Is this just for research or actually trying to steal people's money?

Imagine finding the private key for some of Satoshi's addresses...
legendary
Activity: 1512
Merit: 1057
SpacePirate.io

Saw your Reddit post there this morning, in a few hours it will turn into an argument over who makes the best cheese sandwhich in Dayton, OH  Wink  server logs? I'm guessing it will be smtp.trump-email.com, nsa.gov, and kremlin.ru Tongue
legendary
Activity: 1120
Merit: 1037
฿ → ∞
legendary
Activity: 1120
Merit: 1037
฿ → ∞
Why does it require a BF for each process? Couldn't just the BF get loaded to VRAM and then each process references that one instance?
I am not questioning your work, just digging for information for a better understanding.

I'm sure one BF per GPU could be enough and I probably could even figure out how to do it. I just want to avoid a multi-kernel setup for as long as possible, Inter-kernel communication seems quite ... challenged.

I thought about a finer granularity of GPU kernels with a nifty producer/consumer pipeline, where the app would find out which kernel needed which ressources/time and then do an optimal setup for the given GPU device(s).

Unfortunately, up to OpenCL 2.0 (which nVidia hardly supports) there is no way for kernels to communicate directly except pushing their results back to host and host pushing them again to the next kernel. This is like still having the Mach2 Concorde, but on a flight from New York to San Diego doing refueling in Chicago, Kansas CIty, Dallas and Phoenix....

OpenCL 2.0 (and later) has something called pipes, but we can't rely on that yet. Not without significantly reducing the number of devices LBC would run on. So as of now, I'm trying to have a single-kernel key generation stream. If it works, it's darn fast (therefore the GPU has hardly load). If it works...

Quote
And shit, those are nice workstations at a pretty decent price, kind of pissed I bought an MSI GS70 Stealth Pro.... ~6 Mkeys

After posting this, I looked up the P51 which should be available April 2017. It has a Quadro M2200M which is about 40% faster than my M2000M. Intriguing...  Cool


Rico
legendary
Activity: 1140
Merit: 1000
The Real Jude Austin
legendary
Activity: 1120
Merit: 1037
฿ → ∞
What is the performance cost of emulating 64 bit as 32 bit?

Does it double the cost? For example, does a 64 bit word emulated as 32 bit use 100% of the GPU while 32 bit words use 50%?

Ok, let me elaborate on this a little bit and give you some numbers for better estimates where we are and where we're going:

In my CPU/GPU combination, one CPU core puts 8% load on the GPU and that is a situation, where a fairly strong CPU meets a midrange GPU (a 2.8 - 3.7 GHz Skylake E3-Xeon firing at a Quadro M2000M - see http://www.notebookcheck.net/NVIDIA-Quadro-M2000M.151581.0.html). At the moment it's quite possible with a stronger GPU (1080) that the CPU can put only 5-6% load to the GPU.

The current development version of the generator gives me 9 Mkeys/s for all 4 physical cores running, whereas the published version (the one you can download from FTP) gives 7.5 Mkeys/s.

Main difference is bloom filter search is done on GPU with the development version and also moving the final step of affine->normalization->64bytes to GPU resulting in an overall speed improvement of about 375000 keys/s per core.

Up to now, the GPU behaved like a "magical wand", putting bloom for it to work, didn't raise GPU load, but it raised the keyrate. This could be explained that the time the GPU needs to do the bloom filter search is basically the time the GPU would need to transfer the hashed data back to CPU (which does the bloom filter search on the current public version). Same with the affine transformation.

There is nothing left on the CPU except (heavily optimized) EC computations, so any further speed improvements need to push that to the GPU.
In terms of time, currently one 16M block takes around 6.25 seconds on my machine (if I let compute 8 blocks, it takes 50 seconds - to mitigate the startup cost).

So I thought I'd emulate what's going on on the CPU and move the code piece by piece. Going backwards, the step before the affine transformation is the Jacobi->Affine transformation, where you need to compute the square and the cube of the Jacobi Z coordinate and multiply the X with the former and the Y with the latter. All in all one Field element sqr and 3 FE mul operations.

Done that with my 128bit library (based on 64bit data types) on GPU and behold! GPU load went to 100% and the time per block went to 16 seconds! Uh. Operation successful, patient dead.
-> Back to the drawing board.

Now the same with 32bit data types is currently 12% GPU load and 5.4 seconds per block (per CPU core). So very promising, but I'm hitting a little/big endianness brainwarp hell, so I have to figure out how to do it more elegant.

Also, the new version will demand a more GPU-heavy approach before I can release it. As the bloom filter search is done on GPU, an additional 512MB of GPU memory is used per process. Running 4 processes on my Maxwell GPU with its 4GB VRAM is just fine (and as the memory can be freed from the CPU part of the generator, it takes only 100MB of host memory), but I experienced also Segmentation faults with the Keppler machines on Amazon cloud.

So the goal is really to have one CPU core being able to put at least 50% load on one GPU.

It's no small engineering feat, but at the moment LBC is the fastest key generator on the planet (some 20% faster than oclvanitygen) and I believe it is achievable to be twice as fast as oclvanitygen. That's my goal and motivation and currently I have yet to tap 65% of my GPU capacity to get there.

Quote
And am I wrong assuming that even 32 bit is emulated, specifically on Pascal/Maxwell chips? I read the white paper and it says it does half integers also.

I'm not familiar in detail with the specific hardware interna. At the moment I have a Maxwell chip for my testing and I will have a tendency to support newer architectures/chip families, than the old stuff. Another way to put it: I will not sacrifice any speed to support that "old" chip from 2009. ;-)

Sidenote:

If anyone wants to be at the true forefront of development and have a great workstation-replacement notebook, buy a Lenovo P50 (maybe P51 to be slightly ahead), because that's what I am developing on and LBC will naturally be slightly tailored to it. E.g. it has also an Intel GPU, which I am using for display. So currently I can work with the notebook basically without any limitations, as the Intel Graphics are untouched and as I have the 4 logical cores for my interaction, I can watch videos, browse etc.) and the notebook is churning 9 Mkeys/s. Ok the fan noise is distracting, because normally, the notebook is fine with passive cooling. Wink



Rico
legendary
Activity: 1140
Merit: 1000
The Real Jude Austin
legendary
Activity: 1120
Merit: 1037
฿ → ∞
What happens when you try to just printf the line you commented out?

Also this: http://stackoverflow.com/questions/1255099/whats-the-proper-use-of-printf-to-display-pointers-padded-with-0s


So lessons learned and progress:

Never try to impose a data size on the GPU which it was not built for. Todays GPUs are 32bit. Using 64bit data types is a performance penalty (as the GPU internally transforms this into a sequence of 32bit operations). Moreover, defining your own 128bit arithmetics library using 64bit types on GPU ... will eventually work after you really do something to the GPU which can only be described as raping, but the GPU will not like it and show a performance consistent with its unliking...

Turns out, there is a maximum number of assembler instructions per kernel and of course I ran into it with my glorious 128bit GPU-library, then the kernel will simply crash, or your host application gets a segmentation fault (from the OpenCL library), or

Printout on GPU is nothing but a straw of the desperate GPU developer.


Back to the drawing board, I'm left with a highly optimized 64bit ECC library @ CPU and the need for a (highly optimized) 32bit library on GPU. At least as long as I have parts of the computation on CPU, parts on GPU. Sounds like Frankensteins monster? It is!

Computing with 5x52 fields @ CPU, pushing data to GPU, there a conversion 5x52 -> 10x26, followed by 32bit computations.

But it is surprisingly fast - so far. As the conversion (I hope) is a mere:

Code:
static void hrd256k1_fe_52to26(hrd256k1_fe32 *out, const hrd256k1_fe *in) {
  out->n[1] = in->n[0] & 0x3FFFFFFUL;
  out->n[0] = in->n[0] >> 26;
  out->n[3] = in->n[1] & 0x3FFFFFFUL;
  out->n[2] = in->n[1] >> 26;
  out->n[5] = in->n[2] & 0x3FFFFFFUL;
  out->n[4] = in->n[2] >> 26;
  out->n[7] = in->n[3] & 0x3FFFFFFUL;
  out->n[6] = in->n[3] >> 26;
  out->n[9] = in->n[4] & 0x3FFFFFFUL;
  out->n[8] = in->n[4] >> 26;
}

And the subsequent fe_mul etc. are done using GPU native data format. We'll see.


Rico
legendary
Activity: 1140
Merit: 1000
The Real Jude Austin
Observe this code snippet from the GPU client. It is a small part from the Jacobi -> Affine transformation

I know that  hrd256k1_fe_sqr and hrd256k1_fe_mul work correctly. I know that I am getting the right values into my GPU (az, jpubkey).
However, this code doesn't even run the printf when hrd256k1_fe_mul is in place. It does, when I comment the hrd256k1_fe_mul call

Code:
  hrd256k1_fe_sqr(&zi2, &az);

  apubkey2.infinity = jpubkey.infinity;

  hrd256k1_fe_mul(&apubkey2.x, &jpubkey.x, &zi2);

  printf("GPU %d\nA:%016lx %016lx %016lx %016lx %016lx\nZ:%016lx %016lx %016lx %016lx %016lx\n---\n",
         idx,
         apubkey2.x.n[0],apubkey2.x.n[1],apubkey2.x.n[2],apubkey2.x.n[3],apubkey2.x.n[4],
         apubkey2.y.n[0],apubkey2.y.n[1],apubkey2.y.n[2],apubkey2.y.n[3],apubkey2.y.n[4]
         );

Ok. a simple apubkey2 = jpubkey works. So what is it, that causes this weird behavior? To investigate,. I wrote a small synthetic  hrd256k1_fe_mul2:

Code:
static void
hrd256k1_fe_mul2(hrd256k1_fe *r, const hrd256k1_fe *a, const hrd256k1_fe *b) {
  r->n[0] = a->n[0] + b->n[0];
  r->n[1] = a->n[1] + b->n[1];
  r->n[2] = a->n[2] + b->n[2];
  r->n[3] = a->n[3] + b->n[3];
  r->n[4] = a->n[4] + b->n[4];
}

Guess what? Same problem (doesn't even printf). Now if I comment out ANY of the r->n = a->n + b->n lines, it works!
If I even do

Code:
static void
hrd256k1_fe_mul2(hrd256k1_fe *r, const hrd256k1_fe *a, const hrd256k1_fe *b) {
  r->n[0] = a->n[0]; // + b->n[0];
  r->n[1] = a->n[1] + b->n[1];
  r->n[2] = a->n[2] + b->n[2];
  r->n[3] = a->n[3] + b->n[3];
  r->n[4] = a->n[4] + b->n[4];
}

It still works! What is going on???  Huh

Rico

What happens when you try to just printf the line you commented out?

Also this: http://stackoverflow.com/questions/1255099/whats-the-proper-use-of-printf-to-display-pointers-padded-with-0s
legendary
Activity: 1120
Merit: 1037
฿ → ∞
Observe this code snippet from the GPU client. It is a small part from the Jacobi -> Affine transformation

I know that  hrd256k1_fe_sqr and hrd256k1_fe_mul work correctly. I know that I am getting the right values into my GPU (az, jpubkey).
However, this code doesn't even run the printf when hrd256k1_fe_mul is in place. It does, when I comment the hrd256k1_fe_mul call

Code:
  hrd256k1_fe_sqr(&zi2, &az);

  apubkey2.infinity = jpubkey.infinity;

  hrd256k1_fe_mul(&apubkey2.x, &jpubkey.x, &zi2);

  printf("GPU %d\nA:%016lx %016lx %016lx %016lx %016lx\nZ:%016lx %016lx %016lx %016lx %016lx\n---\n",
         idx,
         apubkey2.x.n[0],apubkey2.x.n[1],apubkey2.x.n[2],apubkey2.x.n[3],apubkey2.x.n[4],
         apubkey2.y.n[0],apubkey2.y.n[1],apubkey2.y.n[2],apubkey2.y.n[3],apubkey2.y.n[4]
         );

Ok. a simple apubkey2 = jpubkey works. So what is it, that causes this weird behavior? To investigate,. I wrote a small synthetic  hrd256k1_fe_mul2:

Code:
static void
hrd256k1_fe_mul2(hrd256k1_fe *r, const hrd256k1_fe *a, const hrd256k1_fe *b) {
  r->n[0] = a->n[0] + b->n[0];
  r->n[1] = a->n[1] + b->n[1];
  r->n[2] = a->n[2] + b->n[2];
  r->n[3] = a->n[3] + b->n[3];
  r->n[4] = a->n[4] + b->n[4];
}

Guess what? Same problem (doesn't even printf). Now if I comment out ANY of the r->n = a->n + b->n lines, it works!
If I even do

Code:
static void
hrd256k1_fe_mul2(hrd256k1_fe *r, const hrd256k1_fe *a, const hrd256k1_fe *b) {
  r->n[0] = a->n[0]; // + b->n[0];
  r->n[1] = a->n[1] + b->n[1];
  r->n[2] = a->n[2] + b->n[2];
  r->n[3] = a->n[3] + b->n[3];
  r->n[4] = a->n[4] + b->n[4];
}

It still works! What is going on???  Huh

Rico
member
Activity: 62
Merit: 10
Rico,

You mentioned that someone wanted a way to get notifications of found addresses...

Why not use Pushbullet?

I use it for some other stuff I do and I like it a lot.

Check it out: https://www.pushbullet.com/

Thanks,
Jude

I could use that  if rico could implement it in the main LBC ... without creating the hook-find things.
legendary
Activity: 1120
Merit: 1037
฿ → ∞
The mechanism for setting or changing a password (=secret) is the same:

Code:
-s oldsecret:newsecret

Obviously, if you had already some password, you are changing. If you had no password before, you are setting.

"But what is oldsecret when I am setting?" you may ask.

Simple answer: anything!

So as was mentioned here already - if you're setting your secret for the 1st time, just use x (or really anything else) for the oldsecret:

Code:
-s x:newsecret

and later you just give your
Code:
-s newsecret
to identify you with the server.



There is this guy from the Centre de Calcul el-Khawarizmi - CCK - Tunisia . Logs say, he has 160 (so far) tries of giving a wrong password to his id. May this short HowTo help him.


Rico
hero member
Activity: 583
Merit: 502
Thanks a lot Jude and Shorena! I'll hope for the first one and try the second one Smiley
Pages:
Jump to: