At the moment, the client precomputes a table with 4096 (uncompressed public keys) with basically this code:
int
hrdec_pubkey_batch_incr(uchar (*pub)[65],
uchar start[32]) {
unsigned int i;
hrdec_mult_gen2(&batchpj[0], start);
for (i = 1; i < 4096; ++i) {
hrdec_gej_add_ge(&batchpj[i], &batchpj[i-1], &incr_a); // increment public key
}
hrdec_ge_set_all_gej_new(batchpa, batchpj); // convert all jacobian coordinates to affine
for (i = 0; i < 4096; ++i) { // compute uncompressed public keys
pub[i][0] = 0x04;
secp256k1_fe_normalize_var(&batchpa[i].x);
hrdec_fe_get_b32(pub[i] + 1, &batchpa[i].x);
secp256k1_fe_normalize_var(&batchpa[i].y);
hrdec_fe_get_b32(pub[i] + 33, &batchpa[i].y);
}
................
for (i = 0; i < 4096; i++) {
r[i].infinity = a[i].infinity;
secp256k1_ge_set_gej_zinv(&r[i], &a[i], &az[i]);
}
free(az);
}
Although I have already optimized the sh*t out of this code, I still feel it's way too clumsy as is and potentially could be improved a lot.
If anyone has any suggestions, I'm all ears.
The
code from libsecp256k1 you are using has a different purpose than yours:
The process of cracking Bitcoin brain wallets is to repeatedly generate public keys using guessed passwords.
Key generation method as we described in 2.1.1, is to compute Q = dP . Here d is a SHA256 hash of the generated password,
P is a xed point which is the base point G for secp256k1...
The time cost for computing one public key given a random private key takes : 47.2 us.
sourceYou don't have to generate
one public key given a random private key, neither do you generate a lot of public keys from a lot of sparse random keys, but you have to generate a batch of "consecutive" public keys, a very more specific (and less expensive) task. Furthermore you don't have to worry about side-channel attack and other security issues.
How do you generate each public key (in batchpj)? And how do you convert jacobian coordinates to affine (hrdec_ge_set_all_gej_new(batchpa, batchpj))? How many operations (multiplication, inverse) are involved?
Once you have k*G , you have to add only 1*G in each step: k*G --> (k+1)*G --> (k+2)*G, .... so the effort should be focused on optimizing the addition '+G':
minimizing number of operations for each addition '+G'The computation of a multiplicative inverse is the most time consuming operation (1 inverse = 22 multiplication - see
table 2 ), so I would start cutting there:
goal: less #inverse for each public key or better: less #inverse for each address (IA)
---------------------------------------------------------------------------------------------------------------------------------
For example, given a,b elements of the field:
a*b, inv(a*b) --> inv(a)=b*inv(a*b), inv(b)=a*inv(a*b) --> 1 inverse + 3 multiplication instead of 2 inverse
So:
private key k --> k*G = (X1,Y1,Z1) (jacobian coordinates) no inverse
private key k+1 --> k*G
+ G = (X1,Y1,Z1)+
(XG,YG,1)=(X2,Y2,Z2) (jacobian coordinates) no inverse
Z=Z1*Z2, inv(Z) --> inv(Z1)=Z2*inv(Z), inv(Z2)=Z1*inv(Z) only 1 inverse
jacobian coordinates (X1,Y1,Z1) --> (X1*inv(Z1)^2, Y1*inv(Z1)^3) affine
jacobian coordinates (X2,Y2,Z2) --> (X2*inv(Z2)^2, Y2*inv(Z2)^3) affine
2 private keys --> 2 public keys (only 1 inverse) --> IA = 1/2 = 0.5
-----------------------------------------------------------------------------------------------------------------------------
better:
2 private keys --> 4 private keys (k, n-k, k+1, n-k-1) --> 4 uncompressed public keys (only 1 inverse) --> IA = 0.25
---------------------------------------------------------------------------------------------------------------------------------------------------
much better:
2 private keys --> 4 private keys --> 8 public keys, 4 compressed and 4 uncompressed, 8 addresses (only 1 inverse) --> IA=0.125