The risk with brain-wallets (eg BIP 038 with no EC multiply, or even with EC multiply if the manufacturer is not that trustworthy) where the ECSA private key is computed from password is that the passwords can be ground and if successful the funds can be stolen. So clearly its desirable to use key-stretching for brain-wallets and this is already done with Scrypt or PBKDF2. However a limitation with key stretching is it incurs computational load on the client, which maybe a smart-phone or single desktop class machine. eg 16384 scrypt iterations are suggested in BIP 038, chosen to be fast enough to tolerate in javascript.
So it would be desirable to have a secure server offloadable KDF, which means a kind of blindable deterministic proof-of-work. I described one such proof-of-work in this post (relating to blind-hashcash a different but in hind-sight related topic, where you in addition need a transferable publicly auditable proof of work):
An RSA based blindable (secure) work offload function:
public params:
n=pq (primes p & q deleted at setup)
g=shared generator
e=2^(2^w)-1 ie a big, big number w is work factor
y=g^e mod n (generated cheaply at setup, or computable one-off cost afterwards)
blind:
m=message
b=random blinding factor
r=g^b*m (broacast r to miners)
work:
s=r^e mod n (expensive because e is big and carm(n)=(p-1)(q-1)/2 is unknown)
unblind:
u=y^b (unblinding factor)
m^e = s/u (as s/y^b=r^e/g^{be}=g^{be}*m^e/g^{be})
So if we call that can be use as a blind deterministic password-based proof of work: we could set message = password, or H(password), blind by random factor g^b, and have the server compute blind-pbkdf( password ) for us with some large w that we cant afford to do on our smart-phone or laptop because itd be too slow.
The above work function is basically a blinded version of Rivest et al's time-lock puzzle (the time-lock puzzle desires non-parallelizabiiity as the idea is to intentionally encrypt something for the future, where you cant speed it up by using multiple cores.)
The fact that it is non-parallelizable is actually a disadvantage for a blind KDF, it means the speed up is limited to the fastest single-core offload server. Another simple parallelizable time-lock is proposed in the time-lock paper which is simply to symmetrically encrypt and discard say 40 of the key-bits (this is also the model used by Juels & Brainard for their client-puzzles proof of work which is somewhat hashcash-like but has no public auditability). However that is not blindable as its not an algebraic form.
But it is easy to make an intentionally parallelizable instance by say 128 server cores (16x 8 core servers, or an even more impressive core count GPU server farm), use a smaller e value to take eg 10 minutes on a 1Ghz GPU core (whatever your transaction delay tolerance is), then stretch the password using eg PBKDF2 and 1 iterations, null salt, into 128-values worth of pseudo-random output call those m1..m128. ie (m1||..||m128)=PBDF2(1,"",password). Now create 128 cryptographically random (non-deterministic RNG) challenges b1...b128, the ECDSA key is x=m1^e+...+m128^e mod n, which is fast to compute when you know d (before deleting p, q, and d). Each offload message is r_i = g^b_i*m1 mod n, and the respective unblind u_i=y^b_i mod n. and the key is k = s1/u1+...+s128/u128 mod n.
Unfortunately the user does need to retain g, y & n (or publish them in a hard to censor location, and keep the hash c=H(g,y,n) as a public fingerprint, or embed that in their coin on the block chain, because if the user relied on the offload server to provide g,y,n the server could provide a g,n where g has a small sub-group, allowing the search-space of blinding factor to be greatly reduced. The few remaining candidate password hashes could then be run through the KDF with the real n.
The user can even create a pre-signed message transferring ownership from key Q1=x1*G to new key Q2=x2*G, with bigger work parameters on x2, that it requests the server operator, or trusted party to release when the available compute farm sizes increase and compute becomes cheaper. If locktime worked properly, you could even broadcast that transaction with a locktime 1 year into the future, and rely on the bitcoin network to automatically update your security margin over time.
So a user protecting a $10,000 brain-wallet bitcoin investment might say use $10 worth of GPU time (at amazon prices of $2.10 per 400core tesla GPU hour thats 100,000 fermi core seconds or 714 fermi card seconds. According to the bitcoin wiki mining comparison an S2070 is 4 tesla cores, and does 750MH so say 187.5MH/tesla second is about 37-bits in entropy compared to PBKDF2 rounds. If you have a 40-bit entropy password that takes you from at risk of being GPU brain-wallet mined ($80 worth of grinding) to implausibly uneconomical border line not feasible with any resources for the mid-term. Of course there are faster (AMD not Nvidia) and cheaper ways to grind than amazon. Eg bitcoin mining pool payout probably charges a lot less. Maybe it would be something for CPU & GPU miners to do as an alternative to vanity address mining or primecoin/litecoin mining.
So in summary you in a javascript client, or puny cell processor can create an arbitrarily hard to undo KDF with no practical CPU cost at setup time. You can offload it securely later to a server, and you are not relying on the server to be around - the information is public (on the blockchain) the "server" is replaceable and stateless, and could even be a bitcoin core feature (pay CPU/GPU miners and other users small fees to help you decrypt your brainwallet). This is parallelizable so it should be easy to add 40-bits of key-stretching or more that would be really expensive even on a high end PC with top of the line graphics card, to do that from a smart phone.
Probably some more things that could be done eg combine with secret sharing so you can detect and eliminate defective work answers, or perhaps find a way to also have signed proof of work that somehow is easily verifiable without introducing a password verification crib.
Adam