This is a fundamental problem with any brainwallet algorithm. An attacker who has X times the computing power you have can try X possible passwords. If an attacker has a billion times the computing power you have, you need to make sure your passphrase isn't one of the first billion he would guess. This is, as far as I know, a fundamental limitation of a deterministic brainwallet algorithm that no known technique can avoid.
This can be partially mitigated by using an algorithm which is difficult to optimize. bcrypt attempts to achieve this.
IMHO SHA-256 is the worst possible choice. The user is likely generating the private key on a personal computer ill suited for a high number of rounds. Mining has created a great incentive in improving the hashing efficiency (MH/$ and MH/W) such that the multiple between a major hashing farm and the user is already huge. The rise of ASIC hardware will increase the potential resources of an attacker by another order of magnitude.
Given the user can wait for some significant but not unreasonably long period of time the use of something like bcrypt would seem to be preferable. GPU acceleration of bcrypt is generally poor (although FPGA acceleration is not as adversely affected). Scrypt attempts to hinder acceleration on both GPU and FPGAs.
TL/DR version. The attacker will always have more hashing power than the user. The goal is to keep the attackers throughput as low as possible (while still keeping the algorithm functional for end user). This can be achieved by:
a) more rounds. If the user only needs <1 private key per second then make it take a full second to generate.
b) salt. prevents parallel and precomputation attacks.
c) algorithm optimization. Use algorithm which is fast for user but not much faster (or optimally no faster) for the attacker.
Hypothetical numbers. Using SHA-256 the user selects a strength which will take <1 second to generate. That would be ~1 million iterations (assume implementation in code is roughly similar to throughput of CPU miners).
An attacker with a single GPU could attempt ~500 million to 1 billion passwords per second.
A major hashing farm (100 GH/s double bitcoin hashes) could attempt 200 billion passwords per second.
A 10TH/s (double bitcoin hashes) could attempt 10 trillion passwords per second.
Against that kind of brute force virtually all passphrase are going to fall. The throughput of the attacker must be reduced. We can use salt but if it is determined by user input it will have relatively low entropy. We can use more rounds but the multiple of attackers throughput is so high that even having user wait 10 seconds per private key "only" keeps a major ASIC hashing farm at 1 trillion passwords per second.
An algorithm which prevents re-use of the massive optimization in bitcoin mining for attacking passphrase is essential.While bcrypt can't protect painfully weak passwords the combination of deterministic salt derived from user responses and a passphrase requirement of at least 8 characters, and a large number of rounds can limit the attacker to an ineffective throughput (baring dedicated bcrypt processors) which makes brute force attacks infeasible or at least uneconomical.