Firstly regarding the change.
The current hash is actually 3 passes though the 64 stage hash function (2 x sha256 but the 1st sha256 is 80 bytes so requires 2 x 64 stages)
The first pass is unaffected by rolling the nonce value or any added data after it up to a certain size (and also would be unaffected by rolling the time value ... there you go hardware people - implement the Roll-NTime hack in there with a difficulty input also - but that would be risky ... and is short term)
Anyway, the size increase change would be to have a version 2 (or 3 or whatever is next at the time) block with the nonce field having 12 extra bytes (96 bits) added after the current nonce (which is currently on the end of the 80 bytes) making the block header 92 bytes.
The rolling would one of:
1) anywhere the hardware likes in the 128 bits that suited the hardware design best
OR
2) a subset given by a pool to allow the pool to avoid having to do any other major work than generate new merkle trees every time the transaction list needs updating (all the time
)
I'd also add that the pool should be setting work difficulty high enough to appropriately (ask organofcorti) minimise work returned and work lifetime small enough to reduce transaction delays
What this change would mean is that ANY device designed with this option could be given work (and difficulty setting) and 'could' be given a time frame and then the device could hash up to the time frame and then stop, independent of the performance of the device (of course as some of the hardware vendors know, returning valid nonce's during hashing is the best way to give answers back)
Thus we also wouldn't have the other problem that the hacks cause:
Extra delays in Bitcoin transaction processing
since the time frames could be set to an appropriate value to minimise this effect (seconds) and ALL devices could guarantee that they could work for that (short) time frame and the mining software could spend that time processing results and setting up the next work ... as they all should (already
)
---
Meanwhile, it is interesting to see people shift the problem around
e.g. slush saying it is the mining hardware problem - the hardware should have a faster 'wire' to deal with the problem - but it could rightfully be argued that it is also the pools problem - they should have faster hardware and better networks as required ... both are valid
This solution solves both
I will also give a very good example of how such arguments about the hardware ignore obvious limitations:
Xiangfu (as anyone with an Icarus should know who he is) had 91 USB Icarus devices connected to a single computer and a single cgminer instance running (with plenty of CPU to spare
) ... so with such a setup, you have to consider 2 orders of magnitude in performance ...............
My solution is a long term solution that (of course) is not going to happen today, but it seems the dev fear of hard forks any time in the distant future (and the fact mentioned by someone else that this nonce issue was brought up 2 years ago) probably means it will never be fixed.
No idea if there is much else to say, but I guess if the discussion has no more merit (due to this last point) then this is a far as it will go.
---
Aside: there is a well known bug in the bitcoin difficulty calculation that gets it wrong (always) but since it's wrong by a small % and that it requires a hard fork, it has never been fixed. Yes, how much you get paid for your BTC mining work, is actually always a faction low
I make this comment so as to shed more light on other comments about hard forks