Handle much larger MH/s rigs : simply increase the nonce size

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Quote from: btharper on December 15, 2013, 07:27:35 PM

Quote from: TierNolan on December 14, 2013, 03:57:07 PM

Quote from: tacotime on December 14, 2013, 03:38:16 PM

So... has this issue been resolved? Will it still break all ASICs if they get too fast?

You can send much more complex info to miners now. The "getblocktemplate" rpc call allows the pool to give miners enough info to generate their own headers on the fly.

Seconded. Between Stratum and GBT there really wasn't a problem and one failed to materialize for no reason. Everything's moving along nicely now. Probably an idea worth revisiting in the future if there's another, more important, reason to do a hard-fork (like switching to SHA-512 or an SHA-3 family item) which would break current ASICs due to the massive change in protocol.

Stratum, GBT and "getblocktemplate" have not solved this in any way whatsoever.

As before, each single nonce work item that allows for O(9) tests, is sent to the device.

At the moment, still no miners do any more than this.

The solution I presented a year ago would allow the information sent to the device to be able to produce O(9) more results with just a 32 bit extension to the nonce size - i.e be future proof by simply increasing it by O(9) (32 bits) O(19) (64 bits) or even O(28) (48bits) easily enough

The solution you are implying (that doesn't exist) would be to code the stratum protocol into the mining device, rather than having the extreme simplicity of just having a counter that is larger.

GBT is not even relevant to the topic since having to send up to a megabyte of data to the mining device at least every 30s and delaying work restart after an LP until that data is sent is ridiculous.

btharper

sr. member

Activity: 389

Merit: 250

Quote from: TierNolan on December 14, 2013, 03:57:07 PM

Quote from: tacotime on December 14, 2013, 03:38:16 PM

So... has this issue been resolved? Will it still break all ASICs if they get too fast?

You can send much more complex info to miners now. The "getblocktemplate" rpc call allows the pool to give miners enough info to generate their own headers on the fly.

Seconded. Between Stratum and GBT there really wasn't a problem and one failed to materialize for no reason. Everything's moving along nicely now. Probably an idea worth revisiting in the future if there's another, more important, reason to do a hard-fork (like switching to SHA-512 or an SHA-3 family item) which would break current ASICs due to the massive change in protocol.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: tacotime on December 14, 2013, 03:38:16 PM

So... has this issue been resolved? Will it still break all ASICs if they get too fast?

You can send much more complex info to miners now. The "getblocktemplate" rpc call allows the pool to give miners enough info to generate their own headers on the fly.

tacotime

legendary

Activity: 1484

Merit: 1005

So... has this issue been resolved? Will it still break all ASICs if they get too fast?

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Quote from: Gavin Andresen on October 10, 2012, 10:43:56 AM

Quote from: beekeeper on October 10, 2012, 09:56:46 AM

My current rig could count those 4 billions in 30 ms or less. That's less than usb handshake to xfer the new payload will take in average.

... so transfer 1,000 headers with different extranonces in one handshake. Or don't handshake every time. Or modify the firmware that speaks whatever mining protocol you're sending at it to do the increment-extranonce-and-recompute-the-merkle-root-thing itself.

All of this discussion is useless; even if you could convince us core developers that we need A HARD FORK RIGHT THIS VERY MINUTE! there is absolutely zero chance we could make that happen before the ASICS start shipping.

So: plan accordingly.

Yes - plan ... that's the point of this thread ... but not of the "We can't ever do a hard fork" devs Tongue

... and that your post clearly shows a lack of understanding 'planning'.

Plan for a future hard fork to allow a 2nd version block header.
The issue is obvious, the cause is obvious, the path to ultimately fix it is obvious, but then you make this stupid statement
"A HARD FORK RIGHT THIS VERY MINUTE"
... who said that? (other than you)

Though - as has already been pointed out - this issue came up 2 years ago ... yeah a lot of foresight back then ignoring it Tongue

No doubt nothing was learned from that mistake.

slush

legendary

Activity: 1386

Merit: 1097

Quote from: beekeeper on October 10, 2012, 11:37:52 AM

Ofc, it can be done like that, but again, we are back at start, I have to overdesign chip controllers to cover extra bandwidth.

It is nice to see you agree that the only problem is sub-optimal software in miners.

a) Optimize wrongly designed mining software
or
b) Hard fork of whole bitcoin network?

I vote for b)

Btw it would be quite comfortable for me to agree with you. With bigger nonce range I won't need to switch to Stratum protocol, because getwork protocol would be good enough for ages. But still I decided that there's much easier solution than trying to do hard fork, so I just re-designed protocol. Btw it isn't easy, I'm working on it almost two months on full time.

Stop calling optimizations "work arounds" and let's go back to work.

beekeeper

sr. member

Activity: 406

Merit: 250

LTC

Quote from: Gavin Andresen on October 10, 2012, 10:43:56 AM

Quote from: beekeeper on October 10, 2012, 09:56:46 AM

My current rig could count those 4 billions in 30 ms or less. That's less than usb handshake to xfer the new payload will take in average.

... so transfer 1,000 headers with different extranonces in one handshake. Or don't handshake every time. Or modify the firmware that speaks whatever mining protocol you're sending at it to do the increment-extranonce-and-recompute-the-merkle-root-thing itself.

All of this discussion is useless; even if you could convince us core developers that we need A HARD FORK RIGHT THIS VERY MINUTE! there is absolutely zero chance we could make that happen before the ASICS start shipping.

So: plan accordingly.

Bulk transactions are for video streaming. Anything derived from financial transactions should have every byte handshaked. Cheesy

Ofc, it can be done like that, but again, we are back at start, I have to overdesign chip controllers to cover extra bandwidth.
BTW: I never intended to start arguing, its rather, as I said, an annoyance I wanted to talk about.

btharper

sr. member

Activity: 389

Merit: 250

Quote from: kjj on October 10, 2012, 10:48:11 AM

Quote from: beekeeper on October 10, 2012, 09:56:46 AM

Quote from: gmaxwell on October 10, 2012, 09:22:06 AM

Quote from: beekeeper on October 10, 2012, 02:54:11 AM

Its not misdesign, its trade-off. The 32 bit nonce is old design, probably it looked great when only CPUs where doing mining, but right now it looks like a bottleneck.

It doesn't not look like a bottleneck. It provides a factor of four billion in reduction in whatever serial task exists outside of that. I have yet to see any evidence that it's a bottleneck.

My current rig could count those 4 billions in 30 ms or less. That's less than usb handshake to xfer the new payload will take in average.

You are getting 140+ Ghash/sec on a single device? Today? In reality?

+1

Most devices don't process one hash from 0 to 4 billion in (total hashrate/4 billion), they run several processes in parallel across several chips. So each of your 20 onboard chips are each (relatively) slowly chewing through work and needs a new piece of work every several hundred milliseconds.

Also as Gavin said, send more than one piece of work for each handshake. Between longpoll and everything else that's setup, I don't see any reason not to have several pieces of work queued up and ready to go on the device side of the USB cord.

kjj

legendary

Activity: 1302

Merit: 1026

Quote from: beekeeper on October 10, 2012, 09:56:46 AM

Quote from: gmaxwell on October 10, 2012, 09:22:06 AM

Quote from: beekeeper on October 10, 2012, 02:54:11 AM

Its not misdesign, its trade-off. The 32 bit nonce is old design, probably it looked great when only CPUs where doing mining, but right now it looks like a bottleneck.

It doesn't not look like a bottleneck. It provides a factor of four billion in reduction in whatever serial task exists outside of that. I have yet to see any evidence that it's a bottleneck.

My current rig could count those 4 billions in 30 ms or less. That's less than usb handshake to xfer the new payload will take in average.

You are getting 140+ Ghash/sec on a single device? Today? In reality?

Gavin Andresen

legendary

Activity: 1652

Merit: 2316

Chief Scientist

Quote from: beekeeper on October 10, 2012, 09:56:46 AM

My current rig could count those 4 billions in 30 ms or less. That's less than usb handshake to xfer the new payload will take in average.

... so transfer 1,000 headers with different extranonces in one handshake. Or don't handshake every time. Or modify the firmware that speaks whatever mining protocol you're sending at it to do the increment-extranonce-and-recompute-the-merkle-root-thing itself.

All of this discussion is useless; even if you could convince us core developers that we need A HARD FORK RIGHT THIS VERY MINUTE! there is absolutely zero chance we could make that happen before the ASICS start shipping.

So: plan accordingly.

beekeeper

sr. member

Activity: 406

Merit: 250

LTC

Quote from: gmaxwell on October 10, 2012, 09:22:06 AM

Quote from: beekeeper on October 10, 2012, 02:54:11 AM

Its not misdesign, its trade-off. The 32 bit nonce is old design, probably it looked great when only CPUs where doing mining, but right now it looks like a bottleneck.

It doesn't not look like a bottleneck. It provides a factor of four billion in reduction in whatever serial task exists outside of that. I have yet to see any evidence that it's a bottleneck.

My current rig could count those 4 billions in 30 ms or less. That's less than usb handshake to xfer the new payload will take in average.

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: beekeeper on October 10, 2012, 02:54:11 AM

Its not misdesign, its trade-off. The 32 bit nonce is old design, probably it looked great when only CPUs where doing mining, but right now it looks like a bottleneck.

It doesn't not look like a bottleneck. It provides a factor of four billion in reduction in whatever serial task exists outside of that. I have yet to see any evidence that it's a bottleneck.

beekeeper

sr. member

Activity: 406

Merit: 250

LTC

Quote from: gmaxwell on October 09, 2012, 10:41:32 AM

Quote from: beekeeper on October 09, 2012, 10:04:38 AM

kano is right. I did encounter this bandwidth issue, its somehow the same annoyance for hardware developers as it was diff1 and multi GHs miners network bandwidth consumption for pool operators. Unfortunately, solving hardware bandwidth issues is not as simple as a workaround in software, most of the time it requires changing design and adding extra costs.

Then don't mis-design your hardware in the first place. Seriously, if you can't do the small bit of multiplication to figure out what your bandwidth requirements will be then you _have no business making mining hardware_, operating pools, etc... Nothing suggested avoids changing design and adding costs. At best the suggestions externalize cost on future bitcoin users.

Its not misdesign, its trade-off. The 32 bit nonce is old design, probably it looked great when only CPUs where doing mining, but right now it looks like a bottleneck.

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Depends on the NDA when I get one of the first ones

btharper

sr. member

Activity: 389

Merit: 250

Quote from: kano on October 10, 2012, 12:23:04 AM

Nope.

It's placing data in 96 bits of zeros ... as I said ... and it is still inside the 3rd round that already exists.
That's not a new round I'm adding - it's already there.
3 rounds is not a change, I'm simply pointing out what a lot of people don't even realise is there already.
... and ... that most of the data added in the 2nd round is 'zero'

If I am correct in understanding Inaba's comments about the BFL ASIC, they may even support this possibility already.
(i.e. if they do a proper full 64 round hash each time and the silicon doesn't decide how that is processed - rather the replaceable firmware)

---

Also, as I said, you already need to consider 2 orders of magnitude in dealing with the hardware ... before ASIC turns up.
ASIC is less than 2 months away and that's just version 1 of anyone's ASIC hardware.
... which is another order of magnitude at the absolute least.

Considering a forward planning change set to arrive in 2 years (or even one year) - who knows what the hashing performance will be by then.

Consider an old ATI 6950 graphics card - it has the equivalent of 1408 complex cores in it (integer shaders)
If version 1 of the ASIC has the equivalent of say only 100 and the complexity is well below that of an ATI core, then performance gains could easily be 128x in just the next generation of ASIC ... yeah I chose that 128 number on purpose

---

Unless you have something new to bring to the discussion, I think this one has ended.

What I was saying is that you could add just over 300 bits of to the current data structure without messing with the three total rounds performed (the same three we have now).

Right now the move to bigger and better is cost prohibitive. While BFL and competitors can crank out massive machines doing in excess of a TH/s, getting MORE will cost more. Unless a giant like AMD gets into things, it won't be as cheap as a $200 graphics card.

Also in two years we'll be halfway through the 25 BTC blocks. At a quarter of the current reward (in 4 years) mining will be much less profitable (assuming BTC value doesn't shoot up, and I don't think expecting it to double each time the reward halves is reasonable).

Things are getting faster. Much faster. So what? It just sounds like panic to me. I do think we've reached the limits of what we can argue about though. If nothing else we're not getting anywhere.

Any chance we can get more details on the flexibility of the new ASICS?

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Quote from: btharper on October 09, 2012, 11:39:58 PM

...
The example you give is probably the biggest reason this won't be put in there anytime soon. While allocating even another 300 or so bits (to keep things in three total rounds of SHA256) would be just fine by me, the difficulty involved in making such a change in very basic levels of the protocol. If we ever have to move away from SHA256 I think that would be a good time to do this, but right now the biggest problem is latency between pool servers and the other side of the USB cord. Getting things across the USB cord is becoming a problem and will need to be addressed by hardware manufactures. A few implementation details (GBT from getwork) will change, and things will keep on keeping on. Yea there's a bit of optimism in that, but I don't feel it's unwarranted.

Nope.

It's placing data in 96 bits of zeros ... as I said ... and it is still inside the 3rd round that already exists.
That's not a new round I'm adding - it's already there.
3 rounds is not a change, I'm simply pointing out what a lot of people don't even realise is there already.
... and ... that most of the data added in the 2nd round is 'zero'

If I am correct in understanding Inaba's comments about the BFL ASIC, they may even support this possibility already.
(i.e. if they do a proper full 64 round hash each time and the silicon doesn't decide how that is processed - rather the replaceable firmware)

---

Also, as I said, you already need to consider 2 orders of magnitude in dealing with the hardware ... before ASIC turns up.
ASIC is less than 2 months away and that's just version 1 of anyone's ASIC hardware.
... which is another order of magnitude at the absolute least.

Considering a forward planning change set to arrive in 2 years (or even one year) - who knows what the hashing performance will be by then.

Consider an old ATI 6950 graphics card - it has the equivalent of 1408 complex cores in it (integer shaders)
If version 1 of the ASIC has the equivalent of say only 100 and the complexity is well below that of an ATI core, then performance gains could easily be 128x in just the next generation of ASIC ... yeah I chose that 128 number on purpose

---

Unless you have something new to bring to the discussion, I think this one has ended.

btharper

sr. member

Activity: 389

Merit: 250

Quote from: kano on October 09, 2012, 06:28:48 PM

Firstly regarding the change.

The current hash is actually 3 passes though the 64 stage hash function (2 x sha256 but the 1st sha256 is 80 bytes so requires 2 x 64 stages)
The first pass is unaffected by rolling the nonce value or any added data after it up to a certain size (and also would be unaffected by rolling the time value ... there you go hardware people - implement the Roll-NTime hack in there with a difficulty input also - but that would be risky ... and is short term)

Anyway, the size increase change would be to have a version 2 (or 3 or whatever is next at the time) block with the nonce field having 12 extra bytes (96 bits) added after the current nonce (which is currently on the end of the 80 bytes) making the block header 92 bytes.

The rolling would one of:
1) anywhere the hardware likes in the 128 bits that suited the hardware design best
OR
2) a subset given by a pool to allow the pool to avoid having to do any other major work than generate new merkle trees every time the transaction list needs updating (all the time

)
I'd also add that the pool should be setting work difficulty high enough to appropriately (ask organofcorti) minimise work returned and work lifetime small enough to reduce transaction delays

What this change would mean is that ANY device designed with this option could be given work (and difficulty setting) and 'could' be given a time frame and then the device could hash up to the time frame and then stop, independent of the performance of the device (of course as some of the hardware vendors know, returning valid nonce's during hashing is the best way to give answers back)

Thus we also wouldn't have the other problem that the hacks cause:
Extra delays in Bitcoin transaction processing
since the time frames could be set to an appropriate value to minimise this effect (seconds) and ALL devices could guarantee that they could work for that (short) time frame and the mining software could spend that time processing results and setting up the next work ... as they all should (already Tongue

)

---

Meanwhile, it is interesting to see people shift the problem around
e.g. slush saying it is the mining hardware problem - the hardware should have a faster 'wire' to deal with the problem - but it could rightfully be argued that it is also the pools problem - they should have faster hardware and better networks as required ... both are valid

This solution solves both

I will also give a very good example of how such arguments about the hardware ignore obvious limitations:
Xiangfu (as anyone with an Icarus should know who he is) had 91 USB Icarus devices connected to a single computer and a single cgminer instance running (with plenty of CPU to spare

) ... so with such a setup, you have to consider 2 orders of magnitude in performance ...............

My solution is a long term solution that (of course) is not going to happen today, but it seems the dev fear of hard forks any time in the distant future (and the fact mentioned by someone else that this nonce issue was brought up 2 years ago) probably means it will never be fixed.

No idea if there is much else to say, but I guess if the discussion has no more merit (due to this last point) then this is a far as it will go.

---

Aside: there is a well known bug in the bitcoin difficulty calculation that gets it wrong (always) but since it's wrong by a small % and that it requires a hard fork, it has never been fixed. Yes, how much you get paid for your BTC mining work, is actually always a faction low

I make this comment so as to shed more light on other comments about hard forks

The example you give is probably the biggest reason this won't be put in there anytime soon. While allocating even another 300 or so bits (to keep things in three total rounds of SHA256) would be just fine by me, the difficulty involved in making such a change in very basic levels of the protocol. If we ever have to move away from SHA256 I think that would be a good time to do this, but right now the biggest problem is latency between pool servers and the other side of the USB cord. Getting things across the USB cord is becoming a problem and will need to be addressed by hardware manufactures. A few implementation details (GBT from getwork) will change, and things will keep on keeping on. Yea there's a bit of optimism in that, but I don't feel it's unwarranted.

slush

legendary

Activity: 1386

Merit: 1097

Slush is saying that it is mining *software* problem. Even with future ASIC miners it will be feasible to prepare block headers on decent computer. If there is any real problem, then it is just using obsolete protocols between computer and mining device.

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Firstly regarding the change.

The current hash is actually 3 passes though the 64 stage hash function (2 x sha256 but the 1st sha256 is 80 bytes so requires 2 x 64 stages)
The first pass is unaffected by rolling the nonce value or any added data after it up to a certain size (and also would be unaffected by rolling the time value ... there you go hardware people - implement the Roll-NTime hack in there with a difficulty input also - but that would be risky ... and is short term)

Anyway, the size increase change would be to have a version 2 (or 3 or whatever is next at the time) block with the nonce field having 12 extra bytes (96 bits) added after the current nonce (which is currently on the end of the 80 bytes) making the block header 92 bytes.

The rolling would one of:
1) anywhere the hardware likes in the 128 bits that suited the hardware design best
OR
2) a subset given by a pool to allow the pool to avoid having to do any other major work than generate new merkle trees every time the transaction list needs updating (all the time

)
I'd also add that the pool should be setting work difficulty high enough to appropriately (ask organofcorti) minimise work returned and work lifetime small enough to reduce transaction delays

What this change would mean is that ANY device designed with this option could be given work (and difficulty setting) and 'could' be given a time frame and then the device could hash up to the time frame and then stop, independent of the performance of the device (of course as some of the hardware vendors know, returning valid nonce's during hashing is the best way to give answers back)

Thus we also wouldn't have the other problem that the hacks cause:
Extra delays in Bitcoin transaction processing
since the time frames could be set to an appropriate value to minimise this effect (seconds) and ALL devices could guarantee that they could work for that (short) time frame and the mining software could spend that time processing results and setting up the next work ... as they all should (already Tongue

)

---

Meanwhile, it is interesting to see people shift the problem around
e.g. slush saying it is the mining hardware problem - the hardware should have a faster 'wire' to deal with the problem - but it could rightfully be argued that it is also the pools problem - they should have faster hardware and better networks as required ... both are valid

This solution solves both

I will also give a very good example of how such arguments about the hardware ignore obvious limitations:
Xiangfu (as anyone with an Icarus should know who he is) had 91 USB Icarus devices connected to a single computer and a single cgminer instance running (with plenty of CPU to spare

) ... so with such a setup, you have to consider 2 orders of magnitude in performance ...............

My solution is a long term solution that (of course) is not going to happen today, but it seems the dev fear of hard forks any time in the distant future (and the fact mentioned by someone else that this nonce issue was brought up 2 years ago) probably means it will never be fixed.

No idea if there is much else to say, but I guess if the discussion has no more merit (due to this last point) then this is a far as it will go.

---

Aside: there is a well known bug in the bitcoin difficulty calculation that gets it wrong (always) but since it's wrong by a small % and that it requires a hard fork, it has never been fixed. Yes, how much you get paid for your BTC mining work, is actually always a faction low

I make this comment so as to shed more light on other comments about hard forks

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: beekeeper on October 09, 2012, 10:04:38 AM

kano is right. I did encounter this bandwidth issue, its somehow the same annoyance for hardware developers as it was diff1 and multi GHs miners network bandwidth consumption for pool operators. Unfortunately, solving hardware bandwidth issues is not as simple as a workaround in software, most of the time it requires changing design and adding extra costs.

Then don't mis-design your hardware in the first place. Seriously, if you can't do the small bit of multiplication to figure out what your bandwidth requirements will be then you _have no business making mining hardware_, operating pools, etc... Nothing suggested avoids changing design and adding costs. At best the suggestions externalize cost on future bitcoin users.

Topic: Handle much larger MH/s rigs : simply increase the nonce size (Read 10091 times)