[Announcement] Avalon ASIC Development Status [Batch #1] - page 20.

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Quote from: fpf on December 19, 2012, 05:42:27 AM

Quote from: kano on December 18, 2012, 07:12:58 AM

Hmm, what is this 'golden nonce' ?

The one (or more) out of the 4294967296 nonces that solve the work and can be submitted & can/will be accepted by the pool server for the task/workload received.
...

Well that's just using the term 'golden' to mean 'valid' or 'share'

(yes I have spent a lot of time looking at nonces Tongue

)

I was meaning that: Was there a specific 'test' nonce that had been referred to as 'THE golden nonce'?
I take all the replies to mean the answer is 'no' - it's just being used as a term for a valid nonce.
Your device generates ~20,116 of them per day per GH/s - so 'golden' seems a bit of an over zealous name Tongue

In Icarus, what happens to detect the device, is to send specific work to the device and expect a reply with an expected nonce value
Xiangfu called this nonce the 'golden_nonce'
What I did on the cgminer code was find a faster (better) one that took ~0.53ms to calculate so when Xiangfu had roughly 40 Icarus it didn't take long to test them all - only a few seconds.
I just didn't know that people referred to the valid values, in general, as 'golden' - I was wondering if there was a specific work+nonce that was called 'golden' due to some special attributes of it (which the answer is 'no')

tacotime

legendary

Activity: 1484

Merit: 1005

Quote from: Bogart on December 19, 2012, 11:14:06 PM

Quote from: kano on December 18, 2012, 07:12:58 AM

Hmm, what is this 'golden nonce' ?

I think if you catch the golden nonce, it's worth 150 points and the match is over.

http://www.youtube.com/watch?v=HPVhmZodaLA#t=1m35s

PuertoLibre

legendary

Activity: 1890

Merit: 1003

Quote from: Dhomochevsky on December 20, 2012, 01:46:21 PM

I don't mean to interrupt this most interesting and totally not boring argument, but I feel like I need to second Frequency above: Are there any chances to get another update soon? Maybe one last update before christmas? Also, as far as I understood it, there was a test/demo planned for the end of December. Is that still the case? Or has the demo been moved to January?

Thanks.

Seconded, great question.

Add to that, is the schedule looking good so far?

Dhomochevsky

sr. member

Activity: 242

Merit: 251

I don't mean to interrupt this most interesting and totally not boring argument, but I feel like I need to second Frequency above: Are there any chances to get another update soon? Maybe one last update before christmas? Also, as far as I understood it, there was a test/demo planned for the end of December. Is that still the case? Or has the demo been moved to January?

Thanks.

hardcore-fs

full member

Activity: 196

Merit: 100

As regards the

Quote

Posting insults & very basic / unrelated facts (of actually any logic or programmable logic) doesn't help here.
"In the land of the blind, the one-eyed man is king"

Is the above even English?

you made a hardcore statement about 'nonces/golden nonces'

Quote

The first option would be to deduct a certain fixed value from the nonce, it's much more efficient for the mining software to do that than it's done in some of the current bitstreams.

it is irrelevant in "what" context it was made,no matter how you try to spin it, A VERY clear statement was made and I cannot think of ANY situation where the "mining software" is going to be able to do that in a more efficient way than pure logic.
EVEN IF the value was subtracted at EVERY stage of the nonce calculation.
In fact in at least one public core IT IS done this way, and do you know what... the calculation as regards time is ABSOLUTELY FREE.

Quote

The 32-bit wide constant subtractor in that design limits the whole speed of the design, you can speed up the whole design by removing that subtractor and simply deduct the constant later from the "nonce" received from the chip to get the "golden nonce"...
It's simply about gaining some overhead, the weakest link breaks the chain.

nope!!!!, it does not "limit the speed", because the subtraction CAN be done at the SAME TIME INSIDE the logic, because you ALREADY KNOW THE VALUE OF THE NONCE BEFORE you start the SHA256(SHA256(x)) Hash.

Notice "nonce" & "currnonce" & clock, one lags the other by 131 in this case , because that is the depth of the current nonce calculations, but we can clearly see the calculation between the two nonces occurs & ends on the same clock cycles, making it effectively "free" to calculate the difference.

if you cannot understand the basic principles of digital logic, then I'm not going to waste my time trying to explain it further.

fpf

newbie

Activity: 20

Merit: 0

Quote from: 2112 on December 17, 2012, 10:26:55 AM

Of course Avalon's logic is secret, but I'm going to discuss the problem based on one of the open-source FPGA hashers. It had a critical timing path in the logic that latched the "golden nonce". Since the design was 125-deep pipelined it had a hardware that subtracted constant 125 from the nonce counter before sending it out of the chip.

Now we have two ways to speed up the above design:

1) remove the 32-bit wide constant subtractor. This will gain a fraction of a nanosecond on every hash tried. It is very easy to subtract 125 in software from the nonce downloaded from the chip.

2) acknowledge that the timing violation may occur and the nonce latched may not be the exact one that solved the block, but a next one or previous one, depending on the details of the latching logic. It is somewhat more involved, but still easily doable in software: recompute the hashes for nonce values n-126,n-125,n-124 and use the one that solved the block. Again this will make the design more tolerant to overclocking for every hash tried inside the chip.

Obviously 1) cannot be applied to the ASIC chip or closed-source FPGA bitstream. But the method 2) remains applicable, just use a different set of test values.

@hardcore-fs
Please read the context in which things were said:

The 32-bit wide constant subtractor in that design limits the whole speed of the design, you can speed up the whole design by removing that subtractor and simply deduct the constant later from the "nonce" received from the chip to get the "golden nonce"...
It's simply about gaining some overhead, the weakest link breaks the chain.

If you can gain even more overhead by assuming that the latched nonce in the chip is not the "golden" one but very close by as stated by him, a few nonce validity checks nearby will finally reveal the "golden nonce" this way you can push the chip in terms of clock and internal timings to its limits and even a bit beyond.

If this can improve the maximum speed that can be reached for the device significantly - in exchange for a bit of insignificant cpu time every nonce found (we talk about crosschecking only a hand full of nonces here vs. the workload, and that roughly every 10.7 seconds for a 200 mhz core) than yes - it's an acceptable way.

Quote

You mean like adding lubricant to your tiers so you can go down hill faster.

You need grip to make use of the car's engine, no point.. but IF the task is to get the car down the hill the fastest way possible with engaged breaks, without the need of being able to stop it and all you have is an unlimited supply of lubricant - than yes adding lubricant to both the street and the tiers to accomplish the task is the way to go.

Quote

You sir are a fucking idiot.
FPGA's process in true parallel.
I can process thousands....(nay tens of thousands) of 32 bit subtractions in an FPGA, before you have even fucking read the numbers into your CPU registers.

Posting insults & very basic / unrelated facts (of actually any logic or programmable logic) doesn't help here.
"In the land of the blind, the one-eyed man is king"

Quote

For interest take a look at one of the ASICS floating about, they have given a proposed pinout showing 8 data lines and some strobes.
WTF.... even the nonce will require 4 CLK cycles just to get it out of the chip and they are claiming this design is good into the GH/S range?

Here we go Cool

a "truly parallel" 8bit Data-bus
Of course it's good into the GH/s range, the traffic is low since only the results ("golden nonces") need to be collected, everything else gets discarded already in the chip... The only way to make it faster than it is right now would be having a 32bit databus to get the whole nonce out of the chip in one CLK cycle, would it matter? no... waste of resources and space, 24 more pins/tracks to deal with for no real benefit... the same is true for getting the "work" to the chip of course (which is of course more than 4 bytes...)

Edit: Should be said for the sake of completeness, those 4 clock cycles needed to collect the nonce will be from an external controller and are not directly related to the internal clock used by the hashing chip, further, the clock for collecting the data will be slower than the internal clock used by the hashing chip. Changes nothing about the situation though.

nathanrees19

full member

Activity: 196

Merit: 100

Quote from: Bogart on December 19, 2012, 11:14:06 PM

Quote from: kano on December 18, 2012, 07:12:58 AM

Hmm, what is this 'golden nonce' ?

I think if you catch the golden nonce, it's worth 150 points and the match is over.

They really ought to nerf that down to 15 points. Then the rest of the team would matter.

Bogart

legendary

Activity: 966

Merit: 1000

Quote from: kano on December 18, 2012, 07:12:58 AM

Hmm, what is this 'golden nonce' ?

I think if you catch the golden nonce, it's worth 150 points and the match is over.

hardcore-fs

full member

Activity: 196

Merit: 100

Quote from: fpf on December 19, 2012, 06:19:19 AM

I know that, but the consequence of those changes is the fix/option in the mining software - Cgminer is just an example here. It's actually the best location for that task since all the functions are already implemented, there is just a special option needed to re-purpose those functions. (such as the nonce validation)

The first option would be to deduct a certain fixed value from the nonce, it's much more efficient for the mining software to do that than it's done in some of the current bitstreams. The next option would be allowing Cgminer a certain range for the nonce validation. (To do a few nonce calculations on the CPU is not a big deal and it could save that (only slightly) corrupted nonce from being ignored and regarded as hardware error...)

You mean like adding lubricant to your tiers so you can go down hill faster.

Quote

The first option would be to deduct a certain fixed value from the nonce, it's much more efficient for the mining software to do that than it's done in some of the current bitstreams.

You sir are a fucking idiot.
FPGA's process in true parallel.
I can process thousands....(nay tens of thousands) of 32 bit subtractions in an FPGA, before you have even fucking read the numbers into your CPU registers.

Frequency

hero member

Activity: 540

Merit: 500

COINDER

Any real news yet or some project update !!!!! Huh

beekeeper

sr. member

Activity: 406

Merit: 250

LTC

In my humble opinion..
There is a wall.. one side, regular BTC developer, the other side people who understand the hardware..
Stop trying to bridge, it is not worth..

fpf

newbie

Activity: 20

Merit: 0

I know that, but the consequence of those changes is the fix/option in the mining software - Cgminer is just an example here. It's actually the best location for that task since all the functions are already implemented, there is just a special option needed to re-purpose those functions. (such as the nonce validation)

The first option would be to deduct a certain fixed value from the nonce, it's much more efficient for the mining software to do that than it's done in some of the current bitstreams. The next option would be allowing Cgminer a certain range for the nonce validation. (To do a few nonce calculations on the CPU is not a big deal and it could save that (only slightly) corrupted nonce from being ignored and regarded as hardware error...)

beekeeper

sr. member

Activity: 406

Merit: 250

LTC

Quote from: fpf on December 19, 2012, 05:56:47 AM

Quote from: 2112 on December 17, 2012, 10:26:55 AM

1) remove the 32-bit wide constant subtractor. This will gain a fraction of a nanosecond on every hash tried. It is very easy to subtract 125 in software from the nonce downloaded from the chip.

2) acknowledge that the timing violation may occur and the nonce latched may not be the exact one that solved the block, but a next one or previous one, depending on the details of the latching logic. It is somewhat more involved, but still easily doable in software: recompute the hashes for nonce values n-126,n-125,n-124 and use the one that solved the block. Again this will make the design more tolerant to overclocking for every hash tried inside the chip.

Yes, would be great if Cgminer would have such a option, where we could manually define such prefixed nonce modifications or even a certain range for a "rescan" over the PC's CPU in a defined range. As I know, Cgminer checks already if the "golden nonce" submitted to it is valid and if not counts it as hardware error, so a implementation should be rather easy and also work with any hardware supported by Cgminer...

FPF

2112 is talking about modifications in FPGA bitstream, ofc software modification to cope with new bitstream would be trivial.

fpf

newbie

Activity: 20

Merit: 0

Quote from: 2112 on December 17, 2012, 10:26:55 AM

1) remove the 32-bit wide constant subtractor. This will gain a fraction of a nanosecond on every hash tried. It is very easy to subtract 125 in software from the nonce downloaded from the chip.

2) acknowledge that the timing violation may occur and the nonce latched may not be the exact one that solved the block, but a next one or previous one, depending on the details of the latching logic. It is somewhat more involved, but still easily doable in software: recompute the hashes for nonce values n-126,n-125,n-124 and use the one that solved the block. Again this will make the design more tolerant to overclocking for every hash tried inside the chip.

Yes, would be great if Cgminer would have such a option, where we could manually define such prefixed nonce modifications or even a certain range for a "rescan" over the PC's CPU in a defined range. As I know, Cgminer checks already if the "golden nonce" submitted to it is valid and if not counts it as hardware error, so a implementation should be rather easy and also work with any hardware supported by Cgminer...

FPF

fpf

newbie

Activity: 20

Merit: 0

Quote from: kano on December 18, 2012, 07:12:58 AM

Hmm, what is this 'golden nonce' ?

The one (or more) out of the 4294967296 nonces that solve the work and can be submitted & can/will be accepted by the pool server for the task/workload received.

About loosing some nonces because of hardware errors, yes that can happen, and I don't think that is some new issue, should also have been the case with GPUs especially when you overclocked them...

@The one concerned (Forgot where I read this)
using & loosing a clock cycle or multiple clock cycles, to read out the found "golden" nonce from any dedicated mining hardware is totally no issue at the clock rates they run on...

A lot of the current hardware on the market looses a lot more time doing that than "4 clock cycles" and I really mean a loooot more.
Let's take the Icarus as an example, even if it can latch the result in one clock cycle it will need a lot of actual clock cycles until the result is finally at it's destination. At the end of the day, does it matter? no.... because for a full range scan you need 4294967296 clockcycles... (assuming the device does 1 nonce per clock cycle and scans the whole range)
4294967.296 clock cycles would cost 0.1% of the performance of the device.... It's still totally insignificant.

Icarus Response:
(4 bytes + 4 x start and 8 x stop bits at 115200 bps (it uses 2 stop bits for each byte if I remember right) or twice this time in case the nonce came from the cascaded FPGA and the nonce gets "forwarded" + the delays by the USB to uart chip + the delays caused by the USB system itself (package based) and the drivers for the usb to uart chip.. + the operating system + the actual miner software + the whole thing again even longer this time because uploading a new "task" takes even longer... - at the end of the day, does it matter? no.... - now you know why even thousands of "clock cycles" are still totally insignificant for such devices...

About multiple cores on a single chip - there are 2 ways - either each of them get their own "work" or the range gets split, it's possible that there is more than 1 "golden nonce" to the "work" - however having more than one golden nonce at the exactly same clock cycle is very unlikely.

FPF

nathanrees19

full member

Activity: 196

Merit: 100

Quote from: hardcore-fs on December 18, 2012, 01:58:41 AM

loosing a "nonce" is not just a case of hay its only .00000000016% of the nonces I generated

To an end-user, that's exactly what it is. If lost nonces (even if they happen in pairs) are a less common failure mode than internet dropouts, pools going offline and stale shares, then only logic-purists will care.

Quote from: hardcore-fs on December 18, 2012, 01:58:41 AM

you wasted the resources

Look at it this way: If a 1GH/W chip has a 1% nonce loss rate, and a 0.5GH/W chip has a 0% nonce loss rate, then one would be wasting power by using the latter chip, even if it is logically perfect.

Tldr;

beekeeper

sr. member

Activity: 406

Merit: 250

LTC

Quote from: kano on December 18, 2012, 03:31:43 PM

Quote from: beekeeper on December 18, 2012, 12:01:18 PM

Quote from: kano on December 18, 2012, 07:12:58 AM

Hmm, what is this 'golden nonce' ?

Lol man, you ever were curious to look over HDL files from those public FPGA projects?

No, I just remember the golden nonce (that xiangfu put) in the Icarus code that I replaced with a better one - so I'm curious about there being some 'generic' golden nonce being referred to since it was indeed named above without reference.

One example ( fpgaminer_top.v):

Code:

		// Check to see if the last hash generated is valid.
		is_golden_ticket <= (hash2[255:224] == 32'h00000000) && !feedback_d1;
		if(is_golden_ticket)
		begin
			// TODO: Find a more compact calculation for this
			if (LOOP == 1)
				golden_nonce <= nonce - 32'd131;
			else if (LOOP == 2)
				golden_nonce <= nonce - 32'd66;
			else
				golden_nonce <= nonce - GOLDEN_NONCE_OFFSET;
		end

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Quote from: beekeeper on December 18, 2012, 12:01:18 PM

Quote from: kano on December 18, 2012, 07:12:58 AM

Hmm, what is this 'golden nonce' ?

Lol man, you ever were curious to look over HDL files from those public FPGA projects?

No, I just remember the golden nonce (that xiangfu put) in the Icarus code that I replaced with a better one - so I'm curious about there being some 'generic' golden nonce being referred to since it was indeed named above without reference.

beekeeper

sr. member

Activity: 406

Merit: 250

LTC

Quote from: kano on December 18, 2012, 07:12:58 AM

Hmm, what is this 'golden nonce' ?

Lol man, you ever were curious to look over HDL files from those public FPGA projects?

MrTeal

legendary

Activity: 1274

Merit: 1004

Quote from: hardcore-fs on December 18, 2012, 01:58:41 AM

I'm not having a go at anyone, but rather pointing out an issue that effects ANY multi-cored logic design.
The issue I have with all this, is that it went:

CPU->GPU->FPGA->ASIC

and now here we are talking about:
ASIC+CPU to correct for possible piss poor logic design, next it will be "HAY, I have an Idea, Let's use ASIC+GPU to find possible missed nonces ,when we might have dropped one"

We already have that kind of combination, and it's no like ASIC will directly interface the network any time soon. Baring some pool op correcting me, I don't think any pools are using GPGPU code on their backend, and Bitcoind is obviously using CPU cycles to deal with the output of the mining device.

Anyway, regarding piss-poor design, I really can't speak on this case (and honestly neither can you). We don't know how Avalon has implemented their design, all we know is that they have a serial interface and there's no reason to think that will be a bottleneck in their design, even will multiple devices sharing the bus.

Topic: [Announcement] Avalon ASIC Development Status [Batch #1] - page 20. (Read 155355 times)