Pages:
Author

Topic: [Announcement] Avalon ASIC Development Status [Batch #1] - page 22. (Read 155346 times)

legendary
Activity: 3878
Merit: 1193
It's not unlikely at all... depending on power factors, we estimate between 64 and 128 chips per device.  88 is an odd number (not impossible of course), but given the propensity for the Avalon team to like nice round figures, I suspect a multiple of 8.  My thoughts are they are going to have to go with more than 64 chips to overcome the particular issues they haven't run into quite yet.

Why dont you spend less time "guessing" what your competition is doing, what stages they are at and how further behind than you they are and spend more time answering the multitude of questions people have surrounding your shitty company ?

This thread is about Avalon ASIC Development Status, and Inaba is staying on topic.

Inaba has no idea whatsoever how Avalon is doing things, so he has no useful info to add to the thread, therefore he's just trolling as usual.
legendary
Activity: 966
Merit: 1000
No, a solution is not provided every clock cycle. A mining logic block will drop non-solutions without requiring any communication with any external logic: it just has to look if the high 32-bits are zero or not.

The end-result is that a a ~7.5 Ghash/sec chip, for example, is going to output a difficulty-1 solution every half second, on average. That's only a few hundred bytes transmitted every second. Hardly "rocket science".
I think I understand what hardcore-fs has on his mind. He is saying that with multiple hashing pipelines you may miss more valuable difficulty-n (n>1) share if your glue hardware is occupied with transmiting a difficulty-1 share that had just been found by another pipeline. This situation is probably infrequent, but he insists on a synchronous FIFO to handle it properly.

These chips crunch near a billion hashes per second.  Losing a small handful of those each second is miniscule.

Mine along on your CPU if you wanna make up the difference and then some.
bce
sr. member
Activity: 756
Merit: 250
It's not unlikely at all... depending on power factors, we estimate between 64 and 128 chips per device.  88 is an odd number (not impossible of course), but given the propensity for the Avalon team to like nice round figures, I suspect a multiple of 8.  My thoughts are they are going to have to go with more than 64 chips to overcome the particular issues they haven't run into quite yet.



Why dont you spend less time "guessing" what your competition is doing, what stages they are at and how further behind than you they are and spend more time answering the multitude of questions people have surrounding your shitty company ?


This thread is about Avalon ASIC Development Status, and Inaba is staying on topic.
legendary
Activity: 2128
Merit: 1073
No, a solution is not provided every clock cycle. A mining logic block will drop non-solutions without requiring any communication with any external logic: it just has to look if the high 32-bits are zero or not.

The end-result is that a a ~7.5 Ghash/sec chip, for example, is going to output a difficulty-1 solution every half second, on average. That's only a few hundred bytes transmitted every second. Hardly "rocket science".
I think I understand what hardcore-fs has on his mind. He is saying that with multiple hashing pipelines you may miss more valuable difficulty-n (n>1) share if your glue hardware is occupied with transmiting a difficulty-1 share that had just been found by another pipeline. This situation is probably infrequent, but he insists on a synchronous FIFO to handle it properly.

I had similar problem back in school, where we had to handle quite improbable fault conditions but we didn't wanted to lose track of them. We simply used asynchronous S/R flip-flops and interrupts. Software would single-step backtrack the faulty channels if more than one fault occured nearly simultaneously.

I think the same approach can be used for hashing chip: don't bother catching exact nonce; since you know the order in which nonces are tried you can check couple of previous nonces in software. Even if the hashing chip cannot reliably use asynchronous S/R flip-flops it could for sure use synchronous J/K flip-flops.

Basically, it is a hardware/software tradeoff in handling rare, but important, conditions.
full member
Activity: 196
Merit: 100
or are we just going to "pretend" there was only 1 result and discard the other solutions

Yes. I don't care if 0.000001% of results are lost.

Edit: What failure rate would be acceptable to a "serious" miner? I'm guessing that 0.1% would not even be noticeable.
mrb
legendary
Activity: 1512
Merit: 1028
"because mining is an embarrassingly parallel workload that requires very little bandwidth, so it is trivial to design the interconnect so as to not make it a bottleneck."

Sorry I would have to disagree with this, if you take a look at some of the RTL floating about,  a solution is provided every clock cycle.
That solution has to be tested and extracted, therefore the more engines you have working on solutions, the higher the probability of generating multiple nonces during the same clock phase that satisfy the rules you are looking for.

lets say that for the sake of argument you have 4 cores running independently at 100Mhz or 100MHs and all four cores produce a solution at the same time (rare but it can happen).
The internal silicon must then be capable of dealing with those 4 results during the same clock cycle. (how you gonna do that?), run the combiner logic at 4* the system clock?, so that you can process the 4 results is a "single" 100Mhz clk cycle, but 4 cycles at 400Mhz?
yep you could split the design down into groups of two engines and process the results in parallel at 200Mhz, but eventually it all has to be combined to get it out of the chip. Now multiply that by the number of cores some of these designs are running (6?)
or are we just going to "pretend" there was only 1 result and discard the other solutions.

Then you have to FIFO all this crap so that you can get it out of the chip, so the more cores you have on the chip, the more problems you have as regards raw silicon design, that is before you even think about HOW you are going to get work into the chip.

For interest take a look at one of the ASICS floating about, they have given a proposed pinout showing 8 data lines and some strobes.
WTF.... even the nonce will require 4 CLK cycles just to get it out of the chip and they are claiming this design is good into the GH/S range?

No, a solution is not provided every clock cycle. A mining logic block will drop non-solutions without requiring any communication with any external logic: it just has to look if the high 32-bits are zero or not.

The end-result is that a a ~7.5 Ghash/sec chip, for example, is going to output a difficulty-1 solution every half second, on average. That's only a few hundred bytes transmitted every second. Hardly "rocket science".
mem
hero member
Activity: 644
Merit: 501
Herp Derp PTY LTD
It's not unlikely at all... depending on power factors, we estimate between 64 and 128 chips per device.  88 is an odd number (not impossible of course), but given the propensity for the Avalon team to like nice round figures, I suspect a multiple of 8.  My thoughts are they are going to have to go with more than 64 chips to overcome the particular issues they haven't run into quite yet.



Why dont you spend less time "guessing" what your competition is doing, what stages they are at and how further behind than you they are and spend more time answering the multitude of questions people have surrounding your shitty company ?
full member
Activity: 196
Merit: 100
 "because mining is an embarrassingly parallel workload that requires very little bandwidth, so it is trivial to design the interconnect so as to not make it a bottleneck."

Sorry I would have to disagree with this, if you take a look at some of the RTL floating about,  a solution is provided every clock cycle.
That solution has to be tested and extracted, therefore the more engines you have working on solutions, the higher the probability of generating multiple nonces during the same clock phase that satisfy the rules you are looking for.

lets say that for the sake of argument you have 4 cores running independently at 100Mhz or 100MHs and all four cores produce a solution at the same time (rare but it can happen).
The internal silicon must then be capable of dealing with those 4 results during the same clock cycle. (how you gonna do that?), run the combiner logic at 4* the system clock?, so that you can process the 4 results is a "single" 100Mhz clk cycle, but 4 cycles at 400Mhz?
yep you could split the design down into groups of two engines and process the results in parallel at 200Mhz, but eventually it all has to be combined to get it out of the chip. Now multiply that by the number of cores some of these designs are running (6?)
or are we just going to "pretend" there was only 1 result and discard the other solutions.

Then you have to FIFO all this crap so that you can get it out of the chip, so the more cores you have on the chip, the more problems you have as regards raw silicon design, that is before you even think about HOW you are going to get work into the chip.

For interest take a look at one of the ASICS floating about, they have given a proposed pinout showing 8 data lines and some strobes.
WTF.... even the nonce will require 4 CLK cycles just to get it out of the chip and they are claiming this design is good into the GH/S range?


legendary
Activity: 1890
Merit: 1003
@ Mrb

I'll try better next time.
mrb
legendary
Activity: 1512
Merit: 1028
But if there were, the slightest overclocking of that group of chips would incur one hell of a performance gain. (And lots of extra electrical use)
This is nonsense. Overclocking by x% always brings a constant x% performance gain, whether it is 88 small chips or, say, 4 large chips.
Not exactly true, there are interconnect/bus issues and firmware issues that might stop that from being true.

In the context of Bitcoin mining, an x% overclock does bring an x% performance gain, because mining is an embarrassingly parallel workload that requires very little bandwidth, so it is trivial to design the interconnect so as to not make it a bottleneck. This specific argument is certaintly not going to explain that 88 chips will perform better than 4 large chips when overclocked (if interconnect was even an issue, it would be more a problem for 88 chips than for 4 chips).

As long as you are dividing the heat load across a wider array of chips with more surface area [88 for example] and as long as your cooling is sufficient for all 88 (and you have enough space for all the chips), no single die should experience the same heat load as 1 in a group of 8. You can double the clock on a group of 88, but the heat is shared across a wider area.

Then you should have said "it is easier to overclock 88 small chips than 4 large chips" (which I agree with). Your sentence "the slightest overclocking of that group of chips would incur one hell of a performance gain" does not convey this idea at all. You need to communicate your ideas more clearly if you want to be understood.
legendary
Activity: 1890
Merit: 1003
But if there were, the slightest overclocking of that group of chips would incur one hell of a performance gain. (And lots of extra electrical use)

This is nonsense. Overclocking by x% always brings a constant x% performance gain, whether it is 88 small chips or, say, 4 large chips.
Not exactly true, there are interconnect/bus issues and firmware issues that might stop that from being true. If you have overclocked a normal CPU you know that something other than the main chip itself might limit a decent performance gain. Chips are usually a part of a system and not a standalone device.

Talking in terms of ASIC being overclocked, it depends on a number of design decisions. Overclocking 8 massive chips from 60 to 120 Gh/s is not the "exactly" the same deal as overclocking 88 chips that subdivide the work.

As long as you are dividing the heat load across a wider array of chips with more surface area [88 for example] and as long as your cooling is sufficient for all 88 (and you have enough space for all the chips), no single die should experience the same heat load as 1 in a group of 8. You can double the clock on a group of 88, but the heat is shared across a wider area.

In modern computing the idea is to create hyper efficient chips at a decent clock rate and stack them in as tiny a package as you possibly can. In fact, these days most CPU vendors are trying to compact as many cores as possible into one socket.

AMD has 32 per socket as an experimental design while intel is aiming for 50.

----------------

In the ASICs coming from the vendors, depending on the design decisions being made, you don't have to go with that logic. You can spread it out into clusters/modules with their own heatsink (like bASIC did).

Anyway, overclocking is much more than just changing the rate of the clock if you are designing the hardware. Perhaps Avalon has gone with the "shot gun" approach where the chips are all very inefficient but they make up the difference by:

1) Perhaps in their simplicity. (reliable, easy to produce dies?)
2) Perhaps by being so tiny alot of them can be packaged together like a mini rig?

I dunno. But there is more than one way to build a system. As long as you change the principles of the design enough that it makes practical sense.

BFL went with the idea of creating dense "Full custom" chips with high performance in a low nm process. But they are no "Intel" or "AMD". God knows how many failures they might face per wafer if their fab bakes the chips just slightly off.

Intel and AMD have their fabs set up to try tons of different combinations in one go. As the fab proceeds they get good data on what worked great and what works terrible as the layers are checked and baked. Therefore the first chips out of their fabs are usually the worst. While the last runs are their best and most efficient chips (and highly overclockable).

etc...

mrb
legendary
Activity: 1512
Merit: 1028
But if there were, the slightest overclocking of that group of chips would incur one hell of a performance gain. (And lots of extra electrical use)

This is nonsense. Overclocking by x% always brings a constant x% performance gain, whether it is 88 small chips or, say, 4 large chips.
legendary
Activity: 1890
Merit: 1003
I am sort of remembering the 7.5*7.5mm number for BFL was the size of the package and not the die size. I am not sure the die size was never revealed but it must be much smaller to fit into the package.

As I recall, the BFL package size was 11mm*11mm.

How would one organise 88 chips? Would it be a good idea to put them all on one PCB, or stack PCBs with 22 or 44 chips?
I doubt there are 88 chips. But if there were, the slightest overclocking of that group of chips would incur one hell of a performance gain. (And lots of extra electrical use)
bce
sr. member
Activity: 756
Merit: 250
Inaba, when I readed your post I couldn't belive you where not trolling.
I'm really happy of it.

i had to read it two/three times  Cheesy

Thats nothing,Frizzzzzzzzzz has to read most everything 5-6 times  Cheesy

That's nothing.  I usually read things even more times - lots of times.  It's pretty easy to do, really.  Smiley   <---- positive vibes
legendary
Activity: 2212
Merit: 1001
Inaba, when I readed your post I couldn't belive you where not trolling.
I'm really happy of it.

i had to read it two/three times  Cheesy

Thats nothing,Frizzzzzzzzzz has to read most everything 5-6 times  Cheesy
sr. member
Activity: 473
Merit: 250
Sodium hypochlorite, acetone, ethanol
Inaba, when I readed your post I couldn't belive you where not trolling.
I'm really happy of it.

i had to read it two/three times  Cheesy
legendary
Activity: 1176
Merit: 1001
Inaba, when I readed your post I couldn't belive you where not trolling.
I'm really happy of it.
full member
Activity: 196
Merit: 100
I am sort of remembering the 7.5*7.5mm number for BFL was the size of the package and not the die size. I am not sure the die size was never revealed but it must be much smaller to fit into the package.

As I recall, the BFL package size was 11mm*11mm.

How would one organise 88 chips? Would it be a good idea to put them all on one PCB, or stack PCBs with 22 or 44 chips?
full member
Activity: 209
Merit: 101
FUTURE OF CRYPTO IS HERE!
I am sort of remembering the 7.5*7.5mm number for BFL was the size of the package and not the die size. I am not sure the die size was never revealed but it must be much smaller to fit into the package.
legendary
Activity: 1260
Merit: 1000
they would need 88 chips. That sounds very unlikely to me...
For a Chinese designers 88 would be a doubly prosperous number or joy number. Sounds likely to me...

http://en.wikipedia.org/wiki/Numbers_in_Chinese_culture#Eight


Wow good point!  That would be a pretty cool thing to design into/around.
Pages:
Jump to: