Pages:
Author

Topic: [Announcement] Avalon ASIC Development Status [Batch #1] - page 18. (Read 155278 times)

sr. member
Activity: 336
Merit: 251
Avalon ASIC Team
PS: If i find a way to make proper cooling for asic unit and there is a hack (which will come with a time)  so that ASIC can be plugged in regular Linux PC i will take that route personally. But this option will not be available from day one though.

Why not, take out the USB cord and plug it into your computer instead, not sure if you want to do this though considering openWRT runs the miners just fine. e.g. cgminer. not to mention there are other airflow reasons you do not want to change the unit, unless you want to run a USB cord out of the unit to an external PC for example.

I have a few TL-WR703Ns around already, with their 4MB of flash.  I suppose I could add USB storage if more is needed.

the openWRT controller is very similar to the TL-WR703n, you can plug in a USB storage if you want.

My guess is they'll have a configuration page where you can make edits to the parameters passed to cgminer. I do hope they'll let us ssh into it. If the past firmwares for OpenWRT for Icarus mining is any indication, ngzhang has always went the "be more open" route. I wonder if xiangfu is still around, he had a custom firmware for the 1043ND I used that I really enjoyed.

We are working with xiangfu, and you are correct. We have always practiced what we preach when it comes to open source. This time will be no different.
donator
Activity: 1419
Merit: 1015
I have a few TL-WR703Ns around already, with their 4MB of flash.  I suppose I could add USB storage if more is needed.

I guess it's possible to solo mine without having to download and store the entire blockchain somewhere.  I've just never seen it done.  I imagine Avalon will just hook up to a pool server or a regular bitcoind node, and not by itself be capable of solo mining, correct?

My guess is they'll have a configuration page where you can make edits to the parameters passed to cgminer. I do hope they'll let us ssh into it. If the past firmwares for OpenWRT for Icarus mining is any indication, ngzhang has always went the "be more open" route. I wonder if xiangfu is still around, he had a custom firmware for the 1043ND I used that I really enjoyed.
legendary
Activity: 966
Merit: 1000
I have a few TL-WR703Ns around already, with their 4MB of flash.  I suppose I could add USB storage if more is needed.

I guess it's possible to solo mine without having to download and store the entire blockchain somewhere.  I've just never seen it done.  I imagine Avalon will just hook up to a pool server or a regular bitcoind node, and not by itself be capable of solo mining, correct?
legendary
Activity: 1610
Merit: 1000
Yes, work is progressing on this end, I think I'll get you guys an update within this weekend or so. If anything we will release the openWRT image first and release the updated miner later. but it is not really any different than other Atheros AR7240 CPU, Atheros AR9331 Chipset routers.

Any update on avalon flash (will be good also 4/8/16 MB)/RAM(16/32/64/128) memory size (i guess there will be no HDD inside Smiley

The best will be flash > 4 and RAM >=64 personal opinion. With 8 Flash and 64 or 128 RAM we will be able to do whatever we like install php,web server, vpn or whatever comes handy
10X
PS: If i find a way to make proper cooling for asic unit and there is a hack (which will come with a time)  so that ASIC can be plugged in regular Linux PC i will take that route personally. But this option will not be available from day one though
legendary
Activity: 4466
Merit: 1798
Linux since 1997 RedHat 4
Hurry up Avalon !  

ASICMiner has got their ASIC chip. Check this please:  https://bitcointalk.org/index.php?topic=91173.msg1422891#msg1422891

Huh, only just now? I thought they taped out early as September, we actually expected them to be online and mining since November. Looks like we over estimated all of our competition.

If my memory is correct, Team Avalon said, that they will release system image for their hardware at the end of December. End of Dec. is here...

Yes, work is progressing on this end, I think I'll get you guys an update within this weekend or so. If anything we will release the openWRT image first and release the updated miner later. but it is not really any different than other Atheros AR7240 CPU, Atheros AR9331 Chipset routers.
Will the USB be like Icarus (PL2303) or something else?
I've been (very slowly) rewriting the cgminer drivers to use libusb, done MMQ, working on BFL, then if there is also a need Icarus.
The new ASIC drivers will of course be based on these USB versions (when I write them)

kano,

if I'm not wrong, Avalon, is a stand alone unit with an ethernet connection, it is not externally controlled by a miner.

spiccioli

No, it's got a miner in it already Smiley Which is most likely to be cgminer.
legendary
Activity: 1378
Merit: 1003
nec sine labore
Hurry up Avalon !  

ASICMiner has got their ASIC chip. Check this please:  https://bitcointalk.org/index.php?topic=91173.msg1422891#msg1422891

Huh, only just now? I thought they taped out early as September, we actually expected them to be online and mining since November. Looks like we over estimated all of our competition.

If my memory is correct, Team Avalon said, that they will release system image for their hardware at the end of December. End of Dec. is here...

Yes, work is progressing on this end, I think I'll get you guys an update within this weekend or so. If anything we will release the openWRT image first and release the updated miner later. but it is not really any different than other Atheros AR7240 CPU, Atheros AR9331 Chipset routers.
Will the USB be like Icarus (PL2303) or something else?
I've been (very slowly) rewriting the cgminer drivers to use libusb, done MMQ, working on BFL, then if there is also a need Icarus.
The new ASIC drivers will of course be based on these USB versions (when I write them)

kano,

if I'm not wrong, Avalon, is a stand alone unit with an ethernet connection, it is not externally controlled by a miner.

spiccioli
legendary
Activity: 4466
Merit: 1798
Linux since 1997 RedHat 4
Hurry up Avalon ! 

ASICMiner has got their ASIC chip. Check this please:  https://bitcointalksearch.org/topic/m.1422891

Huh, only just now? I thought they taped out early as September, we actually expected them to be online and mining since November. Looks like we over estimated all of our competition.

If my memory is correct, Team Avalon said, that they will release system image for their hardware at the end of December. End of Dec. is here...

Yes, work is progressing on this end, I think I'll get you guys an update within this weekend or so. If anything we will release the openWRT image first and release the updated miner later. but it is not really any different than other Atheros AR7240 CPU, Atheros AR9331 Chipset routers.
Will the USB be like Icarus (PL2303) or something else?
I've been (very slowly) rewriting the cgminer drivers to use libusb, done MMQ, working on BFL, then if there is also a need Icarus.
The new ASIC drivers will of course be based on these USB versions (when I write them)
sr. member
Activity: 336
Merit: 251
Avalon ASIC Team
Hurry up Avalon ! 

ASICMiner has got their ASIC chip. Check this please:  https://bitcointalksearch.org/topic/m.1422891

Huh, only just now? I thought they taped out early as September, we actually expected them to be online and mining since November. Looks like we over estimated all of our competition.

If my memory is correct, Team Avalon said, that they will release system image for their hardware at the end of December. End of Dec. is here...

Yes, work is progressing on this end, I think I'll get you guys an update within this weekend or so. If anything we will release the openWRT image first and release the updated miner later. but it is not really any different than other Atheros AR7240 CPU, Atheros AR9331 Chipset routers.
full member
Activity: 137
Merit: 100
Hurry up Avalon ! 

ASICMiner has got their ASIC chip. Check this please:  https://bitcointalksearch.org/topic/m.1422891
full member
Activity: 196
Merit: 100
Wasn't there to be a demo of some sort also?

Can't wait for it.

Yep. If they're on track, we should expect it within the next week or so.
legendary
Activity: 966
Merit: 1000
Wasn't there to be a demo of some sort also?

Can't wait for it.
legendary
Activity: 1610
Merit: 1000
If my memory is correct, Team Avalon said, that they will release system image for their hardware at the end of December. End of Dec. is here...
+1
You memory is perfectly correct:)

I would like to have it with quick build instructions how to build it from cvs open-wrt
hero member
Activity: 798
Merit: 1000
If my memory is correct, Team Avalon said, that they will release system image for their hardware at the end of December. End of Dec. is here...
mem
hero member
Activity: 644
Merit: 501
Herp Derp PTY LTD
Survival of the most competent!? Wait, where does that leave Inaba?

It obviously leaves me so far ahead of you that you can't even see me flash my ass at you.


It leaves him gnashing his teeth, poor guy So much ego and so little to justify it.
legendary
Activity: 2128
Merit: 1068
Put it through a pipeline the same length as the main calculation - no subtraction at all!
I know that the above was meant to be a joke, but it helps to explain some salient choices that the designer has to make.

Most of the FPGA designers for Bitcoin hashing used the XC6SLX150 chip that has about 150k "gates" and costs about $200.

hardcore-fs is working on XC5VLX110T chip that has about 110k "gates" and costs about $2000.

So where's the catch? Spartan-6 has much less "wires" than Virtex-5, the designs on Spartan-6 are quite oftern routing-constrained: there is enough "gates", but not enough "wires" to connect them. And even if there is enough "wires" then the gate interconnections may be longer and slower than in a design that uses less "gates".

Check out the extreme example of the routing-resource limitation: eldentyrell started working on his "hand-placed, auto-routed" design in October'11. He complained about auto-routing failing and being forced to hand-route until about March'12 when he disclosed that he started using DSP slices for some adders to relieve the congestion of the routing for the general-purpose SLICEs.

https://bitcointalksearch.org/topic/m.793740

The very same conceptual limitations will apply to the ASIC synthesis. One can spend a lot of time optimizing performance for the particular design flow. Or one can accept most of the default choices to optimize the time it takes to start the manufacturing.
full member
Activity: 196
Merit: 100
Therefore you have ATLEAST 120 clk cycles to calculate the nonce correction (subtraction), before it is needed (if at  all)

Put it through a pipeline the same length as the main calculation - no subtraction at all!
full member
Activity: 196
Merit: 100
These chips crunch near a billion hashes per second.  Losing a small handful of those each second is miniscule.

Mine along on your CPU if you wanna make up the difference and then some.
I get a feeling that a longer explanation is required for those unfamiliar with digital logic design.

The issue isn't really about losing one in billions of hashes. It is about gaining the timing margin (a.k.a. overclocking headroom) in the design.

Of course Avalon's logic is secret, but I'm going to discuss the problem based on one of the open-source FPGA hashers. It had a critical timing path in the logic that latched the "golden nonce". Since the design was 125-deep pipelined it had a hardware that subtracted constant 125 from the nonce counter before sending it out of the chip.

Now we have two ways to speed up the above design:

1) remove the 32-bit wide constant subtractor. This will gain a fraction of a nanosecond on every hash tried. It is very easy to subtract 125 in software from the nonce downloaded from the chip.

2) acknowledge that the timing violation may occur and the nonce latched may not be the exact one that solved the block, but a next one or previous one, depending on the details of the latching logic. It is somewhat more involved, but still easily doable in software: recompute the hashes for nonce values n-126,n-125,n-124 and use the one that solved the block. Again this will make the design more tolerant to overclocking for every hash tried inside the chip.

Obviously 1) cannot be applied to the ASIC chip or closed-source FPGA bitstream. But the method 2) remains applicable, just use a different set of test values.


Since it's a pipelined design, wouldn't removing the subtractor just reduce the latency of the pipeline instead of increasing the throughput?
Even if this subtractor would prevent the re-loading of the pipeline than you could pipeline the pipeline and the subtractor.
Since the pipeline will not (i presume) produce a nounce to be latched on every clock you have more than enough time to store the previous nounce on chip and subtract the number before sending it out to the controller.
At least i would make my 'store' circuit parallel to the actual pipeline so it can operate asynchonously.


for Christ sake.
Why the hell do people assume you need to do a subtraction when  a 'nonce' is found, this is C programming at its worse, by people incapable of thinking in parallel.
Once again for the noobs:
The nonce is calculated BEFORE the SHA256(SHA256(x)), the product of this function is what is evaluated and dictates IF the nonce is a golden value.
Therefore you have ATLEAST 120 clk cycles to calculate the nonce correction (subtraction), before it is needed (if at  all)

The subtraction is only an issue for people that should not be programming logic chips in the first place.

If you have unused gates just "sitting around", please use them.
420
hero member
Activity: 756
Merit: 500
hero member
Activity: 840
Merit: 1000
Since it's a pipelined design, wouldn't removing the subtractor just reduce the latency of the pipeline instead of increasing the throughput?
The overall speed of a pipelined logic design is limited by the speed of the slowest stage. In the design I mentioned the last pipeline stage did what every other stage did plus it did the zero comparator, subtractor and a latch.
Even if this subtractor would prevent the re-loading of the pipeline than you could pipeline the pipeline and the subtractor.
Since the pipeline will not (i presume) produce a nounce to be latched on every clock you have more than enough time to store the previous nounce on chip and subtract the number before sending it out to the controller.
At least i would make my 'store' circuit parallel to the actual pipeline so it can operate asynchonously.
I don't think you've ever tried to use Xilinx ISE or something similar.
true enough..

Quote
The problem isn't: come up with a different, potentially faster design. The problem is: come up with a working design, the one that the available tools will be capable of synthesizing, and placing/routing sensibly. The overall structure of SHA-2 (which makes every output bit depend on every input bit in each round) is apparently hitting some worst case behavior in the Xilinx toolchain. It takes close to a full day to run a single full implementation. And in many cases the the toolchain either fails to converge to a working implementation or converges to something shamefully inefficient.
lol.,  sorry i even mentioned it.. And no way to work around this?
Quote
On this board 2^256 is frequently thrown around as a number so high that nobody will be able check all of them. Compare this with the work demanded from the Xilinx placing tool: 23038 SLICEs in XC6SLX150 can be permuted in 23028! ways (I'm making a gross simplification of the "place" step) which is about 10^90499. Obviously all digital synthesis tools have to take some heuristic shortcuts through that vast space of available solutions.
Well, thats why you program that thing, right?. The information you give the synthesis tools reduces this space by very much.

Anyway, it would be too off-topic to discuss it here. I take it it is not an easy task.

Quote
So the human art required from the designer is to figuratively take the poor toolchain by the hand an lead it/them to some safe place.

The blackbox art of FPGA programming... Smiley
legendary
Activity: 2128
Merit: 1068
Since it's a pipelined design, wouldn't removing the subtractor just reduce the latency of the pipeline instead of increasing the throughput?
The overall speed of a pipelined logic design is limited by the speed of the slowest stage. In the design I mentioned the last pipeline stage did what every other stage did plus it did the zero comparator, subtractor and a latch.
Even if this subtractor would prevent the re-loading of the pipeline than you could pipeline the pipeline and the subtractor.
Since the pipeline will not (i presume) produce a nounce to be latched on every clock you have more than enough time to store the previous nounce on chip and subtract the number before sending it out to the controller.
At least i would make my 'store' circuit parallel to the actual pipeline so it can operate asynchonously.
I don't think you've ever tried to use Xilinx ISE or something similar. The problem isn't: come up with a different, potentially faster design. The problem is: come up with a working design, the one that the available tools will be capable of synthesizing, and placing/routing sensibly. The overall structure of SHA-2 (which makes every output bit depend on every input bit in each round) is apparently hitting some worst case behavior in the Xilinx toolchain. It takes close to a full day to run a single full implementation. And in many cases the the toolchain either fails to converge to a working implementation or converges to something shamefully inefficient.

On this board 2^256 is frequently thrown around as a number so high that nobody will be able check all of them. Compare this with the work demanded from the Xilinx placing tool: 23038 SLICEs in XC6SLX150 can be permuted in 23028! ways (I'm making a gross simplification of the "place" step) which is about 10^90499. Obviously all digital synthesis tools have to take some heuristic shortcuts through that vast space of available solutions.

So the human art required from the designer is to figuratively take the poor toolchain by the hand an lead it/them to some safe place.
Pages:
Jump to: