Author

Topic: Klondike - 16 chip ASIC Open Source Board - Preliminary - page 145. (Read 435369 times)

sr. member
Activity: 350
Merit: 250
Bkk, query regards pcb plz

How many layers is it ?
And since avalon chips pass heat underneath (docs state there is an air gap in the chip), do we need to user something like :-
http://www.dkthermal.co.uk/index.php?option=com_content&view=article&id=49&Itemid=63
for the pcb manufacture.....?

Also would it be worth adding some small heat fins on the underside, and mount vertically as avalon do their boards ?

cheers

Please read through the thread.

4 layer board with VIAS in the pcb to transfer heat to the heat sink.
Heat sinks to be determined he has a few sample heat sinks coming his way and will test them and report back.



yeah somehow I convinced myself it was 6 layer....this thread moves so fast I cant wade through all the posts after I go to sleep :/

thx for reply...
hero member
Activity: 924
Merit: 1000
Bkk, query regards pcb plz

How many layers is it ?
And since avalon chips pass heat underneath (docs state there is an air gap in the chip), do we need to user something like :-
http://www.dkthermal.co.uk/index.php?option=com_content&view=article&id=49&Itemid=63
for the pcb manufacture.....?

Also would it be worth adding some small heat fins on the underside, and mount vertically as avalon do their boards ?

cheers

Please read through the thread.

4 layer board with VIAS in the pcb to transfer heat to the heat sink.
Heat sinks to be determined he has a few sample heat sinks coming his way and will test them and report back.

Check the design out at the git hub for details. https://github.com/bkkcoins/klondike
sr. member
Activity: 350
Merit: 250
Bkk, query regards pcb plz

How many layers is it ?
And since avalon chips pass heat underneath (docs state there is an air gap in the chip), do we need to user something like :-
http://www.dkthermal.co.uk/index.php?option=com_content&view=article&id=49&Itemid=63
for the pcb manufacture.....?

Also would it be worth adding some small heat fins on the underside, and mount vertically as avalon do their boards ?

cheers
member
Activity: 80
Merit: 10
I think what we need is a 90 x 80 mm maybe 1.5 inches height and fins in one direction. Drilled in 4 places to accommodate 4-40 screws.
Also if the fins are narrow then drill the fins to accept a spring and long bolt on that side, a bit wider than bolt. The heat sink should probably not be tightened directly to the board but have springs for pressure, much like CPU heat sinks usually have. I think wider fin spacing is likely better anyway as it will allow better air flow. I'm not an expert on thermals but I suspect that narrow fin spacing are better suited to convection (passive) use.

Ideally there would be a group of holes or non-thru hole dents made at the PCIe power connector location to allow for pin protrusion. I've allowed for an alternate Phoneix SMD power connector but that is not as standard and easy to use a solution as a PCIe connector. So if custom heat sink machining is done it makes sense to include this.
So in batches of 200 pcs
Price will be 3.62 EUR , including profile 35 mm x 165 mm, machining of holes and release for spring clamping and power connector nest, surface machining on the back.
As soon I get some info I asked BKK, I'll provide CAD of the heat sink.
Price is based on available profile here on stock.


If I'm reading that correctly...35mm width x 35mm length x 165mm height ~(1.4 x 1.4 x 6.5 inches)? Made of aluminum? Where do I sign up for 200 of them?!? (I have 800 chips on order.)

Or is that 35x165mm to cover one side of the board (8 ASICs)

Side note...would that be one full extruded aluminum piece and NOT two-part (base with fins mechanically attached)?

 Shocked

hero member
Activity: 728
Merit: 500
I think what we need is a 90 x 80 mm maybe 1.5 inches height and fins in one direction. Drilled in 4 places to accommodate 4-40 screws.
Also if the fins are narrow then drill the fins to accept a spring and long bolt on that side, a bit wider than bolt. The heat sink should probably not be tightened directly to the board but have springs for pressure, much like CPU heat sinks usually have. I think wider fin spacing is likely better anyway as it will allow better air flow. I'm not an expert on thermals but I suspect that narrow fin spacing are better suited to convection (passive) use.

Ideally there would be a group of holes or non-thru hole dents made at the PCIe power connector location to allow for pin protrusion. I've allowed for an alternate Phoneix SMD power connector but that is not as standard and easy to use a solution as a PCIe connector. So if custom heat sink machining is done it makes sense to include this.
So in batches of 200 pcs
Price will be 3.62 EUR , including profile 35 mm x 165 mm, machining of holes and release for spring clamping and power connector nest, surface machining on the back.
As soon I get some info I asked BKK, I'll provide CAD of the heat sink.
Price is based on available profile here on stock.
full member
Activity: 176
Merit: 100
To be sure there is some risk in ordering test boards relying on software data shifting without fully testing/scoping on a proto-board first. But everyone wants to get these asap, so shaving a few weeks off first boards, and allowing more testing once they arrive, seems like the best option now.

Thanks for your answers, you've really thought this through.  I'm trying to keep to the essentials here and not contribute to the noise.

It would be great to put a logic analyser on an Avalon to mitgate this risk but I guess the chances are slim....
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
That makes sense.
So last chip in the chain starts to work after it receives its nonce.
How does the ASIC tell the difference between receiving its nonce and new work coming down.
I think once it has a nonce it starts work. The passing thru of data is a independent process. This means the chips would start work slightly staggered in time. New work always follows an IDLE period (both data lines high), nonce data is still mid-stream with one or both lines low.

It's probable that it hashes on partially shifted nonces. It takes more effort to sync the hashing than to just ignore invalid hashes. If the relayed config data follows input by 32 bits, then when the nonce is "in", it can "bypass" the relay input stream to skip the nonce. Since each chip is identical and has no sync frame data between chips you have to assume they operate "free running", sync'd only by the initial config start bit (both lines go low), but delayed by 32 bits from the chip ahead of it.

I am assuming that since both CONFIG lines can be held low to indicate ongoing data (not idle) that clock stretching will work ok. If this isn't true then software based shifting will be a problem. ie. it's not really asynchronous data, since each data cell is clocked. It kind of makes no sense to use two data lines and then be dependent on fixed timing, ignoring the clocking.

When all the data is finished the data lines both return to 1 indicating idle. Either that triggers hashing start or they've already been running all along with spurious output ignored.

To be sure there is some risk in ordering test boards relying on software data shifting without fully testing/scoping on a proto-board first. But everyone wants to get these asap, so shaving a few weeks off first boards, and allowing more testing once they arrive, seems like the best option now.

sr. member
Activity: 378
Merit: 250
BkkCoins
Do you think BitSyncom has provided enough information for you to shift in the necessary data to get the K16 running?
Do you have to shift in multiple copies of the hash data and clock configuration?
Are you going to use the USART to shift data into the ASIC chain?
Thanks Smiley
It looks like enough info, though it's a bit slim I think it will do. Unless actual performance is much different from specs, then it may be challenging.

One copy of config + data plus one nonce per chip. I believe the way this works is each chip passes thru config data, keeps first nonce and passes remaining nonce data. It's not documented but you can infer that behaviour. This also implies non-rigid timing as chips not first in the chain would have timing holes.

I'll be using software to send the data but using the USART to receive data. This is because I have 2 chains to feed and even with the USART it would require extra circuitry. I'm expecting the ASIC data input is not so sensitive to timing as it has the clock in the data, which I control. I send almost the same data to both chains with only the nonces being different between them.


That makes sense.
So last chip in the chain starts to work after it receives its nonce.
How does the ASIC tell the difference between receiving its nonce and new work coming down.
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
BkkCoins
Do you think BitSyncom has provided enough information for you to shift in the necessary data to get the K16 running?
Do you have to shift in multiple copies of the hash data and clock configuration?
Are you going to use the USART to shift data into the ASIC chain?
Thanks Smiley
It looks like enough info, though it's a bit slim I think it will do. Unless actual performance is much different from specs, then it may be challenging.

One copy of config + data plus one nonce per chip. I believe the way this works is each chip passes thru config data, keeps first nonce and passes remaining nonce data. It's not documented but you can infer that behaviour. This also implies non-rigid timing as chips not first in the chain would have timing holes.

I'll be using software to send the data but using the USART to receive data. This is because I have 2 chains to feed and even with the USART it would require extra circuitry. I'm expecting the ASIC data input is not so sensitive to timing as it has the clock in the data, which I control. I send almost the same data to both chains with only the nonces being different between them.

sr. member
Activity: 448
Merit: 250
Do you have to shift in multiple copies of the hash data and clock configuration?

No, you only shift in one copy of the clock data and hash data and then one word per chip for the starting nonce for that chip. This would be max(2^32)/numchips.
sr. member
Activity: 378
Merit: 250
BkkCoins
Do you think BitSyncom has provided enough information for you to shift in the necessary data to get the K16 running?
Do you have to shift in multiple copies of the hash data and clock configuration?
Are you going to use the USART to shift data into the ASIC chain?
Thanks Smiley
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
I don't think bus contention on the report pins is an issue since each chip should be working on separate nonce ranges.

I've seen four valid results from one piece of work on my fpgas.  Multiple results are common in observation, I added a result fifo specifically to benefit from this (was more an issue over slow async).
It's still not clear from docs how quickly the results are shifted out. It's no doubt derived form the main clock but whether it's tightly set at 4Mbps or not I don't know. That's the suggested timing from the protocol diagram but no comment on how it's derived from the main clock. If we fed 16MHz as main clock and used the PLL to scale up appropriately, which seems to be within specs, would the shift rate halve? Until some tests are done, or more docs come out we can't be sure.
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
W - start work (the meaning of life... )
I'm surprised the avalon asic provides no indication that it has finished searching the problem space.  Without this the s/w must estimate the work time and guess when to send new work.

In my miners  I use a single entry fifo in h/w to store the next piece of work, once the range has been exhausted the hw pops the next piece of work and starts working, the sw then reloads the fifo.  The fifo is flushed by the sw using long polling when a new block is found to reduce stale results.

Perhaps later on we can do the speed estimating in the PIC to reduce the load on the usb connection.
I think most likely in the PIC I'll set a timer tick interrupt and count them to determine when to feed new work. I can queue work to try and reduce lost cycles. I wasn't surprised that it doesn't know when to stop as that was how Icarus worked as well. I should be able to get very close so that the  lost time is predominantly the time to shift new work in.
full member
Activity: 176
Merit: 100
W - start work (the meaning of life... )
I'm surprised the avalon asic provides no indication that it has finished searching the problem space.  Without this the s/w must estimate the work time and guess when to send new work.

In my miners  I use a single entry fifo in h/w to store the next piece of work, once the range has been exhausted the hw pops the next piece of work and starts working, the sw then reloads the fifo.  The fifo is flushed by the sw using long polling when a new block is found to reduce stale results.

Perhaps later on we can do the speed estimating in the PIC to reduce the load on the usb connection.
full member
Activity: 176
Merit: 100
I don't think bus contention on the report pins is an issue since each chip should be working on separate nonce ranges.

I've seen four valid results from one piece of work on my fpgas.  Multiple results are common in observation, I added a result fifo specifically to benefit from this (was more an issue over slow async).
full member
Activity: 121
Merit: 100
I don't think bus contention on the report pins is an issue since each chip should be working on separate nonce ranges.
member
Activity: 117
Merit: 10
couldn't wait to see his work going live with the chip and pcb working nicely together .. anyone knows if Avalon going to send the BOM and communication protocol soon. Yifu, if you are reading please help out we are buying a lot of chips from you  Smiley Smiley
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
How did you solve the precomputation?

Can't this still be done in the driver like with the Avalon miner?
Yes, this has to be done in the driver. One of the beauties of open source is we don't have to solve every problem as if it's new.

So I'll be using what I can from the Avalon driver, and modifying it to use a more cmd/reply protocol over the USB. The main reason to change anything is that it could reduce data transfer demands. Since we have a CPU on each board we don't need to repeat redundant data.

Firmware Overview

The firmware works with simple cmds for which I've already coded a skeleton. It's a compromise between readable text and compactness. I could have gone totally binary but I'd like to be able to see USB/I2C data if I need to spy on it for debugging.

[cmd char][address- 3 digits][binary data]
eg.
W003blablahdataherecannotread...

Current cmds I have laid out:

W - start work (the meaning of life... )
E - enable/disable work (for reducing power draw when no work available)
S - get status (eg. fan speed, temperature, chip count, board count, work state)
T - set temperature limit (a safe default will be coded)
F - firmware upgrade (send new program code, don't worry will need handshake to confirm)
(maybe more as needed)

The address is always 000 for the primary USB board, and >= 001 for subsequent chained boards. For chained boards it simply relays the cmd to the I2c bus.

The primary board auto-detects what chained boards are present using "serial# as address" arbitration and assigns a unique I2C 7 bit address. So the possible range for destinations the host can talk to is 000-112 (some I2C addresses are reserved). This would be the max boards per chain. I didn't go with 10 bit addressing as it would probably be beyond the limits of what even 400kbps could support on I2C, and at first I'll be working with 100kbps until there's time to test faster. And there's no real need either - at the USB level having 100+ boards on each port of a hub is economically cheap, and more flexible.

Each PIC also auto-detects how many chips are on board and adjusts it's work splitting to match. The number of chips is relayed as status info for the host but the host doesn't really need to know - more for user overview, monitoring problems.

Just throwing this out there so everyone knows what I'm doing next: filling in the code needed to support the cmds above.

Someone told me there is a risk that the more chips are on a bus it could lead to collisions when 2 chips send a result at the same time. Is this risk valid?
Yes, remotely possible. Not boards per chain, but chips per board. I split the 16 into 2 banks primarily to reduce work load time but also it helps with this. What happens in the case of collision depends on how smart the ASIC reporting circuit is. It uses a wired-OR output logic, weak pull-up resistors, so that it is possible to detect a collision and back off. I don't know if it does that - probably not. A collision would be very rare and the only loss would be one share (even more rare to have a collision with a winning share).

The wired-OR scheme allows for arbitration of the data (it's benefit over a tri-state bus), and the ASIC could detect a collision, but it may not bother due to how rare it would be. If it detects and backs off then no data corruption occurs, but if it doesn't back off then corruption can occur.
legendary
Activity: 2674
Merit: 1083
Legendary Escrow Service - Tip Jar in Profile
Someone told me there is a risk that the more chips are on a bus it could lead to collisions when 2 chips send a result at the same time. Is this risk valid?
full member
Activity: 176
Merit: 100
How did you solve the precomputation?

Can't this still be done in the driver like with the Avalon miner?
Jump to: