Pages:
Author

Topic: Klondike - 16 chip ASIC Open Source Board - Preliminary - page 77. (Read 435386 times)

legendary
Activity: 4634
Merit: 1851
Linux since 1997 RedHat 4
usart timing is at page 375 of the pic datasheet - not what you searched in detail -but still interesting.

page 270 and page 271:
Maximum set-able baud-rate:
Desired Baud Rate = FOSC / (4 * (SPBRGH:SPBRGL +1))

48MHZ / 4 = 12MHz - thats "fast"
Ok. So we assuming we refill the UART close to optimally then we could probably get the dead zone to (32+8+16)/12 = 5, say 6uS maybe. Which should be a very, very tiny loss in nonce data, maybe around 0.0000015/5 = 0.0000003%. And maybe double if your average work has 2 nonces but it seems like most have 1 or 2, and a few have 3.

So it's probably not worth delaying nonces to avoid the dead zone. You could lose more due to block change if delayed. The main thing is it completely frees up the IRQ timing constraints.
OK that's sounds good Smiley

Your % aren't quite correct (when you divide the work up, the range for each is of course the division, not 2^32), but the numbers are small enough that indeed it appears it's better to lose a nonce once in a while than slow things down by more than that resolving the problem.

Average nonce per 2^32 is of course 1.
However, 4 isn't all that rare (hmm I think I'll add a stats counter for the BFLSC to see how often >1 does happen - that one's the easiest to do it)
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
usart timing is at page 375 of the pic datasheet - not what you searched in detail -but still interesting.

page 270 and page 271:
Maximum set-able baud-rate:
Desired Baud Rate = FOSC / (4 * (SPBRGH:SPBRGL +1))

48MHZ / 4 = 12MHz - thats "fast"
Ok. So assuming we refill the UART close to optimally then we could probably get the dead zone to (32+8+16)/12 = 5, say 6uS maybe. Which should be a very, very tiny loss in nonce data, maybe around 0.0000015/5 = 0.0000003%. And maybe double if your average work has 2 nonces but it seems like most have 1 or 2, and a few have 3.

So it's probably not worth delaying nonces to avoid the dead zone. You could lose more due to block change if delayed. The main thing is it completely frees up the IRQ timing constraints.
newbie
Activity: 24
Merit: 0
usart timing is at page 375 of the pic datasheet - not what you searched in detail -but still interesting.

page 270 and page 271:
Maximum set-able baud-rate:
Desired Baud Rate = FOSC / (4 * (SPBRGH:SPBRGL +1))

48MHZ / 4 = 12MHz - thats "fast"

hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
Sigh, this is (sort of) what the Icarus bitstream does also - if 2 nonces arrive at the same time, you lose one.
The only catch of course is that "at the same time" can be a long time frame with regards to hashing nonces ... which increases how often it happens.
Hopefully here "at the same time" is a VERY small window.

Just roughly figured this...

Assuming we accept a dead zone then reading by the PIC can be at it's full speed using the UART input in master mode. I didn't check but maybe 1 MHz is feasible, so 32 uS roughly for dead zone, plus re-arm time, say 8uS, so say 40uS. Since a nonce takes 9.1 uS (@450 MHz clk / 128 output speed), that means potentially the 4 nonces after a result nonce are dead. But nonces could occur on any of 16 chips so that's x16 = 64 nonces could be found during a dead zone making the probability 64/2^32 of that happening, ie. 0.0000015%. Something most users can live with in terms of income loss.

Not sure if that's right but my first look at it.

Also, I'm not sure what the max data rate on the UART is. It's not in the timing specs for the PIC. If it can handle 4 MHz then the dead zone is more like 12uS. The firmware gains mostly because it has control of handling data rather than being at the mercy of random ASIC output. Multiple nonces could be received by the SRAM when armed but when the PIC gets around to handling them the dead zone would occur.
legendary
Activity: 4634
Merit: 1851
Linux since 1997 RedHat 4
the dead-zone could be avoided if the nonce is read from the sram and then rearm the sram.
only after this send the next hash to the asic.
(and not much "time" is lot as not every hash gives a nonce?)

I will give it a try (already ordered the part)
Just realized the CS line probably needs to be used as well, though maybe it can be somehow merged with OE.

I'm not sure a dead zone after each nonce is a problem anyway as the chances of two nonces being sequentially close is quite low. The probability is likely low enough that losing the second nonce would have no noticeable effect on overall performance.

However, if the work units are fast (16 chips is pretty fast, 0.9 secs), then the delay of waiting for new work to grab nonces isn't long. The SRAM can capture all the nonces sequentially for a work unit. Then just before pushing new work we read the SRAM and push new work. That way the deadzone happens during duplicate nonce time which are ignored anyway. So the tick counter would trigger nonce read and then work push. The nonces can be read until a zero word is read, and then rewritten with zeros, and armed for write.

Sigh, this is (sort of) what the Icarus bitstream does also - if 2 nonces arrive at the same time, you lose one.
The only catch of course is that "at the same time" can be a long time frame with regards to hashing nonces ... which increases how often it happens.
Hopefully here "at the same time" is a VERY small window.
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
the dead-zone could be avoided if the nonce is read from the sram and then rearm the sram.
only after this send the next hash to the asic.
(and not much "time" is lot as not every hash gives a nonce?)

I will give it a try (already ordered the part)
Just realized the CS line probably needs to be used as well, though maybe it can be somehow merged with OE.

I'm not sure a dead zone after each nonce is a problem anyway as the chances of two nonces being sequentially close is quite low. The probability is likely low enough that losing the second nonce would have no noticeable effect on overall performance.

However, if the work units are fast (16 chips is pretty fast, 0.9 secs), then the delay of waiting for new work to grab nonces isn't long. The SRAM can capture all the nonces sequentially for a work unit. Then just before pushing new work we read the SRAM and push new work. That way the deadzone happens during duplicate nonce time which are ignored anyway. So the tick counter would trigger nonce read and then work push. The nonces can be read until a zero word is read, and then rewritten with zeros, and armed for write.
newbie
Activity: 24
Merit: 0
the dead-zone could be avoided if the nonce is read from the sram and then rearm the sram.
only after this send the next hash to the asic.
(and not much "time" is lot as not every hash gives a nonce?)

I will give it a try (already ordered the part)
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
regarding cpld:
There are no small pincount but many macrocell cplds available.
The smallest I have found is from latice with QFN 32, that has enough space to fit a i2c slave and the receive logic.

But if it's possible to go without cpld it would be the better way - but will need some rework of the isr handling.
I have another idea I wanted to test but due to work load  and how long it takes me to get chips here I haven't pursued it. I wanted to use an 8-pin serial SRAM and 4 normal I/O pins on the PIC. In this setup the ASIC output needs to go thru NOR gate and Schmitt input tristate buffer with OE hooked to PIC.

SI - goes to PIC and output of RES_N buffer
SO - goes to PIC
SCK - goes to PIC and output of NOR via buffer
OE (of buffer) - goes to PIC

In operation the PIC the disables the ASIC output and sends data to the SRAM to configure it for writing. Then it enables the ASIC output and waits, monitoring the SI line for change. When data comes from the ASIC it writes into the SRAM (sequential mode, auto increments) which can handle up to 16 MHz. Once 32 bits are written, the PIC takes control and sends signals to Read the data back, and re-init for write mode again.

This method has a brief dead zone after each nonce obviously but allows high speed capture with no timing issues, and uses only a small 8pin part with no-programming necessary. Regarding the dead zone, we have that with current method too as while the first nonce is being sent to host the PIC won't be able to respond quickly to result interrupts. The dead zone could be removed by using 2 SRAM chips but it starts getting complicated.

Here's a low cost serial SRAM - 0.57 in qty 100. That was the cheapest I found.
http://mouser.com/ProductDetail/Microchip-Technology/23K640-I-SN/?qs=sGAEpiMZZMs6Aik9Fp479oRJ8qzeMKM7vL%2fWRv1ed7o%3d

Currently, I have to re-write code so that as much as possible is handled in the main polling loop, outside ISR time. This means eg. I2C code will have to be run in the main loop but state changes are triggered by interrupt. Any interrupt needs to be only a couple uS. USB will need to use the polling method.
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
How would these K16's connect together exactly? I see the I2C pins, i've seen diagrams, but are they locked together via plastic? have to be connected via wires?
There's probably lots of ways to hook them up but the way I planned on when designing it was that boards would be side by side, either with standoffs or mounted on a larger heatsink. The klego 2x2 headers then line up next to each other and a 10 pin female header block with pins wired correctly can slip over them. The spacing from board edge is 0.127 so that both boards touching would be a 1 pin gap.

Also, I was going to make a touch screen 3.2" 320x240 display by interfacing it to I2C and having it sit on the bus as monitor/control device. But it could just as easily be interfaced to the RasPi and probably is more generically useful. The one I have here is below but I haven't had time to actually hook it up and write software to talk to it. It only cost $12 and just needs an 8 bit interface. Has touch controller and SD card on board. So much to do and no time. I could see this on the front of a rack or enclosure for a K16 array.

http://www.ebay.com/itm/3-2-inch-TFT-LCD-module-Display-touch-panel-SD-card-240x320-than-128x64-lcd-/200908823757?pt=LH_DefaultDomain_0&hash=item2ec7195ccd
sr. member
Activity: 1316
Merit: 254
Sugars.zone | DatingFi - Earn for Posting
That is a sidetrack. Start a new thread about the bitfury.

Bkk brought it up not me Cheesy I'm trying to remember what was the reason for delaying the signal? I haven't looked into the communication protocol too much I just glanced over it. Is the plan to use two nor gates on the final design? I haven't looked at any of the updated files I may do that now.

Tongue

I know.

Damn him still drooling.

I am more interested in what hub I can bundle with a Raspberry Pi and still overclock the K1 Nanos what is possible because I have giant tub of mineral oil, a pretty heat exchanger, cooling tower and some pumps and a beautiful baby blue fiberglass tank I want to dunk them in.

Yeah did you end up posting that complete mineral oil solution you've got there in that other thread? I'm interested in knowing the parts I'd need.

Also, I plan on putting together a guide on turning the RPI into a miner host, with the added bonus of a $20 RGB 128x64px LCD for monitoring, with keypad. Its Gonna be a fun project while I wait for this board to finish Smiley

I saw that RPI miner host project - looks like a blast. I was just filling my shopping cart on adafruit for that last night. Gotta go back and order it Smiley
http://learn.adafruit.com/piminer-raspberry-pi-bitcoin-miner/

I was inspired by that project as well, however I will be using Graphic ST7565 Negative LCD (128x64) http://www.adafruit.com/products/438 instead. I want to be able to display moving graphs among other things Smiley

Hold fire boys, check this thread before buying........

https://bitcointalksearch.org/topic/m.2671338

Kano`s on the job   Wink
newbie
Activity: 24
Merit: 0
regarding cpld:
There are no small pincount but many macrocell cplds available.
The smallest I have found is from latice with QFN 32, that has enough space to fit a i2c slave and the receive logic.

But if it's possible to go without cpld it would be the better way - but will need some rework of the isr handling.

legendary
Activity: 2126
Merit: 1001
GREAT NEWS!

..just came home from the weekend, reading this!
You are awesome! :-)

Ente
cp1
hero member
Activity: 616
Merit: 500
Stop using branwallets
Electrically there must be some sort of header and cable.  Physically there's a standoff.
sr. member
Activity: 249
Merit: 250
How would these K16's connect together exactly? I see the I2C pins, i've seen diagrams, but are they locked together via plastic? have to be connected via wires?
sr. member
Activity: 297
Merit: 250

It's because there is no clock signal from the ASIC, but the clock is implicit in the data. So to allow a UART to capture it rather than using a CPLD or FPGA I use a single gate to extract the clock. The delay is so the PIC has enough setup time between the clock and data. Ultimately a CPLD would allow capturing at higher rates and then the PIC could clock the data in as slow as it needs. A more complicated capture could be designed in a CPLD but then we also have another device that needs programming, and you get into another programming tool and using the HDL environment etc. So I was trying to keep it simple as there's already enough to do.


I like your design. It's simple and works good. 60MHZ span is good enough for most situations where the best hashing rate is selected for long run. Geeks can change the capacitor any way.

You use two NOR gates and a RC circuit to delay the clock and sharp the edge. I didn't verify it but I think you can move the RC circuit to PIN-1 of the first NOR gate (delay the signal at PIN-1) and get rid of the second NOR gate. It may work and be simpler.
hero member
Activity: 924
Merit: 1000
Great job! I'm following this thread semi-closely and things seem to be going well. The small hashrate increase is a nice bonus.

Bkk, I want to thank you again for all the time and energy you are investing in this. One question... do you still plan on offering the boards and parts for sale once a stable design is completed?
sr. member
Activity: 249
Merit: 250
That is a sidetrack. Start a new thread about the bitfury.

Bkk brought it up not me Cheesy I'm trying to remember what was the reason for delaying the signal? I haven't looked into the communication protocol too much I just glanced over it. Is the plan to use two nor gates on the final design? I haven't looked at any of the updated files I may do that now.

Tongue

I know.

Damn him still drooling.

I am more interested in what hub I can bundle with a Raspberry Pi and still overclock the K1 Nanos what is possible because I have giant tub of mineral oil, a pretty heat exchanger, cooling tower and some pumps and a beautiful baby blue fiberglass tank I want to dunk them in.

Yeah did you end up posting that complete mineral oil solution you've got there in that other thread? I'm interested in knowing the parts I'd need.

Also, I plan on putting together a guide on turning the RPI into a miner host, with the added bonus of a $20 RGB 128x64px LCD for monitoring, with keypad. Its Gonna be a fun project while I wait for this board to finish Smiley

I saw that RPI miner host project - looks like a blast. I was just filling my shopping cart on adafruit for that last night. Gotta go back and order it Smiley
http://learn.adafruit.com/piminer-raspberry-pi-bitcoin-miner/

I was inspired by that project as well, however I will be using Graphic ST7565 Negative LCD (128x64) http://www.adafruit.com/products/438 instead. I want to be able to display moving graphs among other things Smiley
cp1
hero member
Activity: 616
Merit: 500
Stop using branwallets
I wonder if it's possible to do a minimal parts setup without the nor gates and just use the capture compare or interrupt on change pins to read out from the ASICs.  Not sure if they're fast enough to grab it bit by bit though.  It would be cool to have something very DIY for people with left over single chips.
sr. member
Activity: 294
Merit: 250
That is a sidetrack. Start a new thread about the bitfury.

Bkk brought it up not me Cheesy I'm trying to remember what was the reason for delaying the signal? I haven't looked into the communication protocol too much I just glanced over it. Is the plan to use two nor gates on the final design? I haven't looked at any of the updated files I may do that now.

Tongue

I know.

Damn him still drooling.

I am more interested in what hub I can bundle with a Raspberry Pi and still overclock the K1 Nanos what is possible because I have giant tub of mineral oil, a pretty heat exchanger, cooling tower and some pumps and a beautiful baby blue fiberglass tank I want to dunk them in.

Yeah did you end up posting that complete mineral oil solution you've got there in that other thread? I'm interested in knowing the parts I'd need.

Also, I plan on putting together a guide on turning the RPI into a miner host, with the added bonus of a $20 RGB 128x64px LCD for monitoring, with keypad. Its Gonna be a fun project while I wait for this board to finish Smiley

I saw that RPI miner host project - looks like a blast. I was just filling my shopping cart on adafruit for that last night. Gotta go back and order it Smiley
http://learn.adafruit.com/piminer-raspberry-pi-bitcoin-miner/
KS
sr. member
Activity: 448
Merit: 250
GREAT NEWS!

I've got it running at 300MHz and it's much more reliable than at lower speeds. I now have 2 chips doing 600 MH/s total with fairly long periods with zero HW errors.

I tried several capacitor values for result capture and the 30pF works well up to about 360MHz. At 380MHz it's lost sync. I tried a brief run at 360 MHz and it worked ok. I need to get a fan working before I run extended tests at higher clocks.

I had to fiddle a lot with the work unit cycle timing. Seems I don't know what's going on because calculated times weren't right. By trial and error I adjusted it to get very few duplicates. I've also implemented code to only update clock cfg when it changes, not every work unit. But I haven't altered the result capture code yet - so even with no extra schmitt buffers and slow result code it's actually doing ok. At 300 MHz the result data comes out at 2.35 MHz.

Here's a pic of the result data at 300 MHz:



I'm just letting it run for a while at 300 to see how it holds up. Chips and heat sink get fairly hot to touch but I have just convection cooling for now. According to IR thermometer the heat sink is about 54C and the chip may be about 63C.

 Kiss
Pages:
Jump to: