Pages:
Author

Topic: Klondike - 16 chip ASIC Open Source Board - Preliminary - page 81. (Read 435369 times)

hero member
Activity: 826
Merit: 1001
... If I added a 1of4 analog switch and used the 2 available lines then I could sequentially sense each of the 4 thermistors, allowing separate readings for each quad. I could then relay them as status values for the driver to decide what to do. IS this something that is worth added cost of a switch chip?
I would say 1 sensor for 16 chips in the center on the heatsink. The heatsink will spread the warmth more or less and a sensor in the middle will probably pick up close to the highest temp of all 16 chips.
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
Could you pass back the results of all of them?
It's easy enough in the mining software to decide which ones to ignore, but also to take note of when just one of them gets high.
At the moment ckolivas displays both temp0 and temp2 in the avalon (coz temp1 is apparently always zero) but I display the max of all temps in the bflsc (since with x-link in the future there could be e.g. 40 of them)
Right now I only have 2 sensors. One in the PIC chip and one as external thermistor. I'm probably not going to use the PIC one. It may be ok but likely too far from any ASICs to be much use. I only have one available analog sense line so if I connect multiple thermistors to that in series+parallel then I only get one reading. If I added a 1of4 analog switch and used the 2 available lines then I could sequentially sense each of the 4 thermistors, allowing separate readings for each quad. I could then relay them as status values for the driver to decide what to do. IS this something that is worth added cost of a switch chip?

Currently in the driver I handle temperature two ways. For the stat line value I return the max of all the devices chained. For the API stats I return each device individual temp values.

I have not currently added the code to detect critical temp and disable the device. But this function is also present in the PIC so it's kind of a backup that the driver would take any action.
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4

That temp reading is right but it's not close to the ASIC mounted. On next board revision I'm going to place the thermistor in the center of a quad of 4 ASICs so it should get a closer approximation, and of course once I do some testing I'll add an offset adjustment. Yes, it's actually 34C at my work bench  Cry

If you put a thermistor per 4 asics then wired them series parallel you could get an average for entire board. How well does the heat conduct to the top layer? I wonder if adding some thermal vias to the thermistor would get it close to actual temp.
Good idea. I'll look into adding a via to help heat get to the thermistor. I was thinking about having sensors for each quad but didn't have enough inputs for individual monitoring. The idea of averaging via several is great because the thermistors are cheap. If I put two in series in parallel then I get 4 thermistors with the same total resistance. I do see some issues with this for situations where not all ASICs are installed or when one gets to critical temp when others are fine. In the latter case you're no worse off than just having one, for the 3 not monitored, but in the case with less ASICs the average of the installed ones would be impacted.

Also, I do have 2 pins available but they cannot be connected to resistors until after programming, and only if I have a bootloader present to do updates. They could be connected to an analog selector switch for choosing 1 of 4 thermistors.
Could you pass back the results of all of them?
It's easy enough in the mining software to decide which ones to ignore, but also to take note of when just one of them gets high.
At the moment ckolivas displays both temp0 and temp2 in the avalon (coz temp1 is apparently always zero) but I display the max of all temps in the bflsc (since with x-link in the future there could be e.g. 40 of them)
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ

That temp reading is right but it's not close to the ASIC mounted. On next board revision I'm going to place the thermistor in the center of a quad of 4 ASICs so it should get a closer approximation, and of course once I do some testing I'll add an offset adjustment. Yes, it's actually 34C at my work bench  Cry

If you put a thermistor per 4 asics then wired them series parallel you could get an average for entire board. How well does the heat conduct to the top layer? I wonder if adding some thermal vias to the thermistor would get it close to actual temp.
Good idea. I'll look into adding a via to help heat get to the thermistor. I was thinking about having sensors for each quad but didn't have enough inputs for individual monitoring. The idea of averaging via several is great because the thermistors are cheap. If I put two in series in parallel then I get 4 thermistors with the same total resistance. I do see some issues with this for situations where not all ASICs are installed or when one gets to critical temp when others are fine. In the latter case you're no worse off than just having one, for the 3 not monitored, but in the case with less ASICs the average of the installed ones would be impacted.

Also, I do have 2 pins available but they cannot be connected to resistors until after programming, and only if I have a bootloader present to do updates. They could be connected to an analog selector switch for choosing 1 of 4 thermistors.
member
Activity: 70
Merit: 10

That temp reading is right but it's not close to the ASIC mounted. On next board revision I'm going to place the thermistor in the center of a quad of 4 ASICs so it should get a closer approximation, and of course once I do some testing I'll add an offset adjustment. Yes, it's actually 34C at my work bench  Cry

If you put a thermistor per 4 asics then wired them series parallel you could get an average for entire board. How well does the heat conduct to the top layer? I wonder if adding some thermal vias to the thermistor would get it close to actual temp.
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
...
But with many devices attached it would probably be nice to give an array for each value eg.

"Temp": [ 65,54,65,78 ],
You can reply with a string or a number, but you can of course make your string look like an array of numbers Smiley

Quote
I noticed that using,

echo -n "{\"command\":\"stats\"}" | nc 127.0.0.1 4028

appends a trailing \0 to output. Is that expected/desired?
It breaks piping to a JSON parser. I use this,

echo -n "{\"command\":\"stats\"}" | nc 127.0.0.1 4028 |tr -d "\000" | jsgrep

to strip the \0 and it prints pretty with jsgrep.

The \0 is there on purpose.

You can of course
java API stats
which is also pretty Cheesy
Also reasonably pretty (but messes up if the data has a comma)
echo -n stats | nc 127.0.0.1 4028 | tr "|," "\n"

If you check the php and java I use the '\0' to identify the completion of data transfer.
It's a character that cannot be used anywhere else so it works well.

(a good example of complete failure in this regard is the BFL SC MCU where it uses a to terminate USB transfers ... except ... sometimes it's OK and sometimes it can be ... either ... with x-link which is a PITA Tongue)
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
Are you using current git?
ckolivas has considered the hashmeter the bane of his existence for a while but he recently updated it to be quite reliable.

Of course, the hash rate is dependent on how you supply the results back, so it depends on which driver you are using as the basis.
(and again there have been a LOT of changes in the drivers also recently - the AMU has sort of been a nightmare due to the problems that have shown up because of it, so there have been a lot of changes related to that)

However, there is another way that they will be different.
My hotplug code now sets a start time for each device rather than using the cgminer start time.
Thus the device average for a hotplugged device does read at the expected performance of the device rather than below it due to the initial amount of time it wasn't connected.
But the overall average is of course since cgminer started.
Thanks for your input.

I pulled 3.3.1 a couple days ago, so not using git right now.

I return the hashcount difference for each call to scanwork, but I have a wait of 200 in there, and it doesn't ask the devices for an update. It depends on get_stats being called for recent device status, and this seems to be every 3 seconds roughly. So overall it probably gets a new hashcount every 3 seconds, which causes quite a bit of variance in the 5 sec hashmeter value.

I used bflsc as basis but cut out a lot of stuff and simplified what I could for klondike.

The API stats now returns Clock, Temp, Fan %, Fan RPM. I numbered them for each device on a klondike chain, starting with 0 for master. eg. using JSON call,

Code:
 {
            "Fan RPM 0": 418,
            "STATS": 1,
            "Clock 0": 0.0,
            "Calls": 0,
            "Min": 99999999.0,
            "Max": 0.0,
            "USB Delay": "r0 0.000000 w0 0.000000",
            "USB Pipe": "0",
            "Elapsed": 1625,
            "Fan Percent 0": 0,
            "Temp 0": 38.25,
            "ID": "KLN0",
            "Wait": 0.0
        },
But with many devices attached it would probably be nice to give an array for each value eg.

"Temp": [ 65,54,65,78 ],

I noticed that using,

echo -n "{\"command\":\"stats\"}" | nc 127.0.0.1 4028

appends a trailing \0 to output. Is that expected/desired?
It breaks piping to a JSON parser. I use this,

echo -n "{\"command\":\"stats\"}" | nc 127.0.0.1 4028 |tr -d "\000" | jsgrep

to strip the \0 and it prints pretty with jsgrep.
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
As I'm sure you know, the amount of each nonce range processed doesn't matter.
ckolivas also solves the problem of not getting exact hash statistics by simply counting the number of nonces found - since of course on average, you'll find one valid 1diff nonce per full nonce range processed.
Although over a short time frame this isn't very accurate, if you are hashing in the 10s of GH/s (or more) this is still good enough.
Is that also why the avg hash rates usually don't seem to add up to the total? Different calc methods.

http://i.imgur.com/aVgHFv6.png
...
Are you using current git?
ckolivas has considered the hashmeter the bane of his existence for a while but he recently updated it to be quite reliable.

Of course, the hash rate is dependent on how you supply the results back, so it depends on which driver you are using as the basis.
(and again there have been a LOT of changes in the drivers also recently - the AMU has sort of been a nightmare due to the problems that have shown up because of it, so there have been a lot of changes related to that)

However, there is another way that they will be different.
My hotplug code now sets a start time for each device rather than using the cgminer start time.
Thus the device average for a hotplugged device does read at the expected performance of the device rather than below it due to the initial amount of time it wasn't connected.
But the overall average is of course since cgminer started.

Quote
That temp reading is right but it's not close to the ASIC mounted. On next board revision I'm going to place the thermistor in the center of a quad of 4 ASICs so it should get a closer approximation, and of course once I do some testing I'll add an offset adjustment. Yes, it's actually 34C at my work bench  Cry
Well ... every other mining devices sux with regards to their temperature measurements, so if you can make it more accurate, you'll be ahead of everyone else Cheesy
hero member
Activity: 924
Merit: 1000
3) In my API stats I've added 2 new fields: "USB Pipe" amd "USB Delay"
If "USB Pipe" is non-zero then there are USB problems happening that could also be causing errors.
"USB Delay" shows if there are timing 'issues' occurring in the code (cps fixes these and reports them in "USB Delay")
I did a bit of debugging on API support with klondike and noticed now that after running for a while USB Pipe and USB Delay both remain at 0.

USB Pipe=0,USB Delay=r0 0.000000 w0 0.000000

I haven't been having USB problems lately anyway.

I noticed that the Eruper (AMU 0) has: USB Delay=r14 0.008319 w0 0.000000

I don't seem to be getting my api stats values added there. Maybe I'm doing something wrong. I'm using plain text stats cmd with,

echo -n stats | nc 127.0.0.1 4028

edit: oh, duh - I wasn't returning the root after adding my items.
...
I guess I should have explained 'issues' Smiley
The issues are simply that it delays writing data until the expected cps time of the previous written data has occurred.
That's not a program problem as such, simply an easy way to completely avoid the issue of transferring data faster than the cps (character's per second) of the chip.
If you want to ensure this problem isn't happening you turn it on with (e.g. in driver-icarus.c)
        usb_set_cps(icarus, baud / 10);
        usb_enable_cps(icarus);
in the initialise() function before doing any control transfers (it's not really a debug thing, it's a permanent solution to leave on)
ckolvias ended up going to a lot of effort timing the access to the avalon (in the avalon hardware this really isn't very good) but I wonder now if the cps code I added would have made that a little simpler.
It's completely optional - only usb_enable_cps() turns it on.

You will probably also have noticed that I added a lot of extra USB locking recently.
(it seems libusb isn't as thread safe as it says it is Tongue)
This caused havoc for the Avalon and the 60GH/s BFL until ckolivas adjusted the locking strategies to ensure each thread got it's chance.
If you are still going with a low performance device to do the mining, this will become even more important - look at the "sched_yield();"s in miner.h
Either way, if you are making devices that hash in the 50GH/s+ arena, you'll need to test them at that speed and see that the locking isn't starving threads of CPU when required (and that you are using the appropriate lock/unlock mechanisms to avoid these problems)


+1 definitely be donating to Ckolvias and Kano for all they are doing here to help once I get my boards and burn them in. Keep up the great work gentlemen.
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
3) In my API stats I've added 2 new fields: "USB Pipe" amd "USB Delay"
If "USB Pipe" is non-zero then there are USB problems happening that could also be causing errors.
"USB Delay" shows if there are timing 'issues' occurring in the code (cps fixes these and reports them in "USB Delay")
I did a bit of debugging on API support with klondike and noticed now that after running for a while USB Pipe and USB Delay both remain at 0.

USB Pipe=0,USB Delay=r0 0.000000 w0 0.000000

I haven't been having USB problems lately anyway.

I noticed that the Eruper (AMU 0) has: USB Delay=r14 0.008319 w0 0.000000

I don't seem to be getting my api stats values added there. Maybe I'm doing something wrong. I'm using plain text stats cmd with,

echo -n stats | nc 127.0.0.1 4028

edit: oh, duh - I wasn't returning the root after adding my items.
...
I guess I should have explained 'issues' Smiley
The issues are simply that it delays writing data until the expected cps time of the previous written data has occurred.
That's not a program problem as such, simply an easy way to completely avoid the issue of transferring data faster than the cps (character's per second) of the chip.
If you want to ensure this problem isn't happening you turn it on with (e.g. in driver-icarus.c)
        usb_set_cps(icarus, baud / 10);
        usb_enable_cps(icarus);
in the initialise() function before doing any control transfers (it's not really a debug thing, it's a permanent solution to leave on)
ckolvias ended up going to a lot of effort timing the access to the avalon (in the avalon hardware this really isn't very good) but I wonder now if the cps code I added would have made that a little simpler.
It's completely optional - only usb_enable_cps() turns it on.

You will probably also have noticed that I added a lot of extra USB locking recently.
(it seems libusb isn't as thread safe as it says it is Tongue)
This caused havoc for the Avalon and the 60GH/s BFL until ckolivas adjusted the locking strategies to ensure each thread got it's chance.
If you are still going with a low performance device to do the mining, this will become even more important - look at the "sched_yield();"s in miner.h
Either way, if you are making devices that hash in the 50GH/s+ arena, you'll need to test them at that speed and see that the locking isn't starving threads of CPU when required (and that you are using the appropriate lock/unlock mechanisms to avoid these problems)
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
As I'm sure you know, the amount of each nonce range processed doesn't matter.
ckolivas also solves the problem of not getting exact hash statistics by simply counting the number of nonces found - since of course on average, you'll find one valid 1diff nonce per full nonce range processed.
Although over a short time frame this isn't very accurate, if you are hashing in the 10s of GH/s (or more) this is still good enough.
Is that also why the avg hash rates usually don't seem to add up to the total? Different calc methods.



That temp reading is right but it's not close to the ASIC mounted. On next board revision I'm going to place the thermistor in the center of a quad of 4 ASICs so it should get a closer approximation, and of course once I do some testing I'll add an offset adjustment. Yes, it's actually 34C at my work bench  Cry
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
If you have 5 chips you could put 4 of them on the K16 board and it should work quit well.
For example using the current design you would place the ASIC's at U6,U8,U9,U11.
Maybe use the extra chip in a K1.
 
That's right but also if you add the 5th chip it will break nonce ranges as if 6 are present which will still work but will leave 1/6th of the ranges unchecked. This doesn't cause problems and since any range is good as another, as long as each chip is checking unique ranges, it's still efficient. A job will finish slightly more often (4 split vs 6 split), so that's the only overhead timing loss.
...
As I'm sure you know, the amount of each nonce range processed doesn't matter.
ckolivas also solves the problem of not getting exact hash statistics by simply counting the number of nonces found - since of course on average, you'll find one valid 1diff nonce per full nonce range processed.
Although over a short time frame this isn't very accurate, if you are hashing in the 10s of GH/s (or more) this is still good enough.
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
3) In my API stats I've added 2 new fields: "USB Pipe" amd "USB Delay"
If "USB Pipe" is non-zero then there are USB problems happening that could also be causing errors.
"USB Delay" shows if there are timing 'issues' occurring in the code (cps fixes these and reports them in "USB Delay")
I did a bit of debugging on API support with klondike and noticed now that after running for a while USB Pipe and USB Delay both remain at 0.

USB Pipe=0,USB Delay=r0 0.000000 w0 0.000000

I haven't been having USB problems lately anyway.

I noticed that the Eruper (AMU 0) has: USB Delay=r14 0.008319 w0 0.000000

I don't seem to be getting my api stats values added there. Maybe I'm doing something wrong. I'm using plain text stats cmd with,

echo -n stats | nc 127.0.0.1 4028

edit: oh, duh - I wasn't returning the root after adding my items.

Does any of the reference material on the Avalon state what the debug pins output?
Not that I've seen but it's been a while since I looked over everything and they could have updated or I may have missed it. I did scope them a couple times and got nothing out. If it was a PLL lock or lost lock then it could only show up sporadically. It would be really useful to get a shift out of actual data captured during the shift in, but so far I haven't seen anything on them.
member
Activity: 70
Merit: 10
Does any of the reference material on the Avalon state what the debug pins output?
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
In Xilinx FPGAs the PLLs are very picky about the multiplier.  eg the multiplied clock must be between  800MHz and 1600MHz.  With a 100 MHz input you would never use a multiplier of 2 and a divider of 1 to get 200MHz, you would use a multiplier of 10 and a divider of 5 to put the multiplied clock into the valid range.  It might be worth sweeping the mutliplier range to see if certain values perform better.
After I get a heat sink mounted I'll do some testing like that.

full member
Activity: 176
Merit: 100
PLL might be a good place to start looking. Just make sure your PLL maintains a good lock.
I wouldn't be at all surprised if now and then the PLL unlocks and the clock shifting causes error results. I would also guess that is more likely to occur closer to the clock limit values, eg. ~512 where I am now, and by moving up to 600 it has more stability. (@half clock is 128 vs 150).

Next board revision will have that ferrite bead on the PLL power inputs, and can only help. I don't know if one of the debug outputs maybe indicates PLL lock but I otherwise have no way to know. It would be nice to know what the 2 debug outputs are but I haven't seen any docs about them.

In Xilinx FPGAs the PLLs are very picky about the multiplier.  eg the multiplied clock must be between  800MHz and 1600MHz.  With a 100 MHz input you would never use a multiplier of 2 and a divider of 1 to get 200MHz, you would use a multiplier of 10 and a divider of 5 to put the multiplied clock into the valid range.  It might be worth sweeping the mutliplier range to see if certain values perform better.
hero member
Activity: 658
Merit: 500
CCNA: There i fixed the internet.

How difficult/possible would it be to rework the FW to do 1 job per chip?
That would require a set of data inputs and outputs for each chip, which is a lot, and isn't possible with the current design. ie. 4 lines per chip and a lot more firmware overhead to handle it. The chips are designed to chain, splitting jobs, and there isn't any efficiency gain by running a job per chip.

If you have 5 chips you could put 4 of them on the K16 board and it should work quit well.
For example using the current design you would place the ASIC's at U6,U8,U9,U11.
Maybe use the extra chip in a K1.
 
That's right but also if you add the 5th chip it will break nonce ranges as if 6 are present which will still work but will leave 1/6th of the ranges unchecked. This doesn't cause problems and since any range is good as another, as long as each chip is checking unique ranges, it's still efficient. A job will finish slightly more often (4 split vs 6 split), so that's the only overhead timing loss.

I haven't verified the code yet but during device init it sends known good data to each chip with a specific nonce count such that a known delay will produce a result. It then counts the results that come back sequentially, and uses this to determine chip count. Due to the chaining connections (hardware) the chips must be installed in a given order, and optimally it's best to balance between banks. I'll be testing this chip count code when I get more chips on board.

Cool Beans. fixed andy issues i would have had by adding 3 more chips to my GB order  Grin

Thanks again BKK and fasmax d=^_^=b
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
PLL might be a good place to start looking. Just make sure your PLL maintains a good lock.
I wouldn't be at all surprised if now and then the PLL unlocks and the clock shifting causes error results. I would also guess that is more likely to occur closer to the clock limit values, eg. ~512 where I am now, and by moving up to 600 it has more stability. (@half clock is 128 vs 150).

Next board revision will have that ferrite bead on the PLL power inputs, and can only help. I don't know if one of the debug outputs maybe indicates PLL lock but I otherwise have no way to know. It would be nice to know what the 2 debug outputs are but I haven't seen any docs about them.

How difficult/possible would it be to rework the FW to do 1 job per chip?
That would require a set of data inputs and outputs for each chip, which is a lot, and isn't possible with the current design. ie. 4 lines per chip and a lot more firmware overhead to handle it. The chips are designed to chain, splitting jobs, and there isn't any efficiency gain by running a job per chip.

If you have 5 chips you could put 4 of them on the K16 board and it should work quit well.
For example using the current design you would place the ASIC's at U6,U8,U9,U11.
Maybe use the extra chip in a K1.
 
That's right but also if you add the 5th chip it will break nonce ranges as if 6 are present which will still work but will leave 1/6th of the ranges unchecked. This doesn't cause problems and since any range is good as another, as long as each chip is checking unique ranges, it's still efficient. A job will finish slightly more often (4 split vs 6 split), so that's the only overhead timing loss.

I haven't verified the code yet but during device init it sends known good data to each chip with a specific nonce count such that a known delay will produce a result. It then counts the results that come back sequentially, and uses this to determine chip count. Due to the chaining connections (hardware) the chips must be installed in a given order, and optimally it's best to balance between banks. I'll be testing this chip count code when I get more chips on board.
sr. member
Activity: 378
Merit: 250
I dont remember if anyone has asked this prior. Ive been silently watching in the background...

Anyway for the PIC firmware, i remember you stating that it subdivides the nonce range by n chips and pushes those ranges to the chips.

How difficult/possible would it be to rework the FW to do 1 job per chip?

This is just out of curiosity since i put in an order for 5 chips in a group buy + board ( once finalized [TY, T13Hydra]).

-Taugeran
It would be more than a firmware change to do this for 5 chips.
The hardware would also need to be changed.
If you have 5 chips you could put 4 of them on the K16 board and it should work quit well.
For example using the current design you would place the ASIC's at U6,U8,U9,U11.
Maybe use the extra chip in a K1.
 
legendary
Activity: 966
Merit: 1000
So pool protocol cause a high hw errors?
Makes no sense, I know. And I'm not saying it does, but when I switched to stratum the rates dropped right down. Still scratching my head. I'm just letting both Erupter and Klondike run now. Klondike currently has A:99 R:0 HW:2 - which is the best it's been yet, though not as good as the Erupter at A:293 R:0 HW:3.

With my USB Asicminer Erupters I see higher HW errors when pool misbehaves (goes offline) or when my internet connection misbehaves (goes off). So color me unsurprised. But it is still non-sense.

My USB BEs seem to report a HW error every time they restart if they ever are allowed to go idle where the green LED comes on solid.
Pages:
Jump to: