Pages:
Author

Topic: Swedish ASIC miner company kncminer.com - page 94. (Read 3049514 times)

sr. member
Activity: 342
Merit: 250
here's the screen shot, trying your latest commit now

legendary
Activity: 2450
Merit: 1002
cool, I'll check it out

It looks good, however I did see one small issue. I raised the MHz on the flaky soft reset die to stress it and bfgminer wouldn't load. It got stuck in a hard reset loop when the die failed. I was able to change the advanced setting to get out of the loop, but maybe need to delay the hard reset routine for a few min to let bfgminer load when there is a bad die that's overclocked or not turned off

ok I just noticed something ... If I get a bunch of "Got nonce for unknown work in slot xx" errors scrolling then it goes straight to hard reset ... (this may tie in with the bgfminer not loading loop).

other than that everything is working fine

I'll get another cup of coffee & watch this for awhile, see if I can notice anything else that might help you troubleshoot it

...... ok this time the "Got nonce for unknown work in slot xx" errors didn't trigger an instant hard reset and it reset successfully, must have been coincidence earlier

ok looks like just getting random hard resets instead of running thru the soft reset loop. Most of the time it runs thru the loop. If I disable the flaky die there aren't any problems

btw, that die is set at 150 MHz so it has less amp draw
So, for the die thats at 150mhz ... are either of the DCDC's less than 5amp output for more than a few mins? If so, thats why its carrying out the hard reset. Paste contents of /var/log/monitordcdc.log if ya could =) thanks

If it turns out that one of the DCDC's is putting out current but the other is not, I may just change the comparison to DCDC1 AND DCDC2 less than threshold rather than OR. Because both will be less than 5 if the die is not working at all.

I bet that's it, in adv settings one die is showing zero amps, the other die isn't even listed -- I know it's hashing because I turned off the good dies and got 10 mhs in bfgminer for that bad die, about half what it's suppose to be

here's the end of the log, it's getting pretty big

Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "200" } }
STATUS=S,When=1439479313,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "200" } }
STATUS=S,When=1439479352,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
KnC: Frequency change FAILED! { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "200" } }
Soft reset failed, initiatng hard reset
Stopping bfgminer.
Power cycling ASIC# 5
INFO: Attempt to power down dc/dc
INFO: Attempt to power UP dc/dc
Starting bfgminer.
[2015-08-13 15:23:46] Die 5- restarted
Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2
Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4
Moving on with dead die test, no manual disabled die found
[2015-08-13 15:28:51] Die 5-1 requires restart
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
STATUS=S,When=1439479743,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
STATUS=S,When=1439479781,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
STATUS=S,When=1439479818,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
STATUS=S,When=1439479857,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
STATUS=S,When=1439479897,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
STATUS=S,When=1439479936,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
STATUS=S,When=1439479973,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
KnC: Frequency change FAILED! { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
Soft reset failed, initiatng hard reset


I was scrolling thru the log and saw this  "Failed multiple soft reset attempts, performing hard reset" -- above it there were 10 soft reset attempts like above -- however  there aren't any die configuration failed error messages in bfgminer

....  ok I just confirmed in bfgminer -- 10 successful soft restarts and the 11th goes straight to hard restart   :|

... it's not detecting a successful restart because of the low/zero amp issue??

Can you try my latest commit? I just uploaded it.
Let me know what happens, also could u paste a screenshot of that 150mhz die from advanced screen config?
Thanks
hero member
Activity: 686
Merit: 500
FUN > ROI
legendary
Activity: 2408
Merit: 1004
How we can mining etherium with titan


Any details here
legendary
Activity: 1610
Merit: 1003
"Yobit pump alert software" Link in my signature!
Guys, Ive donated to GenTarkin as he has done a wonderful job. Has anyone else?

There will be no more updates from KNCminer. GenTarkin is the ONLY one, and he is doing a better job, than KNCminer ever did!! Donate, so we dont lose him. @Gentarkin, please post your BTC address and/or LTC address. Thanks.

Vegas


sr. member
Activity: 342
Merit: 250
cool, I'll check it out

It looks good, however I did see one small issue. I raised the MHz on the flaky soft reset die to stress it and bfgminer wouldn't load. It got stuck in a hard reset loop when the die failed. I was able to change the advanced setting to get out of the loop, but maybe need to delay the hard reset routine for a few min to let bfgminer load when there is a bad die that's overclocked or not turned off

ok I just noticed something ... If I get a bunch of "Got nonce for unknown work in slot xx" errors scrolling then it goes straight to hard reset ... (this may tie in with the bgfminer not loading loop).

other than that everything is working fine

I'll get another cup of coffee & watch this for awhile, see if I can notice anything else that might help you troubleshoot it

...... ok this time the "Got nonce for unknown work in slot xx" errors didn't trigger an instant hard reset and it reset successfully, must have been coincidence earlier

ok looks like just getting random hard resets instead of running thru the soft reset loop. Most of the time it runs thru the loop. If I disable the flaky die there aren't any problems

btw, that die is set at 150 MHz so it has less amp draw
So, for the die thats at 150mhz ... are either of the DCDC's less than 5amp output for more than a few mins? If so, thats why its carrying out the hard reset. Paste contents of /var/log/monitordcdc.log if ya could =) thanks

If it turns out that one of the DCDC's is putting out current but the other is not, I may just change the comparison to DCDC1 AND DCDC2 less than threshold rather than OR. Because both will be less than 5 if the die is not working at all.

I bet that's it, in adv settings one die is showing zero amps, the other die isn't even listed -- I know it's hashing because I turned off the good dies and got 10 mhs in bfgminer for that bad die, about half what it's suppose to be

here's the end of the log, it's getting pretty big

Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "200" } }
STATUS=S,When=1439479313,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "200" } }
STATUS=S,When=1439479352,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
KnC: Frequency change FAILED! { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "200" } }
Soft reset failed, initiatng hard reset
Stopping bfgminer.
Power cycling ASIC# 5
INFO: Attempt to power down dc/dc
INFO: Attempt to power UP dc/dc
Starting bfgminer.
[2015-08-13 15:23:46] Die 5- restarted
Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2
Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4
Moving on with dead die test, no manual disabled die found
[2015-08-13 15:28:51] Die 5-1 requires restart
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
STATUS=S,When=1439479743,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
STATUS=S,When=1439479781,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
STATUS=S,When=1439479818,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
STATUS=S,When=1439479857,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
STATUS=S,When=1439479897,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
STATUS=S,When=1439479936,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
{ "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
STATUS=S,When=1439479973,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0|
Attempting softreset of ASIC# 5 DIE# 1
KnC: Frequency change FAILED! { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } }
Soft reset failed, initiatng hard reset


I was scrolling thru the log and saw this  "Failed multiple soft reset attempts, performing hard reset" -- above it there were 10 soft reset attempts like above -- however  there aren't any die configuration failed error messages in bfgminer

....  ok I just confirmed in bfgminer -- 10 successful soft restarts and the 11th goes straight to hard restart   :|

... it's not detecting a successful restart because of the low/zero amp issue??
sr. member
Activity: 440
Merit: 250
guys some help

my controller only say STOPPED status page
cgminer dont start

neptune machine
how i can fix?
or anyone have SD FULL image for put in SDcard

already test recovery sd card but dont work

any help please?


kncminer say already dont have warranty

any help please
legendary
Activity: 2450
Merit: 1002
cool, I'll check it out

It looks good, however I did see one small issue. I raised the MHz on the flaky soft reset die to stress it and bfgminer wouldn't load. It got stuck in a hard reset loop when the die failed. I was able to change the advanced setting to get out of the loop, but maybe need to delay the hard reset routine for a few min to let bfgminer load when there is a bad die that's overclocked or not turned off

ok I just noticed something ... If I get a bunch of "Got nonce for unknown work in slot xx" errors scrolling then it goes straight to hard reset ... (this may tie in with the bgfminer not loading loop).

other than that everything is working fine

I'll get another cup of coffee & watch this for awhile, see if I can notice anything else that might help you troubleshoot it

...... ok this time the "Got nonce for unknown work in slot xx" errors didn't trigger an instant hard reset and it reset successfully, must have been coincidence earlier

ok looks like just getting random hard resets instead of running thru the soft reset loop. Most of the time it runs thru the loop. If I disable the flaky die there aren't any problems

btw, that die is set at 150 MHz so it has less amp draw
So, for the die thats at 150mhz ... are either of the DCDC's less than 5amp output for more than a few mins? If so, thats why its carrying out the hard reset. Paste contents of /var/log/monitordcdc.log if ya could =) thanks

If it turns out that one of the DCDC's is putting out current but the other is not, I may just change the comparison to DCDC1 AND DCDC2 less than threshold rather than OR. Because both will be less than 5 if the die is not working at all.
legendary
Activity: 2450
Merit: 1002
cool, I'll check it out

It looks good, however I did see one small issue. I raised the MHz on the flaky soft reset die to stress it and bfgminer wouldn't load. It got stuck in a hard reset loop when the die failed. I was able to change the advanced setting to get out of the loop, but maybe need to delay the hard reset routine for a few min to let bfgminer load when there is a bad die that's overclocked or not turned off

ok I just noticed something ... If I get a bunch of "Got nonce for unknown work in slot xx" errors scrolling then it goes straight to hard reset ... (this may tie in with the bgfminer not loading loop).

other than that everything is working fine

I'll get another cup of coffee & watch this for awhile, see if I can notice anything else that might help you troubleshoot it

...... ok this time the "Got nonce for unknown work in slot xx" errors didn't trigger an instant hard reset and it reset successfully, must have been coincidence earlier

The hard reset most ikely took place because that die was probably not able to be reconfigured via waas -s and that command failed, whenver that command fails it will do a hard reset. That command should never return failed. If it does that means soft resets wont bring it back.
The 2nd time it may have just been that the soft reset worked for it =)

As far as ur first issue ... again, if u raised the die to a mhz where it wont even acknowledge waas -s commands or it returns failed it will do a hard reset no matter what.
Can you post the contents of ur log file where this all took place so I can guauge to see if my guesses are correct?


I also had some interesting behaviour on my titan this morning. One die went down, 10 soft resets failed then it did a hard reset on the cube - worked as designed. Then another die went "down" shortly after and it completely failed waas -s command so it carried out the hard reset. Worked as designed as well =)
sr. member
Activity: 342
Merit: 250
cool, I'll check it out

It looks good, however I did see one small issue. I raised the MHz on the flaky soft reset die to stress it and bfgminer wouldn't load. It got stuck in a hard reset loop when the die failed. I was able to change the advanced setting to get out of the loop, but maybe need to delay the hard reset routine for a few min to let bfgminer load when there is a bad die that's overclocked or not turned off

ok I just noticed something ... If I get a bunch of "Got nonce for unknown work in slot xx" errors scrolling then it goes straight to hard reset ... (this may tie in with the bgfminer not loading loop).

other than that everything is working fine

I'll get another cup of coffee & watch this for awhile, see if I can notice anything else that might help you troubleshoot it

...... ok this time the "Got nonce for unknown work in slot xx" errors didn't trigger an instant hard reset and it reset successfully, must have been coincidence earlier

ok looks like just getting random hard resets instead of running thru the soft reset loop. Most of the time it runs thru the loop. If I disable the flaky die there aren't any problems

btw, that die is set at 150 MHz so it has less amp draw
sr. member
Activity: 440
Merit: 250
guys some help

my controller only say STOPPED status page
cgminer dont start

neptune machine
how i can fix?
or anyone have SD FULL image for put in SDcard

already test recovery sd card but dont work

any help please?


kncminer say already dont have warranty
sr. member
Activity: 342
Merit: 250
cool, I'll check it out

It looks good, however I did see one small issue. I raised the MHz on the flaky soft reset die to stress it and bfgminer wouldn't load. It got stuck in a hard reset loop when the die failed. I was able to change the advanced setting to get out of the loop, but maybe need to delay the hard reset routine for a few min to let bfgminer load when there is a bad die that's overclocked or not turned off

ok I just noticed something ... If I get a bunch of "Got nonce for unknown work in slot xx" errors scrolling then it goes straight to hard reset ... (this may tie in with the bgfminer not loading loop).

other than that everything is working fine

I'll get another cup of coffee & watch this for awhile, see if I can notice anything else that might help you troubleshoot it

...... ok this time the "Got nonce for unknown work in slot xx" errors didn't trigger an instant hard reset and it reset successfully, must have been coincidence earlier
sr. member
Activity: 342
Merit: 250
cool, I'll check it out
legendary
Activity: 2450
Merit: 1002
TXSteve, so the way these things work is every loop of the monitoring script it scans how much current is going through the DCDC's and if under 5 amps is going through it basically incriments the value /var/run/dieXX
once /var/run/dieXX reaches over the threshold variable it takes action to reset the die. So...basically if the ASICs dont have work due to a shitty pool configuration like when renting... that /var/run/dieXX will get incrimented, if the pool is shitty enough and it increments over threshold ... no matter what... one of the reset actions will be taken.
Im not exactly sure whats a good way to recode this so it can differentiate between possible pool comm issues vs an actual die needing reset.
I could impliment maybe 2 thresholds, one being a lower one for the pool issues then another maybe being double of the first one?Huh and if thats reached then it performs a hard reset.
I dont know, what do you think?
Or anyone else care to chime in?
Im just kinda doing a ton of trial and error here LOL!


How this all was originally(the way KNC set it up) was on threshold and if that threshold was hit it just performed a soft reset for an eternity.



In the short term, I was thinking, pool issues aside. Now since we have this other stupid mode of failure where the waas command succeeds but it really doesnt bring the die back to life .... I could do another loop which would attempt soft resets up to say 5x sleep after each attempt, run the dcdc function to see if current is over 5A and each incriment its not, of course increase /var/run/dieXX ... then inside that same loop if /var/run/dieXX goes over the threhold then start a hard reset. I think that would solve the issue I ran into just today =)

how about this? An "auto on/auto off" button & a "manual reset" button. On problematic rigs we turn auto reset off, & use manual reset as needed. It still beats physically powering down the rig, and restarting it ...just a thought

I only have one die with these flaky soft resets, so it may not be a huge overall problem

Trying to keep this as automated as possible and least user invasive as possible =), its hard for me to code custom stuff for webgui since I dont have much experience w/ all the crazy shit they have going on involving it. LOL

this v.93 seems to be running pretty good, even the flaky soft reset die seems to have stabilized after several hard resets

when bfgminer randomly shuts down I attribute that to hard resets



Yeahp, bfgminer cant be running in order for a proper dcdc power down / up ... dont know if it has to do w/ bus traffic or what, but seems the dcdc power down / up does something but bfgminer will continually ignore them.

If those loops are too big a pain the neck to implement I wouldn't worry about them

Well, I rewrote the soft / hard reset code =)
Basically, once it detects a die in error via /var/run/dieXX
It calls the reset die function, it first attempts a soft reset... if that fails right off the bat via waas -s failing then it performs a hard reset. (waas -s shouldnt fail because of pool comm errors)
If waas -s succeeds then it calls bfgminer to perform its internal die reconfiguration update
Then script waits 30 seconds and measures the current output of die in question
If either of the DCDC's are below current treshold then it incriments error count
It will loop through the soft die resets up to 10 times, if it fails 10x then it performs a hard reset.
So, it gives the die roughly 5-6 mins to "work" ... meaning have current flowing through it greater than the threshold, via soft resets.
If that fails it hard resets.

What ya think?
I updated github w/ the changes if you wanna test.

I can say it seems the soft reset logic is working. I have yet to be able to see a hard reset take place, I have to actually wait till my unit acts up to confirm hard reset functionality LOL!
sr. member
Activity: 342
Merit: 250
TXSteve, so the way these things work is every loop of the monitoring script it scans how much current is going through the DCDC's and if under 5 amps is going through it basically incriments the value /var/run/dieXX
once /var/run/dieXX reaches over the threshold variable it takes action to reset the die. So...basically if the ASICs dont have work due to a shitty pool configuration like when renting... that /var/run/dieXX will get incrimented, if the pool is shitty enough and it increments over threshold ... no matter what... one of the reset actions will be taken.
Im not exactly sure whats a good way to recode this so it can differentiate between possible pool comm issues vs an actual die needing reset.
I could impliment maybe 2 thresholds, one being a lower one for the pool issues then another maybe being double of the first one?Huh and if thats reached then it performs a hard reset.
I dont know, what do you think?
Or anyone else care to chime in?
Im just kinda doing a ton of trial and error here LOL!


How this all was originally(the way KNC set it up) was on threshold and if that threshold was hit it just performed a soft reset for an eternity.



In the short term, I was thinking, pool issues aside. Now since we have this other stupid mode of failure where the waas command succeeds but it really doesnt bring the die back to life .... I could do another loop which would attempt soft resets up to say 5x sleep after each attempt, run the dcdc function to see if current is over 5A and each incriment its not, of course increase /var/run/dieXX ... then inside that same loop if /var/run/dieXX goes over the threhold then start a hard reset. I think that would solve the issue I ran into just today =)

how about this? An "auto on/auto off" button & a "manual reset" button. On problematic rigs we turn auto reset off, & use manual reset as needed. It still beats physically powering down the rig, and restarting it ...just a thought

I only have one die with these flaky soft resets, so it may not be a huge overall problem

Trying to keep this as automated as possible and least user invasive as possible =), its hard for me to code custom stuff for webgui since I dont have much experience w/ all the crazy shit they have going on involving it. LOL

this v.93 seems to be running pretty good, even the flaky soft reset die seems to have stabilized after several hard resets

when bfgminer randomly shuts down I attribute that to hard resets



Yeahp, bfgminer cant be running in order for a proper dcdc power down / up ... dont know if it has to do w/ bus traffic or what, but seems the dcdc power down / up does something but bfgminer will continually ignore them.

If those loops are too big a pain the neck to implement I wouldn't worry about them
legendary
Activity: 2450
Merit: 1002
TXSteve, so the way these things work is every loop of the monitoring script it scans how much current is going through the DCDC's and if under 5 amps is going through it basically incriments the value /var/run/dieXX
once /var/run/dieXX reaches over the threshold variable it takes action to reset the die. So...basically if the ASICs dont have work due to a shitty pool configuration like when renting... that /var/run/dieXX will get incrimented, if the pool is shitty enough and it increments over threshold ... no matter what... one of the reset actions will be taken.
Im not exactly sure whats a good way to recode this so it can differentiate between possible pool comm issues vs an actual die needing reset.
I could impliment maybe 2 thresholds, one being a lower one for the pool issues then another maybe being double of the first one?Huh and if thats reached then it performs a hard reset.
I dont know, what do you think?
Or anyone else care to chime in?
Im just kinda doing a ton of trial and error here LOL!


How this all was originally(the way KNC set it up) was on threshold and if that threshold was hit it just performed a soft reset for an eternity.



In the short term, I was thinking, pool issues aside. Now since we have this other stupid mode of failure where the waas command succeeds but it really doesnt bring the die back to life .... I could do another loop which would attempt soft resets up to say 5x sleep after each attempt, run the dcdc function to see if current is over 5A and each incriment its not, of course increase /var/run/dieXX ... then inside that same loop if /var/run/dieXX goes over the threhold then start a hard reset. I think that would solve the issue I ran into just today =)

how about this? An "auto on/auto off" button & a "manual reset" button. On problematic rigs we turn auto reset off, & use manual reset as needed. It still beats physically powering down the rig, and restarting it ...just a thought

I only have one die with these flaky soft resets, so it may not be a huge overall problem

Trying to keep this as automated as possible and least user invasive as possible =), its hard for me to code custom stuff for webgui since I dont have much experience w/ all the crazy shit they have going on involving it. LOL

this v.93 seems to be running pretty good, even the flaky soft reset die seems to have stabilized after several hard resets

when bfgminer randomly shuts down I attribute that to hard resets



Yeahp, bfgminer cant be running in order for a proper dcdc power down / up ... dont know if it has to do w/ bus traffic or what, but seems the dcdc power down / up does something but bfgminer will continually ignore them.
sr. member
Activity: 342
Merit: 250
TXSteve, so the way these things work is every loop of the monitoring script it scans how much current is going through the DCDC's and if under 5 amps is going through it basically incriments the value /var/run/dieXX
once /var/run/dieXX reaches over the threshold variable it takes action to reset the die. So...basically if the ASICs dont have work due to a shitty pool configuration like when renting... that /var/run/dieXX will get incrimented, if the pool is shitty enough and it increments over threshold ... no matter what... one of the reset actions will be taken.
Im not exactly sure whats a good way to recode this so it can differentiate between possible pool comm issues vs an actual die needing reset.
I could impliment maybe 2 thresholds, one being a lower one for the pool issues then another maybe being double of the first one?Huh and if thats reached then it performs a hard reset.
I dont know, what do you think?
Or anyone else care to chime in?
Im just kinda doing a ton of trial and error here LOL!


How this all was originally(the way KNC set it up) was on threshold and if that threshold was hit it just performed a soft reset for an eternity.



In the short term, I was thinking, pool issues aside. Now since we have this other stupid mode of failure where the waas command succeeds but it really doesnt bring the die back to life .... I could do another loop which would attempt soft resets up to say 5x sleep after each attempt, run the dcdc function to see if current is over 5A and each incriment its not, of course increase /var/run/dieXX ... then inside that same loop if /var/run/dieXX goes over the threhold then start a hard reset. I think that would solve the issue I ran into just today =)

how about this? An "auto on/auto off" button & a "manual reset" button. On problematic rigs we turn auto reset off, & use manual reset as needed. It still beats physically powering down the rig, and restarting it ...just a thought

I only have one die with these flaky soft resets, so it may not be a huge overall problem

Trying to keep this as automated as possible and least user invasive as possible =), its hard for me to code custom stuff for webgui since I dont have much experience w/ all the crazy shit they have going on involving it. LOL

this v.93 seems to be running pretty good, even the flaky soft reset die seems to have stabilized after several hard resets

when bfgminer randomly shuts down I attribute that to hard resets

legendary
Activity: 2450
Merit: 1002
TXSteve, so the way these things work is every loop of the monitoring script it scans how much current is going through the DCDC's and if under 5 amps is going through it basically incriments the value /var/run/dieXX
once /var/run/dieXX reaches over the threshold variable it takes action to reset the die. So...basically if the ASICs dont have work due to a shitty pool configuration like when renting... that /var/run/dieXX will get incrimented, if the pool is shitty enough and it increments over threshold ... no matter what... one of the reset actions will be taken.
Im not exactly sure whats a good way to recode this so it can differentiate between possible pool comm issues vs an actual die needing reset.
I could impliment maybe 2 thresholds, one being a lower one for the pool issues then another maybe being double of the first one?Huh and if thats reached then it performs a hard reset.
I dont know, what do you think?
Or anyone else care to chime in?
Im just kinda doing a ton of trial and error here LOL!


How this all was originally(the way KNC set it up) was on threshold and if that threshold was hit it just performed a soft reset for an eternity.



In the short term, I was thinking, pool issues aside. Now since we have this other stupid mode of failure where the waas command succeeds but it really doesnt bring the die back to life .... I could do another loop which would attempt soft resets up to say 5x sleep after each attempt, run the dcdc function to see if current is over 5A and each incriment its not, of course increase /var/run/dieXX ... then inside that same loop if /var/run/dieXX goes over the threhold then start a hard reset. I think that would solve the issue I ran into just today =)

how about this? An "auto on/auto off" button & a "manual reset" button. On problematic rigs we turn auto reset off, & use manual reset as needed. It still beats physically powering down the rig, and restarting it ...just a thought

I only have one die with these flaky soft resets, so it may not be a huge overall problem

Trying to keep this as automated as possible and least user invasive as possible =), its hard for me to code custom stuff for webgui since I dont have much experience w/ all the crazy shit they have going on involving it. LOL
sr. member
Activity: 342
Merit: 250
TXSteve, so the way these things work is every loop of the monitoring script it scans how much current is going through the DCDC's and if under 5 amps is going through it basically incriments the value /var/run/dieXX
once /var/run/dieXX reaches over the threshold variable it takes action to reset the die. So...basically if the ASICs dont have work due to a shitty pool configuration like when renting... that /var/run/dieXX will get incrimented, if the pool is shitty enough and it increments over threshold ... no matter what... one of the reset actions will be taken.
Im not exactly sure whats a good way to recode this so it can differentiate between possible pool comm issues vs an actual die needing reset.
I could impliment maybe 2 thresholds, one being a lower one for the pool issues then another maybe being double of the first one?Huh and if thats reached then it performs a hard reset.
I dont know, what do you think?
Or anyone else care to chime in?
Im just kinda doing a ton of trial and error here LOL!


How this all was originally(the way KNC set it up) was on threshold and if that threshold was hit it just performed a soft reset for an eternity.



In the short term, I was thinking, pool issues aside. Now since we have this other stupid mode of failure where the waas command succeeds but it really doesnt bring the die back to life .... I could do another loop which would attempt soft resets up to say 5x sleep after each attempt, run the dcdc function to see if current is over 5A and each incriment its not, of course increase /var/run/dieXX ... then inside that same loop if /var/run/dieXX goes over the threhold then start a hard reset. I think that would solve the issue I ran into just today =)

how about this? An "auto on/auto off" button & a "manual reset" button. On problematic rigs we turn auto reset off, & use manual reset as needed. It still beats physically powering down the rig, and restarting it ...just a thought

I only have one die with these flaky soft resets, so it may not be a huge overall problem
legendary
Activity: 2450
Merit: 1002
TXSteve, so the way these things work is every loop of the monitoring script it scans how much current is going through the DCDC's and if under 5 amps is going through it basically incriments the value /var/run/dieXX
once /var/run/dieXX reaches over the threshold variable it takes action to reset the die. So...basically if the ASICs dont have work due to a shitty pool configuration like when renting... that /var/run/dieXX will get incrimented, if the pool is shitty enough and it increments over threshold ... no matter what... one of the reset actions will be taken.
Im not exactly sure whats a good way to recode this so it can differentiate between possible pool comm issues vs an actual die needing reset.
I could impliment maybe 2 thresholds, one being a lower one for the pool issues then another maybe being double of the first one?Huh and if thats reached then it performs a hard reset.
I dont know, what do you think?
Or anyone else care to chime in?
Im just kinda doing a ton of trial and error here LOL!


How this all was originally(the way KNC set it up) was on threshold and if that threshold was hit it just performed a soft reset for an eternity.



In the short term, I was thinking, pool issues aside. Now since we have this other stupid mode of failure where the waas command succeeds but it really doesnt bring the die back to life .... I could do another loop which would attempt soft resets up to say 5x sleep after each attempt, run the dcdc function to see if current is over 5A and each incriment its not, of course increase /var/run/dieXX ... then inside that same loop if /var/run/dieXX goes over the threhold then start a hard reset. I think that would solve the issue I ran into just today =)
Pages:
Jump to: