Author

Topic: Avalon ASIC users thread - page 117. (Read 438516 times)

legendary
Activity: 1246
Merit: 1002
July 16, 2013, 09:40:05 AM
Yesterday my avalon started showing a super high number of hw.  Almost 75% of the accepted diff1a shares.
For instance shares 100000 hw 75000. Any ideas on what to check?

I had a maximum temp of 51C last night, and my HW values were almost as high as my accepted.  The past week, it has stayed below 50 during routine operation.

It has been stable around 343-347 frequency.  I had --avalon-auto set to 285-375.  I just changed it to 285-350.

My output is conveniently shown at http://eligius.st/~wizkid057/newstats/userstats.php/18bLcVkviErQi75zB8X39jZXxHNpSZggdC

At the start of 16 Jul I moved the unit from a warm upstairs room where the fans fairly consistently ran at 3800 to the basement where they have run closer to 2200.  The dip is from the time it took to move the machine.  The stability and speed both seemed to improve, but during the night the basement started warming up from it's initial 69°F.

I also put a filter in front of the fans, and the fan speed seems to be a little unstable now.  Without the filter, their speed stays pretty constant.
sr. member
Activity: 342
Merit: 250
July 16, 2013, 08:37:23 AM
Yesterday my avalon started showing a super high number of hw.  Almost 75% of the accepted diff1a shares.
For instance shares 100000 hw 75000. Any ideas on what to check?
legendary
Activity: 1610
Merit: 1000
July 16, 2013, 03:54:08 AM
today I have experienced a "bigger problem" for the second time since I have it

the machine was unresponsive on web interface, so I thought it has crashed completely and would need a physical restart, which is very bad for me

however, I tried to connect to SSH and somehow it worked, but extremely slowly, took my a couple of minutes but finally I was able to type reboot command and reboot the machine, which it did and everything is OK again

so, if someone has this totally unresponsive machine, try SSH and be very very patient, it may come up

funny thing is that it did not stop hashing until I reboot it - i.e. even in this strange unresponsive state, it was doing its job, just web interface could not be reached and SSH extremely hardly

since the operation with the whole machine was impossibly slow I could not analyze any logs before reboot, so I have no idea what happened
Out of mem - free?
WiFi disabled yes/no
dmesg?
just a few basic commands will tell you what went wrong. Make sure you post the output next time Wink
sr. member
Activity: 277
Merit: 254
July 16, 2013, 02:23:41 AM
today I have experienced a "bigger problem" for the second time since I have it

the machine was unresponsive on web interface, so I thought it has crashed completely and would need a physical restart, which is very bad for me

however, I tried to connect to SSH and somehow it worked, but extremely slowly, took my a couple of minutes but finally I was able to type reboot command and reboot the machine, which it did and everything is OK again

so, if someone has this totally unresponsive machine, try SSH and be very very patient, it may come up

funny thing is that it did not stop hashing until I reboot it - i.e. even in this strange unresponsive state, it was doing its job, just web interface could not be reached and SSH extremely hardly

since the operation with the whole machine was impossibly slow I could not analyze any logs before reboot, so I have no idea what happened
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
July 16, 2013, 12:06:08 AM

I get a REALLY large number of invalid shares with my Avalon.

I'm not sure whats going on. It's not consistent.

One day I'll be mining and I'll get 99+% valid shares and the next day or two it will only be 88% valid shares.

I've tried mining on Ozcoin and 50btc and it seems to be bad on both.

I don't think it is my network because my jalapenos only get <1% invalid shares.

I'm using the dynamically adjusting frequency and it usually settle at about 352 MHz or so and 82 GH/s on average. Of course, according to the pool, I'm only getting 70 GH/s or so due to all the invalid shares.

I have an AC feeding directly into the intake on the Avalon and no temperature gets above 48 or 49 or so (very rarely 50).

What could be wrong?
This is normal on stratum. Every time the block changes, you get invalids. Some days burn more blocks than others.

It seems like I have way more than other Avalon users, at least from what I've seen on Ozcoin. Most of them have only a couple percent invalid shares at most, while I will have 12 or 13% invalid shares. Right now I'm standing at about 94% efficiency on average, that's with 400,000 invalid shares in the less than a week I've been mining there.


That is definitely far more than you should be getting. There is a good chance you're submitting heaps of duplicates which may also be a different form of instability that the auto mode can't check for. Try setting a lower maximum speed if you're using auto mode because clearly you're not doing 82GH of useful work.
hero member
Activity: 546
Merit: 500
July 15, 2013, 05:33:55 PM

I get a REALLY large number of invalid shares with my Avalon.

I'm not sure whats going on. It's not consistent.

One day I'll be mining and I'll get 99+% valid shares and the next day or two it will only be 88% valid shares.

I've tried mining on Ozcoin and 50btc and it seems to be bad on both.

I don't think it is my network because my jalapenos only get <1% invalid shares.

I'm using the dynamically adjusting frequency and it usually settle at about 352 MHz or so and 82 GH/s on average. Of course, according to the pool, I'm only getting 70 GH/s or so due to all the invalid shares.

I have an AC feeding directly into the intake on the Avalon and no temperature gets above 48 or 49 or so (very rarely 50).

What could be wrong?

This is normal on stratum. Every time the block changes, you get invalids. Some days burn more blocks than others.

It seems like I have way more than other Avalon users, at least from what I've seen on Ozcoin. Most of them have only a couple percent invalid shares at most, while I will have 12 or 13% invalid shares. Right now I'm standing at about 94% efficiency on average, that's with 400,000 invalid shares in the less than a week I've been mining there.

hero member
Activity: 658
Merit: 500
July 15, 2013, 05:11:32 PM

I get a REALLY large number of invalid shares with my Avalon.

I'm not sure whats going on. It's not consistent.

One day I'll be mining and I'll get 99+% valid shares and the next day or two it will only be 88% valid shares.

I've tried mining on Ozcoin and 50btc and it seems to be bad on both.

I don't think it is my network because my jalapenos only get <1% invalid shares.

I'm using the dynamically adjusting frequency and it usually settle at about 352 MHz or so and 82 GH/s on average. Of course, according to the pool, I'm only getting 70 GH/s or so due to all the invalid shares.

I have an AC feeding directly into the intake on the Avalon and no temperature gets above 48 or 49 or so (very rarely 50).

What could be wrong?

This is normal on stratum. Every time the block changes, you get invalids. Some days burn more blocks than others.
hero member
Activity: 546
Merit: 500
July 15, 2013, 05:06:55 PM

I get a REALLY large number of invalid shares with my Avalon.

I'm not sure whats going on. It's not consistent.

One day I'll be mining and I'll get 99+% valid shares and the next day or two it will only be 88% valid shares.

I've tried mining on Ozcoin and 50btc and it seems to be bad on both.

I don't think it is my network because my jalapenos only get <1% invalid shares.

I'm using the dynamically adjusting frequency and it usually settle at about 352 MHz or so and 82 GH/s on average. Of course, according to the pool, I'm only getting 70 GH/s or so due to all the invalid shares.

I have an AC feeding directly into the intake on the Avalon and no temperature gets above 48 or 49 or so (very rarely 50).

What could be wrong?
legendary
Activity: 966
Merit: 1000
July 15, 2013, 03:53:24 PM
It's not the pool connection. Something happens and all work ends up "discarded" until cgminer is restarted. What would cause that?
It's got nowhere to go. It prepares work and then no device is working to take it so it just discards it.

That's why I have a backup pool and use failover mode.  I even have eloipool running locally, set as the secondary backup, so even in the internet fails completely, it can still mine against that (until it becomes out-of-sync if the internet failure lasts long enough).


I can watch MHS5s slowly drop to 0, and eventually cgminer-monitor will restart cgminer. Why is cgminer stopping hashing?
Common reasons:
Wifi kernel problem
Overdoing the overclocking
Pool failure and the cgminer-monitor watchdog is trigger happy and kills cgminer when all it's doing is waiting for a pool to come back online.
FPGA failure in the avalon.

If the fans stop running entirely, does that indicate FPGA failure?

My Batch 1 unit does this sometimes.  I now have a script that queries the API, and if it has a zero hashrate for too long, it will call the connected Web Power Switch to cycle the power on the unit, which always brings it back up.  (I do have to have it leave the power off for a full 30 seconds.)
legendary
Activity: 1764
Merit: 1002
July 15, 2013, 02:34:17 PM
hmm, so not even deleting WWAN solved error -71 for me
despite a possible correlation between temperature and occurrence of these errors, it happened now while being on 44 C, so for sure it occurs here even when "cold"

but should I worry if that happens like 10-15x per week? can it cause any real damage in the long term?

Check to see if the f1 fuse is still onboard.

It is one of the last batch #2 models, so I hope these older problems are not present. I just hope and do not check since I have it in server housing without an easy physical access to the machine, so I would do such check only if there is a serious danger if this is not done. I hope there is no such danger.

you never know.  the avalon that has given me the most trouble was the last one received about a week and a half ago.

the deletion of the WWAN has so far worked to eliminate the -71 and i haven't seen one for over 3d now. Cheesy
sr. member
Activity: 277
Merit: 254
July 15, 2013, 02:02:31 PM
hmm, so not even deleting WWAN solved error -71 for me
despite a possible correlation between temperature and occurrence of these errors, it happened now while being on 44 C, so for sure it occurs here even when "cold"

but should I worry if that happens like 10-15x per week? can it cause any real damage in the long term?

Check to see if the f1 fuse is still onboard.

It is one of the last batch #2 models, so I hope these older problems are not present. I just hope and do not check since I have it in server housing without an easy physical access to the machine, so I would do such check only if there is a serious danger if this is not done. I hope there is no such danger.
legendary
Activity: 1064
Merit: 1000
July 15, 2013, 11:50:16 AM
Con, what is the appropriate NMW/MWC ratio?
No such thing as "appropriate" as it's just another hardware error. If your hashrate is okay, it's okay. You can read a few pages ago about how I chose the "appropriate" hardware error target rate of <2%.

why does ASIC count only read 10?

Thats how many asic chip there are per the 24 miner count .. =)

what do the other 14 represent?

Each module has 8 boards connected to a backpanel so in total you have 24 miners with 10 asic chips each one

Easy visualization: https://bitcointalksearch.org/topic/avalon-clone-assembly-service-europe-210186
legendary
Activity: 1764
Merit: 1002
July 15, 2013, 11:36:09 AM
Con, what is the appropriate NMW/MWC ratio?
No such thing as "appropriate" as it's just another hardware error. If your hashrate is okay, it's okay. You can read a few pages ago about how I chose the "appropriate" hardware error target rate of <2%.

what will make a Stat1 revert back to Stat0?   a hard reboot?
legendary
Activity: 1764
Merit: 1002
July 15, 2013, 11:35:00 AM
Con, what is the appropriate NMW/MWC ratio?
No such thing as "appropriate" as it's just another hardware error. If your hashrate is okay, it's okay. You can read a few pages ago about how I chose the "appropriate" hardware error target rate of <2%.

why does ASIC count only read 10?

Thats how many asic chip there are per the 24 miner count .. =)

what do the other 14 represent?
legendary
Activity: 2450
Merit: 1002
July 15, 2013, 10:52:42 AM
Con, what is the appropriate NMW/MWC ratio?
No such thing as "appropriate" as it's just another hardware error. If your hashrate is okay, it's okay. You can read a few pages ago about how I chose the "appropriate" hardware error target rate of <2%.

why does ASIC count only read 10?

Thats how many asic chip there are per the 24 miner count .. =)
legendary
Activity: 1764
Merit: 1002
July 15, 2013, 09:40:05 AM
Con, what is the appropriate NMW/MWC ratio?
No such thing as "appropriate" as it's just another hardware error. If your hashrate is okay, it's okay. You can read a few pages ago about how I chose the "appropriate" hardware error target rate of <2%.

why does ASIC count only read 10?
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
July 15, 2013, 08:15:01 AM
Con, what is the appropriate NMW/MWC ratio?
No such thing as "appropriate" as it's just another hardware error. If your hashrate is okay, it's okay. You can read a few pages ago about how I chose the "appropriate" hardware error target rate of <2%.
legendary
Activity: 1764
Merit: 1002
July 15, 2013, 05:24:18 AM
what's the difference btwn different "[Stats0,1]"?
You found it then. At some stage the avalon disconnects due to some USB issue and cgminer automatically hotplugs it. Since it's newly hotplugged it comes up as a new device (ava1 instead of ava0).

Con, what is the appropriate NMW/MWC ratio?
newbie
Activity: 55
Merit: 0
July 15, 2013, 05:13:51 AM
what's the difference btwn different "[Stats0,1]"?
You found it then. At some stage the avalon disconnects due to some USB issue and cgminer automatically hotplugs it. Since it's newly hotplugged it comes up as a new device (ava1 instead of ava0).
Most 3 module systems running with --avalon-auto mode run perfectly stable.

I have one 4 module test system and that runs into this problem every few hours, the longest I've seen up is 18 hours.

Symptoms are: no mining activity on the pool, the TP-LINK web page displays value 0 for all measurements like temp and fan RPM, the only counter increasing (rapidly) is HW errors. The yellow LED on the back is off so it thinks it mining and it still consumes power (and produces heat) like it's running at 300 MHs

By the way, the 2 module system I have now also ran at a low speed when I set the config to miner count: 16, I had to set it to 24 (even though there are only 16 miner PCBs).

The stats look kind of funny:

Code:
[STATS0] =>
(
   [STATS] => 0
   [ID] => AVA0
   [Elapsed] => 227763
   [Calls] => 0
   [Wait] => 0.000000
   [Max] => 0.000000
   [Min] => 99999999.000000
   [baud] => 115200
   [miner_count] => 24
   [asic_count] => 10
   [timeout] => 37
   [frequency] => 340
   [fan1] => 2400
   [fan2] => 3960
   [fan3] => 3840
   [temp1] => 27
   [temp2] => 41
   [temp3] => 46
   [temp_max] => 47
   [no_matching_work] => 9471
   [match_work_count1] => 181353
   [match_work_count2] => 181108
   [match_work_count3] => 181894
   [match_work_count4] => 181359
   [match_work_count5] => 181096
   [match_work_count6] => 181561
   [match_work_count7] => 181514
   [match_work_count8] => 180062
   [match_work_count9] => 181354
   [match_work_count10] => 181132
   [match_work_count11] => 181108
   [match_work_count12] => 180555
   [match_work_count13] => 181171
   [match_work_count14] => 181569
   [match_work_count15] => 181338
   [match_work_count16] => 181537
   [match_work_count17] => 182
   [match_work_count18] => 158
   [match_work_count19] => 85
   [match_work_count20] => 114
   [match_work_count21] => 57
   [match_work_count22] => 156
   [match_work_count23] => 130
   [match_work_count24] => 126
   [USB Pipe] => 0
   [USB Delay] => r0 0.000000 w0 0.000000
)

There is fantom work being done on non-existing mining PCBs? ;-)
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
July 15, 2013, 01:31:40 AM
what's the difference btwn different "[Stats0,1]"?
You found it then. At some stage the avalon disconnects due to some USB issue and cgminer automatically hotplugs it. Since it's newly hotplugged it comes up as a new device (ava1 instead of ava0).
Jump to: