Pages:
Author

Topic: Random lockups/freezes on mining rig. 0.1 BTC bounty for a fix (Read 3243 times)

full member
Activity: 213
Merit: 100
Dumb as this may sound.... put a powered USB hub between the main board and the WIFI dongles.  Make sure it can actually draw 500mA per USB.  I've been looking in to making a ' solo wifi miner ' board and keep finding the cheap WIFI dongles tend to actually use more power than what they say.  If you have mounting brackets for  the ' front panel USB ' connectors you could try using them instead with one dongle per USB channel.  I had a similar lockup due to power when I started with Erupters and had the power bar powering the adapter turn off ( PC to the wall ).  Some times the working solution is the connections.  Then again, I've also had the power adapters for my Radion 4850HD melt one of the 12V lines and look ok in spite of the lack of connection.
sr. member
Activity: 420
Merit: 250
200 watts for baseline system parts? lol

My celeron G540, msi z77a-g45, 4gb ram, hard drive, and usb keyboard + mouse uses about 25 watts at the wall. No mining machine will use 200 watts at idle baseline with no GPU, unless you have like 20 fans, 8 core cpu, 32gb ram, and 8 hard drives, and even then, thats maybe 100-150 watts.
I actually don't contest his 200w figure for baseline as safety precaution. One may be interested in CPU coin mining too, or make genuine mistakes like not undervolting cards

Quote from: Slander
I have been down this road. Read carefully and do NOT discount my input.
Sorry for being unclear. I do not discount anyone's input and will test ALL suggestions as far as I'm able to in this thread, in a week or so. However I'm counter-arguing while I'm able to post.

Merry Christmas everyone, and thanks.

My Baseline at the wall is around 130 watts
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
Thanks for everyone's help. Nothing effectively solved the problem.

At least, both the Asrock MB and the 7870 videocard are faulty. The store is sending the MB to RMA (locks up running prime95 with my and their Phenom CPU) and I'll get now an ASUS instead. Sending the 7870 card to RMA is a pain, though. I _may_ be able to stabilize it at higher voltage and use it for gaming or sell it at discount.
sr. member
Activity: 252
Merit: 250
Sentinel
Hm, assuming we're talking brand new installed Win7...
Beware that on default settings after installation, Win7 schedules energy saving/hibernation after 30 Minutes of no user activity. That needs to be manually disabled.

Also, only use WHQL/Release Drivers for your cards, AMD's Beta Drivers often focus only on specific game improvements, while sometimes having less than stellar stability for other applications.

Other than that, I've seen cases where plugging the System simply into another (not co-located) Power Plug, ideally running off another fuse in your house electrics, actually helped.
While other, typical devices may work without issues, such a System with relatively high PSU load and requirement for binary accuracy may not take even smaller power surges (which are normal) without effect.
PSUs vary in quality concerning surge tolerance and the time their internal capacitors can stabilize surging power input.

I remember running alot of rigs (long before bitcoin era) off several Multi-Plugs connected to different power plugs w/ extension cords - that I thought was a good load distribution.
One day however I discovered that alot of these plugs I used in different rooms were actually fed by one power line going to a single 16A fuse in my house, which simply was under very high load @ full operations.
Powering up the rigs too fast in a row or throwing the switch on one of the Multi-Plugs to ON with PSU power switches all in ON often caused a pretty good initial peak load with expected results Wink
Powering some on or off within that setup also caused other systems to lock up sometimes.

Mystery was only solved when that single Fuse kicked in and I realized above. After changing the electrical distribution by feeding half the computer fleet through their own line with own fuse, these problems vanished (after months of searching for the otherwise inexplainable cause).

It's also possible in theory that other high-level comsumption devices are kicking in/out and cause a short general power surge in your house (enough to freak out a loaded computer PSU), if the electrical wiring setup of it isn't well done (large refrigerators or electrical stoves are known to do that, 2000W+ blow-dryers or heaters could also easily do it depending on which electrical wires they operate at; if they share the same line/fuse that your rig is running on, that's a good possible reason for issues).
If you still have classic light bulbs in your house, seeing them change intensity periodically or at random is often a good indicator of such issues.
legendary
Activity: 2688
Merit: 1468
What pool do you mine on? If your pool switches coins, that easily locks up cards. The fix? Install CGWatcher and have it restart mining when cards go sick or cgminer freezes etc.

If you mine on a multicoin pool try mining only litecoin (for example) for 24 hours and see if you get a crash....if not then just use the fix I listed above.

here is my btc address if my solution works for you Smiley

1KnyGG1sySxmCGAD2AAukCWPT8T1o22rhK
Thanks for the troubleshooting tip, however that is not a solution because this is not a videocard lock, it's at CPU or chipset level. I've ran with and without CGWatcher. It also happened when a Nvidia 6200TC was providing video or just with teamviewer virtual display driver. The keyboard leds don't toggle and the wireless connections fall from my router

wipeout,

are you using bfgminer/cgminer ?  If so, disable ADL functionality.  Best is to recompile without ADL support and set your clocks, fans with MSI AB at the machine and use Remote Connection to start cgminer.
If you start cgminer from RDC, ADL will not be accessible so cgminer won't be able to load/use adl libraries.

You are crashing AMD drivers (freezes), most likely because of bfg/cgminer ADL code.

Or try reaper miner.  

legendary
Activity: 1260
Merit: 1000
World Class Cryptonaire
What pool do you mine on? If your pool switches coins, that easily locks up cards. The fix? Install CGWatcher and have it restart mining when cards go sick or cgminer freezes etc.

If you mine on a multicoin pool try mining only litecoin (for example) for 24 hours and see if you get a crash....if not then just use the fix I listed above.

here is my btc address if my solution works for you Smiley

1KnyGG1sySxmCGAD2AAukCWPT8T1o22rhK
Thanks for the troubleshooting tip, however that is not a solution because this is not a videocard lock, it's at CPU or chipset level. I've ran with and without CGWatcher. It also happened when a Nvidia 6200TC was providing video or just with teamviewer virtual display driver. The keyboard leds don't toggle and the wireless connections fall from my router

Yep I had the same problem....feel free to do all the other trouble shooting you wish, but in the end give my suggestion a try with mining a single coin for 24 hours and see if there are any crashes. Good luck Smiley
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
What pool do you mine on? If your pool switches coins, that easily locks up cards. The fix? Install CGWatcher and have it restart mining when cards go sick or cgminer freezes etc.

If you mine on a multicoin pool try mining only litecoin (for example) for 24 hours and see if you get a crash....if not then just use the fix I listed above.

here is my btc address if my solution works for you Smiley

1KnyGG1sySxmCGAD2AAukCWPT8T1o22rhK
Thanks for the troubleshooting tip, however that is not a solution because this is not a videocard lock, it's at CPU or chipset level. I've ran with and without CGWatcher. It also happened when a Nvidia 6200TC was providing video or just with teamviewer virtual display driver. The keyboard leds don't toggle and the wireless connections fall from my router
legendary
Activity: 1260
Merit: 1000
World Class Cryptonaire
What pool do you mine on? If your pool switches coins, that easily locks up cards. The fix? Install CGWatcher and have it restart mining when cards go sick or cgminer freezes etc.

If you mine on a multicoin pool try mining only litecoin (for example) for 24 hours and see if you get a crash....if not then just use the fix I listed above.

here is my btc address if my solution works for you Smiley

1KnyGG1sySxmCGAD2AAukCWPT8T1o22rhK
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
Part of the hardware was returned to the store (Asrock 970, CPU FX-6100 and RAM). In a week they'll report.

I've read all suggestions and I'm in the stage of trying this again. The 2nd base system is now solid (knock on wood) with a Sempron 190 and a single 7970 @ 1.06V. I will add another card tomorrow, either the 6950 or the 7870, dunno yet.

That said, I suspect that it's a combination of 2 or more problems (including what some mentioned): the PSU is not sufficient for these 3 cards + the FX-6100, the FX-6100 has problems with the chipset, the cards don't play well together, and the 7870 has crappy voltage regulators (I can't undervolt it so much or even worse, stresses the PSU)
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
Bump!

I'm now using a Sempron 190 dual-core instead of the FX-6100. We shall see in a few days but I suspect we have a winner  Smiley
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
200 watts for baseline system parts? lol

My celeron G540, msi z77a-g45, 4gb ram, hard drive, and usb keyboard + mouse uses about 25 watts at the wall. No mining machine will use 200 watts at idle baseline with no GPU, unless you have like 20 fans, 8 core cpu, 32gb ram, and 8 hard drives, and even then, thats maybe 100-150 watts.
I actually don't contest his 200w figure for baseline as safety precaution. One may be interested in CPU coin mining too, or make genuine mistakes like not undervolting cards

Quote from: Slander
I have been down this road. Read carefully and do NOT discount my input.
Sorry for being unclear. I do not discount anyone's input and will test ALL suggestions as far as I'm able to in this thread, in a week or so. However I'm counter-arguing while I'm able to post.

Merry Christmas everyone, and thanks.
full member
Activity: 171
Merit: 100
I have been down this road. Read carefully and do NOT discount my input.

Does your rig run ok with ONE card installed?  I bet it does (try it)

I have found that different cards do not play well together. I can get 7870 and 7970 to work just fine, throw another different card in the mix and the whole thing craps out.

IF you cannot get ONE card to work then you have a serious fault somewhere else. Could be many places, ram, mobo, hd, etc. Might be as simple as a faulty cable, but which one?

Suggest you get ONE card working and mining, pick one, and get it running for 24 hours. This way you will know your rig is sound at least.
legendary
Activity: 1344
Merit: 1004
200 watts for baseline system parts? lol

My celeron G540, msi z77a-g45, 4gb ram, hard drive, and usb keyboard + mouse uses about 25 watts at the wall. No mining machine will use 200 watts at idle baseline with no GPU, unless you have like 20 fans, 8 core cpu, 32gb ram, and 8 hard drives, and even then, thats maybe 100-150 watts.
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
Gigabyte Mobo's tend to have issues with RAM. Before you do anything extreme. Update the Bios to the latest version and make sure the RAM is stable at factory specs. If it is, then you can start to overclock them. It could also be a faulty stick or socket, just to name a few possible issues.
The bios is updated to the latest version and so is in the Asrock motherboard (there I also used experimental bios and downgrades) and except for the Gpus (for obvious hash and efficiency reasons) I don't want to use out-of-spec settings. That's precisely the problem: it doesn't work at stock. Then it doesn't work underclocked, with more voltage and with both.  On the CPU, HT, NB, PCi-e PLL and whatever more that I can't remember. It doesn't work with ram underclocked, overvolted and with 11-11-11-36 timings, either the Gskill ripjaws dual channel or the Kingston Value ram single. Whatever combination of settings at the bios that I had peace of mind to try, "Auto" or manual tweak does not solve the problem. Any combination of parts (and I have redundant everything, except the CPU as one can see in the OP), does not solve the lockup.

At this stage, I'm not even sure if some settings make the rig hold on longer than others...

I have however 2 hints of instability: the gpu-z readings on the 7870 show spikes of temperature on the vrm's. Being at 37ºC and then spike to 74ºC. That's twice and can be reading error. Also there is an "Aux" reading on the Asrock mb, measured by speedfan. There is no such issue on the Gigabyte but on the Asrock it was 125ºC !
donator
Activity: 1218
Merit: 1015
I can't get around random lockups and freezes.
.
.
- I ended up modding the BIOS of the 3 videocards, however the rig locks up even if the GPU cores are down at 500 Mhz.

from this i understand its not random freezes but happen only when videocards start working
meaning bad oc or not enough power

remove all cards and test them one by one,
if you can get max hash rate from them separately meaning they don't have enough power

or if you have another power supply try connecting 6950+7870 to other power supply (at least 600w) and 7970+system to your 850w

7970~350w
6950~300w
7870~250w
system~200w
=
1100w
This post makes me think a bit more, however the 6950 was mining for 2 months 24/7 on an Intel / Asus desktop machine and I did test each card separately. Both the Gigabyte and Asrock boards lockup with that (or any other )card mining. I also said in the OP:
Quote
... I used one GPU at a time. I switched slots.

Regarding power, I tested the power usage of each card at the "wall" with a power meter. I don't quite remember the values, but the 7870 at 900Mhz undervolted to 1v, uses a bit above 100w, far far from the 250w you mention. I remember however the power draw measured by Gpu-Z from the VRMs onboard and it pulled ~5A at 12V, while the 7970 undervolted to 1.03V pulled ~12.5A. That would mean 60w on the 7870 and 150w for the 7970.
I'm not completely sure, but I think the 20(+4) pins on MoBo operate on a different voltage, with the PCI-e slot providing 75-125W to cards (don't quite remember). Using just the 12V line wouldn't be accurate (I think...!). 850W seems really, really low. I have rigs with just 2 270s and minimal hardware pulling right around 500W at the wall.
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
I can't get around random lockups and freezes.
.
.
- I ended up modding the BIOS of the 3 videocards, however the rig locks up even if the GPU cores are down at 500 Mhz.

from this i understand its not random freezes but happen only when videocards start working
meaning bad oc or not enough power

remove all cards and test them one by one,
if you can get max hash rate from them separately meaning they don't have enough power

or if you have another power supply try connecting 6950+7870 to other power supply (at least 600w) and 7970+system to your 850w

7970~350w
6950~300w
7870~250w
system~200w
=
1100w
This post makes me think a bit more, however the 6950 was mining for 2 months 24/7 on an Intel / Asus desktop machine and I did test each card separately. Both the Gigabyte and Asrock boards lockup with that (or any other )card mining. I also said in the OP:
Quote
... I used one GPU at a time. I switched slots.

Regarding power, I tested the power usage of each card at the "wall" with a power meter. I don't quite remember the values, but the 7870 at 900Mhz undervolted to 1v, uses a bit above 100w, far far from the 250w you mention. I remember however the power draw measured by Gpu-Z from the VRMs onboard and it pulled ~5A at 12V, while the 7970 undervolted to 1.03V pulled ~12.5A. That would mean 60w on the 7870 and 150w for the 7970.
sr. member
Activity: 349
Merit: 250
“Blockchain Just Entered The Real World”
Gigabyte Mobo's tend to have issues with RAM. Before you do anything extreme. Update the Bios to the latest version and make sure the RAM is stable at factory specs. If it is, then you can start to overclock them. It could also be a faulty stick or socket, just to name a few possible issues.
sr. member
Activity: 252
Merit: 250
I can't get around random lockups and freezes.
.
.
- I ended up modding the BIOS of the 3 videocards, however the rig locks up even if the GPU cores are down at 500 Mhz.

from this i understand its not random freezes but happen only when videocards start working
meaning bad oc or not enough power

remove all cards and test them one by one,
if you can get max hash rate from them separately meaning they don't have enough power

or if you have another power supply try connecting 6950+7870 to other power supply (at least 600w) and 7970+system to your 850w

7970~350w
6950~300w
7870~250w
system~200w
=
1100w
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
disable catalyst control center and install MSI afterburner instead.  CCC causes many headaches and has been known to cause the type of problems you are talking about.


If that works 13E5sY63tZutcgPtZbPdppZoTfgt2N2YnT
I've tested CCC, MSI afterburner, Sapphire Trixx, Atitray and none of such software installed. Thanks.
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
Actually, I have had situations where a usb device can indeed lock up the whole machine, on both linux and windows...  I'd be leaning towards that.  If you have a gaming wireless adapter (i.e. receive wireless and convert to wired) or a wireless repeater with wired ports on it, you might try one of those to rule out the USB adapters.
I see. That's news for me. I will try without using USB, with an ide/sata HD and the ethernet port.
Pages:
Jump to: