Author

Topic: Random lockups/freezes on mining rig. 0.1 BTC bounty for a fix (Read 3288 times)

full member
Activity: 213
Merit: 100
Dumb as this may sound.... put a powered USB hub between the main board and the WIFI dongles.  Make sure it can actually draw 500mA per USB.  I've been looking in to making a ' solo wifi miner ' board and keep finding the cheap WIFI dongles tend to actually use more power than what they say.  If you have mounting brackets for  the ' front panel USB ' connectors you could try using them instead with one dongle per USB channel.  I had a similar lockup due to power when I started with Erupters and had the power bar powering the adapter turn off ( PC to the wall ).  Some times the working solution is the connections.  Then again, I've also had the power adapters for my Radion 4850HD melt one of the 12V lines and look ok in spite of the lack of connection.
sr. member
Activity: 420
Merit: 250
200 watts for baseline system parts? lol

My celeron G540, msi z77a-g45, 4gb ram, hard drive, and usb keyboard + mouse uses about 25 watts at the wall. No mining machine will use 200 watts at idle baseline with no GPU, unless you have like 20 fans, 8 core cpu, 32gb ram, and 8 hard drives, and even then, thats maybe 100-150 watts.
I actually don't contest his 200w figure for baseline as safety precaution. One may be interested in CPU coin mining too, or make genuine mistakes like not undervolting cards

Quote from: Slander
I have been down this road. Read carefully and do NOT discount my input.
Sorry for being unclear. I do not discount anyone's input and will test ALL suggestions as far as I'm able to in this thread, in a week or so. However I'm counter-arguing while I'm able to post.

Merry Christmas everyone, and thanks.

My Baseline at the wall is around 130 watts
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
Thanks for everyone's help. Nothing effectively solved the problem.

At least, both the Asrock MB and the 7870 videocard are faulty. The store is sending the MB to RMA (locks up running prime95 with my and their Phenom CPU) and I'll get now an ASUS instead. Sending the 7870 card to RMA is a pain, though. I _may_ be able to stabilize it at higher voltage and use it for gaming or sell it at discount.
sr. member
Activity: 252
Merit: 250
Sentinel
Hm, assuming we're talking brand new installed Win7...
Beware that on default settings after installation, Win7 schedules energy saving/hibernation after 30 Minutes of no user activity. That needs to be manually disabled.

Also, only use WHQL/Release Drivers for your cards, AMD's Beta Drivers often focus only on specific game improvements, while sometimes having less than stellar stability for other applications.

Other than that, I've seen cases where plugging the System simply into another (not co-located) Power Plug, ideally running off another fuse in your house electrics, actually helped.
While other, typical devices may work without issues, such a System with relatively high PSU load and requirement for binary accuracy may not take even smaller power surges (which are normal) without effect.
PSUs vary in quality concerning surge tolerance and the time their internal capacitors can stabilize surging power input.

I remember running alot of rigs (long before bitcoin era) off several Multi-Plugs connected to different power plugs w/ extension cords - that I thought was a good load distribution.
One day however I discovered that alot of these plugs I used in different rooms were actually fed by one power line going to a single 16A fuse in my house, which simply was under very high load @ full operations.
Powering up the rigs too fast in a row or throwing the switch on one of the Multi-Plugs to ON with PSU power switches all in ON often caused a pretty good initial peak load with expected results Wink
Powering some on or off within that setup also caused other systems to lock up sometimes.

Mystery was only solved when that single Fuse kicked in and I realized above. After changing the electrical distribution by feeding half the computer fleet through their own line with own fuse, these problems vanished (after months of searching for the otherwise inexplainable cause).

It's also possible in theory that other high-level comsumption devices are kicking in/out and cause a short general power surge in your house (enough to freak out a loaded computer PSU), if the electrical wiring setup of it isn't well done (large refrigerators or electrical stoves are known to do that, 2000W+ blow-dryers or heaters could also easily do it depending on which electrical wires they operate at; if they share the same line/fuse that your rig is running on, that's a good possible reason for issues).
If you still have classic light bulbs in your house, seeing them change intensity periodically or at random is often a good indicator of such issues.
legendary
Activity: 2702
Merit: 1468
What pool do you mine on? If your pool switches coins, that easily locks up cards. The fix? Install CGWatcher and have it restart mining when cards go sick or cgminer freezes etc.

If you mine on a multicoin pool try mining only litecoin (for example) for 24 hours and see if you get a crash....if not then just use the fix I listed above.

here is my btc address if my solution works for you Smiley

1KnyGG1sySxmCGAD2AAukCWPT8T1o22rhK
Thanks for the troubleshooting tip, however that is not a solution because this is not a videocard lock, it's at CPU or chipset level. I've ran with and without CGWatcher. It also happened when a Nvidia 6200TC was providing video or just with teamviewer virtual display driver. The keyboard leds don't toggle and the wireless connections fall from my router

wipeout,

are you using bfgminer/cgminer ?  If so, disable ADL functionality.  Best is to recompile without ADL support and set your clocks, fans with MSI AB at the machine and use Remote Connection to start cgminer.
If you start cgminer from RDC, ADL will not be accessible so cgminer won't be able to load/use adl libraries.

You are crashing AMD drivers (freezes), most likely because of bfg/cgminer ADL code.

Or try reaper miner.  

legendary
Activity: 1260
Merit: 1000
World Class Cryptonaire
What pool do you mine on? If your pool switches coins, that easily locks up cards. The fix? Install CGWatcher and have it restart mining when cards go sick or cgminer freezes etc.

If you mine on a multicoin pool try mining only litecoin (for example) for 24 hours and see if you get a crash....if not then just use the fix I listed above.

here is my btc address if my solution works for you Smiley

1KnyGG1sySxmCGAD2AAukCWPT8T1o22rhK
Thanks for the troubleshooting tip, however that is not a solution because this is not a videocard lock, it's at CPU or chipset level. I've ran with and without CGWatcher. It also happened when a Nvidia 6200TC was providing video or just with teamviewer virtual display driver. The keyboard leds don't toggle and the wireless connections fall from my router

Yep I had the same problem....feel free to do all the other trouble shooting you wish, but in the end give my suggestion a try with mining a single coin for 24 hours and see if there are any crashes. Good luck Smiley
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
What pool do you mine on? If your pool switches coins, that easily locks up cards. The fix? Install CGWatcher and have it restart mining when cards go sick or cgminer freezes etc.

If you mine on a multicoin pool try mining only litecoin (for example) for 24 hours and see if you get a crash....if not then just use the fix I listed above.

here is my btc address if my solution works for you Smiley

1KnyGG1sySxmCGAD2AAukCWPT8T1o22rhK
Thanks for the troubleshooting tip, however that is not a solution because this is not a videocard lock, it's at CPU or chipset level. I've ran with and without CGWatcher. It also happened when a Nvidia 6200TC was providing video or just with teamviewer virtual display driver. The keyboard leds don't toggle and the wireless connections fall from my router
legendary
Activity: 1260
Merit: 1000
World Class Cryptonaire
What pool do you mine on? If your pool switches coins, that easily locks up cards. The fix? Install CGWatcher and have it restart mining when cards go sick or cgminer freezes etc.

If you mine on a multicoin pool try mining only litecoin (for example) for 24 hours and see if you get a crash....if not then just use the fix I listed above.

here is my btc address if my solution works for you Smiley

1KnyGG1sySxmCGAD2AAukCWPT8T1o22rhK
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
Part of the hardware was returned to the store (Asrock 970, CPU FX-6100 and RAM). In a week they'll report.

I've read all suggestions and I'm in the stage of trying this again. The 2nd base system is now solid (knock on wood) with a Sempron 190 and a single 7970 @ 1.06V. I will add another card tomorrow, either the 6950 or the 7870, dunno yet.

That said, I suspect that it's a combination of 2 or more problems (including what some mentioned): the PSU is not sufficient for these 3 cards + the FX-6100, the FX-6100 has problems with the chipset, the cards don't play well together, and the 7870 has crappy voltage regulators (I can't undervolt it so much or even worse, stresses the PSU)
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
Bump!

I'm now using a Sempron 190 dual-core instead of the FX-6100. We shall see in a few days but I suspect we have a winner  Smiley
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
200 watts for baseline system parts? lol

My celeron G540, msi z77a-g45, 4gb ram, hard drive, and usb keyboard + mouse uses about 25 watts at the wall. No mining machine will use 200 watts at idle baseline with no GPU, unless you have like 20 fans, 8 core cpu, 32gb ram, and 8 hard drives, and even then, thats maybe 100-150 watts.
I actually don't contest his 200w figure for baseline as safety precaution. One may be interested in CPU coin mining too, or make genuine mistakes like not undervolting cards

Quote from: Slander
I have been down this road. Read carefully and do NOT discount my input.
Sorry for being unclear. I do not discount anyone's input and will test ALL suggestions as far as I'm able to in this thread, in a week or so. However I'm counter-arguing while I'm able to post.

Merry Christmas everyone, and thanks.
full member
Activity: 171
Merit: 100
I have been down this road. Read carefully and do NOT discount my input.

Does your rig run ok with ONE card installed?  I bet it does (try it)

I have found that different cards do not play well together. I can get 7870 and 7970 to work just fine, throw another different card in the mix and the whole thing craps out.

IF you cannot get ONE card to work then you have a serious fault somewhere else. Could be many places, ram, mobo, hd, etc. Might be as simple as a faulty cable, but which one?

Suggest you get ONE card working and mining, pick one, and get it running for 24 hours. This way you will know your rig is sound at least.
legendary
Activity: 1344
Merit: 1004
200 watts for baseline system parts? lol

My celeron G540, msi z77a-g45, 4gb ram, hard drive, and usb keyboard + mouse uses about 25 watts at the wall. No mining machine will use 200 watts at idle baseline with no GPU, unless you have like 20 fans, 8 core cpu, 32gb ram, and 8 hard drives, and even then, thats maybe 100-150 watts.
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
Gigabyte Mobo's tend to have issues with RAM. Before you do anything extreme. Update the Bios to the latest version and make sure the RAM is stable at factory specs. If it is, then you can start to overclock them. It could also be a faulty stick or socket, just to name a few possible issues.
The bios is updated to the latest version and so is in the Asrock motherboard (there I also used experimental bios and downgrades) and except for the Gpus (for obvious hash and efficiency reasons) I don't want to use out-of-spec settings. That's precisely the problem: it doesn't work at stock. Then it doesn't work underclocked, with more voltage and with both.  On the CPU, HT, NB, PCi-e PLL and whatever more that I can't remember. It doesn't work with ram underclocked, overvolted and with 11-11-11-36 timings, either the Gskill ripjaws dual channel or the Kingston Value ram single. Whatever combination of settings at the bios that I had peace of mind to try, "Auto" or manual tweak does not solve the problem. Any combination of parts (and I have redundant everything, except the CPU as one can see in the OP), does not solve the lockup.

At this stage, I'm not even sure if some settings make the rig hold on longer than others...

I have however 2 hints of instability: the gpu-z readings on the 7870 show spikes of temperature on the vrm's. Being at 37ºC and then spike to 74ºC. That's twice and can be reading error. Also there is an "Aux" reading on the Asrock mb, measured by speedfan. There is no such issue on the Gigabyte but on the Asrock it was 125ºC !
donator
Activity: 1218
Merit: 1015
I can't get around random lockups and freezes.
.
.
- I ended up modding the BIOS of the 3 videocards, however the rig locks up even if the GPU cores are down at 500 Mhz.

from this i understand its not random freezes but happen only when videocards start working
meaning bad oc or not enough power

remove all cards and test them one by one,
if you can get max hash rate from them separately meaning they don't have enough power

or if you have another power supply try connecting 6950+7870 to other power supply (at least 600w) and 7970+system to your 850w

7970~350w
6950~300w
7870~250w
system~200w
=
1100w
This post makes me think a bit more, however the 6950 was mining for 2 months 24/7 on an Intel / Asus desktop machine and I did test each card separately. Both the Gigabyte and Asrock boards lockup with that (or any other )card mining. I also said in the OP:
Quote
... I used one GPU at a time. I switched slots.

Regarding power, I tested the power usage of each card at the "wall" with a power meter. I don't quite remember the values, but the 7870 at 900Mhz undervolted to 1v, uses a bit above 100w, far far from the 250w you mention. I remember however the power draw measured by Gpu-Z from the VRMs onboard and it pulled ~5A at 12V, while the 7970 undervolted to 1.03V pulled ~12.5A. That would mean 60w on the 7870 and 150w for the 7970.
I'm not completely sure, but I think the 20(+4) pins on MoBo operate on a different voltage, with the PCI-e slot providing 75-125W to cards (don't quite remember). Using just the 12V line wouldn't be accurate (I think...!). 850W seems really, really low. I have rigs with just 2 270s and minimal hardware pulling right around 500W at the wall.
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
I can't get around random lockups and freezes.
.
.
- I ended up modding the BIOS of the 3 videocards, however the rig locks up even if the GPU cores are down at 500 Mhz.

from this i understand its not random freezes but happen only when videocards start working
meaning bad oc or not enough power

remove all cards and test them one by one,
if you can get max hash rate from them separately meaning they don't have enough power

or if you have another power supply try connecting 6950+7870 to other power supply (at least 600w) and 7970+system to your 850w

7970~350w
6950~300w
7870~250w
system~200w
=
1100w
This post makes me think a bit more, however the 6950 was mining for 2 months 24/7 on an Intel / Asus desktop machine and I did test each card separately. Both the Gigabyte and Asrock boards lockup with that (or any other )card mining. I also said in the OP:
Quote
... I used one GPU at a time. I switched slots.

Regarding power, I tested the power usage of each card at the "wall" with a power meter. I don't quite remember the values, but the 7870 at 900Mhz undervolted to 1v, uses a bit above 100w, far far from the 250w you mention. I remember however the power draw measured by Gpu-Z from the VRMs onboard and it pulled ~5A at 12V, while the 7970 undervolted to 1.03V pulled ~12.5A. That would mean 60w on the 7870 and 150w for the 7970.
sr. member
Activity: 349
Merit: 250
“Blockchain Just Entered The Real World”
Gigabyte Mobo's tend to have issues with RAM. Before you do anything extreme. Update the Bios to the latest version and make sure the RAM is stable at factory specs. If it is, then you can start to overclock them. It could also be a faulty stick or socket, just to name a few possible issues.
sr. member
Activity: 252
Merit: 250
I can't get around random lockups and freezes.
.
.
- I ended up modding the BIOS of the 3 videocards, however the rig locks up even if the GPU cores are down at 500 Mhz.

from this i understand its not random freezes but happen only when videocards start working
meaning bad oc or not enough power

remove all cards and test them one by one,
if you can get max hash rate from them separately meaning they don't have enough power

or if you have another power supply try connecting 6950+7870 to other power supply (at least 600w) and 7970+system to your 850w

7970~350w
6950~300w
7870~250w
system~200w
=
1100w
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
disable catalyst control center and install MSI afterburner instead.  CCC causes many headaches and has been known to cause the type of problems you are talking about.


If that works 13E5sY63tZutcgPtZbPdppZoTfgt2N2YnT
I've tested CCC, MSI afterburner, Sapphire Trixx, Atitray and none of such software installed. Thanks.
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
Actually, I have had situations where a usb device can indeed lock up the whole machine, on both linux and windows...  I'd be leaning towards that.  If you have a gaming wireless adapter (i.e. receive wireless and convert to wired) or a wireless repeater with wired ports on it, you might try one of those to rule out the USB adapters.
I see. That's news for me. I will try without using USB, with an ide/sata HD and the ethernet port.
sr. member
Activity: 246
Merit: 250
Team Heritage Motorsports
disable catalyst control center and install MSI afterburner instead.  CCC causes many headaches and has been known to cause the type of problems you are talking about.


If that works 13E5sY63tZutcgPtZbPdppZoTfgt2N2YnT
sr. member
Activity: 280
Merit: 250
Helperizer
Actually, I have had situations where a usb device can indeed lock up the whole machine, on both linux and windows...  I'd be leaning towards that.  If you have a gaming wireless adapter (i.e. receive wireless and convert to wired) or a wireless repeater with wired ports on it, you might try one of those to rule out the USB adapters.
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
Well a google search shows that people had/have issue with both of them, althought most people having issue with Ralink 2870 chipset was using linux.

When you say lockup/freeze, is it the entire machine or just the mining software?
Thanks man, but I guess an usb adapter is not able to freeze a machine like this. What's more I had them working fine on 2 Intel / Asus systems (Desktop and Laptop)

It is the entire machine. I need to power cycle it to come back (with Always On after power failure on the BIOS). However, now that you mention it, I'm not sure if it locks up without being mining. When I come back I will leave it 24h powered on without doing anything. That can help in troubleshooting, but obviously defeats the purpose.

To make matters worse, the store where I purchased the components have a strict policy. They are *this* close to lose my further investment.
newbie
Activity: 27
Merit: 0
Well a google search shows that people had/have issue with both of them, althought most people having issue with Ralink 2870 chipset was using linux.

When you say lockup/freeze, is it the entire machine or just the mining software?
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
Try updating your ethernet drivers, my bfgminer stops working each time the network screws up
I'm not using ethernet, but 2 wifi usb adapters. One is a generic dongle with Ralink 2870 chipset. The other is an Alfa AWUS036H
newbie
Activity: 27
Merit: 0
Try updating your ethernet drivers, my bfgminer stops working each time the network screws up
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
Maybe you have a bad GPU. Test each one individually.
I already did that too.
sr. member
Activity: 490
Merit: 251
Maybe you have a bad GPU. Test each one individually.
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
Drop the CPU to 2 cores. Read up that the 990 needs further tweaking due to the wider ranger of unlockable cpus.
Thanks, I had tried it already. I also tried one core per unit, i.e. triple. They are also downclocked to 800 Mhz. With dual I managed 2 hours of continuous operation and then a lockup.

Can you suggest me the cheapest (even used) CPU that works with these motherboards?

That said, one should not be forced to tweak for stability the default bios settings or any setting. Slow or lacking features, sure, but not unstable.
member
Activity: 146
Merit: 10
Drop the CPU to 2 cores. Read up that the 990 needs further tweaking due to the wider ranger of unlockable cpus.
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
nobody ever does after they get their solution!  Tongue
Be certain that I keep my word and give away the 0.1 BTC if a solution is found. That's nothing compared with how wasteful this have been and the peace of mind I lost

Edit: Another lockup just happened...  Angry
sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
So, I have a Gigabyte 990FXA-UD3, an AMD FX-6100 CPU, a XFX 850W PSU and 1x4 GB Kingston Value Ram with currently mining with a 7970, a 6950 and a 7870

I can't get around random lockups and freezes.

At least this was attempted:

- I switched the PSU by a Corsair 650 (using just 1 card), the RAM by 2x2GB Gskill Ripjaws and the motherboard from an Asrock 970 Extreme4.
- I've disabled every single feature on the motherboard not useful for mining: Sata, firewire, sound, power saving states, turbo core and others. Likewise on Windows.
- Underclocked to down to half the speed and/or overvolted by 5 to 10% every component on the bios. I relaxed timings on RAM and ran passes of memtest86 without errors.
- I ended up modding the BIOS of the 3 videocards, however the rig locks up even if the GPU cores are down at 500 Mhz.
- I removed the hard-drive and I'm booting from an USB pen. I tested Win7 32 or 64 bits. I used one GPU at a time and switched slots.
- I've read threads online of people complaining about lockups and freezes on this platform.
- It gives me the impression that the rig locks up when left on it's own. I don't remember it even locking up once while interacting with it.
- I updated the bios of the motherboards and even used a suggested bios by someone from Asrock.
- With the autoruns.exe utility, I disabled un-needed device drivers. I test with and without MB drivers installed.
- I've read plenty of threads. For example http://forums.tweaktown.com/asrock/50970-asrock-970-extreme4-fx-8350-unstable-stock-settings-8.html and even http://www.overclock.net/t/1140459/bulldozer-overclocking-guide-performance-scaling-charts-max-ocs-ln2-results-coming
- Temperatures are fine and ventilation is good. GPUs below 60ºC and everything else below 40ºC, even down to 10. I've checked all temperature sensors available via software. Fans are working fine. The CPU cooler has Artic MX-4 thermal paste applied.

I'm pretty sure that experimented more stuff than this. The point is that I need to do something substantially different to fix this crap.
Could someone advise on how to solve this? Thanks.


Edit: I will leave for a few days to be with my family. Please keep replying to this thread, even if I cannot post from there.

Edit2: I can send 0.1 BTC to an escrow if you like. However the machine has to be stable mining 24/7 for at least a week
Jump to: