Rebooting 2x Per Day | Bitcointalksearch.org

rwtrader

newbie

Activity: 32

Merit: 0

Finally figured out problem (I think). Pulled all cables and one of the pins pulled out of a connector. Traded cable and glued back side of all connectors. Up for a day so far. One more day and I start tweeting hash rates. Thanks for all your help and I now have a system config I can post!

https://docs.google.com/document/d/1-j7dvV2r-WhJ1_iG9posVMKZ8WI4B7iyC_SNZWviHzA

rwtrader

newbie

Activity: 32

Merit: 0

I understand, especially about adequately describing rig. Let me put something together I can post with each question. Your time is much too valuable to be running down my problem piece by piece. Give me a day. I really do appreciate your help and this has been a great community!

cryptocoinfarmer

member

Activity: 154

Merit: 10

Can you answer few questions to determine your problem?

1.) What kind of Nvidia cards.
2.) Cards are overclocked or not.
3.) Specify raiser model or just add a photo.
4.) Specify diagram of wiring how all is connected together or add a photo.
5.) What kind of OS you use.
6.) Did you try to swap the risers.
7.) Did you try to connect only one card to the PCI-e x16 slot and run the mining process.
8.) The problem started at the first time when the rig was done or the rig was working some time and then suddenly started to restart.

kapipindot

full member

Activity: 168

Merit: 100

Usually a pc will reboot if there is a faulty ram, over heating processor, overheating gpu, faulty power supply. You may try to mine with 3 cards first and then add additional 1 everyday until a reboot is experienced. You should always check for gpu temp and cpu temp, maybe your setup or the place where you place you rig has poor ventillation.As with the risers, that is one hell of troubleshooting to make. If possible, you may change all of the risers.

RedUkulele

newbie

Activity: 18

Merit: 0

Try to speed up your fans more, not by one or two, above 80% and leave it like that for a few hours, but something is not right

wacko

legendary

Activity: 1106

Merit: 1014

If you want any meaningful advises you'll need to at least properly describe your setup. After all this time you haven't even said what GPUs you run. Undecided

There's plenty of reasons for any computer to reboot by itself, and even more so for a mining rig. Both hardware and software reasons. Unless it's something very obvious, it's unlikely that someone will figure out what's going on with your rig simply because you didn't bother to spend a few minutes and actually describe it. Cards, risers, PSUs, how exactly it's all connected together etc. All we know is that it's tb250btc and you're using 1000w and 500w PSUs one of which is an old Antec. That's like a riddle, you posted only a few clues and then you wait till someone will solve it by guessing everything else. Except obviously no one here cares whether your rig will keep rebooting or not.

You're the only party interested in solving this, yet you don't even want to bother with providing all the details.

The usual suspects in problems like this are the risers, the power (both how it is supplied and with what PSUs), the GPUs themselves and the system (motherboard/cpu/ram). You start by swapping the risers. If it doesn't help then you check the power, install another PSU, check whether all the connections are alright (so there's no nonsense like 3 risers sitting on a single cable from the PSU etc). Didn't help? Change the board. If it doesn't help either then test all the GPUs one by one. That's how it's done in general, it's just with more experience you're more likely to find the culprit faster, but the process remains more or less the same.

Tidsdilatation

sr. member

Activity: 518

Merit: 250

First of all i would switch to an Linux based OS. Then i would check heat on ALL components. Not only GPUs. After that i would check hardware. Risers first then ram. Reboots could literally be anything.

rwtrader

newbie

Activity: 32

Merit: 0

Yes, it is in between power strip and wall. However unlikely, tried that and still have the problem. Put it back and the frequency of reboots is gradually going down. I am down to 3-4 per day... Thanks

wacko

legendary

Activity: 1106

Merit: 1014

So you changed the risers and then the rig worked for 3 days without reboots. Then you changed the extension and it's rebooting every few hours? What exactly is "extension" — is that the power cable from the PSU to the outlet? Can you put the previous one back and see whether the rig is stable with that one? Just to make sure that it is indeed the cause for these reboots and not just a coincidence?

rwtrader

newbie

Activity: 32

Merit: 0

Absolutely! That's how I got a good 3 days before I manually shut it down.

wacko

legendary

Activity: 1106

Merit: 1014

Quote from: rwtrader on November 20, 2017, 08:41:44 PM

I will try swapping risers as I bring on card by card.

...

Quote from: rwtrader on November 26, 2017, 01:48:03 PM

Ok. Seemed to be working fine, no reboots for 3 days. I noticed the extension was getting warm, so I shut it down to use a larger gauge extension and now it's worse! Rebooting every few hours. Any clue why this would be? Thanks again.

Did you try to do what was suggested to you? Changing risers at least?

rwtrader

newbie

Activity: 32

Merit: 0

Ok. Seemed to be working fine, no reboots for 3 days. I noticed the extension was getting warm, so I shut it down to use a larger gauge extension and now it's worse! Rebooting every few hours. Any clue why this would be? Thanks again.

wacko

legendary

Activity: 1106

Merit: 1014

Quote from: rwtrader on November 20, 2017, 08:41:44 PM

I will try swapping risers as I bring on card by card.
If I were more comfortable with Linux I would give it a try.
I thought about the ram so I brought it up to 8. No effect...

Looking at your logs, I would say it's more likely to be a hardware problem, so adding more RAM or switching to Linux is not going to help. For now the main suspects are the board (less likely) and the risers (more likely).

rwtrader

newbie

Activity: 32

Merit: 0

I will try swapping risers as I bring on card by card.
If I were more comfortable with Linux I would give it a try.
I thought about the ram so I brought it up to 8. No effect...

cpmcgrat

member

Activity: 223

Merit: 21

DCAB

Quote from: fanatic26 on November 20, 2017, 07:42:19 PM

dump Windows and mine on linux if you want reliability and stability

I mine on both, for my nvidia cards I prefer windows since it is incredibly difficult to overclock/undervolt them on linux systems.

dagarair

sr. member

Activity: 847

Merit: 383

1 card - 24 hours
no reboot
2 cards 24 hours
etc etc

fanatic26

hero member

Activity: 756

Merit: 560

dump Windows and mine on linux if you want reliability and stability

cpmcgrat

member

Activity: 223

Merit: 21

DCAB

For me, my rig was rebooting a couple of times per day due to issues with memory sharding/overflow (my RAM was barebones 4Gb w/ 16Gb swap). The error code reported out to the error monitor was 0x116. I was able to solve this and increase stability by upgrading the rig to 16Gb of RAM (enough to hold any DAG files or buffer up any I/O from the GPUs without having to use the SSD as swap). After doing this, my machine went from rebooting itself 1-2 times a day to being alive and well for the past 2 weeks straight.

If you're running Windows you can find the event logs at Event Logs Viewer > Windows Logs > System. Below is the error I was seeing that tipped me off.

Quote

The computer has rebooted from a bugcheck. The bugcheck was: 0x00000116 (0xffffe1842ec0b250, 0xfffff802ff76f7d8, 0xffffffffc000009a, 0x0000000000000004). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: b139f2b1-3d17-48a3-a61f-493013377152.

dawidt

jr. member

Activity: 54

Merit: 10

check risers

wacko

legendary

Activity: 1106

Merit: 1014

Quote from: rwtrader on November 20, 2017, 04:59:13 PM

Win says
A connected hardware error has occurred
Component: PCIE Root Port
Error Source: Advanced Error Reporting (PCIE)
Bus Device Function 0x0:0x1C:0x6
VendorID:DeviceID: 0x8086:0xA296
Class Code: 0x30400

It's hard to decode these, might be the motherboard, but more likely problems with the risers (one or more might be faulty). Try changing them if you have spares.

Topic: Rebooting 2x Per Day (Read 728 times)