Pages:
Author

Topic: Rebooting 2x Per Day (Read 713 times)

newbie
Activity: 32
Merit: 0
November 30, 2017, 06:35:28 PM
#29
Finally figured out problem (I think). Pulled all cables and one of the pins pulled out of a connector. Traded cable and glued back side of all connectors. Up for a day so far. One more day and I start tweeting hash rates. Thanks for all your help and I now have a system config I can post!

https://docs.google.com/document/d/1-j7dvV2r-WhJ1_iG9posVMKZ8WI4B7iyC_SNZWviHzA
newbie
Activity: 32
Merit: 0
November 27, 2017, 02:15:03 PM
#28
I understand, especially about adequately describing rig. Let me put something together I can post with each question. Your time is much too valuable to be running down my problem piece by piece. Give me a day. I really do appreciate your help and this has been a great community!
member
Activity: 154
Merit: 10
November 27, 2017, 11:10:52 AM
#27
Can you answer few questions to determine your problem?
 
1.) What kind of Nvidia cards.
2.) Cards are overclocked or not.
3.) Specify raiser model or just add a photo.
4.) Specify diagram of wiring how all is connected together or add a photo.
5.) What kind of OS you use.
6.) Did you try to swap the risers.
7.) Did you try to connect only one card to the PCI-e x16 slot and run the mining process.
8.) The problem started at the first time when the rig was done or the rig was working some time and then suddenly started to restart.
full member
Activity: 168
Merit: 100
November 27, 2017, 11:06:46 AM
#26
Usually a pc will reboot if there is a faulty ram, over heating processor, overheating gpu, faulty power supply. You may try to mine with 3 cards first and then add additional 1 everyday until a reboot is experienced. You should always check for gpu temp and cpu temp, maybe your setup or the place where you place you rig has poor ventillation.As with the risers, that is one hell of troubleshooting to make. If possible, you may change all of the risers.
newbie
Activity: 18
Merit: 0
November 27, 2017, 10:52:05 AM
#25
 Try to speed up your fans more, not by one or two, above 80% and leave it like that for a few hours, but something is not right
legendary
Activity: 1106
Merit: 1014
November 27, 2017, 10:43:35 AM
#24
If you want any meaningful advises you'll need to at least properly describe your setup. After all this time you haven't even said what GPUs you run. Undecided There's plenty of reasons for any computer to reboot by itself, and even more so for a mining rig. Both hardware and software reasons. Unless it's something very obvious, it's unlikely that someone will figure out what's going on with your rig simply because you didn't bother to spend a few minutes and actually describe it. Cards, risers, PSUs, how exactly it's all connected together etc. All we know is that it's tb250btc and you're using 1000w and 500w PSUs one of which is an old Antec. That's like a riddle, you posted only a few clues and then you wait till someone will solve it by guessing everything else. Except obviously no one here cares whether your rig will keep rebooting or not. Smiley You're the only party interested in solving this, yet you don't even want to bother with providing all the details.

The usual suspects in problems like this are the risers, the power (both how it is supplied and with what PSUs), the GPUs themselves and the system (motherboard/cpu/ram). You start by swapping the risers. If it doesn't help then you check the power, install another PSU, check whether all the connections are alright (so there's no nonsense like 3 risers sitting on a single cable from the PSU etc). Didn't help? Change the board. If it doesn't help either then test all the GPUs one by one. That's how it's done in general, it's just with more experience you're more likely to find the culprit faster, but the process remains more or less the same.
sr. member
Activity: 518
Merit: 250
November 27, 2017, 10:40:51 AM
#23
First of all i would switch to an Linux based OS. Then i would check heat on ALL components. Not only GPUs. After that i would check hardware. Risers first then ram. Reboots could literally be anything.
newbie
Activity: 32
Merit: 0
November 27, 2017, 10:09:46 AM
#22
Yes, it is in between power strip and wall. However unlikely, tried that and still have the problem. Put it back and the frequency of reboots is gradually going down. I am down to 3-4 per day... Thanks
legendary
Activity: 1106
Merit: 1014
November 27, 2017, 09:44:47 AM
#21
So you changed the risers and then the rig worked for 3 days without reboots. Then you changed the extension and it's rebooting every few hours? What exactly is "extension" — is that the power cable from the PSU to the outlet? Can you put the previous one back and see whether the rig is stable with that one? Just to make sure that it is indeed the cause for these reboots and not just a coincidence?
newbie
Activity: 32
Merit: 0
November 27, 2017, 09:28:30 AM
#20
Absolutely! That's how I got a good 3 days before I manually shut it down.
legendary
Activity: 1106
Merit: 1014
November 26, 2017, 01:52:21 PM
#19
I will try swapping risers as I bring on card by card.

...

Ok. Seemed to be working fine, no reboots for 3 days. I noticed the extension was getting warm, so I shut it down to use a larger gauge extension and now it's worse! Rebooting every few hours. Any clue why this would be? Thanks again.

Did you try to do what was suggested to you? Changing risers at least?
newbie
Activity: 32
Merit: 0
November 26, 2017, 01:48:03 PM
#18
Ok. Seemed to be working fine, no reboots for 3 days. I noticed the extension was getting warm, so I shut it down to use a larger gauge extension and now it's worse! Rebooting every few hours. Any clue why this would be? Thanks again.
legendary
Activity: 1106
Merit: 1014
November 20, 2017, 08:58:25 PM
#17
I will try swapping risers as I bring on card by card.
If I were more comfortable with Linux I would give it a try.
I thought about the ram so I brought it up to 8.  No effect...
Looking at your logs, I would say it's more likely to be a hardware problem, so adding more RAM or switching to Linux is not going to help. For now the main suspects are the board (less likely) and the risers (more likely).
newbie
Activity: 32
Merit: 0
November 20, 2017, 08:41:44 PM
#16
I will try swapping risers as I bring on card by card.
If I were more comfortable with Linux I would give it a try.
I thought about the ram so I brought it up to 8.  No effect...
member
Activity: 223
Merit: 21
DCAB
November 20, 2017, 07:55:18 PM
#15
dump Windows and mine on linux if you want reliability and stability

I mine on both, for my nvidia cards I prefer windows since it is incredibly difficult to overclock/undervolt them on linux systems.
sr. member
Activity: 847
Merit: 383
November 20, 2017, 07:47:14 PM
#14
1 card - 24 hours
no reboot
2 cards 24 hours
etc etc
hero member
Activity: 756
Merit: 560
November 20, 2017, 07:42:19 PM
#13
dump Windows and mine on linux if you want reliability and stability
member
Activity: 223
Merit: 21
DCAB
November 20, 2017, 07:37:16 PM
#12
For me, my rig was rebooting a couple of times per day due to issues with memory sharding/overflow (my RAM was barebones 4Gb w/ 16Gb swap). The error code reported out to the error monitor was 0x116. I was able to solve this and increase stability by upgrading the rig to 16Gb of RAM (enough to hold any DAG files or buffer up any I/O from the GPUs without having to use the SSD as swap). After doing this, my machine went from rebooting itself 1-2 times a day to being alive and well for the past 2 weeks straight.

If you're running Windows you can find the event logs at Event Logs Viewer > Windows Logs > System. Below is the error I was seeing that tipped me off.

Quote
The computer has rebooted from a bugcheck.  The bugcheck was: 0x00000116 (0xffffe1842ec0b250, 0xfffff802ff76f7d8, 0xffffffffc000009a, 0x0000000000000004). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: b139f2b1-3d17-48a3-a61f-493013377152.
jr. member
Activity: 54
Merit: 10
November 20, 2017, 05:56:31 PM
#11
check risers
legendary
Activity: 1106
Merit: 1014
November 20, 2017, 05:07:02 PM
#10
Win says
A connected hardware error has occurred
Component: PCIE Root Port
Error Source: Advanced Error Reporting (PCIE)
Bus Device Function  0x0:0x1C:0x6
VendorID:DeviceID: 0x8086:0xA296
Class Code: 0x30400
It's hard to decode these, might be the motherboard, but more likely problems with the risers (one or more might be faulty). Try changing them if you have spares.
Pages:
Jump to: