Mining Hardware Mistery [127.5 C / 0 RPM]

smracer

donator

Activity: 1059

Merit: 1038

I am leaning toward the PCIE extender cables. I restarted my troubled miner this morning and GPU 2 read 127.5 and 0 RPM for about 15 seconds then it showed the correct temp/rpm. I also checked the fan and it was up to speed when it said 0 rpm.

I am going to order some powered PCIE extenders and see if that makes a difference.

DeathAndTaxes

donator

Activity: 1218

Merit: 1080

Gerald Davis

Quote from: P4man on February 10, 2012, 09:38:07 AM

Im sure we both meant 8 bit

DOH, and with that I go to get some coffee before I hose some databases up.

P4man

hero member

Activity: 518

Merit: 500

Im sure we both meant 8 bit

DeathAndTaxes

donator

Activity: 1218

Merit: 1080

Gerald Davis

I am sure you meant to type 16bit.

P4man

hero member

Activity: 518

Merit: 500

Quote from: cablepair on February 10, 2012, 09:12:18 AM

why is it exactly 127.5 c every time?

Just a guess, same reason old bioses are limited to 127.5 GB: 32 8 bit registers. Sensor stores data in a 32 8 bit register, gives 256 possible values, which is mapped from 0C to 128C with 0.5C precision, and perhaps one reserved for error state or whatever. You would probably get the same value if you pulled a power cord from a running GPU, but Id recommend not testing that theory.

DeathAndTaxes

donator

Activity: 1218

Merit: 1080

Gerald Davis

Quote from: cablepair on February 10, 2012, 09:12:18 AM

why is it exactly 127.5 c every time?

also I have seen reports from others that have seen this 127.5 interesting enough the other one that reported it here had almost an identical setup

garbage data it may be but there has to be some significance to that exact number appearing it must mean something...

Not really. The temp isn't stored as a floating point number "127.5" internally it is some binary numbering system. 127.5 could simply be all zeroes internally and when the device fails due to voltage sag it goes to 000000000 which the drivers interprets as 127.5C. I am not saying that is the exact reason but if the card isn't getting enough power it isn't operating according to the assumptions made when designing it and thus garbage in -> garbage out.

cablepair

hero member

Activity: 896

Merit: 1000

Buy this account on March-2019. New Owner here!!

why is it exactly 127.5 c every time?

also I have seen reports from others that have seen this 127.5 interesting enough the other one that reported it here had almost an identical setup

garbage data it may be but there has to be some significance to that exact number appearing it must mean something...

DeathAndTaxes

donator

Activity: 1218

Merit: 1080

Gerald Davis

Quote from: cablepair on February 10, 2012, 08:51:14 AM

One mystery still eludes me, what is the significance of this magic 127.5 C ?

I don't think there is any. When electronic devices don't get enough power they can display all kind of garbage data.

Quote

last night I inspected the extender cables and they are in very poor shape, I re-soldered some of the connections that have come lose and it allowed me to get the rig back up, one of the pci extender cables is beyond total repair (its amazing to me the thing works at all considering there are at least 5 leads that are un connected where the ribbon cable meets the pcie slot) so on that extender cable I replaced the 5870 with a 5830 so there would be less power draw. This allowed me to get the rig back up until I get the new extender cables, its obvious to me that these things are hand soldered by Chinese graduate students, and they do not last forever.

Yeah if the PCIe extenders are in that bad of shape the resistance is going to go up and thus it is going to mean more current, possibly more than the MB can supply.

cablepair

hero member

Activity: 896

Merit: 1000

Buy this account on March-2019. New Owner here!!

well for one thing I am actually using only 3 extenders the other two cards are directly on the motherboard

it is a power load issue obviously its just a question of what component is failing

I am convinced now it is one or more of the PCIe extenders

I swapped out the motherboard with a brand new motherboard (I use msi 890fxa gd70 on all five rigs) and also swapped cards around with other rigs to see if it was the cards , the only things that stayed the same are CPU, and PCIe extender cables

last night I inspected the extender cables and they are in very poor shape, I re-soldered some of the connections that have come lose and it allowed me to get the rig back up, one of the pci extender cables is beyond total repair (its amazing to me the thing works at all considering there are at least 5 leads that are un connected where the ribbon cable meets the pcie slot) so on that extender cable I replaced the 5870 with a 5830 so there would be less power draw. This allowed me to get the rig back up until I get the new extender cables, its obvious to me that these things are hand soldered by Chinese graduate students, and they do not last forever.

One mystery still eludes me, what is the significance of this magic 127.5 C ?

DeathAndTaxes

donator

Activity: 1218

Merit: 1080

Gerald Davis

My guess is the load on bus is too high. P4man is right. No two cards are exactly alike in terms of wattage due to the "magic" of silicon fabrication.

5 non-Molex extenders is pushing it. The spec allows 75W per card so that is 225W. No way the MB can handle that. Good thing is 5870 draw ~30W each which puts you right at 150W which is probably the upper limit of what any MB designer considered likely.

Say the MB can't handle more than 150W current across the PCIe bus. Maybe your other 4 rigs are 142W, 138W, 137W, 144W and this one is 153W.

I don't think it is the data lanes on the PCIe connectors but I do think it is the power being drawn through them. My guess (while you wait for powered extenders) is you can find a clock speed (and thus wattage) which will work for all 5 cards. Try 825 then 850, then 875, then 900. There is some wattage point where all 5 cards will work and a little bit more wattage will kill the board.

Now if it were me (and I am crazy when it comes to needing to know), I would grab a clamp meter and xacto knife. I would cut the PCIe extender ribbon to seperate the 12V wires (they are pins A2, A3, B1, B2, B3) but you can include A1 to make it require only one cut because it is only used for presence detect and thus has minimal or no amperage. Essentially split the cable enough to get a clamp around the 12V conductors and take a reading.

Just curious what is the model of the MB?

cablepair

hero member

Activity: 896

Merit: 1000

Buy this account on March-2019. New Owner here!!

actually the PSUS i use have no problem with 5 x 5870s like I said I have four other identical rigs that have been running for months with no problem

plus I tested the power theory just to see what would happen already, Ive got a monster corsair AX1200 and the same problem happen

at this point I am convinced its the PCIe connector(s)

I took them out last night and looked at there are multiple problems on these things

I ordered some new extenders and I will update thread if I am able to fully clock these after I replace the extenders

malevolent

legendary

Activity: 3472

Merit: 1727

1000W can be too little for overclocked 5870s, I would try 1200W, extenders only if powered (especially if the mobo is cheap), if you can try to connect as many cards directly through 6-pin pci-e connectors, not molex-pcie (and if you do use connectors that use two molex cables for each pcie connector) or sata-pcie.

bangra

newbie

Activity: 43

Merit: 0

Have you tried a different Miner? CGminer hates one of my 5850's yet its twin works fine, both cards mine perfectly using Pheonix

smracer

donator

Activity: 1059

Merit: 1038

I have 3 cards on the motherboard and 2 off the board with non powered pcie extender cables. I bought the cables off ebay from different sellers. They all came packaged the same way in static bags and they all look the same.

cablepair

hero member

Activity: 896

Merit: 1000

Buy this account on March-2019. New Owner here!!

Quote from: smracer on February 09, 2012, 05:47:03 PM

I had this exact same problem yesterday on one of my 5 miners. Each one of my miners has 5 5870's.

GPU 2 went to 127.5 and 0 RPM fan and cgminer crashed. I could still ssh to the box but had to retstart to get it to run again.

This happened twice yesterday. 127.5 and 0 RPM on GPU 2.

I shut the miner down and made sure all connections were tight and restarted it. It has been running for 24 hours now without a problem. My conf file had 40-85 for the fans. I changed it to 0-100 for GPU 2 but I don't know if that fixed it or it was a loose connection.

890FXA-GD70
Sempron 145
2 GB ram
Inwin 1200W PS
4GB USB - Ubuntu/cgminer

wow that weird. our setup is identical except I am running windows and you are running linux
Just out of curiosity are you using pci extender cables? if so how many and what kind, what is your extender cable setup

there has to be someone out there that can explain to us what this 127.5 0 thing is

I find it really interesting that our hardware setups are identical hrmmm

thanks for the post!

Joshwaa

hero member

Activity: 497

Merit: 500

Just my 2 cents. Have you looked in GPU-z to see what kind of volts that card is getting and what kind of amps it is drawing. You may not have steady power to the card or a bad vrm on the card.

smracer

donator

Activity: 1059

Merit: 1038

I had this exact same problem yesterday on one of my 5 miners. Each one of my miners has 5 5870's.

GPU 2 went to 127.5 and 0 RPM fan and cgminer crashed. I could still ssh to the box but had to retstart to get it to run again.

This happened twice yesterday. 127.5 and 0 RPM on GPU 2.

I shut the miner down and made sure all connections were tight and restarted it. It has been running for 24 hours now without a problem. My conf file had 40-85 for the fans. I changed it to 0-100 for GPU 2 but I don't know if that fixed it or it was a loose connection.

890FXA-GD70
Sempron 145
2 GB ram
Inwin 1200W PS
4GB USB - Ubuntu/cgminer

P4man

hero member

Activity: 518

Merit: 500

Thats what I would do, but perhaps wait for someone else to chime in, as I have no personal experience with powered extenders.

cablepair

hero member

Activity: 896

Merit: 1000

Buy this account on March-2019. New Owner here!!

I appreciate your feedback - thank you

btw 925 on all cards lasted about 5 mins

on each rig I have two gpus that are directly on the motherboard and 3 of them that use extenders

should I replace 1 of the extenders on each rig with a powered one?

P4man

hero member

Activity: 518

Merit: 500

No 2 chips are 100% equal, there is always a variance in static and dynamic leakage and therefore powerconsumption. That goes for the GPUs mostly, but the CPU or heck, even motherboard chipset could make the difference if you are sailing too close to the wind (and with 5 overlocked 5870s on unpowered extenders, you probably are). Be happy it works on 4 of your rigs, and get some powered for the 5th Wink

.

Id actually consider retrofitting the others too if you can find the cables cheap enough, to reduce the chance of blowing up your motherboards. You may not need powered extenders on all cards, just a few per rig to reduce the load on the PCIe bus.

Anyway, thats just my 2 cents. Perhaps its something different entirely, but I doubt it.

cablepair

hero member

Activity: 896

Merit: 1000

Buy this account on March-2019. New Owner here!!

no im not using powered extenders the way I do it is I have two cards sitting directly on the motherboard and 3 cards via extenders

your making a logical argument but this is what bothers me about it.

I have 5 rigs, they all have the EXACT same setup, same motherboard, same cpu, same gpus same psus, even the same exact brand names

If I can replicate the same configuration 4 times and it works perfectly with no problems (and has been working for many months) the 5th one should be able to work the same way too, unless there is some sort of hard ware problem.

I even swapped out the motherboard with a spare brand new gd70 I had for backup, I am starting to lean towards some sort of problem with one of the pcie extenders

I dont think 925 x 5 will work, I tested several different setups just to keep the thing mining while I troubleshoot it and I determined the best way to keep it going and maintain the highest speed possible was just to lower one single card down to 800 and keep the other 4 at 950

I tried multiple times lowering every single card by a little bit and when I lowered it down enough to where it would be stable I lost more mhash then just doing it the way previously described.

I am going to order some new pcie cables today, maybe I will get one with a molex connector just for fun

P4man

hero member

Activity: 518

Merit: 500

Quote from: cablepair on February 09, 2012, 07:23:53 AM

its not a problem where I am trying to clock them too high, I have 5 rigs that are exactly the same

same gpus, same motherboard, same cpu, same psu, same everything

i clock them all at 950/180 with no problem, I have been mining with 5870s for almost a year I know exactly how to clock them

Thats not my point. It would be worth checking if 5x925 MHz works on your troublesome rig or not. Power consumption of that is higher than 4x950 + 1x800, if it works, it would make power delivery issues with the motherboard far less likely. If 5x925 does not work, thats what I would betting on: too much power pulled from the PCIe bus. Are you using powered extenders?

cablepair

hero member

Activity: 896

Merit: 1000

Buy this account on March-2019. New Owner here!!

its not a problem where I am trying to clock them too high, I have 5 rigs that are exactly the same

same gpus, same motherboard, same cpu, same psu, same everything

i clock them all at 950/180 with no problem, I have been mining with 5870s for almost a year I know exactly how to clock them

P4man

hero member

Activity: 518

Merit: 500

Have you tried 5x900 or 5x925MHz?

ssateneth

legendary

Activity: 1344

Merit: 1004

Quote from: cablepair on February 09, 2012, 06:57:31 AM

thats an interesting idea, those extender cables are pretty chincy , I have had to resolder serveral of them , they are super fragile

what boggles my mind is where the heck does the 127.5 c and 0 RPM come from? why that exact reading which is of course completely incorrect, if you look at the gpu thats giving that reading the fan is spinning fine and the card is not hot to the touch

GPU Caps Viewer reports 127.5c when my GPU's are subzero (HWInfo64 and trixx reports it correctly with a - sign), but thats probably not the case for you.

cablepair

hero member

Activity: 896

Merit: 1000

Buy this account on March-2019. New Owner here!!

thats an interesting idea, those extender cables are pretty chincy , I have had to resolder serveral of them , they are super fragile

what boggles my mind is where the heck does the 127.5 c and 0 RPM come from? why that exact reading which is of course completely incorrect, if you look at the gpu thats giving that reading the fan is spinning fine and the card is not hot to the touch

ssateneth

legendary

Activity: 1344

Merit: 1004

I've had issues with pci-e ribbon cables; it's really cold here in wisconsin and I have a miner outside. I had one GPU causing system freezes whenever a load was applied to it. Figures that it was the PCI-E cable. I switched it with another and it worked fine after. When I brought the cable inside, 1 of the wires had disconnected from the pcb. I tried investigated where exactly it disconnected and a bunch more wires ripped off the pcb. Gotta love that chinese quality.

cablepair

hero member

Activity: 896

Merit: 1000

Buy this account on March-2019. New Owner here!!

alright this is a mining hardware mystery for someone to prove they are smarter than I am. ( like thats hard right? Tongue

)

I like to consider I have mastered the art of the 5 x 5870 windows mining rig, in fact I have 5 of them.

but this one rig , this one little snot nosed, evil, cablepair mocking rig has a very strange problem

on all my 5870 rigs I clock the cards at 950 core, this gets me about 430 mhash per card

the rig in question will only allow me to clock 4 of the cards at 950 and the 5th one has to be clocked at 800
but it doesnt have to be the fifth one, it could be anyone, infact 4 x 950 and 1 x 850 core is the limit, I can clock them all lower and the system will run fine..

Someone is going to say "You dont have enough power!" Thats what I thought at first as well, at first I had a Rosewill 1000watt 80 plus , which should be enough power, I thought maybe there was something wrong with the PSU and it was not putting out enough power, at that
time if I clocked the cards too high as described above it would just shut off

so I switched the PSU to a corsair HX1000 80 plus a PSU i knew for a fact could power 5 x 5870 because it is on my other rigs,
sure enough I run into problem clocking it higher than describe above except this time its really strange, instead of just shutting off it will mine for a min and then all of the sudden one of the GPUs will display 127.5 C and 0 RPM in cgminer how weird is that ? Every single time it shows 127.5 C and 0 RPM in cgminer exactly 127.5 degrees c, also it is not the GPU it self, I have swapped out all the GPUs with others that I know work, I have moved them around, I even changed out the motherboard with another one, as well as formatted reinstalled windows so I know its not a software issue.

to recap here's a break down

5 x 5870s
you cant clock them higher than 4 x 950 core + 1 x 800 core
if you do the system crashes after mining for a min

with Rosewill PSU it just shuts off
with Corsair PSU it shows exactly 127.5 c 0 RPM on one gpu then freezes

to try and find the problem I have

Swapped PSUS (with PSUS that are known to be working and have enough power)
swapped Motherboard
Swapped GPUS
Formatted / Reinstalled - Ruled out Software Problem
Even tried not using a particular PCI-E slot that the problem occured on before , it just moves to a different GPU on a diff PCI-E

Whats Left?
Could the CPU be doing this?

I mean really whats left to rule out

CPU, Ram, PCI-E Extender Cables, those are the only variables left. and none of them make any sense to me,

Has anyone ever heard of this problem? Have any clues as to what could be causing it? I have been racking my brain trying to figure it out and I cant for the life of me.

Thanks for taking the time to read this!!!! Huh

Topic: Mining Hardware Mistery [127.5 C / 0 RPM] (Read 2732 times)