Pages:
Author

Topic: Mining Hardware Mistery [127.5 C / 0 RPM] (Read 2724 times)

donator
Activity: 1057
Merit: 1021
February 10, 2012, 12:52:18 PM
#28
I am leaning toward the PCIE extender cables.  I restarted my troubled miner this morning and GPU 2 read 127.5 and 0 RPM for about 15 seconds then it showed the correct temp/rpm.  I also checked the fan and it was up to speed when it said 0 rpm.

I am going to order some powered PCIE extenders and see if that makes a difference.

donator
Activity: 1218
Merit: 1079
Gerald Davis
February 10, 2012, 09:39:26 AM
#27
Im sure we both meant 8 bit Smiley

DOH, and with that I go to get some coffee before I hose some databases up.
hero member
Activity: 518
Merit: 500
February 10, 2012, 09:38:07 AM
#26
Im sure we both meant 8 bit Smiley
donator
Activity: 1218
Merit: 1079
Gerald Davis
February 10, 2012, 09:26:56 AM
#25
I am sure you meant to type 16bit. Smiley
hero member
Activity: 518
Merit: 500
February 10, 2012, 09:25:44 AM
#24
why is it exactly 127.5 c every time?

Just a guess, same reason old bioses are limited to 127.5 GB: 32 8 bit registers. Sensor stores data in a 32 8 bit register, gives 256 possible values,  which is mapped from 0C to 128C with 0.5C precision, and perhaps one reserved for error state or whatever. You would probably get the same value if you pulled a power cord from a running GPU, but Id recommend not testing that theory.
donator
Activity: 1218
Merit: 1079
Gerald Davis
February 10, 2012, 09:18:55 AM
#23
why is it exactly 127.5 c every time?

also I have seen reports from others that have seen this 127.5 interesting enough the other one that reported it here had almost an identical setup

garbage data it may be but there has to be some significance to that exact number appearing it must mean something...

Not really. The temp isn't stored as a floating point number "127.5" internally it is some binary numbering system.  127.5 could simply be all zeroes internally and when the device fails due to voltage sag it goes to 000000000 which the drivers interprets as 127.5C.  I am not saying that is the exact reason but if the card isn't getting enough power it isn't operating according to the assumptions made when designing it and thus garbage in -> garbage out.

hero member
Activity: 896
Merit: 1000
Buy this account on March-2019. New Owner here!!
February 10, 2012, 09:12:18 AM
#22
why is it exactly 127.5 c every time?

also I have seen reports from others that have seen this 127.5 interesting enough the other one that reported it here had almost an identical setup

garbage data it may be but there has to be some significance to that exact number appearing it must mean something...
donator
Activity: 1218
Merit: 1079
Gerald Davis
February 10, 2012, 09:07:02 AM
#21
One mystery still eludes me, what is the significance of this magic 127.5 C ?

I don't think there is any.  When electronic devices don't get enough power they can display all kind of garbage data.

Quote
last night I inspected the extender cables and they are in very poor shape, I re-soldered some of the connections that have come lose and it allowed me to get the rig back up, one of the pci extender cables is beyond total repair (its amazing to me the thing works at all considering there are at least 5 leads that are un connected where the ribbon cable meets the pcie slot) so on that extender cable I replaced the 5870 with a 5830 so there would be less power draw. This allowed me to get the rig back up until I get the new extender cables, its obvious to me that these things are hand soldered by Chinese graduate students, and they do not last forever.

Yeah if the PCIe extenders are in that bad of shape the resistance is going to go up and thus it is going to mean more current, possibly more than the MB can supply.
hero member
Activity: 896
Merit: 1000
Buy this account on March-2019. New Owner here!!
February 10, 2012, 08:51:14 AM
#20
well for one thing I am actually using only 3 extenders the other two cards are directly on the motherboard

it is a power load issue obviously its just a question of what component is failing

I am convinced now it is one or more of the PCIe extenders

I swapped out the motherboard with a brand new motherboard (I use msi 890fxa gd70 on all five rigs) and also swapped cards around with other rigs to see if it was the cards , the only things that stayed the same are CPU, and PCIe extender cables

last night I inspected the extender cables and they are in very poor shape, I re-soldered some of the connections that have come lose and it allowed me to get the rig back up, one of the pci extender cables is beyond total repair (its amazing to me the thing works at all considering there are at least 5 leads that are un connected where the ribbon cable meets the pcie slot) so on that extender cable I replaced the 5870 with a 5830 so there would be less power draw. This allowed me to get the rig back up until I get the new extender cables, its obvious to me that these things are hand soldered by Chinese graduate students, and they do not last forever.

One mystery still eludes me, what is the significance of this magic 127.5 C ?
donator
Activity: 1218
Merit: 1079
Gerald Davis
February 10, 2012, 08:42:36 AM
#19
My guess is the load on bus is too high.  P4man is right.  No two cards are exactly alike in terms of wattage due to the "magic" of silicon fabrication.

5 non-Molex extenders is pushing it.   The spec allows 75W per card so that is 225W.  No way the MB can handle that.  Good thing is 5870 draw ~30W each which puts you right at 150W which is probably the upper limit of what any MB designer considered likely.

Say the MB can't handle more than 150W current across the PCIe bus.  Maybe your other 4 rigs are 142W, 138W, 137W, 144W and this one is 153W.

I don't think it is the data lanes on the PCIe connectors but I do think it is the power being drawn through them.  My guess (while you wait for powered extenders) is you can find a clock speed (and thus wattage) which will work for all 5 cards.  Try 825 then 850, then 875, then 900.  There is some wattage point where all 5 cards will work and a little bit more wattage will kill the board.


Now if it were me (and I am crazy when it comes to needing to know), I would grab a clamp meter and xacto knife.  I would cut the PCIe extender ribbon to seperate the 12V wires (they are pins A2, A3, B1, B2, B3) but you can include A1 to make it require only one cut because it is only used for presence detect and thus has minimal or no amperage.  Essentially split the cable enough to get a clamp around the 12V conductors and take a reading. 


Just curious what is the model of the MB?
hero member
Activity: 896
Merit: 1000
Buy this account on March-2019. New Owner here!!
February 10, 2012, 08:33:11 AM
#18
actually the PSUS i use have no problem with 5 x 5870s like I said I have four other identical rigs that have been running for months with no problem

plus I tested the power theory just to see what would happen already, Ive got a monster corsair AX1200 and the same problem happen

at this point I am convinced its the PCIe connector(s)

I took them out last night and looked at there are multiple problems on these things

I ordered some new extenders and I will update thread if I am able to fully clock these after I replace the extenders
legendary
Activity: 3472
Merit: 1724
February 10, 2012, 07:43:21 AM
#17
1000W can be too little for overclocked 5870s, I would try 1200W, extenders only if powered (especially if the mobo is cheap), if you can try to connect as many cards directly through 6-pin pci-e connectors, not molex-pcie (and if you do use connectors that use two molex cables for each pcie connector) or sata-pcie.
newbie
Activity: 43
Merit: 0
February 09, 2012, 09:14:02 PM
#16
Have you tried a different Miner? CGminer hates one of my 5850's yet its twin works fine, both cards mine perfectly using Pheonix
donator
Activity: 1057
Merit: 1021
February 09, 2012, 05:57:30 PM
#15
I have 3 cards on the motherboard and 2 off the board with non powered pcie extender cables.  I bought the cables off ebay from different sellers.  They all came packaged the same way in static bags and they all look the same.
hero member
Activity: 896
Merit: 1000
Buy this account on March-2019. New Owner here!!
February 09, 2012, 05:55:16 PM
#14
I had this exact same problem yesterday on one of my 5 miners.  Each one of my miners has 5 5870's.

GPU 2 went to 127.5 and 0 RPM fan and cgminer crashed.  I could still ssh to the box but had to retstart to get it to run again.

This happened twice yesterday.  127.5 and 0 RPM on GPU 2.

I shut the miner down and made sure all connections were tight and restarted it.  It has been running for 24 hours now without a problem.  My conf file had 40-85 for the fans.  I changed it to 0-100 for GPU 2 but I don't know if that fixed it or it was a loose connection.


890FXA-GD70
Sempron 145
2 GB ram
Inwin 1200W PS
4GB USB  - Ubuntu/cgminer

wow that weird. our setup is identical except I am running windows and you are running linux
Just out of curiosity are you using pci extender cables? if so how many and what kind, what is your extender cable setup

there has to be someone out there that can explain to us what this 127.5 0 thing is

I find it really interesting that our hardware setups are identical hrmmm

thanks for the post!
hero member
Activity: 497
Merit: 500
February 09, 2012, 05:51:03 PM
#13
Just my 2 cents. Have you looked in GPU-z to see what kind of volts that card is getting and what kind of amps it is drawing. You may not have steady power to the card or a bad vrm on the card.
donator
Activity: 1057
Merit: 1021
February 09, 2012, 05:47:03 PM
#12
I had this exact same problem yesterday on one of my 5 miners.  Each one of my miners has 5 5870's.

GPU 2 went to 127.5 and 0 RPM fan and cgminer crashed.  I could still ssh to the box but had to retstart to get it to run again.

This happened twice yesterday.  127.5 and 0 RPM on GPU 2.

I shut the miner down and made sure all connections were tight and restarted it.  It has been running for 24 hours now without a problem.  My conf file had 40-85 for the fans.  I changed it to 0-100 for GPU 2 but I don't know if that fixed it or it was a loose connection.


890FXA-GD70
Sempron 145
2 GB ram
Inwin 1200W PS
4GB USB  - Ubuntu/cgminer
hero member
Activity: 518
Merit: 500
February 09, 2012, 08:55:25 AM
#11
Thats what I would do, but perhaps wait for someone else to chime in, as I have no personal experience with powered extenders.
hero member
Activity: 896
Merit: 1000
Buy this account on March-2019. New Owner here!!
February 09, 2012, 08:52:43 AM
#10
I appreciate your feedback - thank you

btw 925 on all cards lasted about 5 mins

on each rig I have two gpus that are directly on the motherboard and 3 of them that use extenders

should I replace 1 of the extenders on each rig with a powered one?
hero member
Activity: 518
Merit: 500
February 09, 2012, 08:34:46 AM
#9
No 2 chips are 100% equal, there is always a variance in static and dynamic leakage and therefore powerconsumption. That goes for the GPUs mostly, but the CPU or heck, even motherboard chipset could make the difference if you are sailing too close to the wind (and with 5 overlocked 5870s on unpowered extenders, you probably are).  Be happy it works on 4 of your rigs, and get some powered for the 5th Wink.

Id actually consider retrofitting the others too if you can find the cables cheap enough, to reduce the chance of blowing up your motherboards. You may not need powered extenders on all cards, just a few per rig to reduce the load on the PCIe bus.

Anyway, thats just my 2 cents. Perhaps its something different entirely, but I doubt it.
Pages:
Jump to: