Author

Topic: GPU Memory Errors (Read 612 times)

newbie
Activity: 70
Merit: 0
February 14, 2018, 04:57:44 PM
#12
Its fine  Cheesy the errors that HWINFO reports are the ones caught by ECC. I have a 570 that's been spewing out errors since day 1 even when not overclocked but it has no problem keeping up with all my other 570's, its rejected share percentage isn't any higher either (around 0.2%).
sr. member
Activity: 476
Merit: 250
February 14, 2018, 03:47:17 PM
#11
Is HWINFO supposed to be able to pick up changes to clock speeds on the fly?

I had a couple of cards throwing out lots of memory errors so with HWINFO and the miner running I reduced the speeds but the memory errors kept increasing.  So I stopped the miner, set the lower clock speeds and no more memory errors.

To test it further I then increased the clock speeds back up to the high setting that was originally throwing errors, but this time no errors showed in HWINFO.

Is this an issue with HWINFO?

Previously I have had exact same problem like you. I believe it is by design of HWINFO. Anyway, you may want to confirm this issue with developers of the software.
newbie
Activity: 20
Merit: 0
February 14, 2018, 03:32:31 PM
#10
I am seeing the same thing on my RX580 rig with 6 cards.
Was getting a very stable 28.5 MH/s with mem clock at 2020 MHz.  No memory errors in HWINFO64.
Any higher in the memory clock, errors start to pile up fast.

But now I'm testing at 2100 MHz clock, getting very close to 31 MH/s now... 
Reported and effective hashrate is way up, 0 invalid shares, no change in stale shares.

Memory errors are through the roof.   I'm talking in the billions /day.   Doesn't seem to matter, I'm seeing an improvement in profitability.
I hope it doesn't harm the cards.

jr. member
Activity: 76
Merit: 1
January 08, 2018, 03:40:43 PM
#9
I don't know. I moved that card to my "test Rig" and I still get a few errors but not in the 100million that I was seeing.

Also, and I don't know 100% about it or not, but the memory errors are errors that have been corrected or trying to be corrected. Either way, I would rather it say 0 but it never made my Rig unstable, shutdown, restart or anything.

I have a lot ahead of my building 4 more rigs and now more for others who don't have the time but have been sitting on the sidelines savings money to get something going. if I see any new errors on this, ill try to remember to update.

I am the OP. I had to change my username...
newbie
Activity: 18
Merit: 0
January 08, 2018, 12:46:50 PM
#8
Is HWINFO supposed to be able to pick up changes to clock speeds on the fly?

I had a couple of cards throwing out lots of memory errors so with HWINFO and the miner running I reduced the speeds but the memory errors kept increasing.  So I stopped the miner, set the lower clock speeds and no more memory errors.

To test it further I then increased the clock speeds back up to the high setting that was originally throwing errors, but this time no errors showed in HWINFO.

Is this an issue with HWINFO?
full member
Activity: 218
Merit: 100
December 08, 2017, 07:07:47 PM
#7
I have the same card and dropped the memory clock to 2000 from 2050 and have no more errors. I had a ton at 2050. Also the hashrate went from 28.5 to 28.3 dual mining.
newbie
Activity: 34
Merit: 0
December 08, 2017, 02:42:16 PM
#6
I have a 6 GPU Rig and I have 1 card throwing out a lot of memory errors. Yesterday I think I have about 600,000,000 errors on that 1 card before I even noticed it.

My Rig does not crash, I have a very good Valid/Stale Share Ratio (Approx 1% or less sometimes). They only way I knew I was getting errors was because I have been trying to learn to overclock/undervolt more and more so I downloaded HWinfo and noticed it there.

Should I be worried about the GPU errors if my Rig is running very stable. Card is XFX 570 4GB and hashing about 28 mh/s.

I plan on trying to play with the card some tonight or tomorrow after I get some chores done but if anyone thinks of a possible solution I would be glad to give it a shot.

I am a very slow learner when I am reading and applying new things (alone) for the first time.

I am also looking for someone to do some small 1:1 lessons to possibly help accelerate the learning curve if you know someone or if you can teach it yourself, shoot me a PM. Payment will be sent for the lesson of course.

Your memory clock is likely just too high. On some cards the errors dont seem to matter as much and the effective hashrate stays up despite them. On others Ive seen massive drops in effective hashrate and eventually the rig would crash.

Personally I go for 0 memory errors. The added stability affects porifitability more than the 1-3% extra hashrate.

I agree with this and I would like to get to the 0 mark so I can have a stable Rig that I can (in theory) have it running and forget about for the most part. What was the method you used to learn on to properly set the overclock settings?

I think what's kind of messing my thinking up honestly, is that 1 have 6 cards in the rig and I am trying to just work on GPU 1 on my rig. should I unplug all the other card and focus on this card until the errors are gone or keep all the cards in the Rig and try to just work on this one card....
newbie
Activity: 4
Merit: 4
December 08, 2017, 02:33:04 PM
#5
I have a 6 GPU Rig and I have 1 card throwing out a lot of memory errors. Yesterday I think I have about 600,000,000 errors on that 1 card before I even noticed it.

My Rig does not crash, I have a very good Valid/Stale Share Ratio (Approx 1% or less sometimes). They only way I knew I was getting errors was because I have been trying to learn to overclock/undervolt more and more so I downloaded HWinfo and noticed it there.

Should I be worried about the GPU errors if my Rig is running very stable. Card is XFX 570 4GB and hashing about 28 mh/s.

I plan on trying to play with the card some tonight or tomorrow after I get some chores done but if anyone thinks of a possible solution I would be glad to give it a shot.

I am a very slow learner when I am reading and applying new things (alone) for the first time.

I am also looking for someone to do some small 1:1 lessons to possibly help accelerate the learning curve if you know someone or if you can teach it yourself, shoot me a PM. Payment will be sent for the lesson of course.

Your memory clock is likely just too high. On some cards the errors dont seem to matter as much and the effective hashrate stays up despite them. On others Ive seen massive drops in effective hashrate and eventually the rig would crash.

Personally I go for 0 memory errors. The added stability affects porifitability more than the 1-3% extra hashrate.
sr. member
Activity: 1414
Merit: 270
Undeads.com - P2E Runner Game
December 08, 2017, 02:32:17 PM
#4
I have a 6 GPU Rig and I have 1 card throwing out a lot of memory errors. Yesterday I think I have about 600,000,000 errors on that 1 card before I even noticed it.

My Rig does not crash, I have a very good Valid/Stale Share Ratio (Approx 1% or less sometimes). They only way I knew I was getting errors was because I have been trying to learn to overclock/undervolt more and more so I downloaded HWinfo and noticed it there.

Should I be worried about the GPU errors if my Rig is running very stable. Card is XFX 570 4GB and hashing about 28 mh/s.

I plan on trying to play with the card some tonight or tomorrow after I get some chores done but if anyone thinks of a possible solution I would be glad to give it a shot.

I am a very slow learner when I am reading and applying new things (alone) for the first time.

I am also looking for someone to do some small 1:1 lessons to possibly help accelerate the learning curve if you know someone or if you can teach it yourself, shoot me a PM. Payment will be sent for the lesson of course.


Out of memory errors on a particular card may reduce the amount found shares. Reduce your memory OC.
newbie
Activity: 34
Merit: 0
December 08, 2017, 02:31:47 PM
#3
What is the error you see repeated the most?

it just says GPU Memory Error (in hwinfo64) and I see the number climbing pretty quickly (like 20,000-30,000 per second)

I am not getting any errors in claymore (mining ETH&DCR). When I do see the stale share message, the default message in Claymore comes up and says to check overclock settings to make sure its not set to high.


As I said earlier, I am learning this still and if I don't understand or any your question in the best way, I am sorry.
sr. member
Activity: 545
Merit: 251
ASK
December 08, 2017, 02:24:35 PM
#2
What is the error you see repeated the most?
newbie
Activity: 34
Merit: 0
December 08, 2017, 02:21:48 PM
#1
I have a 6 GPU Rig and I have 1 card throwing out a lot of memory errors. Yesterday I think I have about 600,000,000 errors on that 1 card before I even noticed it.

My Rig does not crash, I have a very good Valid/Stale Share Ratio (Approx 1% or less sometimes). They only way I knew I was getting errors was because I have been trying to learn to overclock/undervolt more and more so I downloaded HWinfo and noticed it there.

Should I be worried about the GPU errors if my Rig is running very stable. Card is XFX 570 4GB and hashing about 28 mh/s.

I plan on trying to play with the card some tonight or tomorrow after I get some chores done but if anyone thinks of a possible solution I would be glad to give it a shot.

I am a very slow learner when I am reading and applying new things (alone) for the first time.

I am also looking for someone to do some small 1:1 lessons to possibly help accelerate the learning curve if you know someone or if you can teach it yourself, shoot me a PM. Payment will be sent for the lesson of course.
Jump to: