Author

Topic: AMD Vega 56 instability with multiple GPUs (Read 418 times)

newbie
Activity: 7
Merit: 0
November 30, 2017, 01:44:44 AM
#7
Check out this guide, http://vega.miningguides.com/, that helped me setup and get 1950h/s with my Vega 56's modded to 64's. The key thing is Blockchain drivers from Aug 23 and when you restart your computer go into device manager disable/re-enable every vega GPU to get the compute to actually work.

Perfect! Already by seeing the part of the guide where its mentioned a lower intensity of 1800*2 for the card with the HDMI/DP dongle had an dramatic effect.
Simply as step 1, to just change intensity and affine threads to a single dedicated CPU core, I went from an highly unstable hashrate in the range of 3800-4700H/s, I'm now doing a steady:
HASHRATE REPORT - AMD
| ID | 10s |  60s |  15m | ID | 10s |  60s |  15m |
|  0 | 817.8 | 821.8 | 831.1 |  1 | 827.2 | 818.3 | 829.6 |
|  2 | 985.2 | 985.8 | 986.2 |  3 | 988.4 | 986.5 | 986.2 |
|  4 | 961.2 | 959.7 | 958.6 |  5 | 958.2 | 959.2 | 958.6 |
-----------------------------------------------------
Totals:   5538.0 5531.3 5550.3 H/s
Highest:  5738.4 H/s

Still though more to gain on that first card, so I'm gonna test with enabling the onboard GPU and use the DP dongle there and see if the Vega cards pop up (didnt do that with the previous motherboard).
If that works and it gives me and increased hashrate on GPU0, I'm gonna reinstall windows and just follow the guide to the letter Smiley

Big thanks for pointing out this link @Xazax310
member
Activity: 246
Merit: 24
November 29, 2017, 07:39:45 AM
#6
Check out this guide, http://vega.miningguides.com/, that helped me setup and get 1950h/s with my Vega 56's modded to 64's. The key thing is Blockchain drivers from Aug 23 and when you restart your computer go into device manager disable/re-enable every vega GPU to get the compute to actually work.
newbie
Activity: 7
Merit: 0
November 29, 2017, 04:36:48 AM
#5
This is the result using xmr-stak, seems more stable that way, but still, there is this 1 card that acts up.

HASHRATE REPORT - AMD
| ID | 10s |  60s |  15m | ID | 10s |  60s |  15m |
|  0 | 534.4 | 579.1 | (na) |  1 | 416.3 | 458.7 | (na) |
|  2 | 1036.9 | 1035.5 | (na) |  3 | 819.3 | 849.3 | (na) |
|  4 | 1016.8 | 1017.8 | (na) |  5 | 805.9 | 807.8 | (na) |
-----------------------------------------------------
Totals:   4629.6 4748.2 (na) H/s
Highest:  4914.2 H/s
[2017-11-29 10:33:54] : New block detected.

Did another "wipe" of drivers and reflashed the stock Vega 56 bios.
Until now the driver/cards/miner haven't crashed, but we'll see how long it lasts.

Funny thing is if I reboot, the card acting up can suddenly be GPU1 or GPU2.

Worth mentioning, with the previous motherboard I didn't get the Vega card operational unless I had a "dummy plug" in the display port plug on the Vega, so thats still inserted now on the card connected to (via riser card) the x16 slot closest to the CPU.
newbie
Activity: 7
Merit: 0
November 29, 2017, 03:42:06 AM
#4
cast_xmr gives:
[09:45:12] Shares: 18 Accepted, 0 Errors | Hash Rate Avg: 4426.6 H/s | Avg Search Time: 29.7 sec
[09:45:14] GPU0 | 54°C | Fan 3922 RPM | 706.6 H/s
[09:45:14] GPU1 | 57°C | Fan 3887 RPM | 1726.4 H/s
[09:45:14] GPU2 | 56°C | Fan 3928 RPM | 1728.1 H/s
[09:45:16] GPU1 Found Nonce, submitting...
[09:45:17] Share Accepted -> +1
[09:45:17] Shares: 19 Accepted, 0 Errors | Hash Rate Avg: 4424.1 H/s | Avg Search Time: 28.4 sec
[09:45:17] New job received. Avg Job Time: 44.9 sec
[09:45:17] GPU0 | 55°C | Fan 3921 RPM | 0.0 H/s
[09:45:17] GPU1 | 55°C | Fan 3886 RPM | 1721.4 H/s
[09:45:17] GPU2 | 56°C | Fan 3920 RPM | 1727.2 H/s
[09:45:19] GPU0 | 55°C | Fan 3922 RPM | 737.1 H/s
[09:45:19] GPU1 | 56°C | Fan 3883 RPM | 1722.2 H/s
[09:45:19] GPU2 | 56°C | Fan 3924 RPM | 1728.1 H/s
[09:45:20] New job received. Avg Job Time: 41.8 sec
[09:45:22] GPU0 | 55°C | Fan 3930 RPM | 0.0 H/s
[09:45:22] GPU1 | 55°C | Fan 3887 RPM | 1723.9 H/s
[09:45:22] GPU2 | 55°C | Fan 3919 RPM | 1729.7 H/s
[09:45:24] GPU0 | 55°C | Fan 3926 RPM | 702.5 H/s
[09:45:24] GPU1 | 55°C | Fan 3887 RPM | 1723.1 H/s
[09:45:24] GPU2 | 55°C | Fan 3922 RPM | 1729.7 H/s
[09:45:28] GPU0 | 55°C | Fan 3923 RPM | 0.0 H/s
[09:45:28] GPU1 | 56°C | Fan 3886 RPM | 1723.1 H/s
[09:45:28] GPU2 | 56°C | Fan 3919 RPM | 1727.6 H/s
[09:45:30] GPU0 | 55°C | Fan 3930 RPM | 693.4 H/s
[09:45:30] GPU1 | 55°C | Fan 3891 RPM | 1723.9 H/s
[09:45:30] GPU2 | 56°C | Fan 3919 RPM | 1728.1 H/s

Doesn't look right at all Smiley
newbie
Activity: 7
Merit: 0
November 28, 2017, 04:30:40 PM
#3
Yeah tried that as well. Tried now to switch motherboard with my desktop computers motherboard and its the same thing.
Tried to flash all 3 cards again with the Vega 64 bios, but still same.
2 cards going at 1750H/s, 3rd at 1100-1300H/s and during refresh the last card drops to 0 then back to same range.
Then after a while miner stops hashing all together.

Very strange..
legendary
Activity: 1106
Merit: 1014
November 28, 2017, 10:33:25 AM
#2
Seeing how you get interface glitches with amd settings I'd suggest to reinstall the drivers with doing complete cleanup using something like DDU. That is, if you haven't already, but I don't see it mentioned in your post.
newbie
Activity: 7
Merit: 0
November 28, 2017, 03:57:35 AM
#1
So, about a month ago I bought the Vega 56 to use in my 2nd mining rig due to the very nice monero hashrate on it.
Installed the Vega 64 bios on it just to see how it performed with overclocks. Didn't get a good stability, so ended up with wattman settings: -30% gpu freq, 950 mem, -20% power.
After some struggle with using a riser on it (got ~1300H/s) I ended up putting it directly in the PCIe x16 slot and been hasing at ~1800H/s for a good month now.

The time to expand came so I bought 2 more Vega 56's along with a EVGA G2 1600W PSU to prepeare for future expansion of 3 more Vega 56's.
I get the PSU and cards, flash the Vega 64 bios on both the two new 56's, HBCC configured, raised windows virtual memory to 60GB, set the same wattman settings as mentioned above.
Fired up xmr-stak-amd-notls with these settings:
Code:
"gpu_threads_conf" : [
    { "index" : 0, "intensity" : 2016, "worksize" : 8, "affine_to_cpu" : false },
    { "index" : 0, "intensity" : 1600, "worksize" : 8, "affine_to_cpu" : false },
    { "index" : 1, "intensity" : 2016, "worksize" : 8, "affine_to_cpu" : false },
    { "index" : 1, "intensity" : 1600, "worksize" : 8, "affine_to_cpu" : false },
    { "index" : 2, "intensity" : 2016, "worksize" : 8, "affine_to_cpu" : false },
    { "index" : 2, "intensity" : 1600, "worksize" : 8, "affine_to_cpu" : false },
],

The first thing I notice is that 2 of the cards jump right to just below 1800H/s and the 3rd sits down at ~1300H/s which I'm assuming is the card connected to the PCIe x16 slot as mentioned above.
I though this was a power issue with the previous PSU, but now I guess its more pointing to the motherboard I had laying around that I decided to use until I reach 3 GPUs.

However, the real issue here is stability. After a couple of hours (sometimes just 15minutes) hashing it just stops hashing. The miner seemingly is alive, it just doesn't do anything.
If I ctrl-c it asks if I want to end the batch job, which indicates the process itself isn't hung I guess.
I tried different miners with the same result, lastly the cast-xmr miner.
I also tried flashing stock Vega 56 bios, running with no OC settings at all, always the same issue.

I'm leaning towards doing an early switch of the motherboard (it has just 1 PCIe x16 and 2 PCIe x1 so not exactly future ready for the rig..), but I'd like others input for other things to check.
Thanks! Smiley

Extra info:
OS: Windows 10 Pro
RAM: 16GB
SSD: 120GB
Driver: August 23 blockhain driver

Edit: Just happened again and I tried to check amd settings, but now that window is just blurred and even I stop the program and restart it continues just to be a blurred window on the screen.
Event viewer says:
Message 1:
Code:
Fault bucket , type 0
Event Name: LiveKernelEvent
Response: Not available
Cab Id: 0
 
Problem signature:
P1: 141
P2: ffffe501a85ef010
P3: fffff809a6d4f7d8
P4: 0
P5: 190c
P6: 10_0_16299
P7: 0_0
P8: 256_1
P9:
P10:
 
Attached files:
\\?\C:\Windows\LiveKernelReports\WATCHDOG\WATCHDOG-20171128-0946.dmp
\\?\C:\Windows\TEMP\WER-2943937-0.sysdata.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERF45C.tmp.WERInternalMetadata.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERF46C.tmp.csv
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERF46D.tmp.txt
 
These files may be available here:
C:\ProgramData\Microsoft\Windows\WER\ReportQueue\Kernel_141_56df233ee0d876b17be48e5447a258e0bd4b64ed_00000000_cab_1938f47b
 
Analysis symbol:
Rechecking for solution: 0
Report Id: 4f0ff9fb-15c1-4fa4-afee-73c70644f2f8
Report Status: 4
Hashed bucket:

Message 2:
Code:
Fault bucket LKD_0x141_Tdr:6_IMAGE_atikmpag.sys, type 0
Event Name: LiveKernelEvent
Response: Not available
Cab Id: 091d0664-c644-4cbc-bbf7-02dcf25d9a03
 
Problem signature:
P1: 141
P2: ffffe501a85ef010
P3: fffff809a6d4f7d8
P4: 0
P5: 190c
P6: 10_0_16299
P7: 0_0
P8: 256_1
P9:
P10:
 
Attached files:
\\?\C:\Windows\LiveKernelReports\WATCHDOG\WATCHDOG-20171128-0946.dmp
\\?\C:\Windows\TEMP\WER-2943937-0.sysdata.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERF45C.tmp.WERInternalMetadata.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERF46C.tmp.csv
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERF46D.tmp.txt
\\?\C:\Windows\Temp\WER1718.tmp.WERDataCollectionStatus.txt
 
These files may be available here:
C:\ProgramData\Microsoft\Windows\WER\ReportArchive\Kernel_141_56df233ee0d876b17be48e5447a258e0bd4b64ed_00000000_cab_1b911aa1
 
Analysis symbol:
Rechecking for solution: 0
Report Id: 4f0ff9fb-15c1-4fa4-afee-73c70644f2f8
Report Status: 268435456
Hashed bucket:
Jump to: