Custom RAM Timings for GPU's with GDDR5 - DOWNLOAD LINKS - UPDATED - page 42.

Wolf0

member

Activity: 81

Merit: 1002

It was only the wind.

Quote from: nerdralph on March 22, 2017, 08:30:00 AM

Quote from: Wolf0 on March 21, 2017, 09:32:06 PM

Quote from: nerdralph on March 21, 2017, 07:43:29 PM

Quote from: Wolf0 on March 21, 2017, 05:22:02 PM

Sometimes loosening the timings is better, and clocking higher - specifically on Eth.

I've seen you mention loosening the CAS timings. I tried bumping up tCL by 1, but still get crashes on the K4G4 at 2100. So is it just loosening tCL that usually does the trick, or something else too?

You have to loosen it on the DRAM, too - you're loosening the tCL on the ASIC, but not the DRAM, throwing them off.

Interesting. So the memory controller (or driver) isn't smart enough to take tCL from SEQ_CAS_TIMING and use same value for MR0 Cas Latency?
edit: I don't even understand how this would work at all. If the controller is expecting the data 22 cycles after the read, but MR0 is programmed for 21, then wouldn't that cause a 100% error rate?

It actually seems to have a tolerance of one value up or down before it stops working entirely.

nerdralph

sr. member

Activity: 588

Merit: 251

For the strap mod wannabes, I've made a basic strap mod utility. It just sets RRD to 5, ACTRD to 16 (0x10), and zeros FAW.
https://github.com/nerdralph/strapread/blob/master/strapmod.py

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: Wolf0 on March 24, 2017, 08:52:07 AM

Quote from: nerdralph on March 24, 2017, 08:44:58 AM

Agreed, though I find 1625 to be about average for what I can OC the RAM on Tonga cards. The best I had was a MSI R9 380 gaming that had heat spreaders on Samsung RAM, which was stable at 1700Mhz. The worst is a MSI Armor2X with cooling for only 3 of the 8 Elpida RAM chips, and is not stable over 1550.

Bump VDDCI, drop temps any way you can?

I re-routed the fan power cable to try to improve airflow, but it didn't really help. A copper heat spreader over the RAM would likely help, but I've just focused on getting new cards that have good RAM cooling. I took a gamble with the Asus Strix Rx470 since I couldn't find any teardowns before I bought it. It turns out it's RAM cooling isn't any better than MSI Armor2X, and clocking the Hynix RAM over 1875 is difficult. My 2nd Rx470 is the Sapphire ref version (I think you mentioned you have one as well), which is rock solid at 2100.

Wolf0

member

Activity: 81

Merit: 1002

It was only the wind.

Quote from: nerdralph on March 22, 2017, 07:28:22 AM

Quote from: Wolf0 on March 22, 2017, 07:04:36 AM

How about going in the other direction (NSFW): https://ottrbutt.com/tmp/ethwolf-03212017.jpg ?

See GPU 4.

Is that running AMDGPU-Pro 16.60 on kernel 4.10?

It is indeed.

deadsix

hero member

Activity: 751

Merit: 517

Fail to plan, and you plan to fail.

Quote from: Wolf0 on March 24, 2017, 06:29:01 AM

I wrote a program to do it: https://github.com/OhGodACompany/OhGodATool -- OhGodAnOffset

Quote from: Eliovp on March 24, 2017, 07:13:21 AM

Check out Overclock.net forum.
There are some threads there that'll get you a long way to be able to add global offset.
You basically have to find the voltage table (you can find this easily with atombiosreader)
Open your rom in a hexeditor, find that table, (For a lot of Sapphire models, the VDDC offset is already there) 00 8D 00 ** 00

And the masters come to the rescue. Thank you guys for all that you do for the community

Give a man a fish, and you feed him for a day, teach him to fish, and he's fed for life.

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: Wolf0 on March 23, 2017, 08:23:35 PM

Quote from: nerdralph on March 23, 2017, 08:17:15 PM

Glad to hear you're working on a new kernel tuned for Polaris, as you're still better than me when it comes to writing optimized kernels. Tonga seems to have a limit of 92-93% of the peak memory bandwidth for eth mining, so for a R9 380 clocked at 840/1500, that's just over 22Mh. For people that haven't read all my old posts, it's physically impossible to get more than 24Mh (192GB/s memory bandwidth / 8KB ethash reads).

For a Rx 480 clocked at 2Ghz, 92% efficiency would be 29.5Mh, and that should be doable with a core clock of 1100. However I don't recall seeing anyone getting over 29Mh without going over 2Ghz for the memory clock.

Better to bring the mem clock up a lot more, after loosening the timings a bit. And you can get more than 24MH/s - just obviously not at 1500Mhz memclk.

Agreed, though I find 1625 to be about average for what I can OC the RAM on Tonga cards. The best I had was a MSI R9 380 gaming that had heat spreaders on Samsung RAM, which was stable at 1700Mhz. The worst is a MSI Armor2X with cooling for only 3 of the 8 Elpida RAM chips, and is not stable over 1550.

Eliovp

legendary

Activity: 1050

Merit: 1294

Huh?

Quote from: deadsix on March 24, 2017, 06:13:18 AM

Quote from: jstefanop on March 23, 2017, 06:17:20 PM

Yea I got a batch of Samsung based reference saph 470s and manually added the offsets, but for some reason the gpu would not post with the offset added. Weird cause offsets on Hynix/micron cards worked fine. Never had time to investigate this.

What location were you adding the offsets at? Because I don't think they can be added at A992 for cards that dont already have a programmed offset built in.
Maybe the learned folk on this thread could shed some light on how to go about adding an offset location to a bios that doesn't have one.

Check out Overclock.net forum.

There are some threads there that'll get you a long way to be able to add global offset.

You basically have to find the voltage table (you can find this easily with atombiosreader)
Open your rom in a hexeditor, find that table, (For a lot of Sapphire models, the VDDC offset is already there) 00 8D 00 ** 00

If you want to add it you'll need to modify the table by adding those values in that table.
You then have to update the length of the VoltageObjectInfo (VOI) table & also update the length of the i2c table in VOI table <-- (something a lot of people tend to forget, i remember Wolf having issues with this as well until i told him that he had to change this value too (I2C table)).

After that you have to remove the amount of bytes you added in the legacy ROM section (or the size of the rom is incorrect).

That's not all, after doing that you have to modify the master table as well (there's a calculator for this, also on overclock.net).

It's not "that" difficult but the risk of bricking your card is A LOT higher here than screwing around with timings.

Greetings!

deadsix

hero member

Activity: 751

Merit: 517

Fail to plan, and you plan to fail.

Quote from: jstefanop on March 23, 2017, 06:17:20 PM

Yea I got a batch of Samsung based reference saph 470s and manually added the offsets, but for some reason the gpu would not post with the offset added. Weird cause offsets on Hynix/micron cards worked fine. Never had time to investigate this.

What location were you adding the offsets at? Because I don't think they can be added at A992 for cards that dont already have a programmed offset built in.
Maybe the learned folk on this thread could shed some light on how to go about adding an offset location to a bios that doesn't have one.

Wolf0

member

Activity: 81

Merit: 1002

It was only the wind.

How about going in the other direction (NSFW): https://ottrbutt.com/tmp/ethwolf-03212017.jpg ?

See GPU 4.

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: Wolf0 on March 23, 2017, 05:41:37 PM

You'll be happy to know your comment about a better Eth implementation (a long time ago) not increasing speed if not core-limited was correct. I finally got around to writing something really nice for Polaris (still tweaking it) - and while at low memclocks it does almost nothing, when you bump memclk and drop core for undervolting, it's faster.

Glad to hear you're working on a new kernel tuned for Polaris, as you're still better than me when it comes to writing optimized kernels. Tonga seems to have a limit of 92-93% of the peak memory bandwidth for eth mining, so for a R9 380 clocked at 840/1500, that's just over 22Mh. For people that haven't read all my old posts, it's physically impossible to get more than 24Mh (192GB/s memory bandwidth / 8KB ethash reads).

For a Rx 480 clocked at 2Ghz, 92% efficiency would be 29.5Mh, and that should be doable with a core clock of 1100. However I don't recall seeing anyone getting over 29Mh without going over 2Ghz for the memory clock.

jstefanop

legendary

Activity: 2188

Merit: 1401

Quote from: Eliovp on March 22, 2017, 04:37:24 PM

Quote from: deadsix on March 22, 2017, 04:09:41 PM

Kinda OT since this is a RAM Timings thread, but ill ask this anyways :

For Sapphire RX 470 4GB (Ref) cards GPU Core Volt offset is at A992 correct?
Now for the cards with Hynix memory, I find the default 04 which is 4 X 6.25 or +25mv, which seems legit.
But for the cards with Samsung memory, the value at A992 is FF which is -1 X 6.25 or -6.25mv, so something looks off.
Do different memory versions of the card have different default offset values? Or is the location different?

Any help/guidance would be appriciated.

Seems like the samsung one doesn't have global offset.

Stock roms with global offset have either '03' +18.75mV value or '04' +25mV as VDDC offset.
I've never seen something else, or rather negative offset.. and i've opened up a lot of them ;-) 'But correct me if i'm wrong..'

Greetings.

Yea I got a batch of Samsung based reference saph 470s and manually added the offsets, but for some reason the gpu would not post with the offset added. Weird cause offsets on Hynix/micron cards worked fine. Never had time to investigate this.

phobosq

member

Activity: 66

Merit: 10

Thanks, it helped me identify a bug in my strap encoder, 10th word was zeroed on write

it will be better now, I already improved and I'm doing ~340 sol/s @1300/2000.

laik2

sr. member

Activity: 652

Merit: 266

Quote from: nerdralph on March 23, 2017, 01:44:16 PM

Quote from: phobosq on March 23, 2017, 01:08:14 PM

I'm trying to make some experiments as well, but so far all failed. Every time I get to desktop after flashing a BIOS with custom straps, I get "Thread stuck in device driver" BSOD in 10-30 secs from OS load. I have Sapphire Nitro RX 480 4 GB with Samsungs, OS is Windows 10 x64. I tried 1625 strap with TRRD 5, 1750 strap with TRRD 5 and 1750 with TFAW/T32AW = 0, all with the same result. I suspect that injecting custom straps into BIOS using Polaris BIOS Editor might be the cause here. Anyone experienced similar problems?

I've been using Polaris BIOS editor, copy the file to my Linux box, and use atiflash. Works like a charm.

WL = 3, CL = 6, TM = 0, WR = 7, BA0 = 0, BA1 = 0, BA2 = 0, BA3 = 0
Here's the main reason

There are others but first u need to fix this.
# Fixed only MR0 & MR8 #
555000000000000022CC1C00CE595B40D0570C152DCB2409004007000B0414207A8900A00300000 0170F2D35922A3217

# 00300000 # This part is very important and unless you plan to lower your CLmrs below 20 and 21 for WR you shouldn't touch it.

phobosq

member

Activity: 66

Merit: 10

I flashed with yours from three posts ago and it works.... so either my decoding/encoding fails or I don't really know how it works. Probably the latter.
Maybe I share some of my creations and you can tell me if it looks OK

for Samsung 4 GB at 2000:
555000000000000022CC1C00CE595B40D0570C152DCB2409004007000B0314207A8900A00000000 0170F2D35922A3217 or
555000000000000022CC1C00CE595B40D0570F1531CB2409004007000B0314207A8900A00000000 0170F2D35922A3217

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: phobosq on March 23, 2017, 01:08:14 PM

I'm trying to make some experiments as well, but so far all failed. Every time I get to desktop after flashing a BIOS with custom straps, I get "Thread stuck in device driver" BSOD in 10-30 secs from OS load. I have Sapphire Nitro RX 480 4 GB with Samsungs, OS is Windows 10 x64. I tried 1625 strap with TRRD 5, 1750 strap with TRRD 5 and 1750 with TFAW/T32AW = 0, all with the same result. I suspect that injecting custom straps into BIOS using Polaris BIOS Editor might be the cause here. Anyone experienced similar problems?

I've been using Polaris BIOS editor, copy the file to my Linux box, and use atiflash. Works like a charm.

phobosq

member

Activity: 66

Merit: 10

I'm trying to make some experiments as well, but so far all failed. Every time I get to desktop after flashing a BIOS with custom straps, I get "Thread stuck in device driver" BSOD in 10-30 secs from OS load. I have Sapphire Nitro RX 480 4 GB with Samsungs, OS is Windows 10 x64. I tried 1625 strap with TRRD 5, 1750 strap with TRRD 5 and 1750 with TFAW/T32AW = 0, all with the same result. I suspect that injecting custom straps into BIOS using Polaris BIOS Editor might be the cause here. Anyone experienced similar problems?

laik2

sr. member

Activity: 652

Merit: 266

Quote from: nerdralph on March 23, 2017, 11:12:22 AM

Quote from: laik2 on March 23, 2017, 01:37:13 AM

Hint: ARB_DRAM_TIMING.ACTRD/ACTWR can go lower

I think I've figured out ACTRD. It's the delay between successive READ commands to the same row(page). It doesn't seem to be mentioned anywhere in the Hynix datasheet, nor in any of the GDDR product briefs I've looked at. I had incorrectly assumed that back-to-back reads from the same row were possible. With ETH mining (on AMD cards) each DAG access results in 2x 32-byte reads from 2 GDDR chips (128 bytes total).
The delay between ACTIVATE and READ is RCDR, and in the default Samsung straps use ACTRD = RCDR + 1. Knowing the way DRAM works, ACTRD should be much lower than RCDR, but by how much? A paper by nVIDIA suggests ~50%. https://www.cs.utah.edu/~dkopta/papers/DRAM-SIGGRAPH14_post.pdf

I tried reducing ACTRD to 16, and while it seems stable so far, the speed improvement is tiny (less than 0.5%). So is this Samsung strap as good as it gets for Eth mining?
777000000000000022CC1C00CE615C45C0571016B30CD50900400700140514207A8900A00300000 010103139962C3617

In this version lowering ACTRD for eth won't give you much(it may go worse though...just keep an eye in the gap between actrd/actwr).
There are a few more things that can gain you additional hash but won't be much. The best I did for my MSI Armor 4G 470 was 30.2 at 1148/2075 but was unstable due to high mem temp(no heatsinks on the modules...). You might try lowering tRCDR, but based on my exp...it will throw more errors, also tRC a few bits "might" improve 0.1% which is irrelevant(why risk stability to gain such a small improvement). Try setting WR(Write recovery) 1 bit after CLmrs...or don't...Testing, testing, testing.

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: laik2 on March 23, 2017, 01:37:13 AM

Hint: ARB_DRAM_TIMING.ACTRD/ACTWR can go lower

I think I've figured out ACTRD. It's the delay between successive READ commands to the same row(page). It doesn't seem to be mentioned anywhere in the Hynix datasheet, nor in any of the GDDR product briefs I've looked at. I had incorrectly assumed that back-to-back reads from the same row were possible. With ETH mining (on AMD cards) each DAG access results in 2x 32-byte reads from 2 GDDR chips (128 bytes total).
The delay between ACTIVATE and READ is RCDR, and in the default Samsung straps use ACTRD = RCDR + 1. Knowing the way DRAM works, ACTRD should be much lower than RCDR, but by how much? A paper by nVIDIA suggests ~50%. https://www.cs.utah.edu/~dkopta/papers/DRAM-SIGGRAPH14_post.pdf

I tried reducing ACTRD to 16, and while it seems stable so far, the speed improvement is tiny (less than 0.5%). So is this Samsung strap as good as it gets for Eth mining?
777000000000000022CC1C00CE615C45C0571016B30CD50900400700140514207A8900A00300000 010103139962C3617

Wolf0

member

Activity: 81

Merit: 1002

It was only the wind.

Quote from: nerdralph on March 21, 2017, 07:43:29 PM

Quote from: Wolf0 on March 21, 2017, 05:22:02 PM

Sometimes loosening the timings is better, and clocking higher - specifically on Eth.

I've seen you mention loosening the CAS timings. I tried bumping up tCL by 1, but still get crashes on the K4G4 at 2100. So is it just loosening tCL that usually does the trick, or something else too?

You have to loosen it on the DRAM, too - you're loosening the tCL on the ASIC, but not the DRAM, throwing them off.

laik2

sr. member

Activity: 652

Merit: 266

Quote from: Truthchanter on March 23, 2017, 10:50:51 AM

Quote from: Eliovp on March 23, 2017, 10:30:44 AM

Quote from: Truthchanter on March 23, 2017, 10:24:34 AM

Quote from: laik2 on March 22, 2017, 05:29:27 PM

Quote from: nerdralph on March 22, 2017, 05:21:51 PM

Quote from: Truthchanter on March 22, 2017, 04:40:17 PM

Are there any publicly available custom memory strap timings for the rx 400 series? (samsung, elpida, or hynix)

Here's the Samsung strap I'm currently working on:
777000000000000022CC1C0010625C49D0571016B50BD50900400700140514207A8900A00300000 0191131399D2C3617

It's the 1750 strap with RRD=5, FAW&32AW=0. It's stable at 2100 on my Sapphire Rx470. The previous custom strap I tried was based on the 1625 strap, and I would start getting a lot of errors over 2000. I just started working on it today, so there's more tweaking to do (like trying a lower tRC for RAS).

Nicely done

Lets see the final results.

Small adjustments to your timing based on my experience : 777000000000000022CC1C00CE615C45C0571016B30CD50900400700140514207A8900A00300000 0151031399D2C3617

Thanks! Nice improvement over the regular 1750 strap. Now the question is... will this 1750 samsung strap work with 480s with samsung (and then both 4gb and 8gb?)

If you have one of the first 4G samsung batches, it will work and even improve even more. (because you can unlock those to 8G)
If you have a newer batch of those 4G samsung's, it will most probably run but won't run stable.

Greetings

Interesting thanks! I have the old ones and actually used one of your old public bios on em Cheesy

The idea of custom timings is to replace only certain straps, not the whole vbios. I always use oem vbios with which the GPU arrives( if the vendor didn't push new one but that's very rare ).
As Wolf confirmed in a PM, hynix doesn't have defined bit for higher CLmrs above 20. It's not actually reverse engeneering, mostly u decode the highest strap and assume that CLmrs and tCL are at their highest bit.

Topic: Custom RAM Timings for GPU's with GDDR5 - DOWNLOAD LINKS - UPDATED - page 42. (Read 155721 times)