Pages:
Author

Topic: Custom RAM Timings for GPU's with GDDR5 - DOWNLOAD LINKS - UPDATED - page 42. (Read 155485 times)

member
Activity: 81
Merit: 1002
It was only the wind.
Sometimes loosening the timings is better, and clocking higher - specifically on Eth.

I've seen you mention loosening the CAS timings.  I tried bumping up tCL by 1, but still get crashes on the K4G4 at 2100.  So is it just loosening tCL that usually does the trick, or something else too?


You have to loosen it on the DRAM, too - you're loosening the tCL on the ASIC, but not the DRAM, throwing them off.

Interesting.  So the memory controller (or driver) isn't smart enough to take tCL from SEQ_CAS_TIMING and use same value for MR0 Cas Latency?
edit: I don't even understand how this would work at all.  If the controller is expecting the data 22 cycles after the read, but MR0 is programmed for 21, then wouldn't that cause a 100% error rate?


It actually seems to have a tolerance of one value up or down before it stops working entirely.
sr. member
Activity: 588
Merit: 251
For the strap mod wannabes, I've made a basic strap mod utility.  It just sets RRD to 5, ACTRD to 16 (0x10), and zeros FAW.
https://github.com/nerdralph/strapread/blob/master/strapmod.py
sr. member
Activity: 588
Merit: 251
Agreed, though I find 1625 to be about average for what I can OC the RAM on Tonga cards.  The best I had was a MSI R9 380 gaming that had heat spreaders on Samsung RAM, which was stable at 1700Mhz.  The worst is a MSI Armor2X with cooling for only 3 of the 8 Elpida RAM chips, and is not stable over 1550.


Bump VDDCI, drop temps any way you can?

I re-routed the fan power cable to try to improve airflow, but it didn't really help.  A copper heat spreader over the RAM would likely help, but I've just focused on getting new cards that have good RAM cooling.  I took a gamble with the Asus Strix Rx470 since I couldn't find any teardowns before I bought it.  It turns out it's RAM cooling isn't any better than MSI Armor2X, and clocking the Hynix RAM over 1875 is difficult.  My 2nd Rx470 is the Sapphire ref version (I think you mentioned you have one as well), which is rock solid at 2100.
member
Activity: 81
Merit: 1002
It was only the wind.
How about going in the other direction (NSFW): https://ottrbutt.com/tmp/ethwolf-03212017.jpg ?

See GPU 4.

Is that running AMDGPU-Pro 16.60 on kernel 4.10?


It is indeed.
hero member
Activity: 751
Merit: 517
Fail to plan, and you plan to fail.
I wrote a program to do it: https://github.com/OhGodACompany/OhGodATool -- OhGodAnOffset

Check out Overclock.net forum.
There are some threads there that'll get you a long way to be able to add global offset.
You basically have to find the voltage table (you can find this easily with atombiosreader)
Open your rom in a hexeditor, find that table, (For a lot of Sapphire models, the VDDC offset is already there) 00 8D 00 ** 00

And the masters come to the rescue. Thank you guys for all that you do for the community Smiley Give a man a fish, and you feed him for a day, teach him to fish, and he's fed for life.
sr. member
Activity: 588
Merit: 251
Glad to hear you're working on a new kernel tuned for Polaris, as you're still better than me when it comes to writing optimized kernels.  Tonga seems to have a limit of 92-93% of the peak memory bandwidth for eth mining, so for a R9 380 clocked at 840/1500, that's just over 22Mh.  For people that haven't read all my old posts, it's physically impossible to get more than 24Mh (192GB/s memory bandwidth / 8KB ethash reads).

For a Rx 480 clocked at 2Ghz, 92% efficiency would be 29.5Mh, and that should be doable with a core clock of 1100.  However I don't recall seeing anyone getting over 29Mh without going over 2Ghz for the memory clock.


Better to bring the mem clock up a lot more, after loosening the timings a bit. And you can get more than 24MH/s - just obviously not at 1500Mhz memclk.

Agreed, though I find 1625 to be about average for what I can OC the RAM on Tonga cards.  The best I had was a MSI R9 380 gaming that had heat spreaders on Samsung RAM, which was stable at 1700Mhz.  The worst is a MSI Armor2X with cooling for only 3 of the 8 Elpida RAM chips, and is not stable over 1550.
legendary
Activity: 1050
Merit: 1293
Huh?
Yea I got a batch of Samsung based reference saph 470s and manually added the offsets, but for some reason the gpu would not post with the offset added. Weird cause offsets on Hynix/micron cards worked fine. Never had time to investigate this.

What location were you adding the offsets at? Because I don't think they can be added at A992 for cards that dont already have a programmed offset built in.
Maybe the learned folk on this thread could shed some light on how to go about adding an offset location to a bios that doesn't have one.

Check out Overclock.net forum.

There are some threads there that'll get you a long way to be able to add global offset.

You basically have to find the voltage table (you can find this easily with atombiosreader)
Open your rom in a hexeditor, find that table, (For a lot of Sapphire models, the VDDC offset is already there) 00 8D 00 ** 00

If you want to add it you'll need to modify the table by adding those values in that table.
You then have to update the length of the VoltageObjectInfo (VOI) table & also update the length of the i2c table in VOI table <-- (something a lot of people tend to forget, i remember Wolf having issues with this as well until i told him that he had to change this value too (I2C table)).

After that you have to remove the amount of bytes you added in the legacy ROM section (or the size of the rom is incorrect).

That's not all, after doing that you have to modify the master table as well (there's a calculator for this, also on overclock.net).

It's not "that" difficult but the risk of bricking your card is A LOT higher here than screwing around with timings.

Greetings!
hero member
Activity: 751
Merit: 517
Fail to plan, and you plan to fail.
Yea I got a batch of Samsung based reference saph 470s and manually added the offsets, but for some reason the gpu would not post with the offset added. Weird cause offsets on Hynix/micron cards worked fine. Never had time to investigate this.

What location were you adding the offsets at? Because I don't think they can be added at A992 for cards that dont already have a programmed offset built in.
Maybe the learned folk on this thread could shed some light on how to go about adding an offset location to a bios that doesn't have one.
member
Activity: 81
Merit: 1002
It was only the wind.
How about going in the other direction (NSFW): https://ottrbutt.com/tmp/ethwolf-03212017.jpg ?

See GPU 4.
sr. member
Activity: 588
Merit: 251
You'll be happy to know your comment about a better Eth implementation (a long time ago) not increasing speed if not core-limited was correct. I finally got around to writing something really nice for Polaris (still tweaking it) - and while at low memclocks it does almost nothing, when you bump memclk and drop core for undervolting, it's faster.

Glad to hear you're working on a new kernel tuned for Polaris, as you're still better than me when it comes to writing optimized kernels.  Tonga seems to have a limit of 92-93% of the peak memory bandwidth for eth mining, so for a R9 380 clocked at 840/1500, that's just over 22Mh.  For people that haven't read all my old posts, it's physically impossible to get more than 24Mh (192GB/s memory bandwidth / 8KB ethash reads).

For a Rx 480 clocked at 2Ghz, 92% efficiency would be 29.5Mh, and that should be doable with a core clock of 1100.  However I don't recall seeing anyone getting over 29Mh without going over 2Ghz for the memory clock.


legendary
Activity: 2117
Merit: 1397
Kinda OT since this is a RAM Timings thread, but ill ask this anyways :

For Sapphire RX 470 4GB (Ref) cards GPU Core Volt offset is at A992 correct?
Now for the cards with Hynix memory, I find the default 04 which is 4 X 6.25 or +25mv, which seems legit.
But for the cards with Samsung memory, the value at A992 is FF which is -1 X 6.25 or -6.25mv, so something looks off.
Do different memory versions of the card have different default offset values? Or is the location different?

Any help/guidance would be appriciated.

Seems like the samsung one doesn't have global offset.

Stock roms with global offset have either '03' +18.75mV value or '04' +25mV as VDDC offset.
I've never seen something else, or rather negative offset.. and i've opened up a lot of them ;-) 'But correct me if i'm wrong..'

Greetings.

Yea I got a batch of Samsung based reference saph 470s and manually added the offsets, but for some reason the gpu would not post with the offset added. Weird cause offsets on Hynix/micron cards worked fine. Never had time to investigate this.
member
Activity: 66
Merit: 10
Thanks, it helped me identify a bug in my strap encoder, 10th word was zeroed on write Smiley it will be better now, I already improved and I'm doing ~340 sol/s @1300/2000.
sr. member
Activity: 652
Merit: 266
I'm trying to make some experiments as well, but so far all failed. Every time I get to desktop after flashing a BIOS with custom straps, I get "Thread stuck in device driver" BSOD in 10-30 secs from OS load. I have Sapphire Nitro RX 480 4 GB with Samsungs, OS is Windows 10 x64. I tried 1625 strap with TRRD 5, 1750 strap with TRRD 5 and 1750 with TFAW/T32AW = 0, all with the same result. I suspect that injecting custom straps into BIOS using Polaris BIOS Editor might be the cause here. Anyone experienced similar problems?

I've been using Polaris BIOS editor, copy the file to my Linux box, and use atiflash.  Works like a charm.


    WL = 3,  CL = 6,  TM = 0,  WR = 7,  BA0 = 0,  BA1 = 0,  BA2 = 0,  BA3 = 0
Here's the main reason Smiley
There are others but first u need to fix this.
# Fixed only MR0 & MR8 #
555000000000000022CC1C00CE595B40D0570C152DCB2409004007000B0414207A8900A00300000 0170F2D35922A3217

# 00300000 # This part is very important and unless you plan to lower your CLmrs below 20 and 21 for WR you shouldn't touch it.
member
Activity: 66
Merit: 10
I flashed with yours from three posts ago and it works.... so either my decoding/encoding fails or I don't really know how it works. Probably the latter.
Maybe I share some of my creations and you can tell me if it looks OK Smiley for Samsung 4 GB at 2000:
555000000000000022CC1C00CE595B40D0570C152DCB2409004007000B0314207A8900A00000000 0170F2D35922A3217 or
555000000000000022CC1C00CE595B40D0570F1531CB2409004007000B0314207A8900A00000000 0170F2D35922A3217
sr. member
Activity: 588
Merit: 251
I'm trying to make some experiments as well, but so far all failed. Every time I get to desktop after flashing a BIOS with custom straps, I get "Thread stuck in device driver" BSOD in 10-30 secs from OS load. I have Sapphire Nitro RX 480 4 GB with Samsungs, OS is Windows 10 x64. I tried 1625 strap with TRRD 5, 1750 strap with TRRD 5 and 1750 with TFAW/T32AW = 0, all with the same result. I suspect that injecting custom straps into BIOS using Polaris BIOS Editor might be the cause here. Anyone experienced similar problems?

I've been using Polaris BIOS editor, copy the file to my Linux box, and use atiflash.  Works like a charm.
member
Activity: 66
Merit: 10
I'm trying to make some experiments as well, but so far all failed. Every time I get to desktop after flashing a BIOS with custom straps, I get "Thread stuck in device driver" BSOD in 10-30 secs from OS load. I have Sapphire Nitro RX 480 4 GB with Samsungs, OS is Windows 10 x64. I tried 1625 strap with TRRD 5, 1750 strap with TRRD 5 and 1750 with TFAW/T32AW = 0, all with the same result. I suspect that injecting custom straps into BIOS using Polaris BIOS Editor might be the cause here. Anyone experienced similar problems?
sr. member
Activity: 652
Merit: 266
Hint: ARB_DRAM_TIMING.ACTRD/ACTWR can go lower Smiley

I think I've figured out ACTRD.  It's the delay between successive READ commands to the same row(page).  It doesn't seem to be mentioned anywhere in the Hynix datasheet, nor in any of the GDDR product briefs I've looked at.  I had incorrectly assumed that back-to-back reads from the same row were possible.  With ETH mining (on AMD cards) each DAG access results in 2x 32-byte reads from 2 GDDR chips (128 bytes total).
The delay between ACTIVATE and READ is RCDR, and in the default Samsung straps use ACTRD = RCDR + 1.  Knowing the way DRAM works, ACTRD should be much lower than RCDR, but by how much?  A paper by nVIDIA suggests ~50%. https://www.cs.utah.edu/~dkopta/papers/DRAM-SIGGRAPH14_post.pdf

I tried reducing ACTRD to 16, and while it seems stable so far, the speed improvement is tiny (less than 0.5%).  So is this Samsung strap as good as it gets for Eth mining?
777000000000000022CC1C00CE615C45C0571016B30CD50900400700140514207A8900A00300000 010103139962C3617

In this version lowering ACTRD for eth won't give you much(it may go worse though...just keep an eye in the gap between actrd/actwr).
There are a few more things that can gain you additional hash but won't be much. The best I did for my MSI Armor 4G 470 was 30.2 at 1148/2075 but was unstable due to high mem temp(no heatsinks on the modules...). You might try lowering tRCDR, but based on my exp...it will throw more errors, also tRC a few bits "might" improve 0.1% which is irrelevant(why risk stability to gain such a small improvement). Try setting WR(Write recovery) 1 bit after CLmrs...or don't...Testing, testing, testing.
sr. member
Activity: 588
Merit: 251
Hint: ARB_DRAM_TIMING.ACTRD/ACTWR can go lower Smiley

I think I've figured out ACTRD.  It's the delay between successive READ commands to the same row(page).  It doesn't seem to be mentioned anywhere in the Hynix datasheet, nor in any of the GDDR product briefs I've looked at.  I had incorrectly assumed that back-to-back reads from the same row were possible.  With ETH mining (on AMD cards) each DAG access results in 2x 32-byte reads from 2 GDDR chips (128 bytes total).
The delay between ACTIVATE and READ is RCDR, and in the default Samsung straps use ACTRD = RCDR + 1.  Knowing the way DRAM works, ACTRD should be much lower than RCDR, but by how much?  A paper by nVIDIA suggests ~50%. https://www.cs.utah.edu/~dkopta/papers/DRAM-SIGGRAPH14_post.pdf

I tried reducing ACTRD to 16, and while it seems stable so far, the speed improvement is tiny (less than 0.5%).  So is this Samsung strap as good as it gets for Eth mining?
777000000000000022CC1C00CE615C45C0571016B30CD50900400700140514207A8900A00300000 010103139962C3617
member
Activity: 81
Merit: 1002
It was only the wind.
Sometimes loosening the timings is better, and clocking higher - specifically on Eth.

I've seen you mention loosening the CAS timings.  I tried bumping up tCL by 1, but still get crashes on the K4G4 at 2100.  So is it just loosening tCL that usually does the trick, or something else too?



You have to loosen it on the DRAM, too - you're loosening the tCL on the ASIC, but not the DRAM, throwing them off.
sr. member
Activity: 652
Merit: 266
Are there any publicly available custom memory strap timings for the rx 400 series? (samsung, elpida, or hynix)

Here's the Samsung strap I'm currently working on:
777000000000000022CC1C0010625C49D0571016B50BD50900400700140514207A8900A00300000 0191131399D2C3617

It's the 1750 strap with RRD=5, FAW&32AW=0.  It's stable at 2100 on my Sapphire Rx470.  The previous custom strap I tried was based on the 1625 strap, and I would start getting a lot of errors over 2000.  I just started working on it today, so there's more tweaking to do (like trying a lower tRC for RAS).

Nicely done Smiley
Lets see the final results.

Small adjustments to your timing based on my experience : 777000000000000022CC1C00CE615C45C0571016B30CD50900400700140514207A8900A00300000 0151031399D2C3617


Thanks! Nice improvement over the regular 1750 strap. Now the question is... will this 1750 samsung strap work with 480s with samsung (and then both 4gb and 8gb?)

If you have one of the first 4G samsung batches, it will work and even improve even more. (because you can unlock those to 8G)
If you have a newer batch of those 4G samsung's, it will most probably run but won't run stable.

Greetings

Interesting thanks! I have the old ones and actually used one of your old public bios on em Cheesy
The idea of custom timings is to replace only certain straps, not the whole vbios. I always use oem vbios with which the GPU arrives( if the vendor didn't push new one but that's very rare ).
As Wolf confirmed in a PM, hynix doesn't have defined bit for higher CLmrs above 20. It's not actually reverse engeneering, mostly u decode the highest strap and assume that CLmrs and tCL are at their highest bit.
Pages:
Jump to: