Pages:
Author

Topic: Custom RAM Timings for GPU's with GDDR5 - DOWNLOAD LINKS - UPDATED - page 46. (Read 155645 times)

member
Activity: 126
Merit: 10
yeah, just wanted to ask which one is accurate, ohgod's or niko's MISC_TIMING cause one is 31 bits the other one is 32 Smiley

3 highest bits are unused anyway (so difference between 31 and 32 is irrelevant).

And whats the correct structure for MC_SEQ_MISC_TIMING according to your decoding tool for RX series?

As stated in atom_rom_timings.py in git.
member
Activity: 126
Merit: 10
yeah, just wanted to ask which one is accurate, ohgod's or niko's MISC_TIMING cause one is 31 bits the other one is 32 Smiley

I think the linux kernel asic reg headers are misleading.  As far as I can tell the straps are copied into 32-bit registers, and therefore the mask and offset definitions have no functional effect.
Some of the old register names can't even be found in the GDDR5 datasheets.  For example you won't find tR2R in the Hynix datasheet, but you will find tCCDL and tCCDS.  I suspect what the Linux headers refer to as tR2R may actually be tCCDS.

Well, you could be right.
But linked Hynix H5GQ2H24AFR (last seen in R9 290) is dated by 2009 and
linux header is more recent (although if data is up to date here is questionable)
and from my point of view it is about which one is more deprecated.
sr. member
Activity: 652
Merit: 266
yeah, just wanted to ask which one is accurate, ohgod's or niko's MISC_TIMING cause one is 31 bits the other one is 32 Smiley

3 highest bits are unused anyway (so difference between 31 and 32 is irrelevant).

And whats the correct structure for MC_SEQ_MISC_TIMING according to your decoding tool for RX series?
sr. member
Activity: 588
Merit: 251
yeah, just wanted to ask which one is accurate, ohgod's or niko's MISC_TIMING cause one is 31 bits the other one is 32 Smiley

I think the linux kernel asic reg headers are misleading.  As far as I can tell the straps are copied into 32-bit registers, and therefore the mask and offset definitions have no functional effect.
Some of the old register names can't even be found in the GDDR5 datasheets.  For example you won't find tR2R in the Hynix datasheet, but you will find tCCDL and tCCDS.  I suspect what the Linux headers refer to as tR2R may actually be tCCDS.
member
Activity: 126
Merit: 10
yeah, just wanted to ask which one is accurate, ohgod's or niko's MISC_TIMING cause one is 31 bits the other one is 32 Smiley

3 highest bits are unused anyway (so difference between 31 and 32 is irrelevant).
sr. member
Activity: 588
Merit: 251
I've started doing the detailed analysis on memory timing for Eth mining.

With tRRD=6, tRC=62, tCL=21 and 2000 mem clock, I can get almost 27Mh/s mining eth.  Each hash takes 64 random DAG reads of 128 bytes each, and since they are random, each read should be from a different page.  As well, the L2 cache hit rate should be near 0, so each DAG access requires a read from GDDR (2x32-byte reads from 2 GDDR chips).

Before reading, a page (row) has to be activated(opened), so 27Mh * 64 activate = 1728M activates per second.  The Rx470/480 has 4 independent cache controllers, so a single GDDR5 chip will open 432M pages per second.  With a 2Ghz mem clock, that's about 5 (4.73) clocks per activate.  The closer that gets to 4, the better.  Lower than 4 is not possible with Eth mining, since it takes 4 clocks to transfer 64 bytes (half a DAG entry).  Note that if tRRD=6, means 6 clocks, some other timing factor is allowing the RAM to sustain <5 clocks per activate

I tried tRRD=5, and it only makes a small (~1%) improvement.  That makes sense, since RRD is the delay between 2 activate commands when they are going to different banks.  With only 16 banks, the memory controller has lots of opportunity to batch activate commands together in the same bank.  However tRC is defined as, "The minimum time interval between two successive ACTIVE commands on the same bank".  With tRC=62, the fastest access pattern would be to spread the accesses across different banks rather than batching them in the same bank.

So it seems I'm missing something about how the RAM timing.  I know there are multiple clocks for GDDR5, and some run at double data rate (i.e. WCK).  If tRRD=6 means six DDR address clocks, that would be 3 SDR command clocks (2Ghz is the command clock rate).



hero member
Activity: 2548
Merit: 626
yeah, just wanted to ask which one is accurate, ohgod's or niko's MISC_TIMING cause one is 31 bits the other one is 32 Smiley
member
Activity: 81
Merit: 1002
It was only the wind.
Hello all,

Wolf0 and I have, today, released OhGodATool, OhGodADecode and OhGodACsumFixer.

Currently, they are without barebones documentation - I don't have the time right now with work, but once I do have a spare moment, I will update it.

You can download OhGodATool here: https://github.com/OhGodACompany/OhGodATool/releases/
You can download OhGodADecode here: https://github.com/OhGodACompany/OhGodADecode/releases/
You can download OhGodACsumFixer here: https://github.com/OhGodACompany/OhGodACsumFixer/releases

Enjoy.

Thanks.  Where did you find the updated MC_SEQ_MISC_TIMING?


We worked it out by checking the stuff at runtime - modify, check, modify, check, until we got it right.
member
Activity: 126
Merit: 10
Expected format:
Quote
    5:0  TRP_WRA = 0x0
    13:8  TRP_RDA = 0x0
   19:15  TRP = 0x0
   28:20  TRFC = 0x0
I must admit that I think OhGodAGirl's format looks more like it.
I think this one is correct but only for preRX series. At least corresponding header in linux kernel dated reasonable before RX.

Here is the additional hint (produced by my tool).
Code:
RX480(Elpida EDW4032BABG)
 20000 0 999000000000000022aa1c0060881107c0540b078f82c00000204100150014209a8840a1000004c0030105070c0a100c [TRP_WRA=015,TRP_RDA=005,unused1=000,TRP=002,unused2=000,TRFC=012,unused3=000]
 40000 0 999000000000000022aa1c006094120fd0540c0815449101002041001d0314209a8880a2000004c006010a0f190e160d [TRP_WRA=021,TRP_RDA=008,unused1=000,TRP=005,unused2=000,TRFC=025,unused3=000]
 80000 0 999000000000000022aa1c00a5ac351f10550e0c21c73203004482003d0914202a8900a5000004c00c06141a33182210 [TRP_WRA=033,TRP_RDA=014,unused1=000,TRP=011,unused2=000,TRFC=051,unused3=000]
100000 0 777000000000000022aa1c002939572750550d0fa68803040068c200540c1420aa8900a6000004c00f0a191e401e2712 [TRP_WRA=038,TRP_RDA=017,unused1=000,TRP=014,unused2=000,TRFC=064,unused3=000]
125000 0 777000000000000022aa1c00ad49593270550e12ad8a14050068c300640f1420ba8980a7000004c0130e202551242e13 [TRP_WRA=045,TRP_RDA=021,unused1=000,TRP=018,unused2=000,TRFC=081,unused3=000]
137500 0 777000000000000022aa1c00ef516a3790550f14b20b9505006ae40074021420ca89c0a8020004c01510232859283315 [TRP_WRA=050,TRP_RDA=023,unused1=000,TRP=020,unused2=000,TRFC=089,unused3=000]
142500 0 777000000000000022aa1c0010d66a3990550f14344cc505006ae40074031420ca8900a9020004c0161124295c293515 [TRP_WRA=052,TRP_RDA=024,unused1=000,TRP=021,unused2=000,TRFC=092,unused3=000]
150000 0 777000000000000022aa1c00315a6b3ca0550f15b68c1506006ae4007c041420ca8980a9020004c01712262b612b3715 [TRP_WRA=054,TRP_RDA=025,unused1=000,TRP=022,unused2=000,TRFC=097,unused3=000]
162500 0 777000000000000022aa1c0073627c41b0551016ba0d9606006c060104061420ea8940aa030004c01914292e692e3b16 [TRP_WRA=058,TRP_RDA=027,unused1=000,TRP=024,unused2=000,TRFC=105,unused3=000]
175000 0 777000000000000022aa1c00b56a7d46c0551017be8e1607006c07010c081420fa8900ab030004c01b162c3171313f17 [TRP_WRA=062,TRP_RDA=029,unused1=000,TRP=026,unused2=000,TRFC=113,unused3=000]
200000 0 999000000000000022aa1c0018f77e4fd055121946501708006c07011d0c1420fa8980ac030004c01e19323781364718 [TRP_WRA=070,TRP_RDA=032,unused1=000,TRP=029,unused2=000,TRFC=129,unused3=000]
R9390(Elpida EDW4032BABG)
 20000 0 999133200000000060881107c0540b060f05c1000020410022aa1c08150014209a8840a1000000c0030105070c0a100c [TRP_WRA=015,unused1=000,TRP_RDA=005,unused2=000,TRP=002,TRFC=012,unused3=000]
 40000 0 99913320000000006094120fd0540c07158892010020410022aa1c081d0314209a8880a2000000c006010a0f190e160d [TRP_WRA=021,unused1=000,TRP_RDA=008,unused2=000,TRP=005,TRFC=025,unused3=000]
 80000 0 9991332000000000a5ac351f10550e0b218e35030044820022aa1c083d0914202a8900a5000000c00c06141a33182210 [TRP_WRA=033,unused1=000,TRP_RDA=014,unused2=000,TRP=011,TRFC=051,unused3=000]
100000 0 77713320000000002939572750550d0e261107040068c20022aa1c08540c1420aa8900a6000000c00f0a191e401e2712 [TRP_WRA=038,unused1=000,TRP_RDA=017,unused2=000,TRP=014,TRFC=064,unused3=000]
125000 0 7771332000000000ad49593270550e102d1519050068c30022aa1c08640f1420ba8980a7000000c0130e202551242e13 [TRP_WRA=045,unused1=000,TRP_RDA=021,unused2=000,TRP=018,TRFC=081,unused3=000]
137500 0 7771332000000000ef516a3790550f1232179a05006ae40022aa1c0874021420ca89c0a8020000c01510232859283315 [TRP_WRA=050,unused1=000,TRP_RDA=023,unused2=000,TRP=020,TRFC=089,unused3=000]
142500 0 777133200000000010d66a3990550f123498ca05006ae40022aa1c0874031420ca8900a9020000c0161124295c293515 [TRP_WRA=052,unused1=000,TRP_RDA=024,unused2=000,TRP=021,TRFC=092,unused3=000]
150000 0 7771332000000000315a6b3ca0550f1336191b06006ae40022aa1c087c041420ca8980a9020000c01712262b612b3715 [TRP_WRA=054,unused1=000,TRP_RDA=025,unused2=000,TRP=022,TRFC=097,unused3=000]
162500 0 777133200000000073627c41b05510143a1b9c06006c060122aa1c0804061420ea8940aa030000c01914292e692e3b16 [TRP_WRA=058,unused1=000,TRP_RDA=027,unused2=000,TRP=024,TRFC=105,unused3=000]
175000 0 7771332000000000b56a7d46c05510153e1d1d07006c070122aa1c080c081420fa8900ab030000c01b162c3171313f17 [TRP_WRA=062,unused1=000,TRP_RDA=029,unused2=000,TRP=026,TRFC=113,unused3=000]
200000 0 999133200000000018f77e4f0054121a06a01e08006c070122aa1c08350c1420fa8980ac030000c01e1932378139471a [TRP_WRA=006,unused1=000,TRP_RDA=032,unused2=000,TRP=029,TRFC=129,unused3=000]
RX480(Hynix H5GC4H24AJR)
 40000 0 555000000000000022dd1c0084941212f0540b0795847102002041001b0414209a8800a00000312006050d0e270f160e [TRP_WRA=021,TRP_RDA=009,unused1=000,TRP=006,unused2=000,TRFC=039,unused3=000]
 80000 0 777000000000000022dd1c00e7ac352210550d0a20c7f20400248100340914209a8800a0000031200c08171b4f172110 [TRP_WRA=032,TRP_RDA=014,unused1=000,TRP=011,unused2=000,TRFC=079,unused3=000]
 90000 0 777000000000000022dd1c002931462620550e0ba20793050026a2003c0a1420aa8800a0000031200d0a1a1d59192311 [TRP_WRA=034,TRP_RDA=015,unused1=000,TRP=012,unused2=000,TRFC=089,unused3=000]
100000 0 777000000000000022dd1c0029b5462930550e0c244823060026a200440b1420aa8800a0000031200e0a1c20621b2511 [TRP_WRA=036,TRP_RDA=016,unused1=000,TRP=013,unused2=000,TRFC=098,unused3=000]
112500 0 777000000000000022ff1c006bbd572f40550f0d28c9f3060048c5004c0d14205a8900a000003120100c20246f1e2912 [TRP_WRA=040,TRP_RDA=018,unused1=000,TRP=015,unused2=000,TRFC=111,unused3=000]
125000 0 777000000000000022ff1c008cc5583460550f0f2c4ab4070048c5005c0f14205a8900a000003120120d23287b222d13 [TRP_WRA=044,TRP_RDA=020,unused1=000,TRP=017,unused2=000,TRFC=123,unused3=000]
137500 0 777000000000000022339d00cecd593980551111ae8a84080048c6006c0014206a8900a002003120140f262b88252f15 [TRP_WRA=046,TRP_RDA=021,unused1=000,TRP=018,unused2=000,TRFC=136,unused3=000]
142500 0 777000000000000022339d00ce516a3b805511112fcbd408004ae6006c0014206a8900a002003120150f272d8d263015 [TRP_WRA=047,TRP_RDA=022,unused1=000,TRP=019,unused2=000,TRFC=141,unused3=000]
150000 0 777000000000000022339d00ce516a3d9055111230cb4409004ae600740114206a8900a002003120150f292f94273116 [TRP_WRA=048,TRP_RDA=022,unused1=000,TRP=019,unused2=000,TRFC=148,unused3=000]
162500 0 999000000000000022559d0010de7b4480551312b78c450a004c0601750414206a8900a00200312018112d34a42a3816 [TRP_WRA=055,TRP_RDA=025,unused1=000,TRP=022,unused2=000,TRFC=164,unused3=000]
175000 0 999000000000000022559d0031627c489055131339cdd50a004c06017d0514206a8900a00200312019123037ad2c3a17 [TRP_WRA=057,TRP_RDA=026,unused1=000,TRP=023,unused2=000,TRFC=173,unused3=000]
200000 0 bbb000000000000022889d0073ee8d53805515133ecf560c004e26017e0514206a8900a0020031201c143840c5303f17 [TRP_WRA=062,TRP_RDA=030,unused1=000,TRP=027,unused2=000,TRFC=197,unused3=000]
R9390(Hynix H5GC4H24AJR)
 40000 0 555133200000000084941212f0540b07150973020020410022dd1c081b0414209a8800a00000012006050d0e270f160e [TRP_WRA=021,unused1=000,TRP_RDA=009,unused2=000,TRP=006,TRFC=039,unused3=000]
 80000 0 7771332000000000e7ac352210550d0a208ef5040024810022dd1c08340914209a8800a0000001200c08171b4f172110 [TRP_WRA=032,unused1=000,TRP_RDA=014,unused2=000,TRP=011,TRFC=079,unused3=000]
 90000 0 77713320000000002931462620550e0b220f96050026a20022dd1c083c0a1420aa8800a0000001200d0a1a1d59192311 [TRP_WRA=034,unused1=000,TRP_RDA=015,unused2=000,TRP=012,TRFC=089,unused3=000]
100000 0 777133200000000029b5462930550e0c249026060026a20022dd1c08440b1420aa8800a0000001200e0a1c20621b2511 [TRP_WRA=036,unused1=000,TRP_RDA=016,unused2=000,TRP=013,TRFC=098,unused3=000]
112500 0 77713320000000006bbd572f40550f0d2892f7060048c50022ff1c084c0d14205a8900a000000120100c20246f1e2912 [TRP_WRA=040,unused1=000,TRP_RDA=018,unused2=000,TRP=015,TRFC=111,unused3=000]
125000 0 77713320000000008cc5583460550f0f2c94b8070048c50022ff1c085c0f14205a8900a000000120120d23287b222d13 [TRP_WRA=044,unused1=000,TRP_RDA=020,unused2=000,TRP=017,TRFC=123,unused3=000]
137500 0 7771332000000000cecd5939805511112e1589080048c60022339d086c0014206a8900a002000120140f262b88252f15 [TRP_WRA=046,unused1=000,TRP_RDA=021,unused2=000,TRP=018,TRFC=136,unused3=000]
142500 0 7771332000000000ce516a3b805511112f96d908004ae60022339d086c0014206a8900a002000120150f272d8d263015 [TRP_WRA=047,unused1=000,TRP_RDA=022,unused2=000,TRP=019,TRFC=141,unused3=000]
150000 0 7771332000000000ce516a3d9055111230964909004ae60022339d08740114206a8900a002000120150f292f94273116 [TRP_WRA=048,unused1=000,TRP_RDA=022,unused2=000,TRP=019,TRFC=148,unused3=000]
162500 0 999133200000000010de7b448055131237194b0a004c060122559d08750414206a8900a00200012018112d34a42a3816 [TRP_WRA=055,unused1=000,TRP_RDA=025,unused2=000,TRP=022,TRFC=164,unused3=000]
175000 0 999133200000000031627c4890551313399adb0a004c060122559d087d0514206a8900a00200012019123037ad2c3a17 [TRP_WRA=057,unused1=000,TRP_RDA=026,unused2=000,TRP=023,TRFC=173,unused3=000]
200000 0 bbb133200000000073ee8d53805515133e9e5d0c004e260122889d087e0514206a8900a0020001201c143840c5303f17 [TRP_WRA=062,unused1=000,TRP_RDA=030,unused2=000,TRP=027,TRFC=197,unused3=000]
Registers in RX and preRX obviously at different offsets but additionally there is no way to decode MISC with same decoder to produce
reasonably similar values for same memory type in RX and R9 cards.

EDIT: For whose who is wondering why TRP_WRA=006 for Elpida in R9 my theory is that it is a bug in the bios (6 bits was designated for field) and
64 from (70=64+6) was cut off.

EDIT: Thinking about it i think realignment of MISC parts was done to give TRP one additional bit.
In Elpida at higher straps TRP=029 which is almost overflow.
full member
Activity: 190
Merit: 100
Gosh, this thread is becoming hotter and hotter; If only I weren't tied to work lately. Good luck folks.
sr. member
Activity: 652
Merit: 266
newbie
Activity: 19
Merit: 0
member
Activity: 126
Merit: 10
Since everyone is sharing now I suppose i'll put what I've come up with out here. Running -125mv 470 Nitro Sapphire 8GB with Samsung memory with ETH hitting between 28.5MH/s to 29.2MH/s @1140 cor and @2100 mem pulling around 920watts at the wall with 6 GPU per rig. On XMR hitting 785h/s to 795h/s @1170 cor and @2100 mem pulling around 660 watts at the wall with 6 GPU per rig. Also running ethOS 1.2.0. Im here to learn more about the mistakes I made on the mod and see what others in the community have come up with.

Here is the strap I've put together:
777000000000000022CC1C00106A5B47C0570E16B08C05090068C70014051420FA8900A00300000 0190D2F399D2D2E17

The timings from wolf and ohgodagirls vbios decode tools release:
TRCDW = 16
TRCDWA = 16
TRCDR = 26
TRCDRA = 22
TRRD = 5
TRC = 71
Pad0 = 0

TRP_WRA = 48
Pad0 = 2
TRP_RDA = 12
TRP = 22
TRFC = 144

PA2RDATA = 0
Pad0 = 0
PA2WDATA = 0
Pad1 = 0
TFAW = 8
TCRCRL = 3
TCRCWL = 7
TFAW32 = 6

MC_SEQ_MISC1: 0x20140514

MC_SEQ_MISC3: 0xA00089FA

MC_SEQ_MISC8: 0x00000003

ACTRD = 25
ACTWR = 13
RASMACTRD = 47
RASMACTWR = 57

RAS2RAS = 157
RP = 45
WRPLUSRP = 46
BUS_TURN = 23

Looking forward to others input! Cheesy

TRCDR & TRCDRA should be the same.

And i don't think that MISC decoded properly. Or at least it is reasonable to expect Pad0 in
Code:
TRP_WRA = 48
Pad0 = 2
TRP_RDA = 12
TRP = 22
TRFC = 144
to be zero.
sr. member
Activity: 588
Merit: 251
Here is the strap I've put together:
777000000000000022CC1C00106A5B47C0570E16B08C05090068C70014051420FA8900A00300000 0190D2F399D2D2E17

The timings from wolf and ohgodagirls vbios decode tools release:

You should update to the version with my changes that show CAS timing.  I see you're using CL=22.  With Samsung CL=21 I was getting errors at 2100 (OK at 2000).  I'll give 22 a try.
Here's what I was using @2000:
555000000000000022CC1C00CE595B3ED0570F1531CB2409004007000B0314207A8900A00300000 0170F2E36922A3217

update: no luck with CL=22@2100; too many HW errors.  I'm pretty sure the Samsung RAM on these cards is rated for 1750, so getting stable results at 2000 is still pretty good.
legendary
Activity: 1050
Merit: 1293
Huh?
Since everyone is sharing now I suppose i'll put what I've come up with out here. Running -125mv 470 Nitro Sapphire 8GB with Samsung memory with ETH hitting between 28.5MH/s to 29.2MH/s @1140 cor and @2100 mem pulling around 920watts at the wall with 6 GPU per rig. On XMR hitting 785h/s to 795h/s @1170 cor and @2100 mem pulling around 660 watts at the wall with 6 GPU per rig. Also running ethOS 1.2.0. Im here to learn more about the mistakes I made on the mod and see what others in the community have come up with.

Here is the strap I've put together:
777000000000000022CC1C00106A5B47C0570E16B08C05090068C70014051420FA8900A00300000 0190D2F399D2D2E17

....

Looking forward to others input! Cheesy


Cleaned it up for you.

Code:
--> HEX strap: 777000000000000022CC1C00AD695D47C0570E16B08C05090048C70014051420FA8900A003000000190D2F399D2D2E17

--> MC_SEQ_WR_CTL_D0
    DAT_DLY = 7,   DQS_DLY = 7,  DQS_XTR = 0,  DAT_2Y_DLY = 0,  ADR_2Y_DLY = 0,    CMD_2Y_DLY = 0,  OEN_DLY = 7,  OEN_EXT = 0

--> MC_SEQ_WR_CTL_D1
    DAT_DLY = 0,   DQS_DLY = 0,  DQS_XTR = 0,  DAT_2Y_DLY = 0,  ADR_2Y_DLY = 0,    CMD_2Y_DLY = 0,  OEN_DLY = 0,  OEN_EXT = 0

--> MC_SEQ_PMG_TIMING
    TCKSRE = 2,  Pad0 = 0,  TCKSRX = 2,  Pad1 = 0,  TCKE_PULSE = 12,  TCKE = 12,  SEQ_IDLE = 7,  Pad2 = 0,  TCKE_PULSE_MSB = 0, SEQ_IDLE_SS = 0

--> MC_SEQ_RAS_TIMING
    TRCDW = 13,  TRCDWA = 13,  TRCDR = 26,  TRCDRA = 26,  TRRD = 5,  TRC = 71,  Pad0 = 0

--> MC_SEQ_CAS_TIMING
    TNOPW = 0,  TNOPR = 0,  TR2W = 28, TCCLD = 3,  TR2R = 5,  Pad0 = 0,  TW2R = 14,  TCL = 22,  Pad1 = 0

--> MC_SEQ_MISC_TIMING
    TRP_WRA = 48,  Pad0 = 2,  TRP_RDA = 12,  TRP = 22,  TRFC = 144

--> MC_SEQ_MISC_TIMING2
    PA2RDATA = 0,  Pad0 = 0,  PA2WDATA = 0,  Pad1 = 0,  FAW = 8,  TREDC = 2,  TWEDC = 7,  T32AW = 6,  Pad2 = 0,  TWDATATR = 0

--> MC_SEQ_MISC1
 -- MR0
    WL = 4,  CL = 23,  TM = 0,  WR = 25,  BA0 = 0,  BA1 = 0,  BA2 = 0,  BA3 = 0
 -- MR1
    DS = 0,  DT = 1,  ADR = 1,  CAL = 0,  PLL = 0,  RDBI = 0,  WDBI = 0,  ABI = 0,
    RES = 0,  BA0 = 0,  BA1 = 1,  BA2 = 0,  BA3 = 0

--> MC_SEQ_MISC3
 -- MR4
    EDCHP = 10,  CRC WL = 7,  CRC RL = 3,  RD CRC = 0,  WR CRC = 0,  EDCHPi = 1,  BA0 = 0,  BA1 = 0,  BA2 = 0,  BA3 = 1
 -- MR5
    LP1 = 0,  LP2 = 0,  LP3 = 0,  PLL/DLL BW = 0,  RAS = 0,  BA0 = 0,  BA1 = 1,  BA2 = 0,  BA3 = 1


--> MC_SEQ_MISC8
 -- MR8
    CLEHF = 1,  WREHF = 1,  RFU = 0,  BA0 = 0,  BA1 = 0,  BA2 = 0,  BA3 = 0
 -- MR7
    PLL Stby = 0,  PLL Fclk = 0,  PLL DelC = 0,  LF Mode = 0,  Auto Sync = 0,  DQ PreA = 0, Temp Sensor = 0, HVFRED = 0,
    VDD Range = 0,  RFU = 0,  BA0 = 0,  BA1 = 0,  BA2 = 0,  BA3 = 0


--> MC_ARB_DRAM_TIMING
    ACTRD = 25,  ACTWR = 13,  RASMACTRD = 47,  RASMACTWR = 57

--> MC_ARB_DRAM_TIMING2
    RAS2RAS = 157,  RP = 45,  WRPLUSRP = 46,  BUS_TURN = 23

Lots of options.. lots of things to fine tune..
newbie
Activity: 19
Merit: 0
Since everyone is sharing now I suppose i'll put what I've come up with out here. Running -125mv 470 Nitro Sapphire 8GB with Samsung memory with ETH hitting between 28.5MH/s to 29.2MH/s @1140 cor and @2100 mem pulling around 920watts at the wall with 6 GPU per rig. On XMR hitting 785h/s to 795h/s @1170 cor and @2100 mem pulling around 660 watts at the wall with 6 GPU per rig. Also running ethOS 1.2.0. Im here to learn more about the mistakes I made on the mod and see what others in the community have come up with.

Here is the strap I've put together:
777000000000000022CC1C00106A5B47C0570E16B08C05090068C70014051420FA8900A00300000 0190D2F399D2D2E17

The timings from wolf and ohgodagirls vbios decode tools release:
TRCDW = 16
TRCDWA = 16
TRCDR = 26
TRCDRA = 22
TRRD = 5
TRC = 71
Pad0 = 0

TRP_WRA = 48
Pad0 = 2
TRP_RDA = 12
TRP = 22
TRFC = 144

PA2RDATA = 0
Pad0 = 0
PA2WDATA = 0
Pad1 = 0
TFAW = 8
TCRCRL = 3
TCRCWL = 7
TFAW32 = 6

MC_SEQ_MISC1: 0x20140514

MC_SEQ_MISC3: 0xA00089FA

MC_SEQ_MISC8: 0x00000003

ACTRD = 25
ACTWR = 13
RASMACTRD = 47
RASMACTWR = 57

RAS2RAS = 157
RP = 45
WRPLUSRP = 46
BUS_TURN = 23

Looking forward to others input! Cheesy
sr. member
Activity: 588
Merit: 251
Now that the strap tools are out, let's talk about how to optimize the timings.  I want to start with ETH since it is the simplest (and coincidentally the most profitable ATM).

Ethash is many 128-byte random DAG reads, 8KB of them per hash, so 20MH/s requires 160GB/s of random read bandwidth.  For AMD cards 128 bytes is 2 cache lines of 64 bytes each, and each cache line fill reads 32 bytes from 2 GDDR5 memory chips.  Each 32-byte GDDR5 read burst takes 2 clocks, so when the RAM is clocked at 2GHz, the data will be transferred in 1ns (each bit takes just 125ps!).

Here's a couple references to help the noobs get started:
https://www.micron.com/~/media/documents/products/technical-note/dram/tned01_gddr5_sgram_introduction.pdf
https://www.micron.com/~/media/documents/products/data-sheet/dram/gddr5/4gb_gddr5_sgram_brief.pdf

I'm not going to do one long post, so as to make this more readable.  For the more experienced folks, here's a tidbit of ideas to come: set tFAW and t32AW to 0.  Even Hynix's old H5GQ1H24AFR has FAW (23ns) =~ 4* RRD (5.5ns), so virtually all modern GGD5 should be able to work fine without FAW and 32AW limits.  I get 27.0Mh with sgminer on my Rx470/K4G4 clocked at 2Ghz, tRRD=5, tFAW=0.  Zeroing t32AW gives a bump to 27.35Mh.

 
newbie
Activity: 31
Merit: 0
A kitten tamed the wolf Smiley
member
Activity: 81
Merit: 1002
It was only the wind.
I see at least a couple people have written strap decoding programs, but I can't find publicly released.  I was going to write one and release it publicly, but I figured if someone else has already written one...


So it's not as simple as using atombios.h to dump the fields in ATOM_MEMORY_TIMING_FORMAT_V2.
https://raw.githubusercontent.com/torvalds/linux/master/drivers/gpu/drm/radeon/atombios.h

Straps for GCN cards are 52 bytes long (3 bytes for memory clock, 1 byte for memory type, 48 bytes for strap), but sizeof(ATOM_MEMORY_TIMING_FORMAT_V2) = 40 bytes.

So is it just a matter of old-fashioned reverse engineering?  i.e. looking at different straps and reading through GDDR5 data sheets to figure out the strap offsets for different values?


Hah, you don't know the format and you're going to make a public tool? Your threats are like skate park swimming pools - empty Tongue

Looks like you haven't read the rest of the thread.  It took less than an hour to figure it out from the Linux drm code.


Not quite - they tell you part of the story - but look at MISC1, for example :3

Can you stop being a dick? You should be kind and considerate and thankful that people are working hard to document and unlock how all this works - knowledge should be shared, and distributed freely. This kind of optimization isn't just valuable to mining; it's valuable to a lot of operations (including scientific research, which requires a lot of compute power, and benefits quite heavily from this kind of optimization).

I've been understanding of how much you want to brag, because finding this information is hard work, but not everyone has the resources you do, through me.

nerdralph - you're doing really well. Keep it up. I'm really proud of you.

I didn't need to use all those resources - I decoded it a lot through trial and error, and public knowledge. But sure, I'll stop.
sr. member
Activity: 588
Merit: 251
So my first try at a custom strap didn't work (GPU crashed almost immediately when mining ETH).
custom 1900: 1500RAS, 1625CAS, MISC2, & ARB
777000000000000022CC1C00AD515A3ED0570F15B98CA50A004AE7001C0714207A8900A00300000 01B11353F922A3217

A straight copy of the 1625 strap to 2000 works fine, while the 1500 strap gave errors even at 1900.  I tried taking the 1900 strap, RAS from the 1500, and CAS, MISC2 & ARB2 from the 1625 strap and using it for the 2000 strap.

My friend, you have a lot to learn...I was like u...a few weeks ago, then I read all the documentation regarding GDDR5 and with a little help(well...not so little) I managed to understand what actually those timings do Smiley
Keep up the good work by the way!
EDIT: I am also very keen to understand HBM/2 timings, if anyone has some knowledge on those(I already know the mode registers) any help via PM is highly appreciated!

I know how to make an optimized strap, just like I know how to re-shingle a shed.  But tying a tarp over the roof is a lot easier...
Pages:
Jump to: