First - quick results for cn-trtl, w/ efficiency-focused settings, using timings mostly borrowing from others here, w/ some minor tweaks:
Vega 64 air, ubuntu 18.04 + amdgpu-pro 18.50, TRM 0.4.3 (L18+18), 852 cclock (p0)/1107 mclock/818mv, power readings at the wall
stock timings:
--CL 20 --RAS 33 --RCDRD 16 --RCDWR 10 --RC 47 --RP 14 --RRDS 4 --RRDL 6 --RFC 260 (--REF 3900)
18.5 kh/s @ 135w (137 h/w)
modded timings 1
--CL 19 --RAS 28 --RCDRD 12 --RCDWR 5 --RC 44 --RP 12 --RRDS 3 --RRDL 3 --RFC 248
19.75 kh/s @ 137w (144 h/w)
modded timings 2
same as above, plus --REF 15600
20.71 kh/s @ 137w (151 h/w)
Second - notes on power... I don't see any appreciable power differences - nor would I expect to. Clocks and voltages are untouched, we simply have a bit more data being transferred. Even the 2w difference I'm reporting here is conservative - taking natural fluctuations in my readings into account, my actual increase could be closer to <= 1w. People seeing large power increases (at least on vega 64) seem to have something else going on.
Last - some conjecture / educated guessing re: THAT --REF THO!!! I'm assuming --REF is the refresh frequency, in nanoseconds, and unlike most timings, a higher value (meaning less refreshing) is better. Refreshes steal bandwidth, and AMD seems to have gone majorly conservative (aggressive?) on this, probably due to the super high temps of the HBM during normal/gaming use. As leakage increases w/ temps, more refreshes would be required when running your GPU/HBM at high clocks/voltages. Since (efficient) miners tend to run cooler, the crazy high default refresh rate is really unnecessary. I found 4x the default to be around where returns quickly diminish, at least at my clocks - i can get maybe another 50 h/s (turtle) going 4.5x. HOWEVER - if you run super aggressive for max h/r, or just aren't effectively cooled in general, you may want to dial this back, or you may start seeing mem errors / bad shares from corrupted data due to insufficient refreshing / leakage.
After applying timing mods to an 8x vega 64 rig, I have a better picture of power use, and my initial observation does not hold. On this 8x64 rig, I now see an avg of ~8-9w increase w/ my timings above, though that's still only a ~6% increase (for ~+12% h/r.) However, I could see this delta possibly increasing for those running higher cclocks.
Also, tried dropping CL back from 19 to 20, and saw no appreciable change in h/r (maybe a 10h/s drop)
This now leaves me @ 144 h/w for cn-trtl.