Pages:
Author

Topic: SILENTARMY v5: Zcash miner, 115 sol/s on R9 Nano, 70 sol/s on GTX 1070 - page 27. (Read 209309 times)

full member
Activity: 243
Merit: 105
why not using vector store...damn it, I few more weeks and I'll get in rails with OpenCL.

vector store address must be aligned by 16 bytes, it is not possible in any round because of different offsets
sr. member
Activity: 652
Merit: 266
Quote
Well... erm... dunno how to say this, but not only is vdrop+.rom not edited in Heliox's style, it doesn't seem to be edited for voltage AT ALL - even in the ways that DON'T work. All the DPM states on vdrop+.rom point stock (into the voltage table, which has not been changed). The core clocks have been dropped pretty much across the board, though. Default memory clock was changed to 2080, but the rest of the memory states are untouched.

Heliox/Eliovp would (in a low-power ROM) have added a new VID for the initialization of the regulator in VoltageObjectInfo (changing the length of the table), as well as a value for it, which allows for global core undervolts that apply to every power state.
https://forum.ethereum.org/discussion/9650/sapphire-rx-480-nitro-oc-8gb-11260-01-20g-modded-bios-29-mh-downvolt
Here is where I got these bioses. I haven't touched vbios for 8 years...last time I modded vbios was when I got Radeon 9800Pro for PC and flashed it with Mac rom to put it in my G4 Smiley
So basicly my knowledge of this is pretty much narrowed to minimum. Don't judge me too harsh... I just started mining (1/2 weeks ago) Smiley
lol you will see me on that thread also I was using that rom for my nitros 8G Oc thanks wolfO for info
Yeah, I edited my original bioses with Polarisbioseditor and replaced 2000mhz with 1750mhz memory strap. Reflashed and overclocked with 100Mhz memory. So I'm getting better results with original overclocked bios...the only - is power usage and noise of fans. Still trying to find a way to undervolt vcore, all attempts unsuccessful for now.

Yeah, silentarmy does NOT take kindly to copied straps, I noticed. I had to re-do half of my 380's strap to get any improvements at all. This is likely because of the writes used - Eth you write once (more or less) and read frequently, just different pieces. Doing a read after a write incurs a delay, for example, so you have to be more creative.
For the "Ignorant" in this sphere to rely on "Creativity" equals "Stupidity", so better wait and spare your hardware instead of DO and waste your time AND hardware Smiley ( reffering myself)

It's not really stupidity, I spent a long ass time learning more than I EVER, EVER wanted to know about GDDR5 SDRAM.
When you know what you are doing is actually fun...but not having a clue is pointless to even try(if you don't have spare hardware). Every knowledge comes with a cost. If you are willing to pay it - you're on...else you simply use someone else's experiments and pray not to break anything. Look at the mass of miners(including me -  well I'm pretty new to this) - using Claymore, optiminer, ccminer, sgminer and **miner without even looking into the code of these projects. For example nicehash nhequiminer is still open source...some people just do cmake..make and the post stupid topics like "Why can't I specify different stratum server other than nicehash?). Now I'm reading claymore threads...every second post is related to instability, broken systems and etc. Why? Because simply noone cares about their hardware if he/she sees big numbers on the screen. Lets take that 2.5% fee. Its really simple to gain it for you, just create local proxy and add claymore's donation address as a worker, point mining pool to your local address (127.0.0.1 ex.) and you will have his 2.5% fee...Why am I writing this? Because sheeps will never understand it and will laugh at me or just because this is the right thing to do.

Hell, you could do even better - just tcpdump all the traffic for a few days, pull out any addressed that aren't yours and have a proxy (something like iptables) rewrite it such that the devfee address is changed to your own. I think there's a really simple tool for this, too - called netsed - but I didn't look really deeply into it.
Should I consider this sarcastic? Smiley
full member
Activity: 243
Merit: 105
I have been tweaking disassembled GCN codes of SA's kernels, and there seems to be quite a bit of room for performance enhancements, especially by optimizing global memory access by reordering flat_store_dword and s_waitcnt in ht_store(). @eXtremal, how are your next batch of optimizations coming along? If they are almost ready, I will wait for them. Otherwise, I will optimize the OpenCL kernel myself and then tweak the GCN code.

xor_and_store and ht_store must be rewrited, and joined to one function.

unaligned 32 bits reads in xor_and_store -> join in 64bit in half_aligned_long -> 64bit xor in xor_and_store -> on 2,4,6,8 round 256bit shift on xi0xi1xi2xi3 in xor_and_store -> 256bit shift again in ht_store -> split in 32bit, and write in ht_store

must be rewrited to:

unaligned 32 bits reads  - > 32 bit xor -> 256bit shift -> 32 or 64 bit, or vector store
or
64 bits reads -> 64 bit xor -> 64 bit 256bit shift -> 64bit or vector store
or
64 and 32 bit reads -> 64 and 32 bit xor -> mixed 256bit shift -> 64bit or 32bit or vector store

depend on round

sr. member
Activity: 652
Merit: 266
Quote
Well... erm... dunno how to say this, but not only is vdrop+.rom not edited in Heliox's style, it doesn't seem to be edited for voltage AT ALL - even in the ways that DON'T work. All the DPM states on vdrop+.rom point stock (into the voltage table, which has not been changed). The core clocks have been dropped pretty much across the board, though. Default memory clock was changed to 2080, but the rest of the memory states are untouched.

Heliox/Eliovp would (in a low-power ROM) have added a new VID for the initialization of the regulator in VoltageObjectInfo (changing the length of the table), as well as a value for it, which allows for global core undervolts that apply to every power state.
https://forum.ethereum.org/discussion/9650/sapphire-rx-480-nitro-oc-8gb-11260-01-20g-modded-bios-29-mh-downvolt
Here is where I got these bioses. I haven't touched vbios for 8 years...last time I modded vbios was when I got Radeon 9800Pro for PC and flashed it with Mac rom to put it in my G4 Smiley
So basicly my knowledge of this is pretty much narrowed to minimum. Don't judge me too harsh... I just started mining (1/2 weeks ago) Smiley
lol you will see me on that thread also I was using that rom for my nitros 8G Oc thanks wolfO for info
Yeah, I edited my original bioses with Polarisbioseditor and replaced 2000mhz with 1750mhz memory strap. Reflashed and overclocked with 100Mhz memory. So I'm getting better results with original overclocked bios...the only - is power usage and noise of fans. Still trying to find a way to undervolt vcore, all attempts unsuccessful for now.

Yeah, silentarmy does NOT take kindly to copied straps, I noticed. I had to re-do half of my 380's strap to get any improvements at all. This is likely because of the writes used - Eth you write once (more or less) and read frequently, just different pieces. Doing a read after a write incurs a delay, for example, so you have to be more creative.
For the "Ignorant" in this sphere to rely on "Creativity" equals "Stupidity", so better wait and spare your hardware instead of DO and waste your time AND hardware Smiley ( reffering myself)

It's not really stupidity, I spent a long ass time learning more than I EVER, EVER wanted to know about GDDR5 SDRAM.
When you know what you are doing is actually fun...but not having a clue is pointless to even try(if you don't have spare hardware). Every knowledge comes with a cost. If you are willing to pay it - you're on...else you simply use someone else's experiments and pray not to break anything. Look at the mass of miners(including me -  well I'm pretty new to this) - using Claymore, optiminer, ccminer, sgminer and **miner without even looking into the code of these projects. For example nicehash nhequiminer is still open source...some people just do cmake..make and then post stupid topics like "Why can't I specify different stratum server other than nicehash?). Now I'm reading claymore threads...every second post is related to instability, broken systems and etc. Why? Because simply noone cares about their hardware if he/she sees big numbers on the screen. Lets take that 2.5% fee. Its really simple to gain it for you, just create local proxy and add claymore's donation address as a worker, point mining pool to your local address (127.0.0.1 ex.) and you will have his 2.5% fee...Why am I writing this? Because sheeps will never understand it and will laugh at me or just because this is the right thing to do.
sr. member
Activity: 652
Merit: 266
Quote
Well... erm... dunno how to say this, but not only is vdrop+.rom not edited in Heliox's style, it doesn't seem to be edited for voltage AT ALL - even in the ways that DON'T work. All the DPM states on vdrop+.rom point stock (into the voltage table, which has not been changed). The core clocks have been dropped pretty much across the board, though. Default memory clock was changed to 2080, but the rest of the memory states are untouched.

Heliox/Eliovp would (in a low-power ROM) have added a new VID for the initialization of the regulator in VoltageObjectInfo (changing the length of the table), as well as a value for it, which allows for global core undervolts that apply to every power state.
https://forum.ethereum.org/discussion/9650/sapphire-rx-480-nitro-oc-8gb-11260-01-20g-modded-bios-29-mh-downvolt
Here is where I got these bioses. I haven't touched vbios for 8 years...last time I modded vbios was when I got Radeon 9800Pro for PC and flashed it with Mac rom to put it in my G4 Smiley
So basicly my knowledge of this is pretty much narrowed to minimum. Don't judge me too harsh... I just started mining (1/2 weeks ago) Smiley
lol you will see me on that thread also I was using that rom for my nitros 8G Oc thanks wolfO for info
Yeah, I edited my original bioses with Polarisbioseditor and replaced 2000mhz with 1750mhz memory strap. Reflashed and overclocked with 100Mhz memory. So I'm getting better results with original overclocked bios...the only - is power usage and noise of fans. Still trying to find a way to undervolt vcore, all attempts unsuccessful for now.

Yeah, silentarmy does NOT take kindly to copied straps, I noticed. I had to re-do half of my 380's strap to get any improvements at all. This is likely because of the writes used - Eth you write once (more or less) and read frequently, just different pieces. Doing a read after a write incurs a delay, for example, so you have to be more creative.
For the "Ignorant" in this sphere to rely on "Creativity" equals "Stupidity", so better wait and spare your hardware instead of DO and waste your time AND hardware Smiley ( reffering myself)
sr. member
Activity: 652
Merit: 266
Quote
Well... erm... dunno how to say this, but not only is vdrop+.rom not edited in Heliox's style, it doesn't seem to be edited for voltage AT ALL - even in the ways that DON'T work. All the DPM states on vdrop+.rom point stock (into the voltage table, which has not been changed). The core clocks have been dropped pretty much across the board, though. Default memory clock was changed to 2080, but the rest of the memory states are untouched.

Heliox/Eliovp would (in a low-power ROM) have added a new VID for the initialization of the regulator in VoltageObjectInfo (changing the length of the table), as well as a value for it, which allows for global core undervolts that apply to every power state.
https://forum.ethereum.org/discussion/9650/sapphire-rx-480-nitro-oc-8gb-11260-01-20g-modded-bios-29-mh-downvolt
Here is where I got these bioses. I haven't touched vbios for 8 years...last time I modded vbios was when I got Radeon 9800Pro for PC and flashed it with Mac rom to put it in my G4 Smiley
So basicly my knowledge of this is pretty much narrowed to minimum. Don't judge me too harsh... I just started mining (1/2 weeks ago) Smiley
lol you will see me on that thread also I was using that rom for my nitros 8G Oc thanks wolfO for info
Yeah, I edited my original bioses with Polarisbioseditor and replaced 2000mhz with 1750mhz memory strap. Reflashed and overclocked with 100Mhz memory. So I'm getting better results with original overclocked bios...the only - is power usage and noise of fans. Still trying to find a way to undervolt vcore, all attempts unsuccessful for now.
sr. member
Activity: 728
Merit: 304
Miner Developer
I have been tweaking disassembled GCN codes of SA's kernels, and there seems to be quite a bit of room for performance enhancements, especially by optimizing global memory access by reordering flat_store_dword and s_waitcnt in ht_store(). @eXtremal, how are your next batch of optimizations coming along? If they are almost ready, I will wait for them. Otherwise, I will optimize the OpenCL kernel myself and then tweak the GCN code.
sr. member
Activity: 347
Merit: 255
Quote
Well... erm... dunno how to say this, but not only is vdrop+.rom not edited in Heliox's style, it doesn't seem to be edited for voltage AT ALL - even in the ways that DON'T work. All the DPM states on vdrop+.rom point stock (into the voltage table, which has not been changed). The core clocks have been dropped pretty much across the board, though. Default memory clock was changed to 2080, but the rest of the memory states are untouched.

Heliox/Eliovp would (in a low-power ROM) have added a new VID for the initialization of the regulator in VoltageObjectInfo (changing the length of the table), as well as a value for it, which allows for global core undervolts that apply to every power state.
https://forum.ethereum.org/discussion/9650/sapphire-rx-480-nitro-oc-8gb-11260-01-20g-modded-bios-29-mh-downvolt
Here is where I got these bioses. I haven't touched vbios for 8 years...last time I modded vbios was when I got Radeon 9800Pro for PC and flashed it with Mac rom to put it in my G4 Smiley
So basicly my knowledge of this is pretty much narrowed to minimum. Don't judge me too harsh... I just started mining (1/2 weeks ago) Smiley
lol you will see me on that thread also I was using that rom for my nitros 8G Oc thanks wolfO for info
legendary
Activity: 3248
Merit: 1070
ZEC, like ETH, wants you to max out the memory clock as much as you can on a 1070 to maximise hashrate - even if you have to drop the core clock some to keep the card stable.

 NVidia support for OpenCL is very much a grudging afterthought - they really want you do use CUDA 'cause they OWN CUDA.



that is strange then because i get more hash when i underclock my 1070 than overclocking, overclocking has benefit only with very high oc like +700 mem at least
sr. member
Activity: 652
Merit: 266
Hey, which command can be used to reduce % to GPU ?
There is no such option right now.
hero member
Activity: 597
Merit: 500
Hey, which command can be used to reduce % to GPU ?
legendary
Activity: 1498
Merit: 1030
ZEC, like ETH, wants you to max out the memory clock as much as you can on a 1070 to maximise hashrate - even if you have to drop the core clock some to keep the card stable.

 NVidia support for OpenCL is very much a grudging afterthought - they really want you do use CUDA 'cause they OWN CUDA.

sr. member
Activity: 728
Merit: 304
Miner Developer
zawawa, on a 1070 what would be the optimal thing to do under clock or over clock the gpu?  

how much more do you think we can get out of the 1070's  

I don't own a 1070, so I cannot give you a definite answer. If my experience with 980 Ti is applicable, I would bump up the memory clock. I am pretty sure other people have better answers, though.

As for the second question, I would venture to guess that there is some room for performance improvements. Generally speaking, NVIDIA's OpenCL implementations suck balls, so CUDA ports should be much faster than the original OpenCL program, unless you take advantage of dynamic, run-time compilation of the kernel, which is only possible with OpenCL. We will see.
sr. member
Activity: 430
Merit: 254
Right now, I'm focusing on making cards hash more on memory-using algos across the board with VBIOS mods.

If you have any timings for Hynix BFR ICs that are similar in performance to The Stilt's Hynix AFR and Elpida BBBG timings I'd love to test them out Cheesy

I've not yet experimented with values enough to find out EXACTLY how they interact for best performance.

Oh well it was worth a shot. Cheesy I guess it is a very time-consuming thing to figure out. Stilt said himself it took 50 hours to do the timing set for one card.
hero member
Activity: 494
Merit: 500
zawawa, on a 1070 what would be the optimal thing to do under clock or over clock the gpu? 

how much more do you think we can get out of the 1070's 
sr. member
Activity: 728
Merit: 304
Miner Developer
Wolf, what is your private ZEC miner capable of ?

You're assuming I have one. Right now, I'm focusing on making cards hash more on memory-using algos across the board with VBIOS mods.

A quick question.
4 Identical RX 480 Sapphire OC 8GB. 2 of them handle well vdrop+ by ElioVP the other 2 make a lot of gpu faults... Any idea?
P.S. First 2 were clocked to 1236Mhz Core the other to 1306Mhz default.
Thanks in advance.
@zawawa
Any chance to gain access to your modifications?


I am sure I will release my code when eXtremal's new optimization comes out.
legendary
Activity: 1764
Merit: 1024


Scrypt GPU mining ended in the fall of 14 without private kernels. x11 started up shortly there after, became unprofitable at the beginning of winter. Gridseed weren't ASICs either, the first ones weren't very profitable or good. You may have just remembered those little USB things coming out and thought 'well those were ASICs', they weren't. There were a lot of really bad ASICs. Gridseeds were never a good deal.

Unless you were running private kernels yourself, it wasn't happening.

What other algo are you looking at that's mature? Dagger doesn't count. That's a very niche scenario and it's bound almost exclusively by bus width. The GPUs never get a chance to even be close being fully utilized.

R9-290 has a 512bit bus as was already mentioned.

Who tests GPUs on sha-256? How about trying something remotely relevant to the discussion like say NeoS, Lyra2v2, or even x11. People haven't made optimized miners for Sha in years. As mentioned before if you're talking about 'theoretical usage' scenarios, video games are a very good example of that as GPUs are made to run as fast as possible on them.

Memory usage doesn't need to be about bandwidth or bus width, it could just be the total memory usage as well. Not just that, it doesn't need to be restricted JUST to throughput, it can utilize memory and still do a lot of processing on GPUs. At this point though you're just making shit up and theorycrafting again.

You can blame latency all you want, but Fury not only has a 4096 bit bus, but also gobs of memory bandwidth, it's not eight times faster then R9-290 or even twice as fast. It's not just all about memory speeds here or even latency.

 The Gridseed 3355 WAS in fact an ASIC - and on scrypt it was more efficient than anything GPU based at the time by quite a bit. single side of an "80 blade" would pull 2.5 Mhash/sec at 40 watts where the best GPUs of the time were pulling less than half that at a LOT more power (7990 was an exception with it's pair of cores, it could actually manage a bit more than half the hashrate but pulled a TON more power to do so).

 Dagger (ETH) isn't "bus width limited, it's memory access limited - NOT the same thing  or the RX 480 wouldn't even be close to matching the R9 290 on hashrate.

 For MOST usage, the Fury is a LOT faster than the R9 290 - but on ETH it's barely in the same ballpark despite the much higher "in theory" memory bandwidth. *SOMETHING* certainly keeps it uncompetative with much older cards with lower rated memory bandwidth.



ASIC in name, not in the ability to disrupt network difficulty. They were worthless when it came to a cost/performance ratio. Only thing they had was efficiency, but they hashed so little it didn't even matter.

Dagger is definitely limited by bus width with the exception of the outliers such as HBM and GDDR5-X. Cards are different, but you can sum up general performance by that. You're welcome to offer some of your amazing specs to dispute that if you wish.


 (cores can't be compared across generations of cards or chip makers).

 AMD cores in the GCN generations have been pretty consistant on their performance, if anything they've gotten a hair MORE efficient with generational changes.

 Comparing GCN to Terrascale cores or to NVidia cores (which I've NOT DONE AT ALL, strawman comment there) is a lot more problematical.

AMD reused chips across multiple card generations, such as 2XX and 3XX. You can't compare compute cores between a Fury and a Hawaii or to different manufacturers like Nvidia. Most of the time the architecture changes with each new generation, but AMD did rebrands. I'm talking particularly about different generations of chips, not rebrands. if you want to read into that and argue rebrands go for it.

Let's talk about it, then. Most 290Xs (I've had three, and others report the same) clock to 1500 on the memory. 20% OC on the clock.
Most 480s can BARELY get to 2250 - which, over a stock clock of 2000, is only a 12.5% increase. Oh, yeah, and that's still on a smaller bus.

Yup. I didn't even hit on that one, but there definitely isn't more room on AMDs newer models. They made sure to tighten that up as much as possible
sr. member
Activity: 430
Merit: 254
Right now, I'm focusing on making cards hash more on memory-using algos across the board with VBIOS mods.

If you have any timings for Hynix BFR ICs that are similar in performance to The Stilt's Hynix AFR and Elpida BBBG timings I'd love to test them out Cheesy
sr. member
Activity: 652
Merit: 266
Quote
Well... erm... dunno how to say this, but not only is vdrop+.rom not edited in Heliox's style, it doesn't seem to be edited for voltage AT ALL - even in the ways that DON'T work. All the DPM states on vdrop+.rom point stock (into the voltage table, which has not been changed). The core clocks have been dropped pretty much across the board, though. Default memory clock was changed to 2080, but the rest of the memory states are untouched.

Heliox/Eliovp would (in a low-power ROM) have added a new VID for the initialization of the regulator in VoltageObjectInfo (changing the length of the table), as well as a value for it, which allows for global core undervolts that apply to every power state.
https://forum.ethereum.org/discussion/9650/sapphire-rx-480-nitro-oc-8gb-11260-01-20g-modded-bios-29-mh-downvolt
Here is where I got these bioses. I haven't touched vbios for 8 years...last time I modded vbios was when I got Radeon 9800Pro for PC and flashed it with Mac rom to put it in my G4 Smiley
So basicly my knowledge of this is pretty much narrowed to minimum. Don't judge me too harsh... I just started mining (1/2 weeks ago) Smiley

Heh, don't worry about it, I won't. I'll just tell you what's there. Or not there. Looking it over now.
I really appreciate your opinion and help.
Thank you. PM your T address and I will put one VM core to mine on it 24/7.

Okay, so, the regulator initialization hasn't been touched, so there's no global core under/over volting going on - at least not that way. But... what is this? Unlike the other two ROMs you posted, this does undervolt the states... in a really bizarre fashion. Instead of updating the pointers in the DPM states into the voltage table to all point to DPM state 3... it has written DPM state 3's voltage value into the voltage table for entries 4, 5, 6, and 7. I can only guess at the cause of this oddity, but it might be something Polaris BIOS Editor does.

The 1625 strap has been copied up to 1750 and 2000 - I would HIGHLY advise against this for stability reasons on Samsung memory. Elpida and Hynix are happy even taking the 1500 strap all the way up to 2000, but Samsung hates it in my experience, and while it may look good at first, keeping the cards running will probably be hell.

The clocks in this VBIOS have been edited for most (if not all) of the DPM states - similar to your v4.rom and vdrop+.rom. Default memclock on it is 2080 for state 1 (performance) - the idle state is untouched.

The name of the VBIOS image file implies it was dumped from a Sapphire Nitro+, but it doesn't look like it to me. Sapphire Nitro+ cards (all 470s and 480s I've seen, regardless of mem type) have an offset in the voltage regulator init that does a (by default) overvolt of +25mV to core across all power states. This VBIOS does not. But it does go to a card with the same connectors - 2 DisplayPort and 2 HDMI Type A, and one DVI-D.

You don't need to donate, don't worry about it.
Ok so your advise is to return them to stock until valid and tested vbios is released for OC version?


I would recommend copying the 1750 strap to 2000 for stability - you should also be able to clock higher. As for the voltage mods... honestly, not entirely sure they will take effect (at least not on both Windows and Linux), but if they drop wattage for you and are stable, then use 'em.
Sticking to this config for now I guess then. I don't see any gpu faults for now.
When my new 480s arrive I will play with one of them...just now I need them to mine for a while to be able to pay powerbill  with this freefall of ZEC Smiley
Thank you for the time spent on my issues.
legendary
Activity: 1176
Merit: 1015

You don't need to donate, don't worry about it.

I really love this guy.

Most of the time he just puts some gems in front of our noses.
Pages:
Jump to: