Pages:
Author

Topic: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner - page 2. (Read 877850 times)

member
Activity: 81
Merit: 1002
It was only the wind.
What is the optimal Graphics Card for this currently?
Needing a new graphics card any way, may as well let mining decide for me as I don't play many modern games.
R9 Nano, R9 3xx, R9 Fury etc?
Electricity is 8cents per KWh

For "this." No algo specified, no coin specified, we'll just read it from your mind, then.
member
Activity: 81
Merit: 1002
It was only the wind.
This is the exact error message I get trying to build sgminer 5.2.1

Code:
  CCLD     sgminer
sgminer-algorithm.o:(.data+0x1058): undefined reference to `lyra2rev2_regenhash'
collect2: error: ld returned 1 exit status



This is fixable! Give me a minute, I'll clone NH's repo on Freya, check out the 5.2.1 tag, and do it myself so I can give clear, exact instructions on fixing it - I'll do a quick test on mining with it, too.

Thanks! Managed to FU my linux install trying to put on the full fglrx drivers from the .run file. Seems to be a problem with the xorg file but I can't work out what. Would be good to have ADL to OC the AMD card.

Didn't test mining, but add this to the end of Makefile.am and redo autogen.sh:

Code:
sgminer_SOURCES += algorithm/lyra2rev2.c algorithm/lyra2rev2.h algorithm/lyra2v2.c algorithm/lyra2v2.h algorithm/spongev2.c algorithm/spongev2.h

member
Activity: 81
Merit: 1002
It was only the wind.
This is the exact error message I get trying to build sgminer 5.2.1

Code:
  CCLD     sgminer
sgminer-algorithm.o:(.data+0x1058): undefined reference to `lyra2rev2_regenhash'
collect2: error: ld returned 1 exit status



This is fixable! Give me a minute, I'll clone NH's repo on Freya, check out the 5.2.1 tag, and do it myself so I can give clear, exact instructions on fixing it - I'll do a quick test on mining with it, too.
member
Activity: 81
Merit: 1002
It was only the wind.
I hear several algos break pretty bad on Tonga cards - which there are quite a few of, now. 285, 380, 380X...

What do you mean by that? Certain algo optimizations for Tonga aren't giving expected results? If so  Sad I have few Tongas, but haven't played with them enough. Couldn't start the optimized win-32-bit shit at all, on stock 64-bit sgminer Tonga hashes way slower comparing to Tahiti ... although temps rarely exceed 50 degrees

No, the optimizations for Tonga were never sold to NiceHash from whoever ripped off Kachur - so it's less that they're not giving expected results... more like they're giving EXACTLY the expected results.
member
Activity: 81
Merit: 1002
It was only the wind.
How does one build a linux binary from a release tarball and not directly from the latest git clone?
I've found the windows 5.3.0 release binary to have a detrimental impact on the qubit algo compared to the 5.2.1 version. Not tried others yet.
Both the linux release brinay and a linux build core dumps on most algo's on my linux system, hence I would like to try to build the 5.2.1 version on linux.

If your question is "How do I get the sgminer-5-x-x-optimized version built on linux and use those optimized bins" you can't. I've reverse engineered the darkcoin-mod GPU binary (although it took like 2min) in order to figure out how to patch sgminer (the host code) to run that binary - but any binary that's pre-supplied by NiceHash may or may not ONLY be compatible with the Windows binary provided by NiceHash.

Before you try to tell them they have to provide the sources for at least the SGMiner host code for the binary Windows executables they distribute - don't bother. I tried that on this thread, along with someone else - they claim they don't have the sources, and I actually believe them. Kachur obviously didn't sell it to them, so they bought it basically stolen off someone Kachur sold it to - hence the lack of support for several cards as there are no bins, and the fact that NH would have nothing to gain hiding the sources of the binaries if they had them.
member
Activity: 81
Merit: 1002
It was only the wind.
Together with the release of new features on NiceHash (see https://www.nicehash.com/index.jsp?p=news&id=64) we also released a new version of sgminer with some bugfixes and Blake256 algorithms integration, see Release here: https://github.com/nicehash/sgminer/releases/tag/5.3.0

Keep on hashing Wink
In last nicehach 1.3.0 on radeon 7850 on Lyra2rev2 sgminer 5.1.0 create ckolivas bin and all shares are rejected. May be it's error with algo name which nicehash sends to sgminer. Runme.bat works fine with this algo. And may be a suggestoin for the next nicehash: do not hide miner windows during benchmark (option in settings?). Sometimes it's interesting to know what happens in any moment.

I hear several algos break pretty bad on Tonga cards - which there are quite a few of, now. 285, 380, 380X... I would be willing to bet the same happens for Fiji cards.

EDIT: Come to think of it, that will likely happen for ALL cards with instruction sets incompatible with GCN 1.0/1.1. They were only sold stolen Pitcairn and Tahiti bins, apparently - I just did a diff on the darkcoin-mod ones for "SGMiner 5.1.1-optimized" that they package with NiceHash Miner, and they're exactly the same. Thus Hawaii may have some room left for optimizations, seeing as they don't even have GCN 1.1 code, just GCN 1.0, which happens to be compatible.

I also happened to notice not only did they NOT compile SGMiner statically for Windows, but they used shared libs in such a stupid way that I'm forced to assume they don't know how to do a static build. The cURL lib, pdcurses lib, and several others are actually DUPLICATED in the different folders - the point of shared libraries is, you know, to share them among executables. On top of this, there's no merging of ccminer code at all. Hey, just put in three ccminer binaries totaling 50MB and fuck it, it's all good. Three seperate cpuminer binaries, too - for different CPU feature levels. I guess because calling cpuid to figure it out and using if/else is too damned hard. All the GPU miners are 32-bit PE executables, except ethminer? Then the CPU miners are PE32+ (x64) binaries, too. I can get that they want to support GPU mining on 32-bit OSes, too, but then wtf is with ethminer? Fuck it, I don't understand the whole thing.
member
Activity: 81
Merit: 1002
It was only the wind.
Wolf0, still poking at the miner's kernels Cheesy

By the way, I looked at the new sgminer code and saw a new neoscrypt kernel made by you.
I tried to use it but it makes my miner hang... Do we still require the 14.6 drivers ?

15.7 should be fine, but don't blame me for whatever they may have done on the host side.
member
Activity: 81
Merit: 1002
It was only the wind.
I speak to him regularly; he told me he did it.

Oh, so he implemented it in sgminer? I'm interested, where can I see the commits? Would love to pull this into our fork, please let me know if you get that info.

Secondly, why do you care one iota if the block header is rebuilt or not? As long as there is a response to stratum reconnect messages, it will work for any rig rentals.

Yeah, but NiceHash is hash rental and not rig rental ... we do share validation and other stuff and we highly depend on full strict stratum implementation...

Best regards,
kenshirothefist

He didn't release it - which is why I said talk to him. You'll have to probably buy it off of him.

Also, I get this - but this would also mean that you're unable to support coins with an uncommon block header, due to trying to calculate a new block header based on the extranonce and failing because you don't know what they header is supposed to look like.

If you found it would be profitable enough, you'd do it. It wouldn't even be hard - instead of rebuilding a block header, you simply put in the 12 bytes of nonce and hash. The pool has one nonce that's immutable to the miner to prevent duplicate work among workers - the rest you can go nuts with. Simply increment and hash - no need to do all that bullshit.

Either you think Decred isn't worth the effort/dev costs, or the devs you've chosen are woefully incapable... I got curious and checked the commit history when I tried the miner as soon as it came out against Decred and it didn't work.
member
Activity: 81
Merit: 1002
It was only the wind.
You mean besides yiimp and Suprnova? I could do it, but tpruvot/Epsylon3 already did - ask him for it.

Well, unfortunately this is not stratum, really. It's just getwork over stratum. Still waiting for "real" stratum support, both in sgminer/ccminer and pools.

I could do it, but tpruvot/Epsylon3 already did - ask him for it.

tpruvot only works on ccminer, there is a lack of development on sgminer.

I speak to him regularly; he told me he did it. Secondly, why do you care one iota if the block header is rebuilt or not? As long as there is a response to stratum reconnect messages, it will work for any rig rentals.
member
Activity: 81
Merit: 1002
It was only the wind.
Together with the release of new features on NiceHash (see https://www.nicehash.com/index.jsp?p=news&id=64) we also released a new version of sgminer with some bugfixes and Blake256 algorithms integration, see Release here: https://github.com/nicehash/sgminer/releases/tag/5.3.0

Keep on hashing Wink

Implemented Blake-256 14 round and broken for Decred, the ONLY coin worth mining using it. *slow clap*

We are focused on algos that can be offered on NiceHash. Decred still hasn't got any stratum support. But Wolf0, you're really very welcome to submit pull request to add add the support for "Blake-256 14 Decred" algorithm. Can you do this?

You mean besides yiimp and Suprnova? I could do it, but tpruvot/Epsylon3 already did - ask him for it.
member
Activity: 81
Merit: 1002
It was only the wind.
Together with the release of new features on NiceHash (see https://www.nicehash.com/index.jsp?p=news&id=64) we also released a new version of sgminer with some bugfixes and Blake256 algorithms integration, see Release here: https://github.com/nicehash/sgminer/releases/tag/5.3.0

Keep on hashing Wink

Implemented Blake-256 14 round and broken for Decred, the ONLY coin worth mining using it. *slow clap*
member
Activity: 81
Merit: 1002
It was only the wind.
Hi guys,

I decided to upgrade my 7950's to 280x's but I'm having a mare with AMD drivers.

Also I can't remember the relationship between driver and SDK.

What driver and SDK should I install on Windows 8.1 ?

Many thanks

Lee


Newest version of both will work fine Smiley

I always wanted to ask that - why do you even need SDK?
I don't have it installed and mining just fine.

You don't, unless you're building the miner.
member
Activity: 81
Merit: 1002
It was only the wind.
i hope  WOLF0  will make  a good pablic  algo for  decred coin  when  the mining will start.....





ps: wolf0 u must know that u r my favorite...U  prove yr work with facts &  not with many sauces.....THANKS FOR ALL!

I hope Wolf0 will prove me wrong, but I fear there is not a lot to optimise on blake 14 rounds.

I might be able to prove you half wrong. Got a 6970?

No, smallest card I have is a 280x

Too bad, I could probably make that thing do wonders on Blake. I've got an idea!

Maybe a 5750 (juniper) will do? :-)

VLIW5, is it? Perhaps.

i hope  WOLF0  will make  a good pablic  algo for  decred coin  when  the mining will start.....





ps: wolf0 u must know that u r my favorite...U  prove yr work with facts &  not with many sauces.....THANKS FOR ALL!

I hope Wolf0 will prove me wrong, but I fear there is not a lot to optimise on blake 14 rounds.

I might be able to prove you half wrong. Got a 6970?

No, smallest card I have is a 280x

Too bad, I could probably make that thing do wonders on Blake. I've got an idea!

Maybe a 5750 (juniper) will do? :-)

Got a 78** Series. Maybe that'll do?

GCN based. No go.
member
Activity: 81
Merit: 1002
It was only the wind.
i hope  WOLF0  will make  a good pablic  algo for  decred coin  when  the mining will start.....





ps: wolf0 u must know that u r my favorite...U  prove yr work with facts &  not with many sauces.....THANKS FOR ALL!

I hope Wolf0 will prove me wrong, but I fear there is not a lot to optimise on blake 14 rounds.

I might be able to prove you half wrong. Got a 6970?

No, smallest card I have is a 280x

Too bad, I could probably make that thing do wonders on Blake. I've got an idea!

Maybe on Fiji as well?

Nope, Fiji is GCN based. No go.
member
Activity: 81
Merit: 1002
It was only the wind.
i hope  WOLF0  will make  a good pablic  algo for  decred coin  when  the mining will start.....





ps: wolf0 u must know that u r my favorite...U  prove yr work with facts &  not with many sauces.....THANKS FOR ALL!

I hope Wolf0 will prove me wrong, but I fear there is not a lot to optimise on blake 14 rounds.

I might be able to prove you half wrong. Got a 6970?

No, smallest card I have is a 280x

Too bad, I could probably make that thing do wonders on Blake. I've got an idea!
member
Activity: 81
Merit: 1002
It was only the wind.
i hope  WOLF0  will make  a good pablic  algo for  decred coin  when  the mining will start.....





ps: wolf0 u must know that u r my favorite...U  prove yr work with facts &  not with many sauces.....THANKS FOR ALL!

I hope Wolf0 will prove me wrong, but I fear there is not a lot to optimise on blake 14 rounds.

I might be able to prove you half wrong. Got a 6970?
member
Activity: 81
Merit: 1002
It was only the wind.
Wolf0 I'm curious to know if you tried that technique (split to multiple work items) on a kernel and how was the outcome.

Tried it with JH, it worked well, but was kinda underwhelming. I used LDS and not shuffle, though - additionally, it would probably benefit a lot more if the throughput was lower and MOST X11 kernels were properly done to handle multiple hashes per work-item. Would lessen the LDS usage on JH, at least, gaining waves in flight.
member
Activity: 81
Merit: 1002
It was only the wind.
member
Activity: 81
Merit: 1002
It was only the wind.
member
Activity: 81
Merit: 1002
It was only the wind.
That shows the importance of simd. Now you know where to work ;-)
Thanks for the information, please keep us updated on the progress!
Please tell me if I can be of any help.

I don't know - I know one thing for certain now, though - Kachur's Blake was about the same speed as mine, his BMW, however, needed a bit of work. Mine improved overall X11 hash by 1.35% (remember that BMW itself is quite a small part of X11, so the raw improvement in my BMW over his is much larger.)

EDIT: search2, originally Groestl-512, did not take to a simple kernel replacement and will have to be investigated further (manual study of the disassembly.) Skipping for now.

EDIT2: search2 may not have been fucked because of a difference in output, but in where the bloody constants are in global. For JH, I'm going to make an all new test kernel which takes a read-only buffer for JH's constants rather than trying to reference constant memory right now. Simpler. I should then be able to put that in place of the Kachur JH and modify SGMiner to pass a constant buffer on that kernel.

Is search2 faster than yours? or is it just simd?
Maybe Kachur has found a way to make AES-like algos better...
BTW I wouldn't mind a frankenbin if it's faster and stable ;-)

I can't tell - without straight up replacement of a kernel, I dunno if he's done some kind of fuckery with part of a hash in one kernel, and part in another, for example. What I suspect is SIMD has been cut into two parts (at least.)

Now, even if his Groestl is faster than mine, my current Groestl is outdated anyways. My R & D area has a bitsliced Groestl that I have not yet played too much with - parallelization using 4 work items like it's done in CUDA should be possible. I can drop to GCN ASM for ds_swizzle_b32 - limits me to a 4-way, as it's not a 32-way shuffle like CUDA, but it's enough for me. I've just got a lot to do atm - maybe there is something we could work on together... a Groestl, perhaps? If you could look at the code and see if you could split it over multiple work-items and use LDS for the data sharing, I could probably remove said LDS usage by disassembling and modifying the kernel before reassembling it?

SIMD: tonight I was thinking about it and slicing into two parts is the natural way of doing it; I think I could try that. The only little annoyance is that the data to be passed between the (at least two) parts won't just be a hash but a bigger set of data, so the standard sgminer searchX() system wouldn't work.

GROESTL (and similar): I always had the idea that nvidia had to do the bitslice thing because shared memory was slower than on GCN; in fact nvidia bitsliced is on par with GCN LDS. As a logical consequence, I think that if bitslice on GCN is presumed to be slower than on nvidia, I wouldn't even try it.

You might not be looking at the big picture with Groestl - look at that fucking shitty amount of waves in flight you get due to LDS (ab)use.

That's an issue with <= tahiti only, hence why I hate optimizing for those chips ;-)

Not the case - two waves in flight, and your kernel is STILL not actually using the GPU's parallelism like it's supposed to be. One Groestl-512 hash is a big job, and it's parallelizable. If you're doing a throughput of 64 hashes per local workgroup, then use 256 for Groestl, and do 4 work-items per actual hash. Tune to taste.

I understand what you mean: it's like the good old cgminer "vector size". I will think about it.
Besides, I haven't worked on groestl for a long while, but on whirlpool and variants I can easily get 3 waves on >= hawaii.
It's a lighter job, I know, but I haven't had any interest in developing groestl recently.

No, it is the OPPOSITE of vector size. You don't get how the GPU is ACTUALLY supposed to solve issues, I don't think - it really doesn't fucking like large code size, or very complex problems in one work-item - you know this.

Vectors were profitable before because of the old architectures - VLIW based. GCN abolished hardware vectors, and instead made VGPRs 4 bytes. Why, you may ask? Occupancy! This way, if you need to work on a problem that can't be efficiently vectorized like that, you don't waste most of your VGPR.

But, but, but... mah parallelism! GCN has you covered - you just need to think of the shit differently. Instead of parallelizing in vectors, do it in work-items. To give you the cleanest example I've worked with demonstrating this (in X11), take Echo-512.

You have a 256 byte state which I'll now refer to as W. W can be represented as an array of 16 uint4s. If you're looking at the shitty darkcoin-mod.cl trying to visualize this, just look at the 64-bit W vars and imagine them as 32-bit, and an array. Now, if I was going to demonstrate this technique with Echo - I have an array of 4 uint4s. This is my W. To figure out which part of the hash you are, you can choose two ways: launch the kernel with throughput * 4, 1, 1 local size, or do throughput, 4, 1 local size. Since the latter is cleaner, I'll assume that notation: lid = get_local_id(0), and hashid = get_local_id(1).

if hashid is < 2 (i.e. 0 or 1) - we fill up W with (512, 0, 0, 0) (uint4, remember) over all four array indices. If hashid == 2, W becomes the input (input being 16 uints, this may be represented as 4 uint4s, as well), and if hashid == 3, we fill up W with the odds & ends - for X11, these are (0x80, 0, 0, 0) for W[0], (0, 0, 0, 0) for W[1], (0, 0, 0, 0x02000000) for W[2], and (512, 0,0, 0,) for W[3]. Now, go pull up darkcoin-mod.cl, and look at it until the this and the previous paragraph make sense.

I'll continue with rounds and output calculation in another post in just a bit.
Pages:
Jump to: