Pages:
Author

Topic: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner - page 3. (Read 877877 times)

member
Activity: 81
Merit: 1002
It was only the wind.
That shows the importance of simd. Now you know where to work ;-)
Thanks for the information, please keep us updated on the progress!
Please tell me if I can be of any help.

I don't know - I know one thing for certain now, though - Kachur's Blake was about the same speed as mine, his BMW, however, needed a bit of work. Mine improved overall X11 hash by 1.35% (remember that BMW itself is quite a small part of X11, so the raw improvement in my BMW over his is much larger.)

EDIT: search2, originally Groestl-512, did not take to a simple kernel replacement and will have to be investigated further (manual study of the disassembly.) Skipping for now.

EDIT2: search2 may not have been fucked because of a difference in output, but in where the bloody constants are in global. For JH, I'm going to make an all new test kernel which takes a read-only buffer for JH's constants rather than trying to reference constant memory right now. Simpler. I should then be able to put that in place of the Kachur JH and modify SGMiner to pass a constant buffer on that kernel.

Is search2 faster than yours? or is it just simd?
Maybe Kachur has found a way to make AES-like algos better...
BTW I wouldn't mind a frankenbin if it's faster and stable ;-)

I can't tell - without straight up replacement of a kernel, I dunno if he's done some kind of fuckery with part of a hash in one kernel, and part in another, for example. What I suspect is SIMD has been cut into two parts (at least.)

Now, even if his Groestl is faster than mine, my current Groestl is outdated anyways. My R & D area has a bitsliced Groestl that I have not yet played too much with - parallelization using 4 work items like it's done in CUDA should be possible. I can drop to GCN ASM for ds_swizzle_b32 - limits me to a 4-way, as it's not a 32-way shuffle like CUDA, but it's enough for me. I've just got a lot to do atm - maybe there is something we could work on together... a Groestl, perhaps? If you could look at the code and see if you could split it over multiple work-items and use LDS for the data sharing, I could probably remove said LDS usage by disassembling and modifying the kernel before reassembling it?

SIMD: tonight I was thinking about it and slicing into two parts is the natural way of doing it; I think I could try that. The only little annoyance is that the data to be passed between the (at least two) parts won't just be a hash but a bigger set of data, so the standard sgminer searchX() system wouldn't work.

GROESTL (and similar): I always had the idea that nvidia had to do the bitslice thing because shared memory was slower than on GCN; in fact nvidia bitsliced is on par with GCN LDS. As a logical consequence, I think that if bitslice on GCN is presumed to be slower than on nvidia, I wouldn't even try it.

You might not be looking at the big picture with Groestl - look at that fucking shitty amount of waves in flight you get due to LDS (ab)use.

That's an issue with <= tahiti only, hence why I hate optimizing for those chips ;-)

Not the case - two waves in flight, and your kernel is STILL not actually using the GPU's parallelism like it's supposed to be. One Groestl-512 hash is a big job, and it's parallelizable. If you're doing a throughput of 64 hashes per local workgroup, then use 256 for Groestl, and do 4 work-items per actual hash. Tune to taste.
member
Activity: 81
Merit: 1002
It was only the wind.
That shows the importance of simd. Now you know where to work ;-)
Thanks for the information, please keep us updated on the progress!
Please tell me if I can be of any help.

I don't know - I know one thing for certain now, though - Kachur's Blake was about the same speed as mine, his BMW, however, needed a bit of work. Mine improved overall X11 hash by 1.35% (remember that BMW itself is quite a small part of X11, so the raw improvement in my BMW over his is much larger.)

EDIT: search2, originally Groestl-512, did not take to a simple kernel replacement and will have to be investigated further (manual study of the disassembly.) Skipping for now.

EDIT2: search2 may not have been fucked because of a difference in output, but in where the bloody constants are in global. For JH, I'm going to make an all new test kernel which takes a read-only buffer for JH's constants rather than trying to reference constant memory right now. Simpler. I should then be able to put that in place of the Kachur JH and modify SGMiner to pass a constant buffer on that kernel.

Is search2 faster than yours? or is it just simd?
Maybe Kachur has found a way to make AES-like algos better...
BTW I wouldn't mind a frankenbin if it's faster and stable ;-)

I can't tell - without straight up replacement of a kernel, I dunno if he's done some kind of fuckery with part of a hash in one kernel, and part in another, for example. What I suspect is SIMD has been cut into two parts (at least.)

Now, even if his Groestl is faster than mine, my current Groestl is outdated anyways. My R & D area has a bitsliced Groestl that I have not yet played too much with - parallelization using 4 work items like it's done in CUDA should be possible. I can drop to GCN ASM for ds_swizzle_b32 - limits me to a 4-way, as it's not a 32-way shuffle like CUDA, but it's enough for me. I've just got a lot to do atm - maybe there is something we could work on together... a Groestl, perhaps? If you could look at the code and see if you could split it over multiple work-items and use LDS for the data sharing, I could probably remove said LDS usage by disassembling and modifying the kernel before reassembling it?

SIMD: tonight I was thinking about it and slicing into two parts is the natural way of doing it; I think I could try that. The only little annoyance is that the data to be passed between the (at least two) parts won't just be a hash but a bigger set of data, so the standard sgminer searchX() system wouldn't work.

GROESTL (and similar): I always had the idea that nvidia had to do the bitslice thing because shared memory was slower than on GCN; in fact nvidia bitsliced is on par with GCN LDS. As a logical consequence, I think that if bitslice on GCN is presumed to be slower than on nvidia, I wouldn't even try it.

You might not be looking at the big picture with Groestl - look at that fucking shitty amount of waves in flight you get due to LDS (ab)use.

member
Activity: 81
Merit: 1002
It was only the wind.
That shows the importance of simd. Now you know where to work ;-)
Thanks for the information, please keep us updated on the progress!
Please tell me if I can be of any help.

I don't know - I know one thing for certain now, though - Kachur's Blake was about the same speed as mine, his BMW, however, needed a bit of work. Mine improved overall X11 hash by 1.35% (remember that BMW itself is quite a small part of X11, so the raw improvement in my BMW over his is much larger.)

EDIT: search2, originally Groestl-512, did not take to a simple kernel replacement and will have to be investigated further (manual study of the disassembly.) Skipping for now.

EDIT2: search2 may not have been fucked because of a difference in output, but in where the bloody constants are in global. For JH, I'm going to make an all new test kernel which takes a read-only buffer for JH's constants rather than trying to reference constant memory right now. Simpler. I should then be able to put that in place of the Kachur JH and modify SGMiner to pass a constant buffer on that kernel.

Is search2 faster than yours? or is it just simd?
Maybe Kachur has found a way to make AES-like algos better...
BTW I wouldn't mind a frankenbin if it's faster and stable ;-)

I can't tell - without straight up replacement of a kernel, I dunno if he's done some kind of fuckery with part of a hash in one kernel, and part in another, for example. What I suspect is SIMD has been cut into two parts (at least.)

Now, even if his Groestl is faster than mine, my current Groestl is outdated anyways. My R & D area has a bitsliced Groestl that I have not yet played too much with - parallelization using 4 work items like it's done in CUDA should be possible. I can drop to GCN ASM for ds_swizzle_b32 - limits me to a 4-way, as it's not a 32-way shuffle like CUDA, but it's enough for me. I've just got a lot to do atm - maybe there is something we could work on together... a Groestl, perhaps? If you could look at the code and see if you could split it over multiple work-items and use LDS for the data sharing, I could probably remove said LDS usage by disassembling and modifying the kernel before reassembling it?
member
Activity: 81
Merit: 1002
It was only the wind.
Wolf0, how fast do you think your public bins would run if replacing simd with kachur version? Would it be faster than full kachur? If yes, it might be the best kernel for public consumption.

I know it would be if I worked on it some - I tested my Echo implementation vs. Kachur's, mine is faster. I feel kinda let down.

That shows the importance of simd. Now you know where to work ;-)
Thanks for the information, please keep us updated on the progress!
Please tell me if I can be of any help.

I don't know - I know one thing for certain now, though - Kachur's Blake was about the same speed as mine, his BMW, however, needed a bit of work. Mine improved overall X11 hash by 1.35% (remember that BMW itself is quite a small part of X11, so the raw improvement in my BMW over his is much larger.)

EDIT: search2, originally Groestl-512, did not take to a simple kernel replacement and will have to be investigated further (manual study of the disassembly.) Skipping for now.

EDIT2: search2 may not have been fucked because of a difference in output, but in where the bloody constants are in global. For JH, I'm going to make an all new test kernel which takes a read-only buffer for JH's constants rather than trying to reference constant memory right now. Simpler. I should then be able to put that in place of the Kachur JH and modify SGMiner to pass a constant buffer on that kernel.
member
Activity: 81
Merit: 1002
It was only the wind.
Wolf0, how fast do you think your public bins would run if replacing simd with kachur version? Would it be faster than full kachur? If yes, it might be the best kernel for public consumption.

I know it would be if I worked on it some - I tested my Echo implementation vs. Kachur's, mine is faster. I feel kinda let down.
member
Activity: 81
Merit: 1002
It was only the wind.
I finally had time to look at the NiceHash binary-only distribution of SGMiner - well, actually, just the GPU binaries (whether or not the SGMiner binary packaged with it contains malicious code is irrelevant to me) - and I'm... kinda disappointed. A *very* cursory scan of the disassembly tells me that this code isn't using any of the better ideas I've been cooking up since my bins came out - it's simply decent code plus a well-done implementation of the only hash I didn't touch due to it being insanely tedious: SIMD. I'm almost disappointed. Using it without the NiceHash miner (and Windows with it) was child's play; the differences are really rather slight when it comes to host-side calling.

Here's Mithra mining X11 on Linux (NSFW): https://ottrbutt.com/miner/wolfx11-01142016.png

Extending to other X algos should be a minor bit of work - really only a minor modification (or hell, outright replacement, if you're not too good with GCN ASM) of the Echo kernel to ensure the entire output is stored to the hashes buffer, and the output/target can be dropped from Echo - after that, you can just write up (or use the shitty SPH versions) of the additional hashes. Once done, disassemble, append, reassemble, and ensure you call the resulting binary correctly.
member
Activity: 81
Merit: 1002
It was only the wind.
Still selling my Hawaii binary for myr-groestl.

Here the specs for your reference:

Optimised kernel: myriad-groestl (myr-groestl, groestl512 + sha256) for digibyte, myriad, saffroncoin, joincoin, trinity and others.
Speed: 63 Mh/s on r9 290x @1100/150. It is compatible with stock sgminer and includes free future upgrades.

PM for details.

Oh, i thought you were an opensource kind of guy. Guess I miss-read that.  Grin

Supporting opensource doesn't mean all my work must be open. I've spent al lot of time (and I mean thousands of hours) on opensource, supporting linux (since kernel version 0.99), etc.
Oh well I shouldn't need to explain anything!

I can has? Full disclosure: I intend to disassemble it. But I won't make any results public.

Funny question: Wolf0, how old are you?

21, why?
member
Activity: 81
Merit: 1002
It was only the wind.
Still selling my Hawaii binary for myr-groestl.

Here the specs for your reference:

Optimised kernel: myriad-groestl (myr-groestl, groestl512 + sha256) for digibyte, myriad, saffroncoin, joincoin, trinity and others.
Speed: 63 Mh/s on r9 290x @1100/150. It is compatible with stock sgminer and includes free future upgrades.

PM for details.

Oh, i thought you were an opensource kind of guy. Guess I miss-read that.  Grin

Supporting opensource doesn't mean all my work must be open. I've spent al lot of time (and I mean thousands of hours) on opensource, supporting linux (since kernel version 0.99), etc.
Oh well I shouldn't need to explain anything!

I can has? Full disclosure: I intend to disassemble it. But I won't make any results public.
member
Activity: 81
Merit: 1002
It was only the wind.
Nicehash, where's the sources to the exe's you pack into NiceHash Miner? The license stipulates it be available at least upon request. Not the bins, but the SGMiner executables.

The sources for sgminer-5.2.1-general are here: https://github.com/nicehash/sgminer/tree/windows. Unfortunately we don't have sources for sgminer-5.1.0-optimized and sgminer-5.1.1-optimized, because kachur never releases his sources. You all know that we are big open source supporters, we are contributing to open source software, we also release our own sw under open source and we support developers as well (made several tips for many developers in the past). Now, regarding this closed-source sgminer that we're redistributing with NHM -> yes, we know that this is not the fully rightful thing to do - but this way we can give the optimized miners to community and anybody can mine on any pool with these miners. Moreover, before we distribute any closed-source miners we always make a full network sniff of the working miner to make sure that no fake-shares, hidden-pool or any other suspicious stuff isn't plugged into the miner. Hopefully that's OK with you folks, that's the best we can do. Keep on mining! Wink

Best regards,
NiceHash team.

anyone can mine? ...

i cant - you dont have a linux version - closed or open source ... amd or nvidia ...

ill back the opensource side any day - and will do what i can on my side for the devs that are pushing that side of it ...

wolf and pallas have points that cannot be ignored here - and in such a market as this - closed source miners are doing nothing but getting a jump start ...

i have ALL the donation links to the developers pointing to nicehash as the sole site for miners to donate hashrate to ... whether miners do so or not is at their discretion ... there are much bigger plans for the design of the hashing system ill be introducing in the following months - and i need to know that nicehash will also be backing the linux community as much as the windows community ...

just because you do the all the peripheral testing before you relase closed source miners does not mean that you are doing the right thing ... opensource is not about doing the right thing AFTER you do the wrong thing by not supplying the source code upon request ...

so if there has been no request for it - here it is ... can you supply the code used for the sgminer app that you distribute as closed source software? ...

btw - your service that you provide really is second to none ... but really mate - do the right thing for ALL miners - not just windows based miners ...

btw - just to clarify something here ... 'fully right' does not exist ... you are either right or wrong - there is no half way ... you are either driving a car or you are not - there is no 'sort of driving' a car ...

Smiley ...

tanx mate ...

#crysx

I've looked at the disassembly of the X11 Tahiti bin a little bit; if I work at it a while, I MAY be able to work out what they're expecting and write compatible host code for them.
member
Activity: 81
Merit: 1002
It was only the wind.
About the new optimised kernels: I assume they are precompiled bins as usual, but what about the miner part that runs on cpu? At least for quark some kind of host code is needed, for it to hash that fast. Is that code opensource and merged into sgminer main branch?
It's source; mine.

Exactly. sgminer is more or less unmaintained, but you can find neoscrypt.cl in the kernel folder of sgminer-5.2.1-general in the NiceHash Miner. We prefer to keep all opensource, so you're welcome to reuse it anywhere. And of course Wolf has all the credits for the optimized version (but his work was decently paid by us).

Only neoscrypt is opensource. That's because Wolf was generous i assume.

The others, Quark, X11, etc are not open source. Pre-compiled Binaries. Ashame, not able to run on unix based systems.

That's what I was asking: I wanted to run it on linux and have a look at the host code.
bummer :-/

Nicehash, where's the sources to the exe's you pack into NiceHash Miner? The license stipulates it be available at least upon request. Not the bins, but the SGMiner executables.

True, I forgot about the gpl for a moment! I want that too!

I've already gone to work disassembling the GPU binaries - his Echo is very slightly slower than mine, but what appears to be SIMD is interesting.
I'm also working on a 4-way Echo without using LDS - using assembly for shuffle.
member
Activity: 81
Merit: 1002
It was only the wind.
About the new optimised kernels: I assume they are precompiled bins as usual, but what about the miner part that runs on cpu? At least for quark some kind of host code is needed, for it to hash that fast. Is that code opensource and merged into sgminer main branch?
It's source; mine.

Exactly. sgminer is more or less unmaintained, but you can find neoscrypt.cl in the kernel folder of sgminer-5.2.1-general in the NiceHash Miner. We prefer to keep all opensource, so you're welcome to reuse it anywhere. And of course Wolf has all the credits for the optimized version (but his work was decently paid by us).

Only neoscrypt is opensource. That's because Wolf was generous i assume.

The others, Quark, X11, etc are not open source. Pre-compiled Binaries. Ashame, not able to run on unix based systems.

That's what I was asking: I wanted to run it on linux and have a look at the host code.
bummer :-/

Nicehash, where's the sources to the exe's you pack into NiceHash Miner? The license stipulates it be available at least upon request. Not the bins, but the SGMiner executables.
member
Activity: 81
Merit: 1002
It was only the wind.
Quark, X11 it does not work Wolf  Wink

I don't understand what you mean.
member
Activity: 81
Merit: 1002
It was only the wind.
Dear users,

Yet another new version of NiceHash Miner has just been released. The new version 1.2.2.0 brings highly optimized AMD GPU mining for NeoScrypt (up to +300%) and a few bugfixes:

https://www.nicehash.com/index.jsp?p=news&id=56

Keep on mining! Wink


Best regards,
NiceHash team

About the new optimised kernels: I assume they are precompiled bins as usual, but what about the miner part that runs on cpu? At least for quark some kind of host code is needed, for it to hash that fast. Is that code opensource and merged into sgminer main branch?

It's source; mine.
newbie
Activity: 1
Merit: 0
i got the same errors have you already fixed?
jr. member
Activity: 64
Merit: 1
Hey Folks, with the help of TRM (SGMiner compatible miner) I made a SGMiner (and compatible) monitoring and alerting system app for iOS. It's FREE for 1 miner with local and remote monitoring - check it out and let me know what you think!

MINERTRON™ Comprehensive Miner Monitoring & Alerting System App for iOS: https://apps.apple.com/us/app/minertron/id1541270467

Bitcoin talk thread: https://bitcointalk.org/index.php?topic=5298800.new#new
full member
Activity: 686
Merit: 100
Altcoinlog about Sgminer



Guidance on tuning and optimization. Basic commands and frequent beginner problems. Ready Bat files for mining.

Article in Russian

https://altcoinlog.com/sgminer-obzor-nasroika-bat/
Any change a translation to English will happen? ...

It seems to be interesting and we would like to have such details as part of our KnowledgeBase at CWI.

#crysx
You can use Google translate on chrome. It looks like it is not bad at all when you translate from Russian to English.
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
Altcoinlog about Sgminer



Guidance on tuning and optimization. Basic commands and frequent beginner problems. Ready Bat files for mining.

Article in Russian

https://altcoinlog.com/sgminer-obzor-nasroika-bat/
Any change a translation to English will happen? ...

It seems to be interesting and we would like to have such details as part of our KnowledgeBase at CWI.

#crysx
member
Activity: 194
Merit: 29
Altcoinlog about Sgminer



Guidance on tuning and optimization. Basic commands and frequent beginner problems. Ready Bat files for mining.

Article in Russian

https://altcoinlog.com/sgminer-obzor-nasroika-bat/
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
probably the default stratum difficulty is too high for a gpu, as that algo is being mined with ASICs for a while now.
newbie
Activity: 301
Merit: 0
Hi guys,

I'm trying to mine digibyte using sgminer-5.6.1 or 5.5.5 and when i connect to pool i got no submitted or rejected shares.
Im using last amd drivers and an vega 64 for testing

Got this:
http://https://i.ibb.co/yYG3Q5Y/digi.png

[/img]
my conf file is:

Code:
{
"pools" : [
{
"url" : "stratum+tcp://dgbg.suprnova.cc:7978",
"user" : "nept.vega64",
"pass" : "x"
}
]
,
"intensity" : "20",
"worksize" : "256",
"kernel" : "myriadcoin-groestl",
"lookup-gap" : "2",
"thread-concurrency" : "8192",
"shaders" : "0",
"gpu-threads" : "2",
"gpu-engine" : "0-0",
"gpu-fan" : "0-0",
"gpu-memclock" : "0",
"gpu-memdiff" : "0",
"gpu-powertune" : "0",
"gpu-vddc" : "0.000",
"temp-cutoff" : "95",
"temp-overheat" : "85",
"temp-target" : "75",
"api-mcast-port" : "4028",
"api-port" : "4028",
"expiry" : "28",
"failover-switch-delay" : "60",
"gpu-dyninterval" : "7",
"gpu-platform" : "0",
"log" : "5",
"no-pool-disable" : true,
"queue" : "1",
"scan-time" : "7",
"tcp-keepalive" : "30",
"temp-hysteresis" : "3",
"shares" : "0",
"kernel-path" : "/usr/local/bin"
}
Anyone can help?

Thanks


Pages:
Jump to: