[ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner - page 43.

Xmm

full member

Activity: 440

Merit: 100

Quote from: chrysophylax on September 15, 2015, 10:24:05 PM

Quote from: Xmm on September 15, 2015, 10:19:44 PM

4.86M on x13 (W0lfs bin) is it ok?
All thread about x11.
Thank You

is that with 7970 / 280x? ...

i can test on the cards when i get back to the office and let you know what we get if you like? ...

but id be guessing that wouldnt be a bad hashrate for x13 ...

#crysx

Edit my post.
280x of course and 5.7M with 290.

chrysophylax

legendary

Activity: 2940

Merit: 1091

--- ChainWorks Industries ---

Quote from: Xmm on September 15, 2015, 10:19:44 PM

4.86M on x13 (W0lfs bin) is it ok?
All thread about x11.
Thank You

is that with 7970 / 280x? ...

i can test on the cards when i get back to the office and let you know what we get if you like? ...

but id be guessing that wouldnt be a bad hashrate for x13 ...

#crysx

chrysophylax

legendary

Activity: 2940

Merit: 1091

--- ChainWorks Industries ---

Quote from: tanoury on September 15, 2015, 09:22:30 PM

Quote from: chrysophylax on September 15, 2015, 09:07:00 PM

Quote from: tanoury on September 15, 2015, 07:46:30 PM

Are you kidding? 3700 Mh/s is smokin' HOT!!!

in what algo? ... x11? ... not a chance ... that is basic sgminer rate without the optimized wolf bins ...

if you think that is 'smokin hot' - try putting the bins in place and get another 3MH on top of that using the SAME settings ...

#crysx

I have wolf's bins and get 6.6M per HD 7970 card using x11. But 3700M??

hahaha ... i missed the 'M' ... in that case - im with you all the way ...

im off to get another coffee - day almost over and STILL not awake ... either that - or i just cant read anymore ...

Tongue

...

btw - i have gigabyte 7970 oc and gigabyte 20x oc cards - and they all do the same hashrate as yours using gpu / mem clocks of 1100 / 1500 ...

nice work ... though wolf has increased the hashrate by a little bit with his own private miner optimizations ...

#crysx

Xmm

full member

Activity: 440

Merit: 100

Ati 280x 4.86M on x13 (W0lfs bin) is it ok?
All thread about x11.
And what normal speed for 290?
Thank You

tanoury

full member

Activity: 235

Merit: 100

Quote from: chrysophylax on September 15, 2015, 09:07:00 PM

Quote from: tanoury on September 15, 2015, 07:46:30 PM

Are you kidding? 3700 Mh/s is smokin' HOT!!!

in what algo? ... x11? ... not a chance ... that is basic sgminer rate without the optimized wolf bins ...

if you think that is 'smokin hot' - try putting the bins in place and get another 3MH on top of that using the SAME settings ...

#crysx

I have wolf's bins and get 6.6M per HD 7970 card using x11. But 3700M??

chrysophylax

legendary

Activity: 2940

Merit: 1091

--- ChainWorks Industries ---

Quote from: tanoury on September 15, 2015, 07:46:30 PM

Are you kidding? 3700 Mh/s is smokin' HOT!!!

in what algo? ... x11? ... not a chance ... that is basic sgminer rate without the optimized wolf bins ...

if you think that is 'smokin hot' - try putting the bins in place and get another 3MH on top of that using the SAME settings ...

#crysx

tanoury

full member

Activity: 235

Merit: 100

Are you kidding? 3700 Mh/s is smokin' HOT!!!

chrysophylax

legendary

Activity: 2940

Merit: 1091

--- ChainWorks Industries ---

Quote from: logistika on September 15, 2015, 07:38:57 PM

i have simple questions, would be nice to get answers.

Windows 7 64bit
GPU 280x
Drivers: amd-catalyst-15.7.1-with-dotnet45-win7-64bit
sgminer from Nicehash: sgminer_v5.1_2015-03-09-win32.zip
sgminer.conf 1050/1500

Can not get above 3700 Mh/s , and i see that there is 7000 Mh/s out there

Then i downloaded Wolf0's optimized bins for X11-X13-X15 algorithms, and rename one of those bins to my already created bin in my sgminer folder, and replaced that.

Now sgminer does not even start, some error occures.

What i am doing wrong ?
First i want to get those default 6000 Mb/h

Can you help me with some config file? Or some advice ?
thanks.

when you place a question like this - its VERY relevant to also include the commandline you are using to get these figures - so that everyone can see what EXACTLY you are doing ...

if you have performed the correct procedure with wolfs bins - you should be getting around the 6.5MH or more on x11 ...

the info you have provided IS relevant - but not enough ...

#crysx

logistika

newbie

Activity: 39

Merit: 0

i have simple questions, would be nice to get answers.

Windows 7 64bit
GPU 280x
Drivers: amd-catalyst-15.7.1-with-dotnet45-win7-64bit
sgminer from Nicehash: sgminer_v5.1_2015-03-09-win32.zip
sgminer.conf 1050/1500

Can not get above 3700 Mh/s , and i see that there is 7000 Mh/s out there

Then i downloaded Wolf0's optimized bins for X11-X13-X15 algorithms, and rename one of those bins to my already created bin in my sgminer folder, and replaced that.

Now sgminer does not even start, some error occures.

What i am doing wrong ?
First i want to get those default 6000 Mb/h

Can you help me with some config file? Or some advice ?
thanks.

vgo

legendary

Activity: 2072

Merit: 1019

Quote from: ?? on ??

Tonga wasn't out when I released the bins, so there's not one.

Thanks Wolf0.

And thanks rednoW https://bitcointalksearch.org/topic/m.11875320

Shocked

6.6 Mh/s

Tonga!!!!!

xexulcm

newbie

Activity: 36

Merit: 0

Hi all I want to mine x13 gives me HW with this line , I have an ATI Radeon 6950 2GB GDDR5

setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_USE_SYNC_OBJECTS 1
sgminer.exe -k x13mod -o stratum+tcp://hashpower.co:3633 -u 1D88AdpkVDnigQEhuUoGFvPGoNAGZucVab -p x -I 15 -w 64

vgo

legendary

Activity: 2072

Merit: 1019

The process is correct... but with Tahiti.bin 940KB., no Tonga.bin

OC via Gpu Tweak 1100core/1357mem. Mhz. Drivers CCC 15.8 Beta.

setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_USE_SYNC_OBJECTS 1
sgminer.exe -k darkcoin-mod -o stratum+tcp://umine.co.uk:4640 -u bL2WMZfNmGtChg5nmw5Y -p x --xintensity 160 -w 64 --vectors 1 -g 2 --lookup-gap 2

chrysophylax

legendary

Activity: 2940

Merit: 1091

--- ChainWorks Industries ---

Quote from: vgo on September 14, 2015, 06:18:38 PM

Quote from: ?? on ??

Quote from: serg_25 on July 08, 2015, 07:41:51 AM

Hi Wolf, as far as i can see new generation of AMD cards (R9 380) already in all shops. Looking around the forum i could see that hashing result for this model is ~3 mh\s (x11) seems like it's too low.
do you have any results for this card of your private miner in x11 algo?

just interesting of buying this cards and (if you have good results) your miner as well.

Thanks.

R9 380 is a rebrand of the R9 285. As far as I can see, the stock clocks for the 380 are 1000/1550. Test results incoming.

EDIT:

(NSFW): https://ottrbutt.com/miner/x11wolf-07082015.png

wolf-x11Tahitigw64l8ku0.bin ??

Only 3.6Mh/s Asus R9 285 2gb @ R9 380 Bios 1100/1375mhz. CCC 15.8 Beta

setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_USE_SYNC_OBJECTS 1
sgminer.exe -k darkcoin-mod -o stratum+tcp://umine.co.uk:4640 -u bL2WMZfNmGtChg5nmw5Y -p x --xintensity 160 -w 64 --vectors 1 -g 2 --lookup-gap 2

what procedure did you you go through to set this up? ...

many have incorrectly initiated the process AND in your setup - there are no oc adjustments ( gpu and mem clocks ) ...

let us know ...

#crysx

vgo

legendary

Activity: 2072

Merit: 1019

Quote from: ?? on ??

Quote from: serg_25 on July 08, 2015, 07:41:51 AM

Hi Wolf, as far as i can see new generation of AMD cards (R9 380) already in all shops. Looking around the forum i could see that hashing result for this model is ~3 mh\s (x11) seems like it's too low.
do you have any results for this card of your private miner in x11 algo?

just interesting of buying this cards and (if you have good results) your miner as well.

Thanks.

R9 380 is a rebrand of the R9 285. As far as I can see, the stock clocks for the 380 are 1000/1550. Test results incoming.

EDIT:

(NSFW): https://ottrbutt.com/miner/x11wolf-07082015.png

wolf-x11Tahitigw64l8ku0.bin ??

Only 3.6Mh/s Asus R9 285 2gb @ R9 380 Bios 1100/1375mhz. CCC 15.8 Beta

setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_USE_SYNC_OBJECTS 1
sgminer.exe -k darkcoin-mod -o stratum+tcp://umine.co.uk:4640 -u bL2WMZfNmGtChg5nmw5Y -p x --xintensity 160 -w 64 --vectors 1 -g 2 --lookup-gap 2

MaxDZ8

hero member

Activity: 672

Merit: 500

There's no need to CPU-sync for JHA/Quark use an append/consume buffer. This is especially the case if you don't pipeline work.
Have two atomics and a buffer, filling it with from head and tail depending on what path you need to follow.
Then dispatch both and have them branch-out when the count is exhausted, this will naturally produce N-1 fully coherent wavefronts with no sync required.

As a side note: DirectCompute11 even has helpers to do this in API, DispatchIndirect allowing to save the branch-out... in theory, under certain circumstances, maybe. Have you tried OpenCL pipes?

Foss

full member

Activity: 125

Merit: 100

Quote from: kopam on September 09, 2015, 11:17:24 AM

Hey guys, was away from mining for some time and now coming back.

Can some one please tell me what is a good hashrate nowadays for 280x and 7950 - X11.

Thank you ! Appreciate it !

--xintensity 128 -w 64 --vectors 1 -g 2 --lookup-gap 2 -s 0 --expiry 10 --queue 0 --gpu-engine 1100,1100 --gpu-memclock 1500,1500 --gpu-powertune 20 --gpu-fan 58,58

kopam

hero member

Activity: 518

Merit: 500

Hey guys, was away from mining for some time and now coming back.

Can some one please tell me what is a good hashrate nowadays for 280x and 7950 - X11.

Thank you ! Appreciate it !

pallas

legendary

Activity: 2716

Merit: 1094

Black Belt Developer

Quote from: ?? on ??

Pallas, we're not so different. While you believe in open source and I sometimes do not, I still believe in sharing knowledge, even to my detriment. If someone's reasonbly intelligent and willing to work at it, I'll never refuse to help them learn.

Everybody needs a source of income, and we both know how opensource shaped computers in the years.
Actually, it's inherent in them, it's always been there. I remember a vic-20 cassette in 1984 with "public domain" software in it and a brief introduction to the term.
Even the most active opensource programmers must have their income and most of the times it's in computing as well.

MehZhure

newbie

Activity: 34

Merit: 0

Quote from: ?? on ??

Quote from: MehZhure on September 08, 2015, 02:59:18 PM

Quote from: ?? on ??

Quote from: pallas on September 08, 2015, 09:23:38 AM

Quote from: ?? on ??

I might release the Quark bins (and a miner able to run them) for a smaller amount of BTC... problem is that SGMiner is infected with the GPL, and I really don't want to release the host code in this case.

maybe we could update the public sgminer sources with a simple multi-kernel version of quark, than you'd sell your optimized version to use with it.
that is, unless there are fine tricks in the C part ;-)

There are indeed. You see, for every branch in Quark, I read 4 bytes from the GPU on the host side - also, I run both sides of the branch in parallel, then block using OpenCL events until both complete before continuing. The key to Quark is how you do the general structure - the current source is beyond stupid. While I'll explain in detail (even publicly) how it's done if you want, I loathe to have to release working source for it. If people want it, they need to at least read the description and implement it themselves.

I would be interested in that description. Just saying.

All right. First, it's necessary to understand WHY the original Quark is so stupid - it deals with how GPUs tend to handle branching, that is, if/else decisions. They more or less don't. It'll execute both sides of the branch - whichever result isn't needed it will toss away. This means for every branch in Quark (there are three), the stock miner executes one extra hash function that it didn't need to.

Now, you certainly can't read back the whole output of the hash for every work-item - it'd be fuck slow - so the host needs to be able to execute ONLY the hash that needs to be done for a given branch, WITHOUT branching on the GPU. I thought about this for a good while, and here is what I came up with. The host may not be able to read and process every hash for every work item in a reasonable timeframe, but what if I needed to read two 32-bit integers from the GPU only, and act on them, regardless of how many work items are run? With num_branches as the number of decisions on different hashes to run in the algo, and branch_possibilities being the number of possible ways a branch can go, allocate num_branches * branch_possibilities buffers. For Quark and Animecoin, this is three branches times two possible outcomes per branch. Each buffer should be the size of your nonce times the number of global work-items, plus one.

Here's how you use them. Do the first hash in Quark, Blake-512, and then zero out the LAST INDEX of the buffers for the outcomes of the first branch. Index meaning the size of your nonce. This is done for speed - the buffers can be filled with garbage, as long as that is fine. The second hash in Quark is BMW-512, and it decides whether Groestl-512 is run, or Skein is. Pass it the hash states, the two buffers for our first branch, and the number of global work items (also the size of the branch buffers minus one; can't be gotten from OpenCL device code.) Do the hash, and the decision in Quark is whether the fourth bit of the output is set. If it is, Groestl is run, IIRC, otherwise, Skein. Now, if it's set, atomically read and then increment the last index of the first branch buffer, and store the nonce in the index you read. This is a counter, and how nonces are stored as well - this is why we zeroed it. If it's NOT set, do the same for the other buffer. Now, in your host code, you simply read out both numbers, each one being the number of branches that go to each possibility, and you launch that hash's kernel with the appropriate branch nonce buffer, and the number of global work-items as the number of branches that went that way. Inside this kernel, the global ID is then used as an index into the branch nonces buffer - it'll pull out each and every nonce that is appropriate for that branch. Since we use the original global ID as not only a nonce, but an index into the hash states - the hash states may be indexed with the number you pulled out of the nonces buffer to fetch and store the state for that work.

Final note for optimization - I lied. You actually need to read only ONE nonce-sized entry from a branch buffer; I said two because it makes the process easier to understand. Basic algebra - since we know the global work-item count already, all work items branch SOME way, and they can only go two ways, the number of work-items that branched, if added, will equal work-item count. i.e. GlobalWorkItems - Branch1LeftCount = Branch1RightCount. So... read only one of those entries denoting the number of nonces stored in a branch buffer (doesn't matter which), and subtract it from the amount of global work items to get the other.

So, that's my technique for the overall structure of Quark. Any questions?

Pallas, we're not so different. While you believe in open source and I sometimes do not, I still believe in sharing knowledge, even to my detriment. If someone's reasonbly intelligent and willing to work at it, I'll never refuse to help them learn.

Hmmm. Well, that definitely changes how I think about GPU processes. I did not realize they execute both branches. Also, thanks for the detailed description. I greatly appreciate it.

MehZhure

newbie

Activity: 34

Merit: 0

Quote from: ?? on ??

Quote from: pallas on September 08, 2015, 09:23:38 AM

Quote from: ?? on ??

I might release the Quark bins (and a miner able to run them) for a smaller amount of BTC... problem is that SGMiner is infected with the GPL, and I really don't want to release the host code in this case.

maybe we could update the public sgminer sources with a simple multi-kernel version of quark, than you'd sell your optimized version to use with it.
that is, unless there are fine tricks in the C part ;-)

There are indeed. You see, for every branch in Quark, I read 4 bytes from the GPU on the host side - also, I run both sides of the branch in parallel, then block using OpenCL events until both complete before continuing. The key to Quark is how you do the general structure - the current source is beyond stupid. While I'll explain in detail (even publicly) how it's done if you want, I loathe to have to release working source for it. If people want it, they need to at least read the description and implement it themselves.

I would be interested in that description. Just saying.

Topic: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner - page 43. (Read 877889 times)