Pages:
Author

Topic: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner - page 43. (Read 877889 times)

Xmm
full member
Activity: 440
Merit: 100
4.86M  on x13 (W0lfs bin) is it ok?
All thread about x11.
Thank You

is that with 7970 / 280x? ...

i can test on the cards when i get back to the office and let you know what we get if you like? ...

but id be guessing that wouldnt be a bad hashrate for x13 ...

#crysx
Edit my post.
280x of course and 5.7M with 290.
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
4.86M  on x13 (W0lfs bin) is it ok?
All thread about x11.
Thank You

is that with 7970 / 280x? ...

i can test on the cards when i get back to the office and let you know what we get if you like? ...

but id be guessing that wouldnt be a bad hashrate for x13 ...

#crysx
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
Are you kidding? 3700 Mh/s is smokin' HOT!!!

in what algo? ... x11? ... not a chance ... that is basic sgminer rate without the optimized wolf bins ...

if you think that is 'smokin hot' - try putting the bins in place and get another 3MH on top of that using the SAME settings ...

#crysx

I have wolf's bins and get 6.6M per HD 7970 card using x11. But 3700M??

hahaha ... i missed the 'M' ... in that case - im with you all the way ...

im off to get another coffee - day almost over and STILL not awake ... either that - or i just cant read anymore ...

Tongue ...

btw - i have gigabyte 7970 oc and gigabyte 20x oc cards - and they all do the same hashrate as yours using gpu / mem clocks of 1100 / 1500 ...

nice work ... though wolf has increased the hashrate by a little bit with his own private miner optimizations ...

#crysx
Xmm
full member
Activity: 440
Merit: 100
Ati 280x 4.86M  on x13 (W0lfs bin) is it ok?
All thread about x11.
And what normal speed for 290?
Thank You
full member
Activity: 235
Merit: 100
Are you kidding? 3700 Mh/s is smokin' HOT!!!

in what algo? ... x11? ... not a chance ... that is basic sgminer rate without the optimized wolf bins ...

if you think that is 'smokin hot' - try putting the bins in place and get another 3MH on top of that using the SAME settings ...

#crysx

I have wolf's bins and get 6.6M per HD 7970 card using x11. But 3700M??
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
Are you kidding? 3700 Mh/s is smokin' HOT!!!

in what algo? ... x11? ... not a chance ... that is basic sgminer rate without the optimized wolf bins ...

if you think that is 'smokin hot' - try putting the bins in place and get another 3MH on top of that using the SAME settings ...

#crysx
full member
Activity: 235
Merit: 100
Are you kidding? 3700 Mh/s is smokin' HOT!!!
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
i have simple questions, would be nice to get answers.

Windows 7 64bit
GPU 280x
Drivers: amd-catalyst-15.7.1-with-dotnet45-win7-64bit
sgminer from Nicehash: sgminer_v5.1_2015-03-09-win32.zip
sgminer.conf 1050/1500

Can not get above 3700 Mh/s , and i see that there is 7000 Mh/s out there

Then i downloaded Wolf0's optimized bins for X11-X13-X15 algorithms, and rename one of those bins to my already created bin in my sgminer folder, and replaced that.

Now sgminer does not even start, some error occures.

What i am doing wrong ?
First i want to get those default 6000 Mb/h

Can you help me with some config file? Or some advice ?
thanks.


when you place a question like this - its VERY relevant to also include the commandline you are using to get these figures - so that everyone can see what EXACTLY you are doing ...

if you have performed the correct procedure with wolfs bins - you should be getting around the 6.5MH or more on x11 ...

the info you have provided IS relevant - but not enough ...

#crysx
newbie
Activity: 39
Merit: 0
i have simple questions, would be nice to get answers.

Windows 7 64bit
GPU 280x
Drivers: amd-catalyst-15.7.1-with-dotnet45-win7-64bit
sgminer from Nicehash: sgminer_v5.1_2015-03-09-win32.zip
sgminer.conf 1050/1500

Can not get above 3700 Mh/s , and i see that there is 7000 Mh/s out there

Then i downloaded Wolf0's optimized bins for X11-X13-X15 algorithms, and rename one of those bins to my already created bin in my sgminer folder, and replaced that.

Now sgminer does not even start, some error occures.

What i am doing wrong ?
First i want to get those default 6000 Mb/h

Can you help me with some config file? Or some advice ?
thanks.
vgo
legendary
Activity: 2072
Merit: 1019
Tonga wasn't out when I released the bins, so there's not one.

Thanks Wolf0.





And thanks rednoW  https://bitcointalksearch.org/topic/m.11875320  


 Shocked Shocked Shocked   6.6 Mh/s   Shocked Shocked Shocked   Tonga!!!!!

newbie
Activity: 36
Merit: 0
Hi all I want to mine x13 gives me HW with this line , I have an ATI Radeon 6950 2GB GDDR5


setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_USE_SYNC_OBJECTS 1
sgminer.exe -k x13mod  -o stratum+tcp://hashpower.co:3633 -u 1D88AdpkVDnigQEhuUoGFvPGoNAGZucVab -p x -I 15 -w 64
vgo
legendary
Activity: 2072
Merit: 1019
The process is correct... but with Tahiti.bin 940KB., no Tonga.bin

OC via Gpu Tweak 1100core/1357mem. Mhz. Drivers CCC 15.8 Beta.

setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_USE_SYNC_OBJECTS 1
sgminer.exe -k darkcoin-mod  -o stratum+tcp://umine.co.uk:4640 -u bL2WMZfNmGtChg5nmw5Y -p x --xintensity 160 -w 64 --vectors 1 -g 2 --lookup-gap 2

 
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
Hi Wolf, as far as i can see new generation of AMD cards (R9 380) already in all shops. Looking around the forum i could see that hashing result for this model is ~3 mh\s (x11) seems like it's too low.
do you have any results for this card of your private miner in x11 algo?

just interesting of buying this cards and (if you have good results) your miner as well.

Thanks.

R9 380 is a rebrand of the R9 285. As far as I can see, the stock clocks for the 380 are 1000/1550. Test results incoming.

EDIT:

(NSFW): https://ottrbutt.com/miner/x11wolf-07082015.png


wolf-x11Tahitigw64l8ku0.bin ??

Only 3.6Mh/s  Asus R9 285 2gb @ R9 380 Bios  1100/1375mhz.  CCC 15.8 Beta

setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_USE_SYNC_OBJECTS 1
sgminer.exe -k darkcoin-mod  -o stratum+tcp://umine.co.uk:4640 -u bL2WMZfNmGtChg5nmw5Y -p x --xintensity 160 -w 64 --vectors 1 -g 2 --lookup-gap 2



what procedure did you you go through to set this up? ...

many have incorrectly initiated the process AND in your setup - there are no oc adjustments ( gpu and mem clocks ) ...

let us know ...

#crysx
vgo
legendary
Activity: 2072
Merit: 1019
Hi Wolf, as far as i can see new generation of AMD cards (R9 380) already in all shops. Looking around the forum i could see that hashing result for this model is ~3 mh\s (x11) seems like it's too low.
do you have any results for this card of your private miner in x11 algo?

just interesting of buying this cards and (if you have good results) your miner as well.

Thanks.

R9 380 is a rebrand of the R9 285. As far as I can see, the stock clocks for the 380 are 1000/1550. Test results incoming.

EDIT:

(NSFW): https://ottrbutt.com/miner/x11wolf-07082015.png


wolf-x11Tahitigw64l8ku0.bin ??

Only 3.6Mh/s  Asus R9 285 2gb @ R9 380 Bios  1100/1375mhz.  CCC 15.8 Beta

setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_USE_SYNC_OBJECTS 1
sgminer.exe -k darkcoin-mod  -o stratum+tcp://umine.co.uk:4640 -u bL2WMZfNmGtChg5nmw5Y -p x --xintensity 160 -w 64 --vectors 1 -g 2 --lookup-gap 2

hero member
Activity: 672
Merit: 500
There's no need to CPU-sync for JHA/Quark use an append/consume buffer. This is especially the case if you don't pipeline work.
Have two atomics and a buffer, filling it with from head and tail depending on what path you need to follow.
Then dispatch both and have them branch-out when the count is exhausted, this will naturally produce N-1 fully coherent wavefronts with no sync required.

As a side note: DirectCompute11 even has helpers to do this in API, DispatchIndirect allowing to save the branch-out... in theory, under certain circumstances, maybe. Have you tried OpenCL pipes?
full member
Activity: 125
Merit: 100
Hey guys, was away from mining for some time and now coming back.

Can some one please tell me what is a good hashrate nowadays for 280x and 7950 - X11.

Thank you ! Appreciate  it !

--xintensity 128 -w 64 --vectors 1 -g 2 --lookup-gap 2 -s 0 --expiry 10 --queue 0 --gpu-engine 1100,1100 --gpu-memclock 1500,1500 --gpu-powertune 20 --gpu-fan 58,58
hero member
Activity: 518
Merit: 500
Hey guys, was away from mining for some time and now coming back.

Can some one please tell me what is a good hashrate nowadays for 280x and 7950 - X11.

Thank you ! Appreciate  it !
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Pallas, we're not so different. While you believe in open source and I sometimes do not, I still believe in sharing knowledge, even to my detriment. If someone's reasonbly intelligent and willing to work at it, I'll never refuse to help them learn.

Everybody needs a source of income, and we both know how opensource shaped computers in the years.
Actually, it's inherent in them, it's always been there. I remember a vic-20 cassette in 1984 with "public domain" software in it and a brief introduction to the term.
Even the most active opensource programmers must have their income and most of the times it's in computing as well.
newbie
Activity: 34
Merit: 0
I might release the Quark bins (and a miner able to run them) for a smaller amount of BTC... problem is that SGMiner is infected with the GPL, and I really don't want to release the host code in this case.

maybe we could update the public sgminer sources with a simple multi-kernel version of quark, than you'd sell your optimized version to use with it.
that is, unless there are fine tricks in the C part ;-)

There are indeed. You see, for every branch in Quark, I read 4 bytes from the GPU on the host side - also, I run both sides of the branch in parallel, then block using OpenCL events until both complete before continuing. The key to Quark is how you do the general structure - the current source is beyond stupid. While I'll explain in detail (even publicly) how it's done if you want, I loathe to have to release working source for it. If people want it, they need to at least read the description and implement it themselves.

I would be interested in that description.  Just saying.

All right. First, it's necessary to understand WHY the original Quark is so stupid - it deals with how GPUs tend to handle branching, that is, if/else decisions. They more or less don't. It'll execute both sides of the branch - whichever result isn't needed it will toss away. This means for every branch in Quark (there are three), the stock miner executes one extra hash function that it didn't need to.

Now, you certainly can't read back the whole output of the hash for every work-item - it'd be fuck slow - so the host needs to be able to execute ONLY the hash that needs to be done for a given branch, WITHOUT branching on the GPU. I thought about this for a good while, and here is what I came up with. The host may not be able to read and process every hash for every work item in a reasonable timeframe, but what if I needed to read two 32-bit integers from the GPU only, and act on them, regardless of how many work items are run? With num_branches as the number of decisions on different hashes to run in the algo, and branch_possibilities being the number of possible ways a branch can go, allocate num_branches * branch_possibilities buffers. For Quark and Animecoin, this is three branches times two possible outcomes per branch. Each buffer should be the size of your nonce times the number of global work-items, plus one.

Here's how you use them. Do the first hash in Quark, Blake-512, and then zero out the LAST INDEX of the buffers for the outcomes of the first branch. Index meaning the size of your nonce. This is done for speed - the buffers can be filled with garbage, as long as that is fine. The second hash in Quark is BMW-512, and it decides whether Groestl-512 is run, or Skein is. Pass it the hash states, the two buffers for our first branch, and the number of global work items (also the size of the branch buffers minus one; can't be gotten from OpenCL device code.) Do the hash, and the decision in Quark is whether the fourth bit of the output is set. If it is, Groestl is run, IIRC, otherwise, Skein. Now, if it's set, atomically read and then increment the last index of the first branch buffer, and store the nonce in the index you read. This is a counter, and how nonces are stored as well - this is why we zeroed it. If it's NOT set, do the same for the other buffer. Now, in your host code, you simply read out both numbers, each one being the number of branches that go to each possibility, and you launch that hash's kernel with the appropriate branch nonce buffer, and the number of global work-items as the number of branches that went that way. Inside this kernel, the global ID is then used as an index into the branch nonces buffer - it'll pull out each and every nonce that is appropriate for that branch. Since we use the original global ID as not only a nonce, but an index into the hash states - the hash states may be indexed with the number you pulled out of the nonces buffer to fetch and store the state for that work.

Final note for optimization - I lied. You actually need to read only ONE nonce-sized entry from a branch buffer; I said two because it makes the process easier to understand. Basic algebra - since we know the global work-item count already, all work items branch SOME way, and they can only go two ways, the number of work-items that branched, if added, will equal work-item count. i.e. GlobalWorkItems - Branch1LeftCount = Branch1RightCount. So... read only one of those entries denoting the number of nonces stored in a branch buffer (doesn't matter which), and subtract it from the amount of global work items to get the other.

So, that's my technique for the overall structure of Quark. Any questions?

Pallas, we're not so different. While you believe in open source and I sometimes do not, I still believe in sharing knowledge, even to my detriment. If someone's reasonbly intelligent and willing to work at it, I'll never refuse to help them learn.


Hmmm.  Well, that definitely changes how I think about GPU processes.  I did not realize they execute both branches.  Also, thanks for the detailed description.  I greatly appreciate it.
newbie
Activity: 34
Merit: 0
I might release the Quark bins (and a miner able to run them) for a smaller amount of BTC... problem is that SGMiner is infected with the GPL, and I really don't want to release the host code in this case.

maybe we could update the public sgminer sources with a simple multi-kernel version of quark, than you'd sell your optimized version to use with it.
that is, unless there are fine tricks in the C part ;-)

There are indeed. You see, for every branch in Quark, I read 4 bytes from the GPU on the host side - also, I run both sides of the branch in parallel, then block using OpenCL events until both complete before continuing. The key to Quark is how you do the general structure - the current source is beyond stupid. While I'll explain in detail (even publicly) how it's done if you want, I loathe to have to release working source for it. If people want it, they need to at least read the description and implement it themselves.

I would be interested in that description.  Just saying.
Pages:
Jump to: