Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 739. (Read 2347659 times)

legendary
Activity: 1400
Merit: 1050
The quark opensource kernel(nicehash package) does at least 16MH on a 280x
I remember my fury x doing 25Mh at stock settings....

Nicehash is using closed source binaries for the kernal code.

Take a look at:

https://github.com/nicehash/sgminer/tree/master/kernel

Most of the kernals are 2 years old and slow.
may-be you should stop spreading bs... it's ok to not know what is going on amd side, but making stuff up is just boring...
legendary
Activity: 1050
Merit: 1294
Huh?
Sure, in a way you're right, it's not "really" opensource, but it's available to everyone. So next time if you compare, compare with what is available to the general public.  Smiley
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
He likes to compare the garbage code on AMD to newer code on his, so his looks faster Tongue

My opensource does 27MHASH on the 980ti. I compare opensource with opensource..
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
The quark opensource kernel(nicehash package) does at least 16MH on a 280x
I remember my fury x doing 25Mh at stock settings....

Nicehash is using closed source binaries for the kernal code.

Take a look at:

https://github.com/nicehash/sgminer/tree/master/kernel

Most of the kernals are 2 years old and slow.
legendary
Activity: 1050
Merit: 1294
Huh?
As I mentioned before, GPU mining hash has grown about 30% in the last two weeks... Maybe closer to 50% as Eth has gained another 300Mh since then.

The x11 algo used to have 500GHASH.
The Quark algo used to have 200-300 GHASH
the lyra2v2 algo used to have 100GHASH etc..

People move their rigs to mine etherum. If etherum go pos, they will move their rigs back to what they used to mine.

ps. The quark algo is now paying above 0.5BTC/GHASH @ http://www.zpool.ca (0,015234BTC / Day ($6.4))

The 980ti should do 30MHASH with my buyable private miner on the 980ti with a little overclock. (0.1BTC)
The quark opensource kernal does 2.5MHASH on the r9 280x






The quark opensource kernel(nicehash package) does at least 16MH on a 280x
I remember my fury x doing 25Mh at stock settings....
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
As I mentioned before, GPU mining hash has grown about 30% in the last two weeks... Maybe closer to 50% as Eth has gained another 300Mh since then.

The x11 algo used to have 500GHASH.
The Quark algo used to have 200-300 GHASH
the lyra2v2 algo used to have 100GHASH etc..

People move their rigs to mine etherum. If etherum go pos, they will move their rigs back to what they used to mine.

ps. The quark algo is now paying above 0.5BTC/GHASH @ http://www.zpool.ca (0,015234BTC / Day ($6.4))

The 980ti should do 30MHASH with my buyable private miner on the 980ti with a little overclock. (0.1BTC)
The quark opensource kernal does 2.5MHASH on the r9 280x




legendary
Activity: 2590
Merit: 1022
Leading Crypto Sports Betting & Casino Platform
also if ethereum go pos, another big coin will emerge, probably decred, so a pump there is not so unexpected in the near future
the money will always move in way or another and diff will follow

Investors don't just decide to invest in a new coin when one goes PoS in order to feed miners money. If Eth dies, either by bottoming out or PoS, miners are more then likely SoL. Decred and Vanilla are the next closest things.

Before Eth it was Dash and Dash has been private kernels/ASIC for quite some time... three months ago we were making $.50 profit on a 970, today it's $6... This is definitely a high point and it shouldn't be expected it'll stay this way.

money does not vanish, they always float one way or another, there will be always a strong altcoin, like it was litecoin, then doge, then darkcoin, then blackcoin, then ethereum, then it will be decred or a ethereum clone or a new thing, this section will never die, this what i've understand
legendary
Activity: 1526
Merit: 1026
I was sleeping and my 970 was mining. While i woke up, i found my pc in comatose form. Pc was running but no display and there was burning smell from my cpu. I found something greeze like product in the back end of my gpu and it died that way.

MY 550ti CARD--

I had purchased a new GTX 550ti just prior to really mining scrypt or any other algo.  I played games with it, it was a good card at the time.  One day, the screen went black while I was playing, and I had to shut down the system with the off-switch.  After taking the case apart, I finally discoverd a thin plastic piece stuck in the fan of the GTX 550ti card.

The card shipped new in the box with a thin plastic protective covering that was supposed to be removed prior to use.  I had mistakenly left the protective plastic disk that was on the center of the GPU card fan, it had disloged and stuck on the fan blade, preventing rotation.  As a result, the card over-heated, and a failsafe switch shut down the card.  That is why the screen went black; there was a protective shutdown switch for overheating.

When I got everything plugged back together, I rebooted and the system was fine.  If your GPU overheated so much that the circuitry melted, you should RMA the card.  A GTX 970 should not simply self-destruct, it has the same or better protection built-in.  RMA stands for "Return Maintenance Authorization", by the way.  I don't know, you may be living in another country with very hot weather, but the card should still have a warranty.       --scryptr

I tried to send it to RMA but Burning issue causes void of warranty in my country and the vendor deny to send the card for RMA. It was a MSI 4g twinfrozr OC version
legendary
Activity: 1797
Merit: 1028
I was sleeping and my 970 was mining. While i woke up, i found my pc in comatose form. Pc was running but no display and there was burning smell from my cpu. I found something greeze like product in the back end of my gpu and it died that way.

MY 550ti CARD--

I had purchased a new GTX 550ti just prior to really mining scrypt or any other algo.  I played games with it, it was a good card at the time.  One day, the screen went black while I was playing, and I had to shut down the system with the off-switch.  After taking the case apart, I finally discoverd a thin plastic piece stuck in the fan of the GTX 550ti card.

The card shipped new in the box with a thin plastic protective covering that was supposed to be removed prior to use.  I had mistakenly left the protective plastic disk that was on the center of the GPU card fan, it had disloged and stuck on the fan blade, preventing rotation.  As a result, the card over-heated, and a failsafe switch shut down the card.  That is why the screen went black; there was a protective shutdown switch for overheating.

When I got everything plugged back together, I rebooted and the system was fine.  If your GPU overheated so much that the circuitry melted, you should RMA the card.  A GTX 970 should not simply self-destruct, it has the same or better protection built-in.  RMA stands for "Return Maintenance Authorization", by the way.  I don't know, you may be living in another country with very hot weather, but the card should still have a warranty.       --scryptr
legendary
Activity: 1526
Merit: 1026
I was sleeping and my 970 was mining. While i woke up, i found my pc in comatose form. Pc was running but no display and there was burning smell from my cpu. I found something greeze like product in the back end of my gpu and it died that way.
legendary
Activity: 1400
Merit: 1050
As per my opinion performance of 970 is equal to 2.7x gtx 750ti. What would be clever idea, 1 gtx 970 or 3x 750ti would be better to start with?
Some points:
1. If any how gpu dies, in case of 970, some one will loose $350. But 1 750ti will cost $120.
2. In both case almost same amount of electricity bill will be needed.
3. Regarding eth, 970 is solely winner.

I am confused. Should i buy 1x 970 or 3x 750ti?

Also mention if u have other choice
gpu's don't die like that unless you really don't take care or them and you can always RMA'd them.
legendary
Activity: 1526
Merit: 1026
As per my opinion performance of 970 is equal to 2.7x gtx 750ti. What would be clever idea, 1 gtx 970 or 3x 750ti would be better to start with?
Some points:
1. If any how gpu dies, in case of 970, some one will loose $350. But 1 750ti will cost $120.
2. In both case almost same amount of electricity bill will be needed.
3. Regarding eth, 970 is solely winner.

I am confused. Should i buy 1x 970 or 3x 750ti?

Also mention if u have other choice
legendary
Activity: 1764
Merit: 1024
ethereum is much more profitable to mine so this is pointless, i can mine ethereum and buy more decred than mining decred

And BTC used to be profitable for GPUs to mine. Things change. We're in a huge profit bubble right now and that can pop at any moment and then all hell is going to break loose when all that Eth hash hits all the other GPU coins.

it does not work like that, they dump? i'm fine, diff will adjust = same profit as before

Oh yeah? I don't think it works the way you're thinking. Why do you think profitability will be the same if Eth loses market value? No other coin is nearly as profitable and Eth has hand over fist more hash then any other coin. If it starts to equalize the other coins can't support the amount of hash.

As I mentioned before, GPU mining hash has grown about 30% in the last two weeks... Maybe closer to 50% as Eth has gained another 300Mh since then.

This is completely putting aside Eth can crash and it can go PoS, which means no more mining. They have talked about PoS already.

Decred has some pretty damned good profitability - it may not exceed Eth for all GPUs, but it comes fairly close.

Yeah, but quite fragile. Eth has a lot of hash and volume going for it. That isn't easily upset. If people from Eth all jumped on Decred it'd instantly bottom out.

also if ethereum go pos, another big coin will emerge, probably decred, so a pump there is not so unexpected in the near future
the money will always move in way or another and diff will follow

Investors don't just decide to invest in a new coin when one goes PoS in order to feed miners money. If Eth dies, either by bottoming out or PoS, miners are more then likely SoL. Decred and Vanilla are the next closest things.

Before Eth it was Dash and Dash has been private kernels/ASIC for quite some time... three months ago we were making $.50 profit on a 970, today it's $6... This is definitely a high point and it shouldn't be expected it'll stay this way.
legendary
Activity: 1470
Merit: 1114
I kinda doubt there's a documented and stable, supported method of doing so...
You can do it safe:

1. Put the compiled cubin in a ramdisk. (virtual memory drive)
2. Poke the constant values with the cpu directly in the file. (the locations can be found with disassembly and the offsets might change from compiler to compiler (cuda versions) )
2. call the cuda api call cuModuleLoad

https://www.cs.cmu.edu/afs/cs/academic/class/15668-s11/www/cuda-doc/html/group__CUDA__MODULE_g366093bd269dafd0af21f1c7d18115d3.html



I stand corrected.

Nice hack. I've always had a soft spot for self modifying code. I once implemented a switch/case that way because there
wasn't enough memory for a jump table. I didn't think it was still possible with modern cpus and all their protections.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
I kinda doubt there's a documented and stable, supported method of doing so...
You can do it safe:

1. Put the compiled cubin in a ramdisk. (virtual memory drive)
2. Poke the constant values with the cpu directly in the file. (the locations can be found with disassembly and the offsets might change from compiler to compiler (cuda versions) )
2. call the cuda api call cuModuleLoad

https://www.cs.cmu.edu/afs/cs/academic/class/15668-s11/www/cuda-doc/html/group__CUDA__MODULE_g366093bd269dafd0af21f1c7d18115d3.html

sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Ok. I have found away to do the optimal decred kernal now.
http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directly
So I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article.
To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal..
Interesting technique.
But I doubt you'll gain even 1% from it, likely less.
you will, because some of the first rounds will be gone.. (instructions are removed since they work on constant data..) You can try it. replace the d_data[0]...d_data[23] with constant data 0x01234567 etc; make sure every constant is different from each other.. Compile,read the ptx, and count the lines before and after.
Then you don't have 14 round blake kernal. but a 12 rounds blake kernal that only works for one midstate. And solves the 14 round blake problem for one given midstate.
so, when in solo mode everytime you get a new transaction or block (and on a pool it's not much different), you will recompile the kernel? doesn't look optimal to me.

There is a faster way. Poke the new constants directly into the binary of the gpu. (self modified code.). Once the binary has been made, only 24 (+) constant numbers needs to be changed (on a new transaction or block), then the kernal needs to be reloaded to the gpu with a cacheflush (cudadevice reset) or perhaps there is a api call that can load/reload a .cubin file directly.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Ok. I have found away to do the optimal decred kernal now.
http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directly
So I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article.
To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal..
Interesting technique.
But I doubt you'll gain even 1% from it, likely less.

you will, because some of the first rounds will be gone.. (instructions are removed since they work on constant data..) You can try it. replace the d_data[0]...d_data[23] with constant data 0x01234567 etc; make sure every constant is different from each other.. Compile,read the ptx, and count the lines before and after.

Then you don't have 14 round blake kernal. but a 12 rounds blake kernal that only works for one midstate. And solves the 14 round blake problem for one given midstate.

so, when in solo mode everytime you get a new transaction or block (and on a pool it's not much different), you will recompile the kernel? doesn't look optimal to me.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Ok. I have found away to do the optimal decred kernal now.
http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directly
So I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article.
To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal..
Interesting technique.
But I doubt you'll gain even 1% from it, likely less.

you will, because some of the first rounds will be gone.. (instructions are removed since they work on constant data..) You can try it. replace the d_data[0]...d_data[23] with constant data 0x01234567 etc; make sure every constant is different from each other.. Compile,read the ptx, and count the lines before and after.

Then you don't have 14 round blake kernal. but a 12 rounds blake kernal that only works for one midstate. And solves the 14 round blake problem for one given midstate.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Ok. I have found away to do the optimal decred kernal now.

http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directly

So I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article.
To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal..


Interesting technique.
But I doubt you'll gain even 1% from it, likely less.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Ok. I have found away to do the optimal decred kernal now.

http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directly

So I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article.
To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Since the sourcecode will be ptx assembly I also can support linux users. Since operations on constants can be precalculated, the compiler will reduce the number of instructions needed for you, so you end up with a kernal that use less instructions than before..



Release #4 will be near optimal..
Jump to: