Author

Topic: Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480! - page 180. (Read 214410 times)

sr. member
Activity: 273
Merit: 250
BD People Are Legend
But resorting to name calling and abuse when someone who does this for a living tells you you're incorrect is very...well, it's a poor way to handle it.

If someone not tell what is wrong, but just say that i am "talking out your arse." then i just praise his creative answer.
full member
Activity: 199
Merit: 108
Look, I'm really not that interesting. Promise.
I may not understand much of OpenCL or graphics at all  but as network engeneer I still think that hardware limits cannot be exceeded for the purpuse mentioned above.
There is no queue mechanism or alike that can be used to queue solution rate AFAIK. Protocol specifics doesn't allow workarounds.

Yes we can not exceed hardware limits, but for now miner use 63% of mcu and 80% of power, so it not utilize full hardware capacity, why? I suspect that miner computation have 2 phases, one when fetching from memory - cores wait (are blocked), and second cores compute (not use memory).

in simple way:
now threads do: [[external memory read to registers][comp][external memory write result]][[external memory read to registers][comp][external memory write result]]
so one part cores waiting, second part memory not used...

i want (all at the same time):
thread1: [external memory read to cache][external memory read to cache][external memory read to cache]
thread2: [[read cache][comp][write cache]][[read cache][comp][write cache]][[read cache][comp][write cache]]
thread3: [external memory write result][external memory write result][external memory write result]

yes it have flaws in logic, but why i can not discuss that?

You can discuss it - no one is saying otherwise.

But resorting to name calling and abuse when someone who does this for a living tells you you're incorrect is very...well, it's a poor way to handle it.

If you wish to have a discussion, you need make sure to be polite. Ask WHY your theory is not possible, instead of just resorting to name calling.
legendary
Activity: 1274
Merit: 1000
Well guys using Version 1.1 on suprnova on my 5 bios modded powercolor RX 470's clocked at 1250Mhz core and 1720 ram if I set a static difficulty of 1500 my reported average hashrate on the pool side is 1152h/s over 1 hr so that would break down to a effective hash of 230.4h/s per card.


SO i need to remolded my 470 to get 230 like CM with the fee off  at stock setting ? . no complaints if i have to i have to ... ID rather use Gateless Gate.



would you be willing to post your Exact setting ? .

Above all before some one miss judges me , I m not complaining or against paying Fees  that said i know it is a  little bit longer and  zawawa is doing his best to catch up  but if some one is actually getting such high speeds please share with us your exact setting and this is not a demand ,,  if you willing and don't mine other wise cool if not .

My cards all have custom memory straps, I use Driver 16.6.2.
Power is +5
core runs at 1250
mem runs on 1720

The actual miner reports a average of 201h/s for each card but when I set to mine at a static difficulty on suprnova of 1500 I am getting a average hashrate on the pool of over 1150h/s for my 5 card rig.


I do have to say that i killed my 6th rx470 finding the ideal memory straps for my powercolor cards. First thing to try is setting your difficulty to 1500 using d=1500 as your password on suprnova pools and see what you get.

I am not planning to share my straps as I would feel terrible if some one used them and fried their card.


I don't blame you on the not sharing part i don't either for the same reason ,... I use Coinotron pool and two others .. so I'm ok then i get about the same at stock setting using the New AMD Drivers Crimson ReLive Edition 17.1.1 . I thought you had discovered some thing new Smiley .
sr. member
Activity: 450
Merit: 255
Well guys using Version 1.1 on suprnova on my 5 bios modded powercolor RX 470's clocked at 1250Mhz core and 1720 ram if I set a static difficulty of 1500 my reported average hashrate on the pool side is 1152h/s over 1 hr so that would break down to a effective hash of 230.4h/s per card.


SO i need to remolded my 470 to get 230 like CM with the fee off  at stock setting ? . no complaints if i have to i have to ... ID rather use Gateless Gate.



would you be willing to post your Exact setting ? .

Above all before some one miss judges me , I m not complaining or against paying Fees  that said i know it is a  little bit longer and  zawawa is doing his best to catch up  but if some one is actually getting such high speeds please share with us your exact setting and this is not a demand ,,  if you willing and don't mine other wise cool if not .

My cards all have custom memory straps, I use Driver 16.6.2.
Power is +5
core runs at 1250
mem runs on 1720

The actual miner reports a average of 201h/s for each card but when I set to mine at a static difficulty on suprnova of 1500 I am getting a average hashrate on the pool of over 1150h/s for my 5 card rig.


I do have to say that i killed my 6th rx470 finding the ideal memory straps for my powercolor cards. First thing to try is setting your difficulty to 1500 using d=1500 as your password on suprnova pools and see what you get.

I am not planning to share my straps as I would feel terrible if some one used them and fried their card.
sr. member
Activity: 273
Merit: 250
BD People Are Legend
I may not understand much of OpenCL or graphics at all  but as network engeneer I still think that hardware limits cannot be exceeded for the purpuse mentioned above.
There is no queue mechanism or alike that can be used to queue solution rate AFAIK. Protocol specifics doesn't allow workarounds.

Yes we can not exceed hardware limits, but for now miner use 63% of mcu and 80% of power, so it not utilize full hardware capacity, why? I suspect that miner computation have 2 phases, one when fetching from memory - cores wait (are blocked), and second cores compute (not use memory).

in simple way:
now threads do: [[external memory read to registers][comp][external memory write result]][[external memory read to registers][comp][external memory write result]]
so one part cores waiting, second part memory not used...

i want (all at the same time):
thread1: [external memory read to cache][external memory read to cache][external memory read to cache]
thread2: [[read cache][comp][write cache]][[read cache][comp][write cache]][[read cache][comp][write cache]]
thread3: [external memory write result][external memory write result][external memory write result]

yes it have flaws in logic, but why i can not discuss that?
full member
Activity: 199
Merit: 108
Look, I'm really not that interesting. Promise.
There's nothing to do with creative thinking here - there's a limit to how much data can fit on the cache. That's it. You can't add more. You're not being creative, you're being illogical.

You didnt read, he didn't too, 4kb table can fit in cache

I may not understand much of OpenCL or graphics at all  but as network engeneer I still think that hardware limits cannot be exceeded for the purpuse mentioned above.
There is no queue mechanism or alike that can be used to queue solution rate AFAIK. Protocol specifics doesn't allow workarounds.

You're correct.
full member
Activity: 254
Merit: 100
vcruntime140.dll missing both Win7 and Win10
I tried to reinstall VC Redist / DL missing lib

But not working ... any tips on how to run this in windows ?

That's pretty weird... Are you using a 32-bit version of Windows by any chance?
64 bits both... Have you à release with the full DLL dependencies ?

https://www.microsoft.com/en-us/download/confirmation.aspx?id=48145
hero member
Activity: 789
Merit: 501
vcruntime140.dll missing both Win7 and Win10
I tried to reinstall VC Redist / DL missing lib

But not working ... any tips on how to run this in windows ?

That's pretty weird... Are you using a 32-bit version of Windows by any chance?
64 bits both... Have you à release with the full DLL dependencies ?
sr. member
Activity: 652
Merit: 266
There's nothing to do with creative thinking here - there's a limit to how much data can fit on the cache. That's it. You can't add more. You're not being creative, you're being illogical.

You didnt read, he didn't too, 4kb table can fit in cache

I may not understand much of OpenCL or graphics at all  but as network engeneer I still think that hardware limits cannot be exceeded for the purpuse mentioned above.
There is no queue mechanism or alike that can be used to queue solution rate AFAIK. Protocol specifics doesn't allow workarounds.
sr. member
Activity: 273
Merit: 250
BD People Are Legend
There's nothing to do with creative thinking here - there's a limit to how much data can fit on the cache. That's it. You can't add more. You're not being creative, you're being illogical.

You didnt read, he didn't too, 4kb table can fit in cache
full member
Activity: 199
Merit: 108
Look, I'm really not that interesting. Promise.
No problem, you've clearly expressed that you are talking out your arse.

and you have expressed you could not think creatively, and you prefer to reject any new thoughts

There's nothing to do with creative thinking here - there's a limit to how much data can fit on the cache. That's it. You can't add more. You're not being creative, you're being illogical.
sr. member
Activity: 273
Merit: 250
BD People Are Legend
No problem, you've clearly expressed that you are talking out your arse.

and you have expressed you could not think creatively, and you prefer to reject any new thoughts
sr. member
Activity: 588
Merit: 251
So tell me, wise one, how can any developer get >40MB of data to fit into the L2 cache on something like a Rx 480?  Do you even know how big the cache is?

Who say about get all data at once, just do it asynchronus, when table is filling, table is using at the same time, like a buffer. Now memory is used when cores need it, but they not need it all time, so that why it not use all mcu, but when would be buffer table it would be filled all the time in separate process, even in that gaps when cores not need use memory, and more data would be to process leter when cores want to, it can be few kb table only, but benefits would be faster access and less waiting, but it would need redesign all working process, every calculation would be need push data to table, and get result from second, there would be each unit doing only own job and only when data is in table, it would not be simple

sorry my english is not good and i can not express everything what i want

No problem, you've clearly expressed that you are talking out your arse.
sr. member
Activity: 273
Merit: 250
BD People Are Legend
So tell me, wise one, how can any developer get >40MB of data to fit into the L2 cache on something like a Rx 480?  Do you even know how big the cache is?

Who say about get all data at once, just do it asynchronus, when table is filling, table is using at the same time, like a buffer. Now memory is used when cores need it, but they not need it all time, so that why it not use all mcu, but when would be buffer table it would be filled all the time in separate process, even in that gaps when cores not need use memory, and more data would be to process leter when cores want to, it can be few kb table only, but benefits would be faster access and less waiting, but it would need redesign all working process, every calculation would be need push data to table, and get result from second, there would be each unit doing only own job and only when data is in table, it would not be simple

sorry my english is not good and i can not express everything what i want
legendary
Activity: 1274
Merit: 1000
Well guys using Version 1.1 on suprnova on my 5 bios modded powercolor RX 470's clocked at 1250Mhz core and 1720 ram if I set a static difficulty of 1500 my reported average hashrate on the pool side is 1152h/s over 1 hr so that would break down to a effective hash of 230.4h/s per card.


SO i need to remolded my 470 to get 230 like CM with the fee off  at stock setting ? . no complaints if i have to i have to ... ID rather use Gateless Gate.



would you be willing to post your Exact setting ? .

Above all before some one miss judges me , I m not complaining or against paying Fees  that said i know it is a  little bit longer and  zawawa is doing his best to catch up  but if some one is actually getting such high speeds please share with us your exact setting and this is not a demand ,,  if you willing and don't mine other wise cool if not .
sr. member
Activity: 728
Merit: 304
Miner Developer
I was playing with the cryptonight kernel for a change and was able to get it to work on GTX 1060 with "--gpu-threads 1". I also dug out a NeoScrypt kernel I optimized a while back, which runs at 780kh/s on RX 480. I will include them in the next version.
sr. member
Activity: 450
Merit: 255
Well guys using Version 1.1 on suprnova on my 5 bios modded powercolor RX 470's clocked at 1250Mhz core and 1720 ram if I set a static difficulty of 1500 my reported average hashrate on the pool side is 1152h/s over 1 hr so that would break down to a effective hash of 230.4h/s per card.
sr. member
Activity: 728
Merit: 304
Miner Developer
It might not be the best place to ask but, what are the skills used in making such mining softwares, what are the core subjects that one needs to have a good grasp on to have that low level (i assume) knowledge?

I am pretty much self-taught as far as programming is concerned, so my approach to it is fairly idiosyncratic. I was originally interested in internal workings of operating systems, device drivers, and compilers, and that background definitely helped me so far. Now only if I could get this assembly version right...
sr. member
Activity: 728
Merit: 304
Miner Developer
vcruntime140.dll missing both Win7 and Win10
I tried to reinstall VC Redist / DL missing lib

But not working ... any tips on how to run this in windows ?

That's pretty weird... Are you using a 32-bit version of Windows by any chance?
hero member
Activity: 747
Merit: 502
It might not be the best place to ask but, what are the skills used in making such mining softwares, what are the core subjects that one needs to have a good grasp on to have that low level (i assume) knowledge?
Jump to: