Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480! - page 180.

chronek

sr. member

Activity: 273

Merit: 250

BD People Are Legend

Quote from: OhGodAGirl on January 18, 2017, 11:38:20 AM

But resorting to name calling and abuse when someone who does this for a living tells you you're incorrect is very...well, it's a poor way to handle it.

If someone not tell what is wrong, but just say that i am "talking out your arse." then i just praise his creative answer.

OhGodAGirl

full member

Activity: 199

Merit: 108

Look, I'm really not that interesting. Promise.

Quote from: chronek on January 18, 2017, 11:18:25 AM

Quote from: laik2 on January 18, 2017, 10:40:11 AM

I may not understand much of OpenCL or graphics at all but as network engeneer I still think that hardware limits cannot be exceeded for the purpuse mentioned above.
There is no queue mechanism or alike that can be used to queue solution rate AFAIK. Protocol specifics doesn't allow workarounds.

Yes we can not exceed hardware limits, but for now miner use 63% of mcu and 80% of power, so it not utilize full hardware capacity, why? I suspect that miner computation have 2 phases, one when fetching from memory - cores wait (are blocked), and second cores compute (not use memory).

in simple way:
now threads do: [[external memory read to registers][comp][external memory write result]][[external memory read to registers][comp][external memory write result]]
so one part cores waiting, second part memory not used...

i want (all at the same time):
thread1: [external memory read to cache][external memory read to cache][external memory read to cache]
thread2: [[read cache][comp][write cache]][[read cache][comp][write cache]][[read cache][comp][write cache]]
thread3: [external memory write result][external memory write result][external memory write result]

yes it have flaws in logic, but why i can not discuss that?

You can discuss it - no one is saying otherwise.

But resorting to name calling and abuse when someone who does this for a living tells you you're incorrect is very...well, it's a poor way to handle it.

If you wish to have a discussion, you need make sure to be polite. Ask WHY your theory is not possible, instead of just resorting to name calling.

toptek

legendary

Activity: 1274

Merit: 1000

Quote from: cryptominer420 on January 18, 2017, 11:20:34 AM

Quote from: toptek on January 18, 2017, 01:16:58 AM

Quote from: cryptominer420 on January 17, 2017, 10:04:17 PM

Well guys using Version 1.1 on suprnova on my 5 bios modded powercolor RX 470's clocked at 1250Mhz core and 1720 ram if I set a static difficulty of 1500 my reported average hashrate on the pool side is 1152h/s over 1 hr so that would break down to a effective hash of 230.4h/s per card.

SO i need to remolded my 470 to get 230 like CM with the fee off at stock setting ? . no complaints if i have to i have to ... ID rather use Gateless Gate.

would you be willing to post your Exact setting ? .

Above all before some one miss judges me , I m not complaining or against paying Fees that said i know it is a little bit longer and zawawa is doing his best to catch up but if some one is actually getting such high speeds please share with us your exact setting and this is not a demand ,, if you willing and don't mine other wise cool if not .

My cards all have custom memory straps, I use Driver 16.6.2.
Power is +5
core runs at 1250
mem runs on 1720

The actual miner reports a average of 201h/s for each card but when I set to mine at a static difficulty on suprnova of 1500 I am getting a average hashrate on the pool of over 1150h/s for my 5 card rig.

I do have to say that i killed my 6th rx470 finding the ideal memory straps for my powercolor cards. First thing to try is setting your difficulty to 1500 using d=1500 as your password on suprnova pools and see what you get.

I am not planning to share my straps as I would feel terrible if some one used them and fried their card.

I don't blame you on the not sharing part i don't either for the same reason ,... I use Coinotron pool and two others .. so I'm ok then i get about the same at stock setting using the New AMD Drivers Crimson ReLive Edition 17.1.1 . I thought you had discovered some thing new

.

cryptominer420

sr. member

Activity: 450

Merit: 255

Quote from: toptek on January 18, 2017, 01:16:58 AM

Quote from: cryptominer420 on January 17, 2017, 10:04:17 PM

Well guys using Version 1.1 on suprnova on my 5 bios modded powercolor RX 470's clocked at 1250Mhz core and 1720 ram if I set a static difficulty of 1500 my reported average hashrate on the pool side is 1152h/s over 1 hr so that would break down to a effective hash of 230.4h/s per card.

SO i need to remolded my 470 to get 230 like CM with the fee off at stock setting ? . no complaints if i have to i have to ... ID rather use Gateless Gate.

would you be willing to post your Exact setting ? .

Above all before some one miss judges me , I m not complaining or against paying Fees that said i know it is a little bit longer and zawawa is doing his best to catch up but if some one is actually getting such high speeds please share with us your exact setting and this is not a demand ,, if you willing and don't mine other wise cool if not .

My cards all have custom memory straps, I use Driver 16.6.2.
Power is +5
core runs at 1250
mem runs on 1720

The actual miner reports a average of 201h/s for each card but when I set to mine at a static difficulty on suprnova of 1500 I am getting a average hashrate on the pool of over 1150h/s for my 5 card rig.

I do have to say that i killed my 6th rx470 finding the ideal memory straps for my powercolor cards. First thing to try is setting your difficulty to 1500 using d=1500 as your password on suprnova pools and see what you get.

I am not planning to share my straps as I would feel terrible if some one used them and fried their card.

chronek

sr. member

Activity: 273

Merit: 250

BD People Are Legend

Quote from: laik2 on January 18, 2017, 10:40:11 AM

I may not understand much of OpenCL or graphics at all but as network engeneer I still think that hardware limits cannot be exceeded for the purpuse mentioned above.
There is no queue mechanism or alike that can be used to queue solution rate AFAIK. Protocol specifics doesn't allow workarounds.

Yes we can not exceed hardware limits, but for now miner use 63% of mcu and 80% of power, so it not utilize full hardware capacity, why? I suspect that miner computation have 2 phases, one when fetching from memory - cores wait (are blocked), and second cores compute (not use memory).

in simple way:
now threads do: [[external memory read to registers][comp][external memory write result]][[external memory read to registers][comp][external memory write result]]
so one part cores waiting, second part memory not used...

i want (all at the same time):
thread1: [external memory read to cache][external memory read to cache][external memory read to cache]
thread2: [[read cache][comp][write cache]][[read cache][comp][write cache]][[read cache][comp][write cache]]
thread3: [external memory write result][external memory write result][external memory write result]

yes it have flaws in logic, but why i can not discuss that?

OhGodAGirl

full member

Activity: 199

Merit: 108

Look, I'm really not that interesting. Promise.

Quote from: laik2 on January 18, 2017, 10:40:11 AM

Quote from: chronek on January 18, 2017, 07:07:48 AM

Quote from: OhGodAGirl on January 18, 2017, 06:44:39 AM

There's nothing to do with creative thinking here - there's a limit to how much data can fit on the cache. That's it. You can't add more. You're not being creative, you're being illogical.

You didnt read, he didn't too, 4kb table can fit in cache

I may not understand much of OpenCL or graphics at all but as network engeneer I still think that hardware limits cannot be exceeded for the purpuse mentioned above.
There is no queue mechanism or alike that can be used to queue solution rate AFAIK. Protocol specifics doesn't allow workarounds.

You're correct.

joaocha

full member

Activity: 254

Merit: 100

Quote from: th00ber on January 18, 2017, 10:43:24 AM

Quote from: zawawa on January 17, 2017, 08:16:23 PM

Quote from: th00ber on January 17, 2017, 07:21:18 PM

vcruntime140.dll missing both Win7 and Win10
I tried to reinstall VC Redist / DL missing lib

But not working ... any tips on how to run this in windows ?

That's pretty weird... Are you using a 32-bit version of Windows by any chance?

64 bits both... Have you à release with the full DLL dependencies ?

https://www.microsoft.com/en-us/download/confirmation.aspx?id=48145

th00ber

hero member

Activity: 789

Merit: 501

Quote from: zawawa on January 17, 2017, 08:16:23 PM

Quote from: th00ber on January 17, 2017, 07:21:18 PM

vcruntime140.dll missing both Win7 and Win10
I tried to reinstall VC Redist / DL missing lib

But not working ... any tips on how to run this in windows ?

That's pretty weird... Are you using a 32-bit version of Windows by any chance?

64 bits both... Have you à release with the full DLL dependencies ?

laik2

sr. member

Activity: 652

Merit: 266

Quote from: chronek on January 18, 2017, 07:07:48 AM

Quote from: OhGodAGirl on January 18, 2017, 06:44:39 AM

There's nothing to do with creative thinking here - there's a limit to how much data can fit on the cache. That's it. You can't add more. You're not being creative, you're being illogical.

You didnt read, he didn't too, 4kb table can fit in cache

I may not understand much of OpenCL or graphics at all but as network engeneer I still think that hardware limits cannot be exceeded for the purpuse mentioned above.
There is no queue mechanism or alike that can be used to queue solution rate AFAIK. Protocol specifics doesn't allow workarounds.

chronek

sr. member

Activity: 273

Merit: 250

BD People Are Legend

Quote from: OhGodAGirl on January 18, 2017, 06:44:39 AM

There's nothing to do with creative thinking here - there's a limit to how much data can fit on the cache. That's it. You can't add more. You're not being creative, you're being illogical.

You didnt read, he didn't too, 4kb table can fit in cache

OhGodAGirl

full member

Activity: 199

Merit: 108

Look, I'm really not that interesting. Promise.

Quote from: chronek on January 18, 2017, 06:42:12 AM

Quote from: nerdralph on January 18, 2017, 06:26:34 AM

No problem, you've clearly expressed that you are talking out your arse.

and you have expressed you could not think creatively, and you prefer to reject any new thoughts

There's nothing to do with creative thinking here - there's a limit to how much data can fit on the cache. That's it. You can't add more. You're not being creative, you're being illogical.

chronek

sr. member

Activity: 273

Merit: 250

BD People Are Legend

Quote from: nerdralph on January 18, 2017, 06:26:34 AM

No problem, you've clearly expressed that you are talking out your arse.

and you have expressed you could not think creatively, and you prefer to reject any new thoughts

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: chronek on January 18, 2017, 02:41:19 AM

Quote from: nerdralph on January 17, 2017, 04:39:59 PM

So tell me, wise one, how can any developer get >40MB of data to fit into the L2 cache on something like a Rx 480? Do you even know how big the cache is?

Who say about get all data at once, just do it asynchronus, when table is filling, table is using at the same time, like a buffer. Now memory is used when cores need it, but they not need it all time, so that why it not use all mcu, but when would be buffer table it would be filled all the time in separate process, even in that gaps when cores not need use memory, and more data would be to process leter when cores want to, it can be few kb table only, but benefits would be faster access and less waiting, but it would need redesign all working process, every calculation would be need push data to table, and get result from second, there would be each unit doing only own job and only when data is in table, it would not be simple

sorry my english is not good and i can not express everything what i want

No problem, you've clearly expressed that you are talking out your arse.

chronek

sr. member

Activity: 273

Merit: 250

BD People Are Legend

Quote from: nerdralph on January 17, 2017, 04:39:59 PM

So tell me, wise one, how can any developer get >40MB of data to fit into the L2 cache on something like a Rx 480? Do you even know how big the cache is?

Who say about get all data at once, just do it asynchronus, when table is filling, table is using at the same time, like a buffer. Now memory is used when cores need it, but they not need it all time, so that why it not use all mcu, but when would be buffer table it would be filled all the time in separate process, even in that gaps when cores not need use memory, and more data would be to process leter when cores want to, it can be few kb table only, but benefits would be faster access and less waiting, but it would need redesign all working process, every calculation would be need push data to table, and get result from second, there would be each unit doing only own job and only when data is in table, it would not be simple

sorry my english is not good and i can not express everything what i want

toptek

legendary

Activity: 1274

Merit: 1000

Quote from: cryptominer420 on January 17, 2017, 10:04:17 PM

Well guys using Version 1.1 on suprnova on my 5 bios modded powercolor RX 470's clocked at 1250Mhz core and 1720 ram if I set a static difficulty of 1500 my reported average hashrate on the pool side is 1152h/s over 1 hr so that would break down to a effective hash of 230.4h/s per card.

SO i need to remolded my 470 to get 230 like CM with the fee off at stock setting ? . no complaints if i have to i have to ... ID rather use Gateless Gate.

would you be willing to post your Exact setting ? .

Above all before some one miss judges me , I m not complaining or against paying Fees that said i know it is a little bit longer and zawawa is doing his best to catch up but if some one is actually getting such high speeds please share with us your exact setting and this is not a demand ,, if you willing and don't mine other wise cool if not .

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

I was playing with the cryptonight kernel for a change and was able to get it to work on GTX 1060 with "--gpu-threads 1". I also dug out a NeoScrypt kernel I optimized a while back, which runs at 780kh/s on RX 480. I will include them in the next version.

cryptominer420

sr. member

Activity: 450

Merit: 255

Well guys using Version 1.1 on suprnova on my 5 bios modded powercolor RX 470's clocked at 1250Mhz core and 1720 ram if I set a static difficulty of 1500 my reported average hashrate on the pool side is 1152h/s over 1 hr so that would break down to a effective hash of 230.4h/s per card.

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

Quote from: Jdope on January 17, 2017, 07:47:43 PM

It might not be the best place to ask but, what are the skills used in making such mining softwares, what are the core subjects that one needs to have a good grasp on to have that low level (i assume) knowledge?

I am pretty much self-taught as far as programming is concerned, so my approach to it is fairly idiosyncratic. I was originally interested in internal workings of operating systems, device drivers, and compilers, and that background definitely helped me so far. Now only if I could get this assembly version right...

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

Quote from: th00ber on January 17, 2017, 07:21:18 PM

vcruntime140.dll missing both Win7 and Win10
I tried to reinstall VC Redist / DL missing lib

But not working ... any tips on how to run this in windows ?

That's pretty weird... Are you using a 32-bit version of Windows by any chance?

Jdope

hero member

Activity: 747

Merit: 502

It might not be the best place to ask but, what are the skills used in making such mining softwares, what are the core subjects that one needs to have a good grasp on to have that low level (i assume) knowledge?

Topic: Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480! - page 180. (Read 214463 times)