CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 17.

Kodaman

jr. member

Activity: 189

Merit: 2

X25X algo is the one and only GPU only algo not for all Nvidia GPUS but also for cards that have 6gb or less because it doesn't require memory hard operations.
For all the Nvidia cards from 1050ti 2 gb up to 1080 ti can all be mined without the fear of ASICS and FPGAS around.
T-Rex has the algo optimised maximum that is why no private miners or no new faster miners for X25X.
Today is the good information day lol

Kodaman

jr. member

Activity: 189

Merit: 2

Quote from: joblo on September 24, 2019, 02:52:41 PM

If you don't produce a source it usually means you're starting the rumour.

iBeLink in California is the ASIC provider. Is the source good enough Wink

joblo

legendary

Activity: 1470

Merit: 1114

If you don't produce a source it usually means you're starting the rumour.

Kodaman

jr. member

Activity: 189

Merit: 2

maybe this place has the info. There are rumors about x16rv2 has already ASICS.
Any feedback?

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

With a compiled kernel, the GPU can execute 15000 Randomx Instructions in 15 cycles per hash@2000mhz.

pallas

legendary

Activity: 2716

Merit: 1094

Black Belt Developer

True, but it's also true that you can fill the FPGA with custom made cores each executing RandomX instructions. FPGAs are plenty flexible, much more than GPUs, it only takes much more time to optimise.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: pallas on September 14, 2019, 12:47:38 AM

The FPGA doesn't make N multiplications per cycle. It does N hashes per cycle, with N integer > 0 or, in the case of complex algorithms, 1/N.

Yes, but in Randomx the FPGA need to do a memory read per cycle to determine the instruction to be executed so the N hash doesn't apply. Then the new limit is N instructions where N is limited by the number of memory accesses the chip can do per cycle. In older FPGA designs it was normal to have ASIC multipliers you could use to speedup multiplications (f.ex Altera Cyclone IV). The multiplication could also be done in code.

pallas

legendary

Activity: 2716

Merit: 1094

Black Belt Developer

The FPGA doesn't make N multiplications per cycle. It does N hashes per cycle, with N integer > 0 or, in the case of complex algorithms, 1/N.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: joblo on September 13, 2019, 05:37:05 PM

The language would have to complex enough (in the CISC sense) that the FPGA can't decode with a simple table lookup. That's a hell of a lot of work.

The FPGA have limits to memory access and multipliers. Let's say the FPGA can do 32 multiplications and 32 mem access per cycle, then you might be able to run 32 instruction per cycle. @500mhz

RandomX on the gpu doesn't need any memory access because the code is compiled, and you can run with 1024 threads at 2000Mz.

So the gpu can do 1024 instructions per cycle@2000mz

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: sp_ on September 13, 2019, 03:24:57 PM

Doesn't need to be PTX. If you run on NVIDIA hardware you convert the random stream of instructions to PTX. RandomX could be very profitable on NVIDIA hardware with a proper implementation...

Precisely. You can build a Nvidia-only proof of concept, but a real product will need
it's own pseudo language that can be compiled to ptx/cuda, ocl, and x86 native instructions
producing identical functionality. The language would have to complex enough (in the CISC sense)
that the FPGA can't decode with a simple table lookup. That's a hell of a lot of work.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: joblo on September 13, 2019, 11:33:06 AM

Quote from: sp_ on September 13, 2019, 02:03:54 AM

The point with ptx is that it's a unified language for all NVIDIA gpu architechtures.

The point s that it's only Nvidia GPU architectures. No ASIC, no FPGA, no Radeon, no CPU.

Doesn't need to be PTX. If you run on NVIDIA hardware you convert the random stream of instructions to PTX. RandomX could be very profitable on NVIDIA hardware with a proper implementation...

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: sp_ on September 13, 2019, 02:03:54 AM

The point with ptx is that it's a unified language for all NVIDIA gpu architechtures.

The point s that it's only Nvidia GPU architectures. No ASIC, no FPGA, no Radeon, no CPU.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: pallas on September 13, 2019, 02:13:02 AM

Yeah, no PTX, that's what I was saying.
==> RandomX

So to make a fast randomx miner on NVIDIA you can convert the randomx code to ptx before execution. (Create a new ptx kernel for each block)

Without optimalizations the NVIDIA cards are loosing to the CPU.

randomx benchmarks:

https://bitcointalksearch.org/topic/randomx-benchmarks-httpsmonerobenchmarksinfo-5176747

GPU	Cryptonight-R	RandomX
AMD
Vega 64	2200 H/s	1225 H/s
RX 480/580	960-1000 H/s	400-410 H/s
RX 560 4GB (1400/2200 MHz)	495 H/s	260 H/s
NVIDIA/EVGA
RTX 2080 Ti (1915/13600 MHz)	960-1000 H/s	400-410 H/s
GTX 1080 Ti (2037/11800 MHz)	927 H/s	1122 H/s
GTX 1070 Ti (1900/7600 MHz)	625 H/s	769 H/s

For CPUs:

CPU	Cryptonight-R	RandomX
AMD 3900X (4.25GHZ ALL CORE, 3600MHZ RAM)	1335 H/s	13330 H/s
RYZEN 3700X	1018 H/s	6853 H/s
RYZEN 5 3600	803 H/s	6580 H/s
INTEL I9 9900K	630 H/s	2102 H/s
2X XEON E5 2670 V2	930 H/s	5815 H/s
INTEL I7 7700K	350 H/s	2100 H/s

pallas

legendary

Activity: 2716

Merit: 1094

Black Belt Developer

Quote from: sp_ on September 13, 2019, 02:00:57 AM

Quote from: joblo on September 12, 2019, 09:24:32 PM

By using PTX you're esentially using a proprietary language to prevent anything but a Nvidia product
or a Nvidia licensed product from mining your algo. That's one way to make an algo ASIC/FPGA resistant.

Doesn't need to be PTX. You need a pseudo Assembly language that can easily be translated to ptx before execution.
The CPU miner would have to parse this language and create proper native binary before execution. (Create instructions in memory, flush the caches, then execute ) CPU verification is important for the pool/wallet/exchanges.

Yeah, no PTX, that's what I was saying.
==> RandomX

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: pallas on September 13, 2019, 01:19:24 AM

what will happen when cards compatible with the language are no longer produced?

The point with ptx is that it's a unified language for all NVIDIA gpu architechtures. The ptx is compiled to the native gpu language by the NVIDIA driver before execution. If NVIDIA decide to replace PTX with SPTX, you simply need your miner software to convert the random hashing function into SPTX.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: joblo on September 12, 2019, 09:24:32 PM

By using PTX you're esentially using a proprietary language to prevent anything but a Nvidia product
or a Nvidia licensed product from mining your algo. That's one way to make an algo ASIC/FPGA resistant.

Doesn't need to be PTX. You need a pseudo Assembly language that can easily be translated to ptx before execution.
The CPU miner would have to parse this language and create proper native binary before execution. (Create instructions in memory, flush the caches, then execute ) CPU verification is important for the pool/wallet/exchanges.

pallas

legendary

Activity: 2716

Merit: 1094

Black Belt Developer

Quote from: sp_ on September 12, 2019, 05:06:44 PM

Quote from: pallas on September 12, 2019, 01:56:05 AM

There are HBM equipped FPGAs already.
Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.

So the next question is how many times can you access the HBM per cycle.
In my algo proposal you will have a random stream of instructions for every new block. (15000 PTX instructions / 15 sec blocktime).
On the GPU you will just run the ptx. (cuda will compile and cache the code before execution and it will take a few milliseconds). After the compilation has been done, you get 14.xx seconds left to run the compiled kernel in full speed. On the FPGA you cannot generate the VHDL code compile and flash in 15 seconds, so you need to make a CPU emulator. This is because it would probably difficult,slow or impossible to generate VHDL out of random instructions and run it without timing bugs.

what will happen when cards compatible with the language are no longer produced?
maybe you are planning a pump and dump coin so you don't care :-D

bensam1231

legendary

Activity: 1764

Merit: 1024

Quote from: pallas on September 12, 2019, 01:56:05 AM

There are HBM equipped FPGAs already.
Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.

If you're talking about FKs they aren't even being shipped and they don't even talk about what algos they'll support. Either way, as I mentioned FPGAs with memory (specifically fast memory) are in the extreme minority. They aren't everywhere.

Anti-FPGA effort is not a silver bullet. They're a lot more expensive to produce so you make something that makes it extremely expensive to produce then there has to be a huge reward on the other side or it's not worth it. Looking at 2-3 year ROI on a lot of FPGAs, even if they produce a lot of hashrate makes them very unpalatable. There is opportunity cost associated with everything a lot of people don't consider that. FPGAs also become obsolete and obsolescence is something that has to be considered. So even if you have a FPGA that will ROI in 3 years, there can and more then likely will be newer ones out that will obsolete those.

Chinese don't respect licenses or IP rights unless it's some megacorp and you have millions to throw at it with lawyers.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: sp_ on September 12, 2019, 05:06:44 PM

So the next question is how many times can you access the HBM per cycle.
In my algo proposal you will have a random stream of instructions for every new block. (15000 PTX instructions / 15 sec blocktime).
On the GPU you will just run the ptx. (cuda will compile and cache the code before execution and it will take a few milliseconds). After the compilation has been done, you get 14.xx seconds left to run the compiled kernel in full speed. On the FPGA you cannot generate the VHDL code compile and flash in 15 seconds, so you need to make a CPU emulator. This is because it would probably difficult,slow or impossible to generate VHDL out of random instructions and run it without timing bugs.

By using PTX you're esentially using a proprietary language to prevent anything but a Nvidia product
or a Nvidia licensed product from mining your algo. That's one way to make an algo ASIC/FPGA resistant.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: pallas on September 12, 2019, 01:56:05 AM

There are HBM equipped FPGAs already.
Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.

So the next question is how many times can you access the HBM per cycle.
In my algo proposal you will have a random stream of instructions for every new block. (15000 PTX instructions / 15 sec blocktime).
On the GPU you will just run the ptx. (cuda will compile and cache the code before execution and it will take a few milliseconds). After the compilation has been done, you get 14.xx seconds left to run the compiled kernel in full speed. On the FPGA you cannot generate the VHDL code compile and flash in 15 seconds, so you need to make a CPU emulator. This is because it would probably difficult,slow or impossible to generate VHDL out of random instructions and run it without timing bugs.

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 17. (Read 2347686 times)