Pages:
Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 17. (Read 2347632 times)

jr. member
Activity: 189
Merit: 2
X25X algo is the one and only GPU only algo not for all Nvidia GPUS but also for cards that have 6gb or less because it doesn't require memory hard operations.
For all the Nvidia cards from 1050ti 2 gb up to 1080 ti can all be mined without the fear of ASICS and FPGAS around.
T-Rex has the algo optimised maximum that is why no private miners or no new faster miners for X25X.
Today is the good information day lol
jr. member
Activity: 189
Merit: 2
If you don't produce a source it usually means you're starting the rumour.

iBeLink in California is the ASIC provider. Is the source good enough Wink
legendary
Activity: 1470
Merit: 1114
If you don't produce a source it usually means you're starting the rumour.
jr. member
Activity: 189
Merit: 2
maybe this place has the info. There are rumors about x16rv2 has already ASICS.
Any feedback?
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
With a compiled kernel, the GPU can execute 15000 Randomx Instructions in 15 cycles per hash@2000mhz.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
True, but it's also true that you can fill the FPGA with custom made cores each executing RandomX instructions. FPGAs are plenty flexible, much more than GPUs, it only takes much more time to optimise.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
The FPGA doesn't make N multiplications per cycle. It does N hashes per cycle, with N integer > 0 or, in the case of complex algorithms, 1/N.

Yes, but in Randomx the FPGA need to do a memory read per cycle to determine the instruction to be executed so the N hash doesn't apply. Then the new limit is N instructions where N is limited by the number of memory accesses the chip can do per cycle. In older FPGA designs it was normal to have ASIC multipliers you could use to speedup multiplications (f.ex Altera Cyclone IV). The multiplication could also be done in code.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
The FPGA doesn't make N multiplications per cycle. It does N hashes per cycle, with N integer > 0 or, in the case of complex algorithms, 1/N.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
The language would have to complex enough (in the CISC sense) that the FPGA can't decode with a simple table lookup. That's a hell of a lot of work.

The FPGA have limits to memory access and multipliers. Let's say the FPGA can do 32 multiplications and 32 mem access per cycle, then you might be able to run 32 instruction per cycle. @500mhz


RandomX on the gpu doesn't need any memory access because the code is compiled, and you can run with 1024 threads at 2000Mz.

So the gpu can do 1024 instructions per cycle@2000mz
legendary
Activity: 1470
Merit: 1114
Doesn't need to be PTX. If you run on NVIDIA hardware you convert the random stream of instructions to PTX. RandomX could be very profitable on NVIDIA hardware with a proper implementation...

Precisely. You can build a Nvidia-only proof of concept, but a real product will need
it's own pseudo language that can be compiled to ptx/cuda, ocl, and x86 native instructions
producing identical functionality. The language would have to complex enough (in the CISC sense)
that the FPGA can't decode with a simple table lookup. That's a hell of a lot of work.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
The point with ptx is that it's a unified language for all NVIDIA gpu architechtures.
The point s that it's only Nvidia GPU architectures. No ASIC, no FPGA, no Radeon, no CPU.

Doesn't need to be PTX. If you run on NVIDIA hardware you convert the random stream of instructions to PTX. RandomX could be very profitable on NVIDIA hardware with a proper implementation...
legendary
Activity: 1470
Merit: 1114
The point with ptx is that it's a unified language for all NVIDIA gpu architechtures.

The point s that it's only Nvidia GPU architectures. No ASIC, no FPGA, no Radeon, no CPU.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Yeah, no PTX, that's what I was saying.
==> RandomX

So to make a fast randomx miner on NVIDIA you can convert the randomx code to ptx before execution. (Create a new ptx kernel for each block)

Without optimalizations the NVIDIA cards are loosing to the CPU.

randomx benchmarks:

https://bitcointalksearch.org/topic/randomx-benchmarks-httpsmonerobenchmarksinfo-5176747

GPUCryptonight-RRandomX
AMD
Vega 642200 H/s1225 H/s
RX 480/580960-1000 H/s400-410 H/s
RX 560 4GB (1400/2200 MHz)495 H/s260 H/s
NVIDIA/EVGA
RTX 2080 Ti (1915/13600 MHz)960-1000 H/s400-410 H/s
GTX 1080 Ti (2037/11800 MHz)927 H/s1122 H/s
GTX 1070 Ti (1900/7600 MHz)625 H/s769 H/s

For CPUs:
CPUCryptonight-RRandomX
AMD 3900X (4.25GHZ ALL CORE, 3600MHZ RAM)1335 H/s13330 H/s
RYZEN 3700X1018 H/s6853 H/s
RYZEN 5 3600803 H/s6580 H/s
INTEL I9 9900K630 H/s2102 H/s
2X XEON E5 2670 V2 930 H/s5815 H/s
INTEL I7 7700K350 H/s2100 H/s

legendary
Activity: 2716
Merit: 1094
Black Belt Developer
By using PTX you're esentially using a proprietary language to prevent anything but a Nvidia product
or a Nvidia licensed product from mining your algo. That's one way to make an algo ASIC/FPGA resistant.

Doesn't need to be PTX. You need a pseudo Assembly language that can easily be translated to ptx before execution.
The CPU miner would have to parse this language and create proper native binary before execution. (Create instructions in memory, flush the caches, then execute ) CPU verification is important for the pool/wallet/exchanges.

Yeah, no PTX, that's what I was saying.
==> RandomX
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
what will happen when cards compatible with the language are no longer produced?

The point with ptx is that it's a unified language for all NVIDIA gpu architechtures. The ptx is compiled to the native gpu language by the NVIDIA driver before execution. If NVIDIA decide to replace PTX with SPTX, you simply need your miner software to convert the random hashing function into SPTX.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
By using PTX you're esentially using a proprietary language to prevent anything but a Nvidia product
or a Nvidia licensed product from mining your algo. That's one way to make an algo ASIC/FPGA resistant.

Doesn't need to be PTX. You need a pseudo Assembly language that can easily be translated to ptx before execution.
The CPU miner would have to parse this language and create proper native binary before execution. (Create instructions in memory, flush the caches, then execute ) CPU verification is important for the pool/wallet/exchanges.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
There are HBM equipped FPGAs already.
Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.

So the next question is how many times can you access the HBM per cycle.
In my algo proposal you will have a random stream of instructions for every new block. (15000 PTX instructions / 15 sec blocktime).
On the GPU you will just run the ptx. (cuda will compile and cache the code before execution and it will take a few milliseconds). After the compilation has been done, you get 14.xx seconds left to run the compiled kernel in full speed. On the FPGA you cannot generate the VHDL code compile and flash in 15 seconds, so you need to make a CPU emulator. This is because it would probably difficult,slow or impossible to generate VHDL out of random instructions and run it without timing bugs.

what will happen when cards compatible with the language are no longer produced?
maybe you are planning a pump and dump coin so you don't care :-D
legendary
Activity: 1764
Merit: 1024
There are HBM equipped FPGAs already.
Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.

If you're talking about FKs they aren't even being shipped and they don't even talk about what algos they'll support. Either way, as I mentioned FPGAs with memory (specifically fast memory) are in the extreme minority. They aren't everywhere.

Anti-FPGA effort is not a silver bullet. They're a lot more expensive to produce so you make something that makes it extremely expensive to produce then there has to be a huge reward on the other side or it's not worth it. Looking at 2-3 year ROI on a lot of FPGAs, even if they produce a lot of hashrate makes them very unpalatable. There is opportunity cost associated with everything a lot of people don't consider that. FPGAs also become obsolete and obsolescence is something that has to be considered. So even if you have a FPGA that will ROI in 3 years, there can and more then likely will be newer ones out that will obsolete those.

Chinese don't respect licenses or IP rights unless it's some megacorp and you have millions to throw at it with lawyers.
legendary
Activity: 1470
Merit: 1114
So the next question is how many times can you access the HBM per cycle.
In my algo proposal you will have a random stream of instructions for every new block. (15000 PTX instructions / 15 sec blocktime).
On the GPU you will just run the ptx. (cuda will compile and cache the code before execution and it will take a few milliseconds). After the compilation has been done, you get 14.xx seconds left to run the compiled kernel in full speed. On the FPGA you cannot generate the VHDL code compile and flash in 15 seconds, so you need to make a CPU emulator. This is because it would probably difficult,slow or impossible to generate VHDL out of random instructions and run it without timing bugs.

By using PTX you're esentially using a proprietary language to prevent anything but a Nvidia product
or a Nvidia licensed product from mining your algo. That's one way to make an algo ASIC/FPGA resistant.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
There are HBM equipped FPGAs already.
Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.

So the next question is how many times can you access the HBM per cycle.
In my algo proposal you will have a random stream of instructions for every new block. (15000 PTX instructions / 15 sec blocktime).
On the GPU you will just run the ptx. (cuda will compile and cache the code before execution and it will take a few milliseconds). After the compilation has been done, you get 14.xx seconds left to run the compiled kernel in full speed. On the FPGA you cannot generate the VHDL code compile and flash in 15 seconds, so you need to make a CPU emulator. This is because it would probably difficult,slow or impossible to generate VHDL out of random instructions and run it without timing bugs.
Pages:
Jump to: