Pages:
Author

Topic: [XPM] [ANN] Primecoin Release - First Scientific Computing Cryptocurrency - page 34. (Read 688812 times)

hero member
Activity: 560
Merit: 500


I am shell shocked about two things that I did not know just a couple of days ago:
1) Primecoin mining is currently more profitable than Bitcoin mining for the same amount of expenses (GTX 580 vs ASIC pricing).


Does that include electricity costs?
sr. member
Activity: 278
Merit: 250
My objective is not to code an Nvidia GPU Primecoin miner, that part is functional but not completely optimized. The objective is to find an algorithmic enhancement for reducing the overall run-time complexity. If we keep the same complexity but execute the algorithm faster on the same hardware, I call it an optimization. But if we reduce the overall run-time complexity of the algorithm, I call it a breakthrough.

Now with that said, is seems to me that people are more interested in the Nvidia GPU Primecoin miner. But I say, be careful what you ask for, just follow the programming guideline I posted earlier and you will have a working miner that blows away the current high performance CPU miner implementation. Additionally, I know nothing about AMD GPU programming and never owned one, therefore I do not know how powerful they are for multiprecision arithmetic.

There are several enhancements which account for the drastic increase in performance over the current CPU implementation:
1) Montgomery Reduction is used.
2) The size of the multiprecision arithmetic is fixed.
3) An optimized sieve is running on the GPU.
4) An optimized primorial search is running on the GPU (double SHA-256).
5) An exploitation of the difficulty (Sunny King knows what I am referring to here, just ask Sunny).

If you apply the same enhancements to the CPU code, then the gap will close to about 7x in favor for the GPU (Nvidia GTX 580 vs AMD Phenom II X6 1100T)

Please note that I am not accepting any donations for the research, the XPMs from GPUs will fully fund the research and the result will be made publically available. Someone from this forum made an attempt to program an Nvidia card, but after looking at the source code(https://github.com/primedigger/primecoin/blob/master/src/cuda/mainkernel.cu), I am surprised that it even made a difference at all.
sr. member
Activity: 278
Merit: 250
Quote
I suspect you are in fact trying to subtly manipulate the markets by stirring up the "IMPENDING GPU MINER" hysteria (again) that serves to create sillyness in the altcoin exchange trollboxes and market instability in general.

I am putting this down to another piss-poor attempt at market manipulation until some code is released and verified as working by other miners.

Haters Gonna Hate

Sorry, the source code is proprietary (IP) closed source. However, I have outlined the exact steps that you need to take to verify the results.

Please consult the following documentation for further assistance:

The Billion-Mulmod-Per-Second PC
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.150.4576&rep=rep1&type=pdf

Usable assembly language for GPUs
https://eprint.iacr.org/2012/137.pdf
hero member
Activity: 546
Merit: 500
Quote
Unfortunately, carrying out the research to analyze collected data will require funding. I have been running "Primecoin version v0.1.2.0xpm-hp11-unk-beta" for 4 days now on a 32/64 core/thread server and I have only mined a block (primemeter  43807850 prime/h 663367364 test/h  180 7-chains/h 8.075400 chain/d). I just cannot see how anyone, other that people who have access to botnets, can be profitable at mining Primecoin.

8.0754 * 0.8 * 0.04 = 0.26 blocks per day (no variance) = those results are completely as expected = deal with it!

Edit: Apparently this old method of estimating blocks/day is outdated: ref: https://bitcointalksearch.org/topic/m.3487226

My GTX 580 disagrees with that mathematical formula and it did not even enroll in any engineering classes at MIT. You need to actually count the number of 9-chains at a fixed precision for a given amount of time before you can estimate your probability of finding a chain that meets the requirement.

My comment above has nothing to do with the discussion about GPU. You are confusing two different discussions going on in this thread simultaneously. The formula I provided is one determined by the work of mikaelh, who is the author of the HP series of miners. I realize you are fairly new and may not have read all 170+ pages of this thread, you would've come across it many times if you had. The formula provides an accurate representation of the mining reward when using CPU and when assuming zero variance, eg. an average over a long period of time.



Quote
(30 blocks/day) x (60 days) x (10 XPMs) x (6 GTX 580 Hydro's) x (1.17 speed-up over stock ) = 126k XPMs

I suspect you are in fact trying to subtly manipulate the markets by stirring up the "IMPENDING GPU MINER" hysteria (again) that serves to create sillyness in the altcoin exchange trollboxes and market instability in general.

I am putting this down to another piss-poor attempt at market manipulation until some code is released and verified as working by other miners.
legendary
Activity: 2940
Merit: 1090
(30 blocks/day) x (60 days) x (10 XPMs) x (6 GTX 580 Hydro's) x (1.17 speed-up over stock ) = 126k XPMs

Single GPU will be finding 30 blocks/day!!!!!!!!!!???

There are must be something wrong with your assumption. Previously you used numbers like: "3 blocks/day", "7x faster then CPU". How did you come from that to 30 blocks/day now?

He said he got five blocks in four hours using one GPU. If that can be done consistently it would be 30 blocks per GPU per day.

-MarkM-
sr. member
Activity: 321
Merit: 250
I have five (5) GTX 580's Hydro Copper collecting dust and one (1) in my development PC. Sometime within the next couple of weeks, I will use all six (6) for Primecoin mining and let them run for about two months.
(30 blocks/day) x (60 days) x (10 XPMs) x (6 GTX 580 Hydro's) x (1.17 speed-up over stock ) = 126k XPMs

Single GPU will be finding 30 blocks/day!!!!!!!!!!???

There are must be something wrong with your assumption. Previously you used numbers like: "3 blocks/day", "7x faster then CPU". How did you come from that to 30 blocks/day now?

You forgot to quote the base for the calculation. I fixed that for you (and even highlited the important parts Wink
sr. member
Activity: 301
Merit: 250
still can't change my profile pic
(30 blocks/day) x (60 days) x (10 XPMs) x (6 GTX 580 Hydro's) x (1.17 speed-up over stock ) = 126k XPMs

Single GPU will be finding 30 blocks/day!!!!!!!!!!???

There are must be something wrong with your assumption. Previously you used numbers like: "3 blocks/day", "7x faster then CPU". How did you come from that to 30 blocks/day now?
hero member
Activity: 516
Merit: 500
CAT.EX Exchange
I would estimate somewhere between 15-20 hours of work, but finding the free time for me at this time of the year is more difficult than doing the actual work. Early next year I will have time and the botnets can enjoy Primcoin mining for a couple of months longer before the knockout punch comes to them.

Despite this https://bitcointalksearch.org/topic/xpm-working-on-a-gpu-miner-for-primecoin-new-thread-273637 many may want to donate/invest again if you lay your case down clearly -- it's been good so far.
sr. member
Activity: 278
Merit: 250
im sure that if you invent something that really work you could be a millionaire. Invent GPU miner could couse drop reward very quickly and make the price higher. this is the race - so move on. you can find some funds - just try to find a investor on your area.

if you really don't have time/energy to do this - maybe you can give some tips to another devs on public forum

At this stage, funding is no longer necessary for the research from the general public, it is optional. I have five (5) GTX 580's Hydro Copper collecting dust and one (1) in my development PC. Sometime within the next couple of weeks, I will use all six (6) for Primecoin mining and let them run for about two months.

(30 blocks/day) x (60 days) x (10 XPMs) x (6 GTX 580 Hydro's) x (1.17 speed-up over stock ) = 126k XPMs
Early next year I will offer XPM/BTC to those mathematicians who are interested in helping out with the research; and knowledge gained from the research will be made publically available.

I am shell shocked about two things that I did not know just a couple of days ago:
1) Primecoin mining is currently more profitable than Bitcoin mining for the same amount of expenses (GTX 580 vs ASIC pricing).
2) An optimized sieve of Eratosthenes on the GPU makes enough of a difference to increase your probability of finding a valid block.

Here is a link to the CUDA implementation of the sieve that I based mine on. My code is a little bit more optimized, but it is a good start for those who are interested:
https://sites.google.com/site/bbuhrow/home/cuda-sieve-of-eratosthenes

Here is good guide for implementing your own modular arithmetic on the GTX 580:
http://infoscience.epfl.ch/record/180450/files/jb_lowlatency.pdf

I am not an expert Nvidia GPU programmer by any means, but I am an FPGA and x64 assembly guy. So I am left with a couple of theories:
1) Nobody who knows about Primecoin mining took the time to implement an optimized modular multiplication and prime sieving on the GPU (very hard to believe).
2) Somebody out there already has an efficient implementation for the GPU and is keeping it to themselves (very easy to believe).
legendary
Activity: 1610
Merit: 1000
Well hello there!
Hey fellas. Just downloaded 1.2-beta win32 client from sourceforge and trying to get it up and running but seemingly not finding any peers. Anybody else having this problem? Is the preferred scenario setting up a linux box and compiling from source?
Same problem here. The advice from page 172 helped me; I did the following steps, and after a few minutes was synchronizing blocks with the network:

  • closed primecoin-qt
  • deleted peers.dat from %AppData%\Primecoin
  • created primecoin.conf in that folder, and put this line in it: seednode=primeseed.muuttuja.org
  • loaded primecoin-qt, and after a few moments the Help->Debug Window showed an increasing block count and date

HTH


It appears as though it just took it several hours to find a peer. Looks like i've got it now. Next step, try to make a coin or two Smiley

Thanks for the reply
newbie
Activity: 15
Merit: 0
Hey fellas. Just downloaded 1.2-beta win32 client from sourceforge and trying to get it up and running but seemingly not finding any peers. Anybody else having this problem? Is the preferred scenario setting up a linux box and compiling from source?
Same problem here. The advice from page 172 helped me; I did the following steps, and after a few minutes was synchronizing blocks with the network:

  • closed primecoin-qt
  • deleted peers.dat from %AppData%\Primecoin
  • created primecoin.conf in that folder, and put this line in it: seednode=primeseed.muuttuja.org
  • loaded primecoin-qt, and after a few moments the Help->Debug Window showed an increasing block count and date

HTH
legendary
Activity: 1610
Merit: 1000
Well hello there!
Hey fellas. Just downloaded 1.2-beta win32 client from sourceforge and trying to get it up and running but seemingly not finding any peers. Anybody else having this problem? Is the preferred scenario setting up a linux box and compiling from source?
member
Activity: 93
Merit: 10
im sure that if you invent something that really work you could be a millionaire. Invent GPU miner could couse drop reward very quickly and make the price higher. this is the race - so move on. you can find some funds - just try to find a investor on your area.

if you really don't have time/energy to do this - maybe you can give some tips to another devs on public forum
sr. member
Activity: 278
Merit: 250
Supercomputing: can you tell us if GPU could mine 10x faster then CPU in the nearest future ? Is it possible? Or other FGPA/ASIC stuff could be used to mine PrimeCoin?

Montgomery multiplication - Coarsely integrated operand scanning (CIOS) @ 256-bit
An Nvidia GTX 580 (reference design ) is about 7x faster than an AMD Phenom II X6 1100T using all six cores

Montgomery multiplication for GPU - must use hand optimized PTX (unrolled)
Montgomery multiplication for CPU - must use hand optimized x64 assembly (unrolled)

I would put the GPU at about 7x that of the CPU since modular multiplication is the bottle neck and not sieving.


FPGA's will only be useful for primorial searching, they are not cost effective for modular multiplication.

you are saying that you (or someone else) can create GPU miner which is 7x faster then CPU (for the same price) ? how long could it take?

Yes, that is correct when using the GPU and CPU below for the baseline.

For most desktop CPUs, it will be more than 7x faster. Also, the AMD Bulldozer and the Nvidia GTX 6xx series took a step backwards when it comes to integer arithmetic throughput (for multiplication). Intel's Sandy Bridge, Ivy Bridge, and Haswell processors are also very good. However, AMD's K10 series is still king for the CPUs, and the GTX Titan is still king for the GPUs.

For me, I am still at the proof of concept stage and it is looking very good. I found 5 blocks (9-chain) in the last 4 hours while off-loading the Fermat tests to the GPU (single GTX 580). There is still a lot of work left to do before reaching the point where a single GTX 580 can aid in finding 3 blocks (10-chains) within 24 hours. The primorial search needs to be off-loaded to a second GPU, it is just as important as sieving for aiding in the search for 10-chains faster at 320-bit. I would estimate somewhere between 15-20 hours of work, but finding the free time for me at this time of the year is more difficult than doing the actual work. Early next year I will have time and the botnets can enjoy Primcoin mining for a couple of months longer before the knockout punch comes to them.

Baseline GPU: Nvidia GTX 580  (utilizing all 16 processors 512 ALUs)
Baseline CPU: AMD Phenom II X6 1100T (utilizing all 6 cores SIMD)


Primecoin Miner off-load GPU (must follow these rules for multiprecision arithmetic implementation):
Minimize thread divergence.
Global memory access must be coalesced.
Use shared memory for data exchange.
Precision must be fixed at compile time: e.g. 320-bit.
Use Montgomery Reduction (CIOS).
Must use unrolled PTX coding for Montgomery Reduction (madc, mad.cc, addc, add.cc, etc).
Compile and bechmark with different values for maxrregcount.
Compile and bechmark with different grids, blocks, and threads organization.
Compile to .cubin format and profile the code.
Optimize the code using the profiling data.

If the above guideline is not followed, an x64 CPU will most likely outperform the GPU.
sr. member
Activity: 278
Merit: 250
Quote
Unfortunately, carrying out the research to analyze collected data will require funding. I have been running "Primecoin version v0.1.2.0xpm-hp11-unk-beta" for 4 days now on a 32/64 core/thread server and I have only mined a block (primemeter  43807850 prime/h 663367364 test/h  180 7-chains/h 8.075400 chain/d). I just cannot see how anyone, other that people who have access to botnets, can be profitable at mining Primecoin.

8.0754 * 0.8 * 0.04 = 0.26 blocks per day (no variance) = those results are completely as expected = deal with it!

Edit: Apparently this old method of estimating blocks/day is outdated: ref: https://bitcointalksearch.org/topic/m.3487226

My GTX 580 disagrees with that mathematical formula and it did not even enroll in any engineering classes at MIT. You need to actually count the number of 9-chains at a fixed precision for a given amount of time before you can estimate your probability of finding a chain that meets the requirement.
hero member
Activity: 812
Merit: 1000
Really interesting...Primecoin is doing alright, all things considered.
I'm in for the long haul.
hero member
Activity: 560
Merit: 500
Supercomputing: can you tell us if GPU could mine 10x faster then CPU in the nearest future ? Is it possible? Or other FGPA/ASIC stuff could be used to mine PrimeCoin?

Montgomery multiplication - Coarsely integrated operand scanning (CIOS) @ 256-bit
An Nvidia GTX 580 (reference design ) is about 7x faster than an AMD Phenom II X6 1100T using all six cores

Montgomery multiplication for GPU - must use hand optimized PTX (unrolled)
Montgomery multiplication for CPU - must use hand optimized x64 assembly (unrolled)

I would put the GPU at about 7x that of the CPU since modular multiplication is the bottle neck and not sieving.


FPGA's will only be useful for primorial searching, they are not cost effective for modular multiplication.

you are saying that you (or someone else) can create GPU miner which is 7x faster then CPU (for the same price) ? how long could it take?
No, he is saying that the current bottleneck can be made 7x faster. This is only one part of a more complicated pathway, so one might expect a fresh bottleneck to become apparent. For example
The sieve (of Atkin) will remain on the CPU to maximize performance.
If you have a slow CPU this may become the next bottleneck. If you have a powerful, many cored processor, the next bottleneck may be in another part of the calculation. How would we tell, since collecting all the required data on miners' hardware and software environments is next to impossible?

So now, there are two possible directions for the research of finding longer chains faster. The first direction is to continue making incremental improvements to the sieving and modular exponentiation arithmetic (brute force). And the second direction (which I prefer) is to analyze all of the data collected so far and come up with better arithmetic for finding longer chains. I assume that this was the reason that made Primecoin unique, otherwise it is no different from Bitcoin mining.

Unfortunately, carrying out the research to analyze collected data will require funding.
member
Activity: 93
Merit: 10
Supercomputing: can you tell us if GPU could mine 10x faster then CPU in the nearest future ? Is it possible? Or other FGPA/ASIC stuff could be used to mine PrimeCoin?

Montgomery multiplication - Coarsely integrated operand scanning (CIOS) @ 256-bit
An Nvidia GTX 580 (reference design ) is about 7x faster than an AMD Phenom II X6 1100T using all six cores

Montgomery multiplication for GPU - must use hand optimized PTX (unrolled)
Montgomery multiplication for CPU - must use hand optimized x64 assembly (unrolled)

I would put the GPU at about 7x that of the CPU since modular multiplication is the bottle neck and not sieving.


FPGA's will only be useful for primorial searching, they are not cost effective for modular multiplication.

you are saying that you (or someone else) can create GPU miner which is 7x faster then CPU (for the same price) ? how long could it take?
hero member
Activity: 546
Merit: 500
Quote
Unfortunately, carrying out the research to analyze collected data will require funding. I have been running "Primecoin version v0.1.2.0xpm-hp11-unk-beta" for 4 days now on a 32/64 core/thread server and I have only mined a block (primemeter  43807850 prime/h 663367364 test/h  180 7-chains/h 8.075400 chain/d). I just cannot see how anyone, other that people who have access to botnets, can be profitable at mining Primecoin.

8.0754 * 0.8 * 0.04 = 0.26 blocks per day (no variance) = those results are completely as expected = deal with it!

Edit: Apparently this old method of estimating blocks/day is outdated: ref: https://bitcointalksearch.org/topic/m.3487226
sr. member
Activity: 278
Merit: 250
Supercomputing: can you tell us if GPU could mine 10x faster then CPU in the nearest future ? Is it possible? Or other FGPA/ASIC stuff could be used to mine PrimeCoin?

Montgomery multiplication - Coarsely integrated operand scanning (CIOS) @ 256-bit
An Nvidia GTX 580 (reference design ) is about 7x faster than an AMD Phenom II X6 1100T using all six cores

Montgomery multiplication for GPU - must use hand optimized PTX (unrolled)
Montgomery multiplication for CPU - must use hand optimized x64 assembly (unrolled)

I would put the GPU at about 7x that of the CPU since modular multiplication is the bottle neck and not sieving.


FPGA's will only be useful for primorial searching, they are not cost effective for modular multiplication.
Pages:
Jump to: