Author

Topic: Kepler vs Radeon [GCN] Architecture, any 680 GTX reviews on mining...? (Read 4949 times)

full member
Activity: 242
Merit: 100
I expect less than 280mhs. (2xGTX580)
full member
Activity: 126
Merit: 100
Did not copy quote properly, BSN states the current Kepler reaches 3.09 TFLOPS in single precision, and possibly 1/8 in double. Not much about GPGPU performance is currently discussed (at least until May during GPU tech conference). If it is possible to unlock (says BSN), Kepler should reach between 1.54 to 2.32 TFLOPS in double precision. I wonder how true these numbers are in practice.

Wonder how AMD will reconsider the pricing on GCN, and how perhaps Intel's possible upcoming graphics card will perform in GPGPU.
full member
Activity: 126
Merit: 100
Bright Side of News did a neat analysis on the architecture of the gpu released today:

http://www.brightsideofnews.com/news/2012/3/22/nvidia-kepler-analysis-another-masterpiece-from-the-architects-of-g80-or.aspx?pageid=1

Quote
...By looking at raw numbers, Kepler features higher efficiency that Fermi but there are some parts which remain the same, as LD/ST (Load/Store) units inside each SMX unit. This result in 64-bit operations executed much in the same way as Fermi - at least on the consumer side of things. The most important bit is the numbers of instructions per clock: while Fermi (GF100/110) executed up to 1024 instructions in a single clock, i.e. 1.58 million instructions per second, Kepler (GF104) executes 2048 instructions per clock, i.e. 2.06 million instructions per second. If there was any doubt why GeForce GTX 670 Ti became GeForce GTX 680, this is it.

GPU Computing & Double Precision: Yes, the GeForce Kepler is Castrated...

Edit:
Did not copy quote properly, BSN states the current Kepler reaches 3.09 TFLOPS in single precision, and possibly 1/8 in double. Not much about GPGPU performance is currently discussed (at least until May during GPU tech conference). If it is possible to unlock (says BSN), Kepler should reach between 1.54 to 2.32 TFLOPS in double precision. I wonder how true these numbers are in practice through mining.

Wonder how AMD will reconsider the pricing on GCN, and how perhaps Intel's possible upcoming graphics card will perform in GPGPU.
legendary
Activity: 1274
Merit: 1004
1) nVidia purposely severely caps double-precision (FP64) performance of "gaming" cards, so they don't diminish the sales of Quadro and Tesla boards. Single-precision (FP32) is uncapped, double-precision should be around 1/2 of single-precision performance (on Quadros and Teslas). Nevertheless, Bitcoin uses integer math, not floating-point.

Not in GK104. In each SMX there are 192 normal CUDA cores and 8 FP64 CUDA cores. The 8 special cores run FP64 at the same speed as FP32, but the vast majority of the CUDA cores are incapable of running FP64 at any speed.
From Anandtech's review
Quote
The other change coming from GF114 is the mysterious block #15, the CUDA FP64 block. In order to conserve die space while still offering FP64 capabilities on GF114, NVIDIA only made one of the three CUDA core blocks FP64 capable. In turn that block of CUDA cores could execute FP64 instructions at a rate of ¼ FP32 performance, which gave the SM a total FP64 throughput rate of 1/12th FP32. In GK104 none of the regular CUDA core blocks are FP64 capable; in its place we have what we’re calling the CUDA FP64 block.

The CUDA FP64 block contains 8 special CUDA cores that are not part of the general CUDA core count and are not in any of NVIDIA’s diagrams. These CUDA cores can only do and are only used for FP64 math. What more, the CUDA FP64 block has a very special execution rate: 1/1 FP32. With only 8 CUDA cores in this block it takes NVIDIA 4 cycles to execute a whole warp, but each quarter of the warp is done at full speed as opposed to ½, ¼, or any other fractional speed that previous architectures have operated at. Altogether GK104’s FP64 performance is very low at only 1/24 FP32 (1/6 * ¼), but the mere existence of the CUDA FP64 block is quite interesting because it’s the very first time we’ve seen 1/1 FP32 execution speed. Big Kepler may not end up resembling GK104, but if it does then it may be an extremely potent FP64 processor if it’s built out of CUDA FP64 blocks.

It's not an artificial limit, GK104 is actually really incapable of doing FP64 at any reasonable speed. There probably won't be any Kepler Quadro or Tesla cards until GK110 is released.
Vbs
hero member
Activity: 504
Merit: 500
The Kepler architecture whitepaper should clarify those issues, I don't think they have released it yet?

(...) (Unless an expert CUDA programmer can map out bitcoin mining effectively over 1536 nVidia streams?)

The CUDA compiler should take care of that successfully.

Now we just need someone with a GTX 680 to run Ufasoft's miner, it will use CUDA by default on nVidia hardware.

EDIT: http://www.geforce.com/Active/en_US/en_US/pdf/GeForce-GTX-680-Whitepaper-FINAL.pdf Still, no specific integer references I could see, we need real performance numbers. Grin
donator
Activity: 1218
Merit: 1079
Gerald Davis
Even with CUDA it is unlikely it would come close to a 7970 and likely not even a 7950.

AMD has two very valuable instructions which allow it to save a large number of clock cycles each round.  500 series NVIDIA cards lake any equivelent.  Since these instructions are useful in many forms of encryption one would expect we would see that or have it mentioned by NVIDIA or have it reflected in intop benchmarks.
full member
Activity: 126
Merit: 100
Given the fact that no reviews so far mention bitmining as an application for the 680 (7970 benched for mining at release) and the history of nVidia's cards not being comparable to the Radeon line-up in mining, is it safe to say that the 680 is suited only to be a gaming card? (Unless an expert CUDA programmer can map out bitcoin mining effectively over 1536 nVidia streams?)
donator
Activity: 1218
Merit: 1079
Gerald Davis
nope.  single, double and float refer to floating point math.  NVidia has always had great floating point processing power.  Of course it is completely useless for hashing.  You want something dealing with integers (whole numbers).

https://bitcointalksearch.org/topic/m.811192
full member
Activity: 126
Merit: 100
I just glanced over the article again, the site used SiSoft Sandra 2012; the graphs comparing both cards are labeled as "double shader Mpix/s" and "Float Shader Mpix/s" instead of double and single precisopn, does this difference in terminology matter?
Vbs
hero member
Activity: 504
Merit: 500
1) nVidia purposely severely caps double-precision (FP64) performance of "gaming" cards, so they don't diminish the sales of Quadro and Tesla boards. Single-precision (FP32) is uncapped, double-precision should be around 1/2 of single-precision performance (on Quadros and Teslas). Nevertheless, Bitcoin uses integer math, not floating-point.

2) Any bench using OpenCL is pontless, since nVidia doesn't really care much about OpenCL. It does care a lot about CUDA, and to a lesser degree DirectCompute, since it is being used more and more by games for advanced effects (realistic depth-of-field, etc).
donator
Activity: 1218
Merit: 1079
Gerald Davis
double or single precision refers to floating point math.   I think there is a toms hardware article which has some OpenCL integer math comparison and the 680 is totally outgunned by the 7970.  It is even marginally out gunned by the 7950.
full member
Activity: 126
Merit: 100
There are numerous topics in this forum already posted regarding the upcoming 680 GTX. So far I have not seen any reviews investigating bitcoin mining on this new gpu, nVidia's website does not mention its application for mining, it is mentioned merely as a gaming card.  Has anyone found any solid numbers? The 680 triples in number of streams over the 580, despite architectural differences, what does this translate to versus the 7970?

Perhaps the closest benchmark posted on the net so far concerning GPGPU performance is on the Bright Side of News: this article compares single and double precision power between the 680 and 7970, and the 590 vs 680 in CUDA (590 wins in double precision). The 7970 significantly "smokes" the 680 in double precision in cryptography, yet the 680 "smokes" back the 7970 in single precision. Another article on this website will be published analyzing the 680's architecture more in depth yet.

Article: 

http://www.brightsideofnews.com/news/2012/3/22/nvidia-gtx-680-reviewed-a-new-hope.aspx?pageid=4
Jump to: