Pages:
Author

Topic: GeForce GTX 680 are now available! Please post hashing results here. (Read 20138 times)

legendary
Activity: 1190
Merit: 1000
www.bitcointrading.com
How do you guys thing the k10/k20 tesla's will do. K10 is dual 680's and K20 is big kepler.

Fail again most likely. Nvidia are a bunch of retards for letting AMD and their shitty drivers be the only mining GPUs.
I think the mining community is too small for nVidia to focus on.  Thankfully!  If the whole world was mining the difficulty would be pretty darn hard.
full member
Activity: 238
Merit: 100
★YoBit.Net★ 350+ Coins Exchange & Dice
There's a table of operations per clock cycle per SMX

Given that K20 has 15 SMX and K10 has 16 SMX and GTX680 has 8 SMX and we all know how fast GTX680 is at mining.... it doesn't look hopeful.
hero member
Activity: 518
Merit: 500
How do you guys thing the k10/k20 tesla's will do. K10 is dual 680's and K20 is big kepler.

Fail again most likely. Nvidia are a bunch of retards for letting AMD and their shitty drivers be the only mining GPUs.
full member
Activity: 238
Merit: 100
★YoBit.Net★ 350+ Coins Exchange & Dice
Yup, it's the integer performance that is killing the hashing results, GTX580 is capable of almost 1000000 Miops and it does ~180 MH/s.

http://forums.nvidia.com/index.php?showtopic=192076
hero member
Activity: 846
Merit: 1000
The One and Only
I'm getting set up for doing some test mining on my dual 680's now.

Does anyone have a GTX680 here? I would be interested in the results of CUDA-Z.

http://cuda-z.sourceforge.net/

If the integer performance there is good, it's probably an optimization issue. If it is bad, then more than likely the hardware is just not as good as Fermi in some way.

Tested, Here are the results:
Gives more stable performance on the second card in my SLI Setup.
http://dl.dropbox.com/u/20040127/CUDA-Z.txt

I'll be interested to see performance with Cuda 5
legendary
Activity: 1190
Merit: 1000
www.bitcointrading.com
edit: the VGX has only 786 cuda cores total across all 4 GPUs?  SAD!  no longer excited.

It seems more reasonable to me that it would be 768 core per each of the GPU.

Yeah but it says

GPU Specifications
Number of GPUs    4
Total NVIDIA CUDA® Cores    768
Shader Perf (TFLOPS)    1.3
Power (W)    150

So.. Total CUDA cores = 768?  What a rip!
full member
Activity: 238
Merit: 100
★YoBit.Net★ 350+ Coins Exchange & Dice
edit: the VGX has only 786 cuda cores total across all 4 GPUs?  SAD!  no longer excited.

It seems more reasonable to me that it would be 768 core per each of the GPU.
legendary
Activity: 1190
Merit: 1000
www.bitcointrading.com
the thing that got me excited was the quad-GPU VGX

edit: the VGX has only 786 cuda cores total across all 4 GPUs?  SAD!  no longer excited.



and the endless racks FULL of GPUs



and since they offer enterprise cloud computing, wondering if we could get all those GPUs hashing?  EXCEPT the fact that nVidia sucks balls at mining.
hero member
Activity: 518
Merit: 500
and updates on 680 hashrates?

Nope it looks like Nfail again.

Too bad because I would have loved to get rid of this shitty ATI mining monopoly game.

Angry
legendary
Activity: 1190
Merit: 1000
www.bitcointrading.com
and updates on 680 hashrates?
legendary
Activity: 1148
Merit: 1008
If you want to walk on water, get out of the boat
Quote
Many scientific simulations require the precision of FP64, so FP32 isn't really an option. That's why you pay thousands for Telsa instead of just buying a GTX580.

Except 6970 FP64 performance is even higher than Tesla.

People don't buy Tesla for its FP64 performance. They buy it because it is a dynamically-scheduled architecture (which tends to extract a ton of thread-level parallelism), with ECC support (important if you are into HPC), and TCC mode.

Quote
As for AMD vs NVIDIA mining performance, there are other factors there. AMD supports a couple instructions that significantly improve hashing performance. You can't directly compare shaders*clocks between the two.

I'm talking about Fermi vs Kepler
7900 serie has ECC support
full member
Activity: 238
Merit: 100
★YoBit.Net★ 350+ Coins Exchange & Dice
Does anyone have a GTX680 here? I would be interested in the results of CUDA-Z.

http://cuda-z.sourceforge.net/

If the integer performance there is good, it's probably an optimization issue. If it is bad, then more than likely the hardware is just not as good as Fermi in some way.
legendary
Activity: 1274
Merit: 1004
Quote
Many scientific simulations require the precision of FP64, so FP32 isn't really an option. That's why you pay thousands for Telsa instead of just buying a GTX580.

Except 6970 FP64 performance is even higher than Tesla.

People don't buy Tesla for its FP64 performance. They buy it because it is a dynamically-scheduled architecture (which tends to extract a ton of thread-level parallelism), with ECC support (important if you are into HPC), and TCC mode.
That, and the better support NVIDIA offers to HPC developers. Interestingly, the hardware scheduler is gone in GK104.

Quote
As for AMD vs NVIDIA mining performance, there are other factors there. AMD supports a couple instructions that significantly improve hashing performance. You can't directly compare shaders*clocks between the two.

I'm talking about Fermi vs Kepler
For that, I have no idea. I would imagine it's a lack of optimizations for the new arch, but I don't know much about the CUDA miners.
full member
Activity: 238
Merit: 100
★YoBit.Net★ 350+ Coins Exchange & Dice
Quote
Many scientific simulations require the precision of FP64, so FP32 isn't really an option. That's why you pay thousands for Telsa instead of just buying a GTX580.

Except 6970 FP64 performance is even higher than Tesla.

People don't buy Tesla for its FP64 performance. They buy it because it is a dynamically-scheduled architecture (which tends to extract a ton of thread-level parallelism), with ECC support (important if you are into HPC), and TCC mode.

Quote
As for AMD vs NVIDIA mining performance, there are other factors there. AMD supports a couple instructions that significantly improve hashing performance. You can't directly compare shaders*clocks between the two.

I'm talking about Fermi vs Kepler
legendary
Activity: 1274
Merit: 1004
Edit: Anyways if it is right that 1 of the 3 blocks cannot do anything but FP mathematics, that means that there are still 1024 cores which can. Coupled with the halved speed, it should still match the GTX580. I wonder why the performance is so poor then.

I dont buy the new architecture = no tuned miner theory. It should be relatively easy to write a compiler for a MIMD architecture like the GTX680 and get 99% utilization with data parallel tasks, like hashing.

It's not 1/3 blocks. There's 192 normal CUDA cores in an SMX and only 8 FP64 cores. That's why performance is so poor.

Ahh, I misread and equated "blocks" with "SMs". Apologies.

But as far as I know, mining is integer performance, which should go with the normal CUDA cores right? Why is the performance so poor then?

As for FP performance, most scientific applications should use FP32 to save memory/time (although with the amount of lazy programmers I deal with everyday, that may not be the case....)

Many scientific simulations require the precision of FP64, so FP32 isn't really an option. That's why you pay thousands for Telsa instead of just buying a GTX580.

As for AMD vs NVIDIA mining performance, there are other factors there. AMD supports a couple instructions that significantly improve hashing performance. You can't directly compare shaders*clocks between the two.
full member
Activity: 238
Merit: 100
★YoBit.Net★ 350+ Coins Exchange & Dice
Edit: Anyways if it is right that 1 of the 3 blocks cannot do anything but FP mathematics, that means that there are still 1024 cores which can. Coupled with the halved speed, it should still match the GTX580. I wonder why the performance is so poor then.

I dont buy the new architecture = no tuned miner theory. It should be relatively easy to write a compiler for a MIMD architecture like the GTX680 and get 99% utilization with data parallel tasks, like hashing.

It's not 1/3 blocks. There's 192 normal CUDA cores in an SMX and only 8 FP64 cores. That's why performance is so poor.

Ahh, I misread and equated "blocks" with "SMs". Apologies.

But as far as I know, mining is integer performance, which should go with the normal CUDA cores right? Why is the performance so poor then?

As for FP performance, most scientific applications should use FP32 to save memory/time (although with the amount of lazy programmers I deal with everyday, that may not be the case....)
legendary
Activity: 1274
Merit: 1004
Edit: Anyways if it is right that 1 of the 3 blocks cannot do anything but FP mathematics, that means that there are still 1024 cores which can. Coupled with the halved speed, it should still match the GTX580. I wonder why the performance is so poor then.

I dont buy the new architecture = no tuned miner theory. It should be relatively easy to write a compiler for a MIMD architecture like the GTX680 and get 99% utilization with data parallel tasks, like hashing.

It's not 1/3 blocks. There's 192 normal CUDA cores in an SMX and only 8 FP64 cores. That's why performance is so poor.
legendary
Activity: 1274
Merit: 1004
Mining doesnt use FP -_- Also not sure why you quoted the part on GF114. The GTX480 and Quadro/Tesla are GF100, not GF114.
Only the first sentence really talks about GF114, and it's just there to give context as to the difference in approach between GF114 and GK104.

As for mining, that's true. However, Telsa cards are not sold for mining. Most scientific computing uses FP, so when discussing whether the new 7B transistor Telsa card is 2x3.5B GK104 or GK110, floating point performance is a huge factor.
full member
Activity: 238
Merit: 100
★YoBit.Net★ 350+ Coins Exchange & Dice
Mining doesnt use FP -_- Also not sure why you quoted the part on GF114. The GTX480 and Quadro/Tesla are GF100, not GF114.

Edit: Anyways if it is right that 1 of the 3 blocks cannot do anything but FP mathematics, that means that there are still 1024 cores which can. Coupled with the halved speed, it should still match the GTX580. I wonder why the performance is so poor then.

I dont buy the new architecture = no tuned miner theory. It should be relatively easy to write a compiler for a MIMD architecture like the GTX680 and get 99% utilization with data parallel tasks, like hashing.
legendary
Activity: 1274
Merit: 1004
Apparently nVidia is releasing a Kepler version of Tesla on May 14.  What seems very interesting is that it is a 7 billion transistor chip, the GTX 680 is 3.5 billion transistors.

http://www.brightsideofnews.com/news/2012/4/20/nvidia-to-launch-a-7-billion-transistor-kepler-gpgpu-tesla-boards-on-may-14.aspx

Interesting. With the number it would be tempting to think 2xGK104, but given how crippled GK104 is in FP64 performance I can't see that happening in a Tesla product. If this actually is BigK, it's way ahead of when most people expected it to ship.

Actually NVIDIA is most likely just crippling in the drivers like they did for the GTX480 vs Quadro6000/Tesla C2050 by hobbling the drivers. Anyways I dont think GK104 has ECC support, so it is likely this new card will be GK110.

You're basing this on what? Every review that talks about it is pretty emphatic that FP64 performance just isn't there, which is one of the reasons NVIDIA was able to pack so much performance in such a little die.

From Anandtech
Quote
The other change coming from GF114 is the mysterious block #15, the CUDA FP64 block. In order to conserve die space while still offering FP64 capabilities on GF114, NVIDIA only made one of the three CUDA core blocks FP64 capable. In turn that block of CUDA cores could execute FP64 instructions at a rate of ¼ FP32 performance, which gave the SM a total FP64 throughput rate of 1/12th FP32. In GK104 none of the regular CUDA core blocks are FP64 capable; in its place we have what we’re calling the CUDA FP64 block.

The CUDA FP64 block contains 8 special CUDA cores that are not part of the general CUDA core count and are not in any of NVIDIA’s diagrams. These CUDA cores can only do and are only used for FP64 math. What's more, the CUDA FP64 block has a very special execution rate: 1/1 FP32. With only 8 CUDA cores in this block it takes NVIDIA 4 cycles to execute a whole warp, but each quarter of the warp is done at full speed as opposed to ½, ¼, or any other fractional speed that previous architectures have operated at. Altogether GK104’s FP64 performance is very low at only 1/24 FP32 (1/6 * ¼), but the mere existence of the CUDA FP64 block is quite interesting because it’s the very first time we’ve seen 1/1 FP32 execution speed. Big Kepler may not end up resembling GK104, but if it does then it may be an extremely potent FP64 processor if it’s built out of CUDA FP64 blocks.
Pages:
Jump to: