Pages:
Author

Topic: Ufasoft Miner - Windows/Linux, x86/x64, SSE2/OpenCL, Open Source - page 50. (Read 631117 times)

newbie
Activity: 21
Merit: 0
Excited about the update!

Ufa: is there a place to go (other than here in the forums) to keep up on your miner updates?

Thanks  Grin Grin Grin
full member
Activity: 126
Merit: 100
0.4 is about 10% faster but affects my GPU mining (by 7Mh/s, from 274 to 267)
sr. member
Activity: 404
Merit: 251
There appears to be a small memory leak somewhere in this program. I haven't looked at the code yet to try and find it.
Version 0.4 uploaded:
  • The Memory leak fixed.
  • ATI/AMD GPU adapters now supported, but performance is not better than on other OpenCL miners
full member
Activity: 126
Merit: 100
After a computer restart I'm getting 20Mh/s lol 0.75 more than before
newbie
Activity: 18
Merit: 0
Went from 3 MH/s to 10 on an AMD Phenom x4. But it only seems to work when I use 127.0.0.1. When I try to send it over a lan to a local server @192.168.0.5 it doesn't work. Also, why do I have connect to port 8332 on 127.0.0.1, doesn't Bitcoin use 8333?

Thanks for the great work!
full member
Activity: 126
Merit: 100
I get 19.2~19.3Mh/s constant (while running GPU miner as well) with my i7 920 @ 4ghz. Unfortunately it also lowers my GPU mining by 10Mh/s (260 to 250), and My CPU consumes 200W while my GPU <175W
newbie
Activity: 57
Merit: 0
However, wanted to let people know that my ugly code is now in jgarzik's tree. You can verify it at:

https://github.com/jgarzik/cpuminer/tree/
Cool, just sent you the other 25 BTC. Thanks for your great work Smiley
newbie
Activity: 40
Merit: 0
Yea, this code really just runs full bore in my i5. I've got a massive heatsink (Scythe Mugen 2) on mine, so hitting all 4 cores at once isn't that bad. Biggest issue is that sound of my electric meter going WHIRRRRRR over the fans.... So I could see not-so-good colling solutions not working (this code runs the CPU at 100%, and most thermal solutions seem to be designed for a lower % utilization case to save on weight/cost...).

However, wanted to let people know that my ugly code is now in jgarzik's tree. You can verify it at:

https://github.com/jgarzik/cpuminer/tree/

I'll probably be making some minor changes every so often, and I'll put in pull requests when features are ready. The biggest x86 updates I'm looking at are support for the Sandy Bridge hardware, porting the code to NEON (which has the same rough structure as the X86_64 code), and possibly porting the code to x86_32 if anyone has SSE2 and only a 32-bit machine.

However, I have a bunch of other projects I'm interested in I want to get to first...
sr. member
Activity: 373
Merit: 250
Thanks, triple speed up here at intel core 2 T9550 2.66GHz Shocked

Update: weird, Lenovo T500 laptop goes 90 degrees centigrade CPU and switches off, twice in a hour.   Getting back to cpu miner, temperature stays at 80 degrees and it worked for weeks.

--
mtve

Yeah, this program runs my CPU (T9300 2.50 GHz so similar to yours) far hotter than any other application I've used, so I restrict it to one core which keeps it at about 70 degrees, and give it ample breathing room around the ventilation ports. 
newbie
Activity: 3
Merit: 0
Oh. Sandy Bridge...

...if there are willing testers, I could try to port the code to those chips. If I recall, the SB'es have 256 bit wide XMM registers, so we may be able to use those for an 8 way hash. However, don't have the hardware available to me.

Yup , intel AVX .

I`d be glad to try it out .
newbie
Activity: 40
Merit: 0
Oh. Sandy Bridge...

...if there are willing testers, I could try to port the code to those chips. If I recall, the SB'es have 256 bit wide XMM registers, so we may be able to use those for an 8 way hash. However, don't have the hardware available to me.
newbie
Activity: 3
Merit: 0
Well , thats a boost - up to 24Mh/s from 8Mh/s with a [email protected] .
newbie
Activity: 9
Merit: 0
Thanks, triple speed up here at intel core 2 T9550 2.66GHz Shocked

Update: weird, Lenovo T500 laptop goes 90 degrees centigrade CPU and switches off, twice in a hour.   Getting back to cpu miner, temperature stays at 80 degrees and it worked for weeks.

--
mtve
newbie
Activity: 57
Merit: 0
I'll give 50 BTC to the person that implements this code into jgarzik's miner. First 25 BTC when working code is released, the other 25 BTC when it's pushed into jgarzik's cpuminer git repository.

Please take a look at https://github.com/chromicant/cpuminer/tree/sse2

[...]

Please, if you like my work, donate at the address in my sig!
Great work, thanks a lot! I've sent you the first 25 BTC Smiley
full member
Activity: 238
Merit: 100
Interesting discussion about AMD and Intel differences. But it does not explain why the compiled 4way code is so much faster on AMD than on Intel. If you compare the best code for AMD and the best code for Intel, they are very close in term of MH/s per GHz.
newbie
Activity: 1
Merit: 0
Is there a way of using this software through a proxy such as Tor? I am behind an ISP NAT router that blocks port 8332.
newbie
Activity: 40
Merit: 0
Thanks.

OK, so here are two instruction-level benchmarks
of the ufasoft code, one on core i5, the other on
AMD phenom.

A super quick glance at it seems to indicate that bitwise
integer SSE ops (psrld and friends) are dirt cheap on Intel
chips and rather heavy on AMD Sad

Download the profile here:

profile.tar.bz2



Thanks for the data! The information is enlightening.

It looks like you get a (serious) stall if you access memory then try to use the register on AMD, and the rotates seem to be killer...which is quite a shock. The memory loads impact Intel as well, but it seems to be not as much...

I can definitely clean up some of the memory loads, but something makes me think that there's not much I can do to help the AMD case. Anyone know why you see impacts on the instruction pipeline due to a ps(r/l)ld? Is there a better instruction for this?
hero member
Activity: 540
Merit: 500
A big thank to chromicant ! My hashing speed have done more than x2 !

Here are all the steps i followed one a debian (testing) system to obtain the compiled version :
Code:
sudo apt-get install git automake1.7 libc6-dev libcurl4-openssl-dev

#YASM : http://pkgs.org/download/debian-sid/multimedia-main-amd64/yasm_1.1.0-0.0_amd64.deb.html
wget http://ftp.br.debian.org/debian-multimedia/pool/main/y/yasm/yasm_1.1.0-0.0_amd64.deb
sudo dpkg -i yasm_1.1.0-0.0_amd64.deb

git clone https://github.com/chromicant/cpuminer.git
cd cpuminer
git checkout remotes/origin/sse2
cd x86_64/
./build.sh
cd ..
./autogen.sh
./configure
make

Here is the compiled binary : http://dl.free.fr/tjbLyHclU

Hope it will be usefull :p
newbie
Activity: 40
Merit: 0
Thanks.

Someone sent me the output of Intel's compiler on the 4way code. One thing that is different between the sse2_64 and 4way code is that the SSE2 core loop is unrolled. Also, the sse2_64 code is just SHA-256, which means you have to call it twice to get the hash you want.

This may be leading to some overhead.
full member
Activity: 238
Merit: 100
The chromicant code gives 1.18 Mhash/s per 1 GHz per physical core on Intel Core i5 when utilizing 4 threads (2 physical cores, 2 virtual) as opposed to 0.89 Mhash/s in 4way version. The difference is almost 100% for a single-threaded run but apparently the multithreading can catch up a bit on the 4way ineffciencies.

On K10 AMD, the new code is 0.93 Mhash/s per 1GHz per physical core compared to 1.13 Mhash/s for 4way version of jgarzik's cpuminer compiled with Intel Compiler (icc). It seems that the 4way code compiled with icc is fastest for K10 architecture.
Pages:
Jump to: