Pages:
Author

Topic: Ufasoft Miner - Windows/Linux, x86/x64, SSE2/OpenCL, Open Source - page 31. (Read 631117 times)

newbie
Activity: 46
Merit: 0
Ok, so why doesnt the user just pop open Task Manager and see which process uses them resources.
I already did that and ufasoft miner was using like 80% but now it doesn't really matter as i tested again and cpuminer with 1 thread+affinity set to the core 0 it's taking only 50% of the memory as it should.
full member
Activity: 126
Merit: 100
Ok, so why doesnt the user just pop open Task Manager and see which process uses them resources.
hero member
Activity: 807
Merit: 500
Why are we talking about cgminer here?  This is the Ufasoft thread.
Brain salad.  I'll edit that post to say what I meant.
sr. member
Activity: 378
Merit: 250
I still think it's normal, one core is hogged with mining, the other is doing system related tasks.
The system related task take 3-4% always when i'm not mining and so+i don't play any games that would use resources and so.
To be clear, does cgminer show 90% processor usage, or is 90% processor usage shown in total with cgminer showing 50% cpu usage?  There are no system related tasks using processor as cgminer.
Why are we talking about cgminer here?  This is the Ufasoft thread.
hero member
Activity: 807
Merit: 500
I still think it's normal, one core is hogged with mining, the other is doing system related tasks.
The system related task take 3-4% always when i'm not mining and so+i don't play any games that would use resources and so.
To be clear, does bitcoin-miner show 90% processor usage, or is 90% processor usage shown in total with bitcoin-miner showing 50% cpu usage?  There are no system related tasks using processor as bitcoin-miner, and in my experience, it has practically no overhead, so if it shows 90% something is wrong in this particular instance that is not other system related tasks but specifically related to the miner, either not using one thread or having far more overhead than typical.
newbie
Activity: 46
Merit: 0
The system related task take 3-4% always when i'm not mining and so+i don't play any games that would use resources and so.
full member
Activity: 126
Merit: 100
I still think it's normal, one core is hogged with mining, the other is doing system related tasks.
newbie
Activity: 46
Merit: 0
Yes i know, it is mining at half of the speed but it still uses 90% of the cpu power....
After you have specified the number of threads, set(if possible use task manager) the cpu affinity to be at core X or whichever you choose. And that's basically it.
I have already set the affinity from the guiminer.
There's a slight possibility it's not working via guiminer. Do it via Task Manager. Also, how many cores do you have?
Only 2 cores.
Then what did you expect? I have a quad core, so if i use one thread that's only 25% of my CPU. For you, it will be 50%. Basically, that is why you have the so called "high cpu usage"
Maybe he expects 50%. That's what I get on my dual-core systems with one thread, and that is what he should be getting based on your post.  The 90% that he is getting is significantly higher than 50%  I hope someone congratulates me on making a pot as useful as yours...
Yes that's my point...
sr. member
Activity: 378
Merit: 250
Well, at first glance, I see that there's a large number of back and forth copies between xmm0 and the 32 bit registers.  You might be able to store each 32-bit register value into a portion of the 128-bit and then pull them all out at once when they're needed.
The next thing I'm about to propose is extremely radical and I have no idea how it would turn out.  While the values are loaded into the 128 bit registers, you can perform horizontal math as needed all at once and then export the values to the next step.  I suppose what I'm suggesting is taking a little more advantage of vectorizing and what it can do now.  This will cut down greatly on the number of instructions needed and
I also don't see a single prefetch in the code.  I find it difficult to believe that the code could be so optimal that it doesn't need a prefetch.   Tongue  Intels might be able to fair well, but AMDs would benefit the most.
Also, I'm seeing a lot of movsd commands used.  One here and there isn't so bad, but I saw three in a row which should probably be replaced with another command likely involving an xmm register.
Also, I see this movd from a 32-bit register into a 128-bit register before pshufd and then putting back into a 32-bit register.  I could be wrong, but you MIGHT be able to take advantage of pshufb in a few of these cases to cut down on the unneeded instructions.

Now, I can't rewrite the code to take advantage of these since the only decompiler I have is objconv and it sort of sucks on its created asm files (the YASM compatible asm files it generates isn't even YASM compatible).  But I think you've got the jist of what I'm talking about.  The 128-bit registers are capable of doing so much more than what I've seen from a number of SHA256 programs.  Taking advantage of these will really help speed things up.
sr. member
Activity: 378
Merit: 250
Just a random thought, but might you be willing to add further SSE optimizations such as SSE3, SSSE3, SSE4.1, SSE4a ect. which can be enabled based upon the detected CPU?  I've seen modest improvements enabling some of these through assembly based on your code and was thinking you might like to try them out and see what happens.  Most notably, the use of non-temporal moves from memory to cache increased the hash rate by about .5 that I recall.  I'm not certain if SSE3 optimizations will be helpful though since they mainly focus on horizontal math, though some of the bit rotations could possibly be translated to addition problems to take place simultaneously if the rotations are small.  Say, for example, that you have a rotation of 3 to take place for 4 numbers; by using horizontal math, you can take those 4 instructions and shrink it down to 3.  Now, I don't know the cost of doing so, but the compiler will determine that.

I have observed other SSE instruction and have not found any useful for SHA-2 algorithm. Please suggest ASM code snippetes if you think it optimizes the performance.

I'll take a look at it.  But one thing that I notice that helps most of the time with other code (I haven't disassembled this yet) is the use of non-temporal moves on larger sets of data.  Granted that this insinuates that the data will only be read once, but it helps a little when used right.
sr. member
Activity: 404
Merit: 251
Just a random thought, but might you be willing to add further SSE optimizations such as SSE3, SSSE3, SSE4.1, SSE4a ect. which can be enabled based upon the detected CPU?  I've seen modest improvements enabling some of these through assembly based on your code and was thinking you might like to try them out and see what happens.  Most notably, the use of non-temporal moves from memory to cache increased the hash rate by about .5 that I recall.  I'm not certain if SSE3 optimizations will be helpful though since they mainly focus on horizontal math, though some of the bit rotations could possibly be translated to addition problems to take place simultaneously if the rotations are small.  Say, for example, that you have a rotation of 3 to take place for 4 numbers; by using horizontal math, you can take those 4 instructions and shrink it down to 3.  Now, I don't know the cost of doing so, but the compiler will determine that.

I have observed other SSE instruction and have not found any useful for SHA-2 algorithm. Please suggest ASM code snippetes if you think it optimizes the performance.
sr. member
Activity: 254
Merit: 250
Is there a way I can get the previous version which doesn't trigger the virus alert?
sr. member
Activity: 378
Merit: 250
Just a random thought, but might you be willing to add further SSE optimizations such as SSE3, SSSE3, SSE4.1, SSE4a ect. which can be enabled based upon the detected CPU?  I've seen modest improvements enabling some of these through assembly based on your code and was thinking you might like to try them out and see what happens.  Most notably, the use of non-temporal moves from memory to cache increased the hash rate by about .5 that I recall.  I'm not certain if SSE3 optimizations will be helpful though since they mainly focus on horizontal math, though some of the bit rotations could possibly be translated to addition problems to take place simultaneously if the rotations are small.  Say, for example, that you have a rotation of 3 to take place for 4 numbers; by using horizontal math, you can take those 4 instructions and shrink it down to 3.  Now, I don't know the cost of doing so, but the compiler will determine that.

Just a thought.
sr. member
Activity: 462
Merit: 250
I heart thebaron
Ufasofft i was wondering if you could make an build that uses only half of the power that computer has, like if it has 2 CPS to use just 1 or so?
I would like this low build because it will allow you're computer to do other things when you are using the miner whitout any lag or so.
You can already do this using CPU AFFINITY if you are using Guiminer as a front end.
Looks like I was wrong, although not entirely.

On my Core2Duo/Quad systems, I need to use the -t flag.

On my Core i7 systems, I can use the Guiminer CPU affinity to set the number of threads in real time, without the need for a stop/start.

On my Dual Xeon X5500 (i7 platform/LGA1366 Dual), I can watch each of it's 16 threads come to life or die when switched either on or off using Guiminer's affinity checkboxes.....same with my i7 920 8 thread systems.

I have nothing AMD-CPU-based to test......
hero member
Activity: 807
Merit: 500
Yes i know, it is mining at half of the speed but it still uses 90% of the cpu power....
After you have specified the number of threads, set(if possible use task manager) the cpu affinity to be at core X or whichever you choose. And that's basically it.
I have already set the affinity from the guiminer.
There's a slight possibility it's not working via guiminer. Do it via Task Manager. Also, how many cores do you have?
Only 2 cores.
Then what did you expect? I have a quad core, so if i use one thread that's only 25% of my CPU. For you, it will be 50%. Basically, that is why you have the so called "high cpu usage"
Maybe he expects 50%. That's what I get on my dual-core systems with one thread, and that is what he should be getting based on your post.  The 90% that he is getting is significantly higher than 50%  I hope someone congratulates me on making a pot as useful as yours...
full member
Activity: 126
Merit: 100
Yes i know, it is mining at half of the speed but it still uses 90% of the cpu power....
After you have specified the number of threads, set(if possible use task manager) the cpu affinity to be at core X or whichever you choose. And that's basically it.
I have already set the affinity from the guiminer.
There's a slight possibility it's not working via guiminer. Do it via Task Manager. Also, how many cores do you have?
Only 2 cores.
Then what did you expect? I have a quad core, so if i use one thread that's only 25% of my CPU. For you, it will be 50%. Basically, that is why you have the so called "high cpu usage"
newbie
Activity: 46
Merit: 0
Yes i know, it is mining at half of the speed but it still uses 90% of the cpu power....
After you have specified the number of threads, set(if possible use task manager) the cpu affinity to be at core X or whichever you choose. And that's basically it.
I have already set the affinity from the guiminer.
There's a slight possibility it's not working via guiminer. Do it via Task Manager. Also, how many cores do you have?
Only 2 cores.
sr. member
Activity: 362
Merit: 250
Binary for Debian 5.0 "Lenny" (it may work in other Linux distro's).
full member
Activity: 126
Merit: 100
Yes i know, it is mining at half of the speed but it still uses 90% of the cpu power....
After you have specified the number of threads, set(if possible use task manager) the cpu affinity to be at core X or whichever you choose. And that's basically it.
I have already set the affinity from the guiminer.
There's a slight possibility it's not working via guiminer. Do it via Task Manager. Also, how many cores do you have?
sr. member
Activity: 362
Merit: 250
New error:

Code:
kanotix@Kanotix:~/ufasoft_bitcoin-miner-0.10$ ./bitcoin-miner
terminate called after throwing an instance of 'std::runtime_error'
  what():  locale::facet::_S_create_c_locale name not valid
Aborted
kanotix@Kanotix:~/ufasoft_bitcoin-miner-0.10$ LC_ALL=C ./bitcoin-miner
bitcoin-miner 0.10  Copyright (c) 2011 Ufasoft  http://ufasoft.com/open/bitcoin
Usage: bitcoin-miner [-a seconds] [-g yes|no] [-t threads] [-v] [-o url] [-x proxy] -u user -p password
Options:
  -a time between getwork requests 1..60, default 15
  -g yes|no    set 'no' to disable GPU, default 'yes'
  -h           this help
  -o url       in form http://server.tld:port/path, by default http://127.0.0.1:8332
  -t Number of threads for CPU mining, by default is number of CPUs (Cores), 0 - disable CPU mining
  -v           Verbose output
  -x type=host:port   Use HTTP or SOCKS proxy. Examples: -x http=127.0.0.1:3128, -x socks=127.0.0.1:1080

kanotix@Kanotix:~/ufasoft_bitcoin-miner-0.10$

http://www.google.ru/search?sourceid=chrome&ie=UTF-8&q=locale%3A%3Afacet%3A%3A_S_create_c_locale+name+not+valid
Pages:
Jump to: