Pages:
Author

Topic: New demonstration CPU miner available - page 6. (Read 386323 times)

-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 12, 2011, 06:42:39 PM
Now that I've got a meaningful total throughput counter, I can confirm that running number of threads == number of logical processors on i7 is actually faster than even carefully bound number of threads == number of physical cores. This means that the default behaviour of minerd with my modifications which chooses how many threads to start up will give you the highest throughput.

My cumulative changes are here till jgarzik pulls them if anyone's interested:
https://github.com/ckolivas/cpuminer
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 11, 2011, 11:22:25 AM
As a user I would expect that total Mhash/s were the sum of the khash/s of all threads—in your example it would be always around 12 Mhash/s (since each thread works at a consistent pace of nearly 3000 khash/s). That's not the case, so I guess the algorithm is different.

That was the miner just starting up. After a while it converges more and more.
legendary
Activity: 1974
Merit: 1030
June 11, 2011, 11:20:36 AM
As a user I would expect that total Mhash/s were the sum of the khash/s of all threads—in your example it would be always around 12 Mhash/s (since each thread works at a consistent pace of nearly 3000 khash/s). That's not the case, so I guess the algorithm is different.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 11, 2011, 11:06:32 AM
Hmm perhaps "solved" isn't quite the right word there for accepted blocks.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 11, 2011, 10:41:35 AM
Hi Jeff, et al.

I've made some modifications to the output to generate a total throughput counter since there was confusion with the multiple threads issue, cleaned up the output a little, and added a solution counter. I also dropped a lot of output when only one thread is in use. Please pull the changes into your tree if you agree with the changes.

it now looks like this:

[2011-06-12 01:37:26] [Total: 8.40 Mhash/sec] [thread 3: 109989796 hashes, 3075 khash/sec] [Solved: 0]
[2011-06-12 01:37:26] PROOF OF WORK RESULT: true (yay!!!)
[2011-06-12 01:37:47] [Total: 8.45 Mhash/sec] [thread 0: 183024176 hashes, 3090 khash/sec] [Solved: 1]
[2011-06-12 01:37:48] [Total: 9.89 Mhash/sec] [thread 1: 183024176 hashes, 3085 khash/sec] [Solved: 1]
[2011-06-12 01:37:48] [Total: 11.31 Mhash/sec] [thread 2: 183024176 hashes, 3082 khash/sec] [Solved: 1]
[2011-06-12 01:38:27] [Total: 9.72 Mhash/sec] [thread 3: 183316328 hashes, 3019 khash/sec] [Solved: 1]
[2011-06-12 01:38:50] [Total: 9.54 Mhash/sec] [thread 0: 186126280 hashes, 2969 khash/sec] [Solved: 1]
[2011-06-12 01:38:50] [Total: 10.52 Mhash/sec] [thread 1: 186126280 hashes, 2989 khash/sec] [Solved: 1]
[2011-06-12 01:38:51] [Total: 11.50 Mhash/sec] [thread 2: 186126280 hashes, 3007 khash/sec] [Solved: 1]

Thanks.
newbie
Activity: 51
Merit: 0
June 11, 2011, 01:24:28 AM
Im now discovering a different issue Tongue

minerd.exe --algo cryptopp_asm32 --s 2 --url http://btcguild.com/ --userpass xxxx:xxx this runs when i tried it on deepbit, local miner and a few others....

however on btcguild i get the following error

[2011-06-12 10:02:16] 1 miner threads started, using SHA256 'cryptopp_asm32' algorithm.
[2011-06-12 10:02:20] JSON decode failed(1): '[' or '{' expected near '<'
[2011-06-12 10:02:20] json_rpc_call failed, retry after 30 seconds


its only happening with btcguild though, not any of the other mining pools i tested with.

anyone come accross this before ??

Win7
Intel Dual Core
Nvidia GTX470OC
full member
Activity: 373
Merit: 100
June 10, 2011, 12:22:46 PM
You need to redirect stderr. According to http://www.techtalkz.com/windows-xp/27452-redirect-stdout-stderr-windows-shell.html, "2>" will do that.
newbie
Activity: 51
Merit: 0
June 10, 2011, 11:28:05 AM
is there a working flag for outputting to a log file for this using the windows binaries?

I've tried
--f
-f
>

with full path, just a file name, with extension, without ext, with "" "" and also without.

I think i've worn my fingers out doing this
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 10, 2011, 07:45:13 AM
Bouncing work from one CPU to another will decrease throughput a fair amount. The cost of that should not be discounted. Anyway feel free to try...
sr. member
Activity: 378
Merit: 250
June 10, 2011, 07:40:01 AM
I believe there is one i7 processor out there that doesn't have a shared cache.  Each core on it has its own dedicated 2M cache.  However, if you could detect the shared cache, you could also detect the unshared cache and end up with what I suggested above that I had no idea how to start coding for a similar reason.  But anyhow, if you're worried about the kernels having sys information in different locations, it's a simple if-else-else statement that you'll be using to find or not find it.
But, as I suggested, using the two cores sharing the same cache, you can have each core perform different portions of the same work.  While one has completed half of the equations, you can send the work off to the next core for completion and get the next work.  With each core doing a specific task, you can simplify the code for unrolled loops and pass fewer instructions to the processor I believe.  Fewer instructions generally means less overhead.
So a part of the problem is the first getwork.  Half as many threads will be running until half of the work is completed and passed to the next core.  I don't know a way around it since I would think the SHA equation to only be done according to the order of operations (Parenthesis, exponents, multiple/divide...).  Actually, now that I think about it, if a matrix calculation comes into play, that would have its own unique optimizations...eep digressing!  But yeah, how to split the cores to keep them from doing redundant work is the biggest issue here.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 09, 2011, 04:21:46 PM
Actually that's not quite true. There was one more complex setup that produced higher throughput. If I set the number of mining threads to the total number of cores only, and then bound each worker thread to all the logical CPUs that shared cache, the throughput was a bit better again. However which cores shares threads is even harder to detect reliably without digging around in /sys and the format has changed between kernels. Furthermore, on my i7, the shared caches aren't even sequential numbers so binding threads to sequential logical CPUs was worse (i.e. CPUs 0 and 2 shared caches and 1 and 3 and so on).
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 09, 2011, 04:00:01 PM
Actually Con's patch is rather simplified -- you want the number of cores, not the total number of processors (which might include HyperThread siblings).

If you use all cores + HT, then your hash performance is slower than cores alone.



I tested total threads vs total cores and got slightly more with total threads on i7. Plus there is no particularly easy and reliable way to detect cores versus threads. So total "processors" actually generated more in my testing.
member
Activity: 98
Merit: 13
June 09, 2011, 01:46:44 PM
Actually Con's patch is rather simplified -- you want the number of cores, not the total number of processors (which might include HyperThread siblings).

If you use all cores + HT, then your hash performance is slower than cores alone.

full member
Activity: 373
Merit: 100
June 09, 2011, 01:16:47 PM
I agree with your changes.  I think the default should be the maximum number of CPUs on board.  The default wasn't the SSE2_64?  Hmm, good thing I set that then.
But I don't think I've ever seen Via padlock not crash the program on start.  Does it actually have a use?  If no, it could be edited out until then.
I only changed the usage text, no "proper" code. SSE2_64 is default on 64bit Linux since the last commit, therefore the usage text was wrong, so I wrote a patch to amend that.
sr. member
Activity: 378
Merit: 250
June 09, 2011, 12:58:58 PM
I agree with your changes.  I think the default should be the maximum number of CPUs on board.  The default wasn't the SSE2_64?  Hmm, good thing I set that then.
But I don't think I've ever seen Via padlock not crash the program on start.  Does it actually have a use?  If no, it could be edited out until then.
full member
Activity: 373
Merit: 100
June 09, 2011, 12:45:07 PM
The new changes seem to work rather well on my 64bit linux. However, since some of the usage texts are now incorrect, I'd like to suggest the following patch:
Code:
--- cpuminer-git/cpu-miner.c    2011-06-09 16:58:43.137777002 +0200
+++ cpuminer_build/cpu-miner.c  2011-06-09 19:25:52.087777001 +0200
@@ -140,24 +140,28 @@
          "(-h) Display this help text" },
 
        { "config FILE",
-         "(-c FILE) JSON-format configuration file (default: none)\n"
+         "(-c FILE) JSON-format configuration file (default: none)\n\t"
          "See example-cfg.json for an example configuration." },
 
        { "algo XXX",
          "(-a XXX) Specify sha256 implementation:\n"
-         "\tc\t\tLinux kernel sha256, implemented in C (default)"
+#ifdef WANT_X8664_SSE2
+         "\t\tc\t\tLinux kernel sha256, implemented in C"
+#else
+         "\t\tc\t\tLinux kernel sha256, implemented in C (default)"
+#endif
 #ifdef WANT_SSE2_4WAY
-         "\n\t4way\t\ttcatm's 4-way SSE2 implementation"
+         "\n\t\t4way\t\ttcatm's 4-way SSE2 implementation"
 #endif
 #ifdef WANT_VIA_PADLOCK
-         "\n\tvia\t\tVIA padlock implementation"
+         "\n\t\tvia\t\tVIA padlock implementation"
 #endif
-         "\n\tcryptopp\tCrypto++ C/C++ implementation"
+         "\n\t\tcryptopp\tCrypto++ C/C++ implementation"
 #ifdef WANT_CRYPTOPP_ASM32
-         "\n\tcryptopp_asm32\tCrypto++ 32-bit assembler implementation"
+         "\n\t\tcryptopp_asm32\tCrypto++ 32-bit assembler implementation"
 #endif
 #ifdef WANT_X8664_SSE2
-         "\n\tsse2_64\t\tSSE2 implementation for x86_64 machines"
+         "\n\t\tsse2_64\t\tSSE2 implementation for x86_64 machines (default)"
 #endif
          },
 
@@ -191,7 +195,11 @@
 #endif
 
        { "threads N",
+#ifdef WIN32
          "(-t N) Number of miner threads (default: 1)" },
+#else
+         "(-t N) Number of miner threads (default: #available cpus)" },
+#endif
 
        { "url URL",
          "URL for bitcoin JSON-RPC server "
@@ -753,7 +761,7 @@
                struct option_help *h;
 
                h = &options_help[i];
-               printf("--%s\n%s\n\n", h->name, h->helptext);
+               printf("--%s\n\t%s\n\n", h->name, h->helptext);
        }
 
        exit(1);

Summary of changes:
  • tabs before description items to improve readability
  • the default algorithm is now marked correctly
  • on Linux, it now says "default: #available cpus" instead of just "default: 1" (if anyone knows how to insert the actual numer here, be my guest)
sr. member
Activity: 378
Merit: 250
June 09, 2011, 11:35:37 AM
Actually, yes it was.  However, I noticed very little difference in the speeds of the two.  But kudos!  It compiles and runs.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 09, 2011, 09:26:00 AM
Is this with the version I posted? The CPU affinity that I added can have quite strong effects on certain hardware and kernels (comparing different linux kernels and windows especially) if the kernels aren't particularly good at soft affinity effects. Having spent most of my spare time coding CPU schedulers I obsess about these things...
sr. member
Activity: 378
Merit: 250
June 09, 2011, 09:19:37 AM
Oh, I meant self-compiled vs. precompiled.  The repository version is 2.0.1 as of now under synaptic (when you add the repository to the list).  So the problem is either with 2.0.1 or updating the code to allow for 2.0.1 to be used.  It looks like jansson was wanting some extra commands that weren't required previously or just had a different syntax.  But it's unimportant since a compatible version is included with cpuminer anyway.
On a semi-related note, I'm achieving an entire Mhash/sec higher than I was in Windows using bitcoinminer which appeared to have the fastest rate on a Winbox via CPU usage at the time.  I don't know what effects including -march=core2 -mmmx -msse -msse3 -mssse3 and -fomit-frame-pointer has had on my compilation, but the program is working flawlessly so far.  I'll try a comparison to see if there's any difference in hashes/sec with different options outside of -march=native/core2.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 09, 2011, 08:57:35 AM
Yeah when I said your own version, I hadn't really envisioned that you'd programmed your own jansson library Wink

Glad to see you got it going. Now see how you go Smiley
Pages:
Jump to: