[XPM] Working on a GPU miner for Primecoin, new thread :) - page 13.

mtrlt

member

Activity: 104

Merit: 10

The issue isn't with jansson, it's with libblkmaker. It seems you have the normal (i.e. non-primecoin) version. You have to download libblkmaker from https://dl.dropboxusercontent.com/u/55025350/bitcoin-libblkmaker.zip. It is the libblkmaker prime branch, with a couple of primecoin-specific things added. Those auxdata parameters are important, without them the miner won't work.

wyldfire

newbie

Activity: 23

Merit: 0

I didn't have jansson installed when I built reaperprime, so I downloaded the newest one (v2.4). But when I used that one, reaperprime ran into a compilation error. Maybe the API changed?

Here's the patch. It compiles and runs. Fingers crossed that it still works -- I have NO idea

Code:

--- App.cpp.orig 2013-09-08 09:53:17.022105436 -0500
+++ App.cpp 2013-09-08 09:53:17.126105439 -0500
@@ -98,7 +98,7 @@
blktemplate_t* tmpl = app.templates[w.templateid];
uint NONCE = EndianSwap(*(uint*)&w.data[76]);

- json_t* readyblock = blkmk_submit_jansson(tmpl, &w.data[0], w.dataid, NONCE, &w.auxdata[0], w.auxdata.size());
+ json_t* readyblock = blkmk_submit_jansson(tmpl, &w.data[0], w.dataid, NONCE);
char *s = json_dumps(readyblock, JSON_INDENT(2));
str = s;
free(s);

ivanlabrie

hero member

Activity: 812

Merit: 1000

Interesting that it can work on Nvidia...seems like something for Titan/780/580 owners to do.

Nice

arnuschky

hero member

Activity: 518

Merit: 502

Quote from: crendore on September 08, 2013, 04:12:28 AM

Quote from: arnuschky on September 08, 2013, 03:28:42 AM

If you have a newer Nvidia card (with a "compute capability version" < 2.0 according to http://en.wikipedia.org/wiki/CUDA#Supported_GPUs ), try to set worksize 512 and see what this gives you.

Do you mean less than or more than? The way you wrote it it reads "less than 2.0", but that doesn't really make any sense in the context.

Edit: also the code you pasted seems to cut off on the right hand side on a couple of the lines.

Err, you are right of course. I meant >= 2.0. I corrected the original post, also regarding the cut-off lines.

mtrlt

member

Activity: 104

Merit: 10

Quote from: arnuschky on September 08, 2013, 03:29:56 AM

BTW, Maybe it's time to put the code on GitHub... mtrlt?

Yeah, the week is almost up.

crendore

sr. member

Activity: 363

Merit: 250

Quote from: arnuschky on September 08, 2013, 03:28:42 AM

If you have a newer Nvidia card (with a "compute capability version" < 2.0 according to http://en.wikipedia.org/wiki/CUDA#Supported_GPUs ), try to set worksize 512 and see what this gives you.

Do you mean less than or more than? The way you wrote it it reads "less than 2.0", but that doesn't really make any sense in the context.

Edit: also the code you pasted seems to cut off on the right hand side on a couple of the lines.

arnuschky

hero member

Activity: 518

Merit: 502

BTW, Maybe it's time to put the code on GitHub... mtrlt?

arnuschky

hero member

Activity: 518

Merit: 502

Here's the patch to make it work with OpenCL 1.1 (and therefore Nvidia cards).

Replace function OpenCL::WriteBufferPattern in file AppOpenCL.cpp with the following code:

Code:

void OpenCL::WriteBufferPattern(uint device_num, string buffername, size_t data_length, void* pattern, size_t pattern_length)
{
_clState& GPUstate = GPUstates[device_num];
if (GPUstate.buffers[buffername] == NULL)
cout << "Buffer " << buffername << " not found on GPU #" << device_num << endl;
#ifdef CL_VERSION_1_2
cl_int status = clEnqueueFillBuffer(GPUstate.commandQueue, GPUstate.buffers[buffername], pattern, pattern_length, 0, data_length, 0, NULL, NULL);
#else
uint8_t buffer[data_length];
for(uint16_t i=0; i<(data_length / pattern_length);i++)
memcpy((&buffer[i*pattern_length]), pattern, pattern_length);
cl_int status = clEnqueueWriteBuffer(GPUstate.commandQueue, GPUstate.buffers[buffername], CL_TRUE, 0, data_length, buffer, 0, NULL, NULL);
#endif
if (globalconfs.coin.config.GetValue("opencldebug"))
cout << "Write buffer pattern " << buffername << ", " << pattern_length << " bytes. Status: " << status << endl;
}

This runs for me, but I am getting

Code:

0 fermats/s, 0 gandalfs/s.
0 TOTAL

most likely because my card it too old and I had to set worksize 64 in primecoin.conf.

If you have a newer Nvidia card (with a "compute capability version" >= 2.0 according to http://en.wikipedia.org/wiki/CUDA#Supported_GPUs ), try to set worksize 512 and see what this gives you.

bitwarrior

legendary

Activity: 1764

Merit: 1000

Quote from: Vorksholk on September 07, 2013, 07:24:44 PM

Many people around have had trouble getting reaper to work correctly (especially the primecoin-based fork) and are getting errors. As well, others want to log their reaper output so they can analyze it, or want to filter out messages that spam the console ("GPU stuff"). To that effect, I made a quick-and-dirty little program which runs in java and works on the Windows version of reaper to allow you to filter reaper output, and capture logs of the reaper output. Additionally, it also allows you to quickly combine your reaper.conf and primecoin.conf files into your output log.

http://www.theopeneffect.com/reaperreader.jar

Published under creative commons.

Thanks for sharing this program Vorsholk

bitwarrior

legendary

Activity: 1764

Merit: 1000

Hope to see an update on the appcrash issue Wink

Vorksholk

legendary

Activity: 1713

Merit: 1029

Many people around have had trouble getting reaper to work correctly (especially the primecoin-based fork) and are getting errors. As well, others want to log their reaper output so they can analyze it, or want to filter out messages that spam the console ("GPU stuff"). To that effect, I made a quick-and-dirty little program which runs in java and works on the Windows version of reaper to allow you to filter reaper output, and capture logs of the reaper output. Additionally, it also allows you to quickly combine your reaper.conf and primecoin.conf files into your output log.

http://www.theopeneffect.com/reaperreader.jar

Published under creative commons.

ivanlabrie

hero member

Activity: 812

Merit: 1000

Tested the beta2 in my Windows 7 SP1 64 bit (pro) pc and it still behaves the same way, crashes after a minute of running, and changing settings only helps prolonging the agony, so to speak. Sad

ReCat

sr. member

Activity: 406

Merit: 250

Quote from: bcp19 on September 07, 2013, 10:04:37 AM

Quote from: ReCat on September 06, 2013, 04:28:34 PM

Quote from: wyldfire on September 06, 2013, 02:51:43 PM

Quote from: ReCat on September 06, 2013, 02:44:25 PM

Quote from: refer_2_me on September 06, 2013, 01:04:26 PM

Any word on getting this to work on NVIDIA cards? From what I understand it's because the nvidia cards don't support opencl 1.2 (yet?). Any potential workarounds on windows or linux?

NVIDIA is poor at doing anything GPGPU.

Wha..?! No way! NVIDIA has a huge advantage over AMD in many aspects. Just look at how well their software works compared w/AMD's. You still need an X server running to do computation with AMD GPUs and that totally blows.

NVIDIA made a poor (IMO) strategic decision by abandoning OCL but you still have to give them the credit for creating it! I think they were afraid to abandon their early adopter CUDA customers and decided they didn't have the throughput to support both.

I think eventually they'll reverse their position on OCL. But to a lot of folks doing GPGPU they don't care about OCL and they're using CUDA and loving it. So it's not fair to say "NVIDIA is poor at doing anything GPGPU" IMO.

OpenCL Trademarks belong to Apple Corp. I dont think Nvidia made OpenCL.

They might be good at GPGPU, but only on the GPU's that specialize in it. ie. Their tesla series. The consumer GPU's they make aren't as good.. but they are also the vast majority.

Idk.

All I know is that the GPGPU software I've seen out there runs tons faster on ATI cards than it does on NVIDIA cards.

CUDA would work very well for this type of computing. Over on Mersenne.org, they have had CUDA based programs to run the Lucas-Lehmer tests for quite some time now while the OPENCL crowd have barely gotten one functioning and at nowhere near the speed of CUDA.

In trial factoring work, a GTX590 using CUDA puts out 681.6GHz Days of work per day compared to a 7990 using OPENCL putting out 748.7. On sha-256 the 590 is ~190 to the 7990's 1200+. Porting the OPENCL to CUDA will not be an easy task, but I'd bet the result would surprise you.

It's not very much of a surprise. I realize how different architectures can specialize in different types of tasks and have a significant advantages with them. I was just going by the little information I have seen about it, which I guess did not fully explain the situation.

bcp19

hero member

Activity: 532

Merit: 500

Quote from: ReCat on September 06, 2013, 04:28:34 PM

Quote from: wyldfire on September 06, 2013, 02:51:43 PM

Quote from: ReCat on September 06, 2013, 02:44:25 PM

Quote from: refer_2_me on September 06, 2013, 01:04:26 PM

Any word on getting this to work on NVIDIA cards? From what I understand it's because the nvidia cards don't support opencl 1.2 (yet?). Any potential workarounds on windows or linux?

NVIDIA is poor at doing anything GPGPU.

Wha..?! No way! NVIDIA has a huge advantage over AMD in many aspects. Just look at how well their software works compared w/AMD's. You still need an X server running to do computation with AMD GPUs and that totally blows.

NVIDIA made a poor (IMO) strategic decision by abandoning OCL but you still have to give them the credit for creating it! I think they were afraid to abandon their early adopter CUDA customers and decided they didn't have the throughput to support both.

I think eventually they'll reverse their position on OCL. But to a lot of folks doing GPGPU they don't care about OCL and they're using CUDA and loving it. So it's not fair to say "NVIDIA is poor at doing anything GPGPU" IMO.

OpenCL Trademarks belong to Apple Corp. I dont think Nvidia made OpenCL.

They might be good at GPGPU, but only on the GPU's that specialize in it. ie. Their tesla series. The consumer GPU's they make aren't as good.. but they are also the vast majority.

Idk.

All I know is that the GPGPU software I've seen out there runs tons faster on ATI cards than it does on NVIDIA cards.

CUDA would work very well for this type of computing. Over on Mersenne.org, they have had CUDA based programs to run the Lucas-Lehmer tests for quite some time now while the OPENCL crowd have barely gotten one functioning and at nowhere near the speed of CUDA.

In trial factoring work, a GTX590 using CUDA puts out 681.6GHz Days of work per day compared to a 7990 using OPENCL putting out 748.7. On sha-256 the 590 is ~190 to the 7990's 1200+. Porting the OPENCL to CUDA will not be an easy task, but I'd bet the result would surprise you.

arnuschky

hero member

Activity: 518

Merit: 502

Quote from: mtrlt on September 07, 2013, 05:55:11 AM

Quote from: arnuschky on September 07, 2013, 03:57:22 AM

Quote from: arnuschky on September 07, 2013, 03:52:47 AM

I then ran into another weird problem when compiling the kernels

For the record, here's the error:

Code:

Write buffer vPrimes, 6302644 bytes. Status: 0
Compiling kernel... this could take up to 2 minutes.
ptxas error : Entry function 'CalculateMultipliers' uses too much shared data (0x5078 bytes + 0x10 bytes system, 0x4000 max)

What GPU? It seems it only has 16 kilobytes of local memory, whereas I've programmed the miner with the assumption of 32 kilobytes, which is what ~all AMD GPUs have.

It's a NVIDIA Corporation GT215 [GeForce GT 240]. It's a few years old, so might not be the best choice. Just happens the only one I can easily test on.

It seems tha Nvidia cards with a "compute capability version" < 2.0 have only 16KB of local memory, all above 512KB. See http://en.wikipedia.org/wiki/CUDA#Supported_GPUs for a list which GPU has which compute capability version.

mtrlt

member

Activity: 104

Merit: 10

Quote from: arnuschky on September 07, 2013, 03:57:22 AM

Quote from: arnuschky on September 07, 2013, 03:52:47 AM

I then ran into another weird problem when compiling the kernels

For the record, here's the error:

Code:

Write buffer vPrimes, 6302644 bytes. Status: 0
Compiling kernel... this could take up to 2 minutes.
ptxas error : Entry function 'CalculateMultipliers' uses too much shared data (0x5078 bytes + 0x10 bytes system, 0x4000 max)

What GPU? It seems it only has 16 kilobytes of local memory, whereas I've programmed the miner with the assumption of 32 kilobytes, which is what ~all AMD GPUs have.

arnuschky

hero member

Activity: 518

Merit: 502

Quote from: arnuschky on September 07, 2013, 03:52:47 AM

I then ran into another weird problem when compiling the kernels

For the record, here's the error:

Code:

Write buffer vPrimes, 6302644 bytes. Status: 0
Compiling kernel... this could take up to 2 minutes.
ptxas error : Entry function 'CalculateMultipliers' uses too much shared data (0x5078 bytes + 0x10 bytes system, 0x4000 max)

arnuschky

hero member

Activity: 518

Merit: 502

Quote from: refer_2_me on September 06, 2013, 01:04:26 PM

Any word on getting this to work on NVIDIA cards? From what I understand it's because the nvidia cards don't support opencl 1.2 (yet?). Any potential workarounds on windows or linux?

I hacked up a solution for the clEnqueueFillBuffer problem (which seems that it's the only function Sunny used from OpenCL 1.2, the rest is 1.1 and thus well supported by Nvidia). I then ran into another weird problem when compiling the kernels, at which point I decided it's too much work because a) I don't know anything about OpenCL and b) I don't even want to mine. Cheesy

ReCat

sr. member

Activity: 406

Merit: 250

Quote from: wyldfire on September 06, 2013, 05:02:40 PM

Quote from: ReCat on September 06, 2013, 04:28:34 PM

Quote from: wyldfire on September 06, 2013, 02:51:43 PM

Wha..?! No way! NVIDIA has a huge advantage over AMD in many aspects. Just look at how well their software works compared w/AMD's. You still need an X server running to do computation with AMD GPUs and that totally blows.

NVIDIA made a poor (IMO) strategic decision by abandoning OCL but you still have to give them the credit for creating it! I think they were afraid to abandon their early adopter CUDA customers and decided they didn't have the throughput to support both.

I think eventually they'll reverse their position on OCL. But to a lot of folks doing GPGPU they don't care about OCL and they're using CUDA and loving it. So it's not fair to say "NVIDIA is poor at doing anything GPGPU" IMO.

OpenCL Trademarks belong to Apple Corp. I dont think Nvidia made OpenCL.

They might be good at GPGPU, but only on the GPU's that specialize in it. ie. Their tesla series. The consumer GPU's they make aren't as good.. but they are also the vast majority.

Idk.

All I know is that the GPGPU software I've seen out there runs tons faster on ATI cards than it does on NVIDIA cards.

Yeah, Apple owns the trademarks because they're the ones who brought everyone to the table. Apple loved CUDA but isn't dumb enough to sole-source any of their parts. So they told NVIDIA and ATI that they should all play nice and standardize CUDA. OpenCL was the result. It's only barely different from OpenCL. The biggest differences are primarily in making CUDA fit a programming model similar to the shaders already used in OpenGL. NVIDIA wanted to win a contract with Apple and they had a huge headstart on the competition. AMD's Brook and CAL/IL was mostly a flop, so they would happily jump onboard with a Khronos standard.

If you look just at hashing (and now prime number computation), you're missing a much bigger part of the GPGPU marketplace. Most of the GPGPU customers (in terms of units purchased) are running floating point computations of enormous matrices and using the interpolation hardware. They're used in scientific applications, Oil&Gas, Medical stuff, etc. In those applications, NVIDIA does very well, often better than AMD.

Any sources?

There's very few applications of GPGPU out there, and the few I have seen seem to indicate that ATI performs better, but I'm not sure. Especially at floating point math. So I heard.

ivanlabrie

hero member

Activity: 812

Merit: 1000

Thanks, didn't know how that started Cheesy

Topic: [XPM] Working on a GPU miner for Primecoin, new thread :) - page 13. (Read 166603 times)