Author

Topic: Vanitygen: Vanity bitcoin address generator/miner [v0.22] - page 179. (Read 1152885 times)

legendary
Activity: 1680
Merit: 1035
Safe mode does the same ting (never finds anything)
Running oclvanitygen -d0 1 finds something instantly
Running oclvanitygen -d0 -vkr -w128 "^1B" returns
Quote
Device: Cypress
Vendor: Advanced Micro Devices, Inc. (1002)
Driver: CAL 1.4.1417 (VM)
Profile: FULL_PROFILE
Version: OpenCL 1.1 AMD-APP-SDK-v2.4 (650.9)
Max compute units: 14
Max workgroup size: 256
Global memory: 838860800
Max allocation: 209715200
OpenCL compiler flags: -DDEEP_PREPROC_UNROLL -DVERY_EXPENSIVE_BRANCHES -DDEEP_VLIW -DAMD_BFI_INT
Loading kernel binary ce56d53554537b697380a368231a1646.oclbin
Grid size: 896x512
Modular inverse: 3584 threads, 128 ops each
WARNING: Using CPU pattern matcher
GPU idle: 99.34%
GPU idle: 99.75%

followed by more GPU idles. Not sure if that CPU matcher is right. Maybe that -r isn't working for me?  Undecided
full member
Activity: 140
Merit: 430
Firstbits: 1samr7
Though I'm noticing for simple keys, like 1pop for example, vanitygen64 finds a key almost instantly, wile oclvanitygen seems to flash estimated time remaining, and then sit there, just counting up the total. What's preventing it from finding the key I wonder? Is it only useful for more difficult keys?

Oh no, it definitely shouldn't be doing this.

Ideally it would have a comprehensive test suite that you could run and validate that the OpenCL kernel is running correctly.  Today, there is something less comprehensive.  You can try:

oclvanitygen -d0 -vkr -w128 "^1B"

It should scroll a list of addresses fairly quickly.  If any of them don't start with 1B, then it's producing incorrect addresses on the GPU.  If this test passes, something is wrong with the GPU prefix checker.

Edit: Does it also fail to find any matches to 1Pop in safe mode?
legendary
Activity: 1680
Merit: 1035
New version posted.  This includes a lot of changes.

  • Oclvanitygen #pragma unroll has been removed, for Rassah.  If you experienced a crash with a Radeon card, but safe mode (-S) worked, this should fix the problem.


Runs great without crashing now! Thanks!
Though I'm noticing for simple keys, like 1pop for example, vanitygen64 finds a key almost instantly, wile oclvanitygen seems to flash estimated time remaining, and then sit there, just counting up the total. What's preventing it from finding the key I wonder? Is it only useful for more difficult keys?
I ran oclvanitygen -d0 -v 1Pop and output is
Quote
Prefix difficulty:                77178 1Pop
Difficulty: 77178
Device: Cypress
Vendor: Advanced Micro Devices, Inc. (1002)
Driver: CAL 1.4.1417 (VM)
Profile: FULL_PROFILE
Version: OpenCL 1.1 AMD-APP-SDK-v2.4 (650.9)
Max compute units: 14
Max workgroup size: 256
Global memory: 838860800
Max allocation: 209715200
OpenCL compiler flags: -DDEEP_PREPROC_UNROLL -DVERY_EXPENSIVE_BRANCHES -DDEEP_VLIW -DAMD_BFI_INT
Loading kernel binary ce56d53554537b697380a368231a1646.oclbin
Grid size: 1792x1024
Modular inverse: 3584 threads, 512 ops each
Using OpenCL prefix matcher
GPU idle: 11.30%
GPU idle: 1.78%
GPU idle: 1.26%
[19.42 Mkey/s][total 776208384]
full member
Activity: 140
Merit: 430
Firstbits: 1samr7
One question tho: Why do I get 2*0.9kKeys/s when throwing two threads of oclvanitygen at the same GPU, and only 1.6MKeys/s when running one thread?

Edit: Adding one more thread (for a total of 3) gives me 2M+, but 4 threads only gets about 1.6M.

Interesting!

If you run with -v, does it report high GPU idle numbers?

This could be an issue with poor multithreading in oclvanitygen.  Currently the CPU based work is done in one thread, the OpenCL work is dispatched to the GPU by a separate thread, and there is too much synchronization between the two, to the point of adding a lot of turnaround latency to the dispatcher.  It's not surprising that multiple instances improve it.  But I'm not sure why three instances would show even more improvement, unless Apple is doing something very interesting like allowing simultaneous NDRange jobs to run on the GPU at the same time.
full member
Activity: 140
Merit: 430
Firstbits: 1samr7
x86: 800-850
x64: 1300-1350

That's an impressive amount of hardware you threw at it!  What CPU gets you 1300??


Maybe you should analyse the base58 key, and print the hex private key if the base58 key is not valid

This is a good idea, running the encoded private key back through the decoder for verification.  I'll try making an online assertion test out of it.  At the very least, it's going into the offline test suite.


Unfortunately it's not friendly towards my MBP any longer :/ First time I tried oclvanitygen with default settings after having downloaded it I got a segfault after some error output about running out of CL resources. It was repeatable. I then tried to "fix it" by playing around with the parameters. -w256 made no difference, and -w256 -g256x512 caused oclvanitygen (the CL compilator I assume, the only thing that got displayed was the difficulty) to hang my graphics card completely. Had to hard reboot.

After the reboot, even the default parameters hung the computer. Got two error lines but it never reached the segfault. I've now hard rebooted again and sorry to say won't try to get you the debug output Wink Or is this where I should try the safe switch?

The meaning of -w changed to be a bit more intuitive.  It specifies the number of addresses to calculate per hardware thread in a work unit.  It constrains the grid size and saves you from having to manually tweak it.  Try -w1 and increase in powers of two.

If it still doesn't work, you can get the old behavior with -t256 -g256x512.

You make a good point about safe mode.  Its only effect right now is to work around issues with optimizations, but given that the work unit problem caused your computer to crash, the work unit problem deserves equal attention and safe mode should also constrain it.
hero member
Activity: 530
Merit: 500
New version posted.  This includes a lot of changes.

  • Oclvanitygen work size autoconfiguration now sets more reasonable limits for smaller GPUs.
  • Numerous performance tweaks to oclvanitygen, including AMD BFI_INT instruction patching.


Unfortunately it's not friendly towards my MBP any longer :/ First time I tried oclvanitygen with default settings after having downloaded it I got a segfault after some error output about running out of CL resources. It was repeatable. I then tried to "fix it" by playing around with the parameters. -w256 made no difference, and -w256 -g256x512 caused oclvanitygen (the CL compilator I assume, the only thing that got displayed was the difficulty) to hang my graphics card completely. Had to hard reboot.

After the reboot, even the default parameters hung the computer. Got two error lines but it never reached the segfault. I've now hard rebooted again and sorry to say won't try to get you the debug output Wink Or is this where I should try the safe switch?

(0.16 still works as before)

full member
Activity: 126
Merit: 100
New version posted.  This includes a lot of changes.

  • Bugfix for private key encoder.  It was possible for it to output a shorter-than-expected private key.  If you have a private key that does not start with a 5, or is less than 51 characters long, don't use it.  If you have such a private key with a bitcoin balance, contact me for assistance.
  • Oclvanitygen #pragma unroll has been removed, for Rassah.  If you experienced a crash with a Radeon card, but safe mode (-S) worked, this should fix the problem.
  • Oclvanitygen work size autoconfiguration now sets more reasonable limits for smaller GPUs.
  • Numerous performance tweaks to oclvanitygen, including AMD BFI_INT instruction patching.
  • Windows binary package now includes x64 build: vanitygen64.exe.  This binary runs about 50% faster, if you have a 64-bit edition of Windows.

I never knew just how slow the 32-bit Windows build was until recently, and must apologize for not posting a 64-bit build sooner.  64-bit arithmetic makes a huge difference.  The Windows binaries for 0.17 are also built and linked with more MSVC optimizations enabled, and now include statically-linked OpenSSL.  Despite all of this, the x64 Windows build seems to run about 10% slower than an equivalent Linux build.
Upgrade to OS X Lion fixed my problems, getting 1.6MKeys/s now with my Radeon 6490M.

One question tho: Why do I get 2*0.9kKeys/s when throwing two threads of oclvanitygen at the same GPU, and only 1.6MKeys/s when running one thread?

Edit: Adding one more thread (for a total of 3) gives me 2M+, but 4 threads only gets about 1.6M.
sr. member
Activity: 350
Merit: 251
6870
gpu 940mhz



23.5-23.7 Mkeys/s

Code:
@echo off
oclvanitygen.exe -ik -p 0 -d 0 -o ocl 1
pause
legendary
Activity: 1176
Merit: 1233
May Bitcoin be touched by his Noodly Appendage
  • Bugfix for private key encoder.  It was possible for it to output a shorter-than-expected private key.  If you have a private key that does not start with a 5, or is less than 51 characters long, don't use it.  If you have such a private key with a bitcoin balance, contact me for assistance.
Maybe you should analyse the base58 key, and print the hex private key if the base58 key is not valid
sr. member
Activity: 350
Merit: 251
x86: 800-850
x64: 1300-1350

Code:
@echo off
vanitygen64.exe -qik -o file3 1ctoon6
pause

pretty good stuff

are Mkeys/s 1024 based or just 1000

and are hashes the same way, always wondered

1090T
8gigs of ram
windows 7 ultimate x64
full member
Activity: 140
Merit: 430
Firstbits: 1samr7
New version posted.  This includes a lot of changes.

  • Bugfix for private key encoder.  It was possible for it to output a shorter-than-expected private key.  If you have a private key that does not start with a 5, or is less than 51 characters long, don't use it.  If you have such a private key with a bitcoin balance, contact me for assistance.
  • Oclvanitygen #pragma unroll has been removed, for Rassah.  If you experienced a crash with a Radeon card, but safe mode (-S) worked, this should fix the problem.
  • Oclvanitygen work size autoconfiguration now sets more reasonable limits for smaller GPUs.
  • Numerous performance tweaks to oclvanitygen, including AMD BFI_INT instruction patching.
  • Edit: Meaning of oclvanitygen -w flag has changed: It now specifies number of work items per hardware thread per job, and constrains the grid size.  -w1 now specifies a smaller overall work size.  The previous effect can be done with -t.
  • Windows binary package now includes x64 build: vanitygen64.exe.  This binary runs about 50% faster, if you have a 64-bit edition of Windows.

I never knew just how slow the 32-bit Windows build was until recently, and must apologize for not posting a 64-bit build sooner.  64-bit arithmetic makes a huge difference.  The Windows binaries for 0.17 are also built and linked with more MSVC optimizations enabled, and now include statically-linked OpenSSL.  Despite all of this, the x64 Windows build seems to run about 10% slower than an equivalent Linux build.
full member
Activity: 140
Merit: 430
Firstbits: 1samr7
It worked with the -S flag. Getting 3.12Mkey/s on my 900Mhz 5830. I hope I'm not losing much speed. Still says [50% in 1.8d] for the address I chose with Difficulty: 888446610538 (better than ~50 days that I was getting with my CPU)

That sounds about right for safe mode.  If you manage to get it to run with the optimizations turned on, with a 900 clock, you should be able to get >20Mkey/s out of it.

On the topic of crashing, I managed to get similar behavior to what you described, with a Win7 x64 VM.  Obviously not with a GPU.  But, attempting to compile any sort of OpenCL kernel with a #pragma unroll would cause a crash.  This affected both KernelAnalyzer and oclvanitygen on the CPU device.  It initially had Catalyst 11.6 and APP SDK 2.4.  Upgrading didn't fix it.  I didn't connect it to your problem until a couple days ago when it needed to be reinstalled.  Lo and behold, with a clean install of APP SDK 2.5, #pragma unroll no longer causes everything to crash!  Now, to get it back to the broken state.
legendary
Activity: 1680
Merit: 1035
Code:
Driver: CAL 1.4.1417 (VM)

Well that's embarassing, Radeon 5830 is supposed to be one of the best tested cards!  Unfortunately I'm not sure how well tested it is with Catalyst 11.6 on either Windows or Linux, which would appear to be the driver you're using?

I'm going to look into this.

There are two things you can try:
  • Safe mode (-S flag), which disables loop unrolling optimizations, but also kills performance.
  • Different versions of the Catalyst driver.  My test environment is 11.5 on Linux.

It worked with the -S flag. Getting 3.12Mkey/s on my 900Mhz 5830. I hope I'm not losing much speed. Still says [50% in 1.8d] for the address I chose with Difficulty: 888446610538 (better than ~50 days that I was getting with my CPU)
sr. member
Activity: 350
Merit: 251
i put the pthreadVC2.lib file in C:\pthreads-win32 from making pthreads

Quote
the vanitygen make file is the original. i am very confused Huh

Your pthreads dir certainly looks complete.  Try renaming c:\pthreads-win32 to c:\pthreads-w32.

If you're up for it, send me a /msg on freenode, same nickname.

that worked

now i get
Code:
LINK : fatal error LNK1181: cannot open input file 'C:\pthreads-w32\pthread.lib'

this is probably from me now knowing how to change it to use the other lib file.
full member
Activity: 140
Merit: 430
Firstbits: 1samr7
i put the pthreadVC2.lib file in C:\pthreads-win32 from making pthreads

Quote
the vanitygen make file is the original. i am very confused Huh

Your pthreads dir certainly looks complete.  Try renaming c:\pthreads-win32 to c:\pthreads-w32.

If you're up for it, send me a /msg on freenode, same nickname.
sr. member
Activity: 350
Merit: 251
i put the pthreadVC2.lib file in C:\pthreads-win32 from making pthreads and it complains about
Code:
vanitygen.c(24) : fatal error C1083: Cannot open include file: 'pthread.h': No s
uch file or directory

Code:
 Directory of C:\pthreads-win32
12/19/2006  10:00 PM            15,359 FAQ
08/06/2011  04:46 AM              include
08/06/2011  04:46 AM              lib
12/21/2006  10:00 PM            89,608 libpthreadGC2.a
12/21/2006  10:00 PM            89,614 libpthreadGCE2.a
12/19/2006  10:00 PM                99 MAINTAINERS
12/21/2006  10:00 PM               132 md5.sum
12/21/2006  10:00 PM            38,603 NEWS
12/19/2006  10:00 PM               142 PROGRESS
12/19/2006  10:00 PM            43,162 pthread.h
12/21/2006  10:00 PM            60,273 pthreadGC2.dll
12/21/2006  10:00 PM           112,556 pthreadGCE2.dll
12/21/2006  10:00 PM            86,070 pthreadVC2.dll
08/06/2011  05:18 AM            75,986 pthreadVC2.lib
12/21/2006  10:00 PM            77,879 pthreadVCE2.dll
12/21/2006  10:00 PM            29,400 pthreadVCE2.lib
12/21/2006  10:00 PM            86,071 pthreadVSE2.dll
12/21/2006  10:00 PM            29,400 pthreadVSE2.lib
12/19/2006  10:00 PM             4,844 sched.h
12/19/2006  10:00 PM             4,429 semaphore.h
12/19/2006  10:00 PM             7,904 WinCE-PORT

the vanitygen make file is the original. i am very confused Huh
full member
Activity: 140
Merit: 430
Firstbits: 1samr7
to get it to build i renamed pthreadVSE2.lib to pthread.lib i think. it complained that it could not find it so i just picked that one lol.

Quote
i removed  /DPTW32_STATIC_LIB and it builds, but still requires the dll.

This part is over my head, but I think you definitely want to leave the /DPTW32_STATIC_LIB flag in the vanitygen CFLAGS, as it sets static linkage for the pthread functions.

I figured out how to build pthreads-win32 using their nmake makefile.  First, edit the CFLAGS in the pthreads-win32 Makefile and remove the /MD.  Then build the static lib:

nmake clean VC-static

Then, point the vanitygen makefile at pthreadVC2.lib, and it should build and link with static pthreads.
full member
Activity: 140
Merit: 430
Firstbits: 1samr7
Code:
Error loading CL kernel: No such file or directory
Press any key to continue . . .

Ah, the calc_addrs.cl file needs to be in the working directory.  I'll change that error message.

Also, regarding those command line options, you're going to run into one of oclvanitygen's weak spots -- it won't be able to produce more than 1-2 addresses per second.  CPU vanitygen is much better at this.  I'll add a warning message.
sr. member
Activity: 350
Merit: 251
Code:
@echo off
oclvanitygen.exe -kqi -d 0 -o file 1
pause

Code:
Error loading CL kernel: No such file or directory
Press any key to continue . . .

i probably am not passing the proper parameters to oclvanitygen.exe

some more stuff

to get it to build i renamed pthreadVSE2.lib to pthread.lib i think. it complained that it could not find it so i just picked that one lol.

Code:
CC = cl
OPENSSL_DIR = C:\OpenSSL-Win32
PTHREADS_DIR = C:\pthreads-w32
PCRE_DIR = C:\pcre-7.9-src
OPENCL_DIR = "C:\Program Files (x86)\AMD APP"
OPENCL_INCLUDE = /I$(OPENCL_DIR)\include
OPENCL_LIBS = $(OPENCL_DIR)\lib\x86\OpenCL.lib
CFLAGS = /D_WIN32 /DPTW32_STATIC_LIB /DPCRE_STATIC /I$(OPENSSL_DIR)\include /I$(PTHREADS_DIR) /I$(PCRE_DIR)
LIBS = $(OPENSSL_DIR)\lib\libeay32.lib $(PTHREADS_DIR)\pthread.lib $(PCRE_DIR)\pcre.lib ws2_32.lib
OBJS = vanitygen.obj oclvanitygen.obj pattern.obj winglue.obj

all: vanitygen.exe

vanitygen.exe: vanitygen.obj pattern.obj winglue.obj
link /nologo /out:$@ $** $(LIBS)

oclvanitygen.exe: oclvanitygen.obj pattern.obj winglue.obj
link /nologo /out:$@ $** $(LIBS) $(OPENCL_LIBS)

.c.obj:
$(CC) $(CFLAGS) /c /Tp$< /Fo$@

oclvanitygen.obj: oclvanitygen.c
$(CC) $(CFLAGS) $(OPENCL_INCLUDE) /c /Tpoclvanitygen.c /Fo$@

clean:
del vanitygen.exe $(OBJS)

Code:
en.obj pattern.obj winglue.obj C:
\OpenSSL-Win32\lib\libeay32.lib C:\pthreads-w32\pthread.lib C:\pcre-7.9-src\pcre
.lib ws2_32.lib
        link /nologo /out:oclvanitygen.exe oclvanitygen.obj pattern.obj winglue.
obj C:\OpenSSL-Win32\lib\libeay32.lib C:\pthreads-w32\pthread.lib C:\pcre-7.9-sr
c\pcre.lib ws2_32.lib "C:\Program Files (x86)\AMD APP"\lib\x86\OpenCL.lib

i moved the src folder to the root to make it easier.

i removed  /DPTW32_STATIC_LIB and it builds, but still requires the dll.
full member
Activity: 140
Merit: 430
Firstbits: 1samr7
I can not find pthread.lib anywhere.

Quote
i managed to get a working .exe but it requires pthreadVSE2.dll to be in the same dir as vanitygen.exe, also i can not seem to get oclvanitygen.exe to work.

Wow ctoon6, I'm impressed!  Getting a Windows build environment set up is extremely difficult.

If it needs pthreadvse2.dll, it just means you built the DLL version of pthreads-win32.  You must have removed the /DPTW32_STATIC_LIB in order for it to have linked vanitygen.exe, and it should still work fine.  I didn't spend enough time reading the documentation for pthreads-win32, had a lot of problems with this, and ended up using their visual studio project to build a static pthread.lib.

What's wrong with your build of oclvanitygen?
Jump to: