Pages:
Author

Topic: VanitySearch (Yet another address prefix finder) - page 4. (Read 32072 times)

legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
I did try to look through this to see if it has been answered, but attempting to find a GPU accelerated nostr vanity address generator, and this is as close as I've found so far.  Obviously not the same thing but, just trying to figure out if there is a branch of this that could do that, or something else made to do so?  Using CPU only is brutal.
You could go to project development : https://bitcointalk.org/index.php?board=12.0 there are some windows GPU versions, some are not open source, so do your research before using.

None of those are Nostr vanity generators. AFAIK, nobody has made one yet, which is not unreasonable when you consider that Nostr is a very recent invention.

[for the record, VanitySearch cannot find Taproot addresses, but it should be as simple as checking for an input prefix that is either bc1q or bc1p to allow that. I was able to use VanitySearch to generate arbitrary prefixes not starting with 1, 3, etc, in my experimental builds.]
copper member
Activity: 1330
Merit: 899
🖤😏
I did try to look through this to see if it has been answered, but attempting to find a GPU accelerated nostr vanity address generator, and this is as close as I've found so far.  Obviously not the same thing but, just trying to figure out if there is a branch of this that could do that, or something else made to do so?  Using CPU only is brutal.
You could go to project development : https://bitcointalk.org/index.php?board=12.0 there are some windows GPU versions, some are not open source, so do your research before using.
copper member
Activity: 60
Merit: 2
I did try to look through this to see if it has been answered, but attempting to find a GPU accelerated nostr vanity address generator, and this is as close as I've found so far.  Obviously not the same thing but, just trying to figure out if there is a branch of this that could do that, or something else made to do so?  Using CPU only is brutal.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
This is what I ended up settling with:

Code:
static bool char16_is_match(RegexNode *node, const char *orig, const char *cur,
                          const char **next)
{
        Char16Node char16 = node->chr16;
        int result;
        __m128i xmm0, xmm1, xmm2;

        *next = cur+16;
        xmm0 = _mm_loadu_si128((__m128i*) char16.chr);
        xmm1 = _mm_loadu_si128((__m128i*) cur);
        xmm2 = _mm_cmpeq_epi8(xmm0, xmm1);
        result = xmm2[0] & xmm2[1] & xmm2[2] & xmm2[3] & xmm2[4] & xmm2[5] & xmm2[6]
            & xmm2[7] & xmm2[8] & xmm2[9] & xmm2[10] & xmm2[11] & xmm2[12]
            & xmm2[13] & xmm2[14] & xmm2[15];

        return result;
}

The ugly AND operators are the direct result of Intel not having any kind of "AND bytes/words/dwords/quadwords inside XMM register" instruction. Indeed, there are no packed bitwise instructions operating on single SIMD registers at all - the kinds of instructions are there are mostly designed with video decoding algorithms in mind.

There's no real performance improvement from using this over the old code.

Yes, I know there is MOVMSKPS, but that is for floats, so it hardly suits byte inputs. Not only would I have to restrict inputs to 4 chars at a time, I would also have to pad them with zeroes which will negatively affect performance.

Hence this code has been pushed right now to the repo, but nothing calls it - this is despite coverage tests saying it got hit 16 million times (over the span of a few minutes) whereas the char_is_match other function would be (and still is) called several hundreds of millions of times.

So the only real performance improvement I pushed in this latest commit is to disable the UTF8 character matching and treat everything as ASCII - that did increase Mkey performance on 8x "Intel(R) Xeon(R) CPU E31240 @ 3.30GHz" by about 0.1MKey/s (up from 0.9 MKey/s average - VanitySearch reports large speeds for the first few seconds but then it comes down as the estimates become more accurate).

Meanwhile the non-regex mode (also in my fork) runs at 2.20MKey/s once the dust is settled - roughly the same as original VanitySearch.
newbie
Activity: 17
Merit: 7
Quote
Is this part of the base standard, or a C++11/14/17/20 extension?

hmm its actually in std::experimental so i guess that means its not officially in the standard yet. it was originally a third-party library that i had used in an old version of a program of mine, then a few years ago it got promoted to get absorbed into the c++ standard library. not sure if its on track to become official with c++23 or not. anyway i just know i really liked how relatively simple it made writing vectorized code 😎
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
Quote
maybe put some SSE or AVX in there somehow

https://en.cppreference.com/w/cpp/experimental/simd

A zero-overhead abstraction for the high-level language you are already using is so much nicer than spending a bunch of time writing your own architecture-specific routines in assembly or compiler intrinsics (std::simd itself being a simple template library implemented using intrinsics). Generally speaking at least

This is very useful, thanks!

Is this part of the base standard, or a C++11/14/17/20 extension?

Meanwhile I have already made an optimized matcher for consecutive character matches:

edited for garbage code formatting pasted from my phone

Code:
static bool char16_is_match(RegexNode *node, const char *orig, const char *cur, const char **next)
{
    Char16Node char16 = node->chr16;
    int result;
    __asm__ __volatile__ (
        "movdqa xmm1, [%2] \n"
        "pcmpeqb xmm0, xmm1 \n"
        "movmskps %0, xmm0 \n"
        : "=r"(result)
        : "r" (char16.chr), "r" (cur)
        : "xmm0", "xmm1"
    );
    *next = cur+16;
    return result;
}

This actually only uses SSE2. I'm not sure why JLP compiled with SSSE3, is he using some of those newer instructions in his SECP256k1 class? Huh

Anyway, I haven't benchmarked this yet, but the way this is supposed to work, is that the regular expression is already represented as a linked list like this:


^1tryme.*

start --> char --> char --> char --> char --> char --> char --> quant
                                                                                              v
                                                                                             any

so you can actually go down in some of these nodes if necessary.

Now instead of comparing each of these char nodes as is presently done, which is inefficient, we can take advantage of the fact that XMM registers can hold up to 16 char values. So we can combine sets of 2-16 char nodes into a single char16 node, and pad it (as well as the input string!) with NULs if necessary :

start --> char16 --> quant
                                 v
                               any

This has roughly the same overhead as a single "char" old-school comparison. Considering that the "char" node match function is called hundreds of millions of times in the span of a few seconds, owing to the fact that most people will usually search for just a certain pattern of strings, this will translate to 6-8x less of these calls being made depending on the regex string.
newbie
Activity: 17
Merit: 7
Quote
maybe put some SSE or AVX in there somehow

https://en.cppreference.com/w/cpp/experimental/simd

A zero-overhead abstraction for the high-level language you are already using is so much nicer than spending a bunch of time writing your own architecture-specific routines in assembly or compiler intrinsics (std::simd itself being a simple template library implemented using intrinsics). Generally speaking at least
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
WOW you are really true code master man, how long do you think will this take you to have a working GPU build?

Well, I wouldn't call myself that.  Wink

citb0in actually asked me that question already, and we both thought it would take max. a week or two. This was in the beginning of January. But unfortunately (among other things) I discovered that adding the GPU support wasn't as simple as changing a few lines of code (and I refunded him his money last night).

So as it stands, there's only a working CPU build - just now I pushed the code that will enable you to run queries like "./VanitySearch ^1abc" to search for 1abc prefix at the beginning, and also you could already do ./VanitySearch 1abc$ to search for 1abc at the end of the address, and some other things I will list below.

Wildcard searches with *, +, and {m,n} work but they are not very useful as current hardware can only reasonably search for 6-7 characters at a time:

(difficulty estimation is currently wrong for {m,n} and will report much longer time than actual)


Character classes and quantifier example:

./VanitySearch -t 8 '^1[fF][iI].{2,4}[sS][hH]'

Code:
 time ./VanitySearch -t 8 '^1[fF][iI].{2,4}[sS][hH]'
VanitySearch v1.19
Benchmarking regex matching speed of prefix "^1[fF][iI].{2,4}[sS][hH]" (case sensitive) for 1 second, please wait... done
Difficulty: 9225725494
Search: ^1[fF][iI].{2,4}[sS][hH] [Compressed]
Start Tue Feb  7 12:12:39 2023
Base Key: CA21EDE5142917948E79FEFD98EBBB2E485ECA54E1C814B482C39A2C8B8D5B37
Number of CPU thread: 8

PubAddress: 1FiF7Sh6Ei8ezjMASNdED1ixyTZ9HY4DbP
Priv (WIF): p2pkh:Kxsq34CD5UBCqhRMZi4WNH1oQjnr3FHjzQqS2kEDb6CN3EpZYauo
Priv (HEX): 0x317261FB480D6C92F777B4F3B18176433CAED4D61E8A3E507A0414E3A6BCCAD4

PubAddress: 1FiZ9bmpedPk1ksHRtUtZEDZ2n7fNz68LP
Priv (WIF): p2pkh:L3zdVyH3kWaRtzJ4eLEc6FwcbrzBRpVgoPZjCC3Gq1d2ZCEENuM6
Priv (HEX): 0xCA21EDE5142917948E79FEFD98EBBB31485ECA54E1C814B782C39A2C8B8D6ACE
[0.20 Mkey/s][GPU 0.00 Mkey/s][Total 2^18.63][Prob 0.0%][50% in 08:45:41][Found 2]
PubAddress: 1FidbgsHsR9B2iEu5KkfYxRE7SPjadWSiS
Priv (WIF): p2pkh:L12uW3KSdEkLQzPutC9E6cpGJe6w9R9udHi4tLCrVEJKkCeikoJJ
Priv (HEX): 0x71CA8897DFCA52F8649A489E5C34216517D724D6E9710FE937FEE419F019AA9F
[0.19 Mkey/s][GPU 0.00 Mkey/s][Total 2^19.55][Prob 0.0%][50% in 09:15:06][Found 3]  ^C

real    0m8.447s
user    0m51.466s
sys     0m0.075s

Note: Make sure you put the regex in single quotes '' otherwise bash might mess them up!

Anyway, I'm going to try to optimize the CPU build first before working on the GPU (regex mode is currently 2x slower than non-regex mode), maybe put some SSE or AVX in there somehow.
copper member
Activity: 1330
Merit: 899
🖤😏
I am still working on my VanitySearch build by the way.

Over the past few days, I made the CPU core of the regex support work (It just needs a few lines of code change to allow you to lock the search to prefix or suffix, but that will be done in a few days), but apparently, there's a ton of work necessary to be done on the GPU side.

[Yes, I hate CUDA - what an abomination to debug - but if nobody else is going to do it, why let the idea of regex search go down the drain?]

EDIT: GPUEngine must be almost completely rewritten, including making a regex matcher in CUDA. Apparently, nobody has ever done this before (or at least finished it) so whatever I'm going to implement here might possibly be the first time someone attempts that.

Why does it need to be rewritten?

Because the GPUEngine was based on wildcards. It would basically take the ? and * and then calculate all the different possible combinations of the prefix, and the result would be a gigantic array which is indexed by a unique prefix ID, where ago find a match for an address, you'd run a string comparison on all of the corresponding prefixes, so see if any match.

There is two ways to insert regex at hat point:

1- shoehorn the other regex characters inside the GPUEngine (but this approach really only works with quantifier metacharacters), or
2) break the input string down into a tree, where each regex character corresponds to some part of the tree, and match all those parts in parallel.

2) is the approach I chose to take because it means I can use the existing CPU regex library as a reference for how to match stuff.

Either way, we're on track to make the first regex-enables VanitySearch, eventually.

PS. There is no longer a GPU performance hit, but it just seems to be sitting there idle ever since I ripped out the (pointless for this build, IMO) prefix input's anyway.
WOW you are really true code master man, how long do you think will this take you to have a working GPU build?
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
I am still working on my VanitySearch build by the way.

Over the past few days, I made the CPU core of the regex support work (It just needs a few lines of code change to allow you to lock the search to prefix or suffix, but that will be done in a few days), but apparently, there's a ton of work necessary to be done on the GPU side.

[Yes, I hate CUDA - what an abomination to debug - but if nobody else is going to do it, why let the idea of regex search go down the drain?]

EDIT: GPUEngine must be almost completely rewritten, including making a regex matcher in CUDA. Apparently, nobody has ever done this before (or at least finished it) so whatever I'm going to implement here might possibly be the first time someone attempts that.

Why does it need to be rewritten?

Because the GPUEngine was based on wildcards. It would basically take the ? and * and then calculate all the different possible combinations of the prefix, and the result would be a gigantic array which is indexed by a unique prefix ID, where ago find a match for an address, you'd run a string comparison on all of the corresponding prefixes, so see if any match.

There is two ways to insert regex at hat point:

1- shoehorn the other regex characters inside the GPUEngine (but this approach really only works with quantifier metacharacters), or
2) break the input string down into a tree, where each regex character corresponds to some part of the tree, and match all those parts in parallel.

2) is the approach I chose to take because it means I can use the existing CPU regex library as a reference for how to match stuff.

Either way, we're on track to make the first regex-enables VanitySearch, eventually.

PS. There is no longer a GPU performance hit, but it just seems to be sitting there idle ever since I ripped out the (pointless for this build, IMO) prefix input's anyway.
legendary
Activity: 2534
Merit: 6080
Self-proclaimed Genius
May I?

I have a few questions if you don't mind.

1: how do we know if a custom address could exist and is valid?
As long as the address has a valid checksum, it's valid.
The checksum is the last 4Bytes of the address after decoding it in base58;
To be valid, it should match the first 4Bytes of the SHA256D hash of the rest of the characters (w/o the checksum).

In fact, even without the private key, you can make valid custom address that "could exist".
Here's a tool for example: https://gobittest.appspot.com/ProofOfBurn

Quote from: digaran
2: if I already know a vanity address's private key/ key range, would it be easier to generate that address if I could set a specific range for the tool to search only in that range? Would that reduce the difficulty and the time needed?
Yes, but why search if you already know the private key?

Take note that similar vanity addresses like 1banana12345...... and 1banana12344...... have no correlation to their private key's ranges.
Addresses are just the base58 encoding of the HASH160 (sha256 and then ripemd160) of the public key which is pseudorandom, plus some additional data.
Knowing the former's private key wont trivialize the search for the latter's private key.
copper member
Activity: 1330
Merit: 899
🖤😏
I have a few questions if you don't mind.

1: how do we know if a custom address could exist and is valid?
2: if I already know a vanity address's private key/ key range, would it be easier to generate that address if I could set a specific range for the tool to search only in that range? Would that reduce the difficulty and the time needed?
full member
Activity: 1162
Merit: 237
Shooters Shoot...
Quote
How did you input these prefixes? Using the command-line, or another way?

From input file, the original -i  flag; 1 per line

Good to know. Although it would still be handy to read prefixes from the command line too: VanitySearch 1abcd bc1qabcd for example.
Easy to do, but I wouldn't wanna put 32 million prefixes on cmd line LOL.

Just keep parsing over the CMD line...
hero member
Activity: 1554
Merit: 880
pxzone.online
Thanks for this, i just discovered this thread and eventually generate mine with native segwit wallet address (check my profile). Well it's was generated almost in instant since i only use 3 character in reference with my username. Smiley
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
Quote
How did you input these prefixes? Using the command-line, or another way?

From input file, the original -i  flag; 1 per line

Good to know. Although it would still be handy to read prefixes from the command line too: VanitySearch 1abcd bc1qabcd for example.
full member
Activity: 1162
Merit: 237
Shooters Shoot...
Quote
How did you input these prefixes? Using the command-line, or another way?

From input file, the original -i  flag; 1 per line
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
Ultimately, it will not only allow prefix searches, but also suffix searches and also for vanity characters in an arbitrary part of the address.

Does your version still support full address search?

I didn't modify that particular section so that should work just fine.

Quote
also, current program can search up to 32 million prefixes simultaneously (I haven't tried more than 32 million).

How did you input these prefixes? Using the command-line, or another way?

Original VanitySearch only reads the first prefix on the command line, but I tweaked it to read all of them.

Side note: Regex matching on GPU seems to work, but I am going to revert all the changes I made to GPU because there's an alternate way of matching strings using the "|" operator, which makes everything I made in that section obsolete (and is probably the cause of the 5x GPU slowdown anyway).
full member
Activity: 1162
Merit: 237
Shooters Shoot...
Guys I updated my VanitySearch clone to work on GPU as well as CPU, so you guys shouldn't have any problems with that. I'm still debugging some issues with ".*" and ".+" wildcards that citb0in spotted, and the GPU build has a nasty and unexpected 5x slowdown. Which is pretty weird since almost all of the GPU codebase is the same as the original.

What exactly is different about your clone versus the original? Thanks.

My clone has regex support instead of just the two wildcards ? and * (had to rip out all the legacy wildcard code out in the process because it was overcomplicating things) and it also support searching for multiple prefixes simultaneously.

It is available at https://github.com/ZenulAbidin/VanitySearch (warning: still experimental software).
Can you give me an example of how regex works/is different than current program?

current program can search strings like:

1NotATeth
1NotATeth*

How is regex different? I read your github page

Does your version still support full address search?

also, current program can search up to 32 million prefixes simultaneously (I haven't tried more than 32 million).

legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
Guys I updated my VanitySearch clone to work on GPU as well as CPU, so you guys shouldn't have any problems with that. I'm still debugging some issues with ".*" and ".+" wildcards that citb0in spotted, and the GPU build has a nasty and unexpected 5x slowdown. Which is pretty weird since almost all of the GPU codebase is the same as the original.

What exactly is different about your clone versus the original? Thanks.

My clone has regex support instead of just the two wildcards ? and * (had to rip out all the legacy wildcard code out in the process because it was overcomplicating things) and it also support searching for multiple prefixes simultaneously.

It is available at https://github.com/ZenulAbidin/VanitySearch (warning: still experimental software).
hero member
Activity: 1438
Merit: 513
Guys I updated my VanitySearch clone to work on GPU as well as CPU, so you guys shouldn't have any problems with that. I'm still debugging some issues with ".*" and ".+" wildcards that citb0in spotted, and the GPU build has a nasty and unexpected 5x slowdown. Which is pretty weird since almost all of the GPU codebase is the same as the original.
Ive always noticed a slowdown on verbose wildcard vanities. specially ones like 1test*test.
usually, its indexes much slower with split key generation + wildcards toss in a split key +1test*test even with -c it seems to take a while longer than it should.

Pages:
Jump to: