[ANNOUNCE] Poolserverj WORKMAKER EDITION RELEASED - 0.4.0rc1 - page 7.

shads

sr. member

Activity: 266

Merit: 254

I quite like that idea. Poolserverj already maintains an internal 'pseudoblocknumber' which is just the sum of all chain's blocknumbers. This could be used.

Though it would require miner support. And if I'm going to go down the road of cajoling miner devs to support another protocol adjustment I wonder I should just exert the effort getting uptake for a differential binary protocol... I'm going to post a proposed spec for discussion in the next few days. The spec I have in mind would eliminate the need for LP requests altogether.

makomk

hero member

Activity: 686

Merit: 564

Quote from: shads on November 08, 2011, 12:51:01 AM

When there is a double block change (i.e. btc and nmc solved by one solution) the daemons won't see the new block at exactly the same time. From what I've seen a delay of around a 1-2 seconds is not unusual. What happens then is that one chain advances to the next block. PSJ checks the other chains to see if they've updated, sees they haven't so starts sending out LPs. Before the miner receives the LP and establishes a new LP connection the second chain advances so PSJ starts sending out another batch of LP's. Those miners who haven't got their new LP connection registered before the second LP miss out so continue working on the old block for as long as they would normally (probably about a minute).

The solution I was going to suggest for this was adding an opaque X-Block-ID header to the getwork response that changes every time there's a new block on any of the chains (for example, it could be a counter that increments). Mining software can then be modified to send their own X-Block-ID header in their longpoll requests with the last value they saw, and if it doesn't match what you're expecting then the miner must've missed a block change and the long poll should return immediately.

The nice thing is that this can be combined with your existing workaround; clients that don't send X-Block-ID can get the early longpoll expiry whereas clients that send it don't need to be spammed with extra work.

shads

sr. member

Activity: 266

Merit: 254

I'm pretty excited about this release. It contains some very significant new features and improves overall performance by several 100%. The merged mining branch of poolserverj has been in alpha for some time and with the help and testing from several pool ops we've been able to stabilise it and iron out most of the MM specific glitches.

The WorkMaker feature represents a fundamental shift the way poolserverj operates. The rpc bottleneck is gone forever so your bitcoin daemons can have a little rest.

I'll go through the major features one by one and full changelog is at the end of this post. But first...

Please do this!

The config options have changed significantly. I highly recommend you start fresh with the sample properties file and transfer your settings over. Trying to do it the other way is going to be a very error prone process. I also recommend you take the time to read the comments in properties file in detail. They are the defacto poolserverj documentation.

Please also make sure you read the 'Default Donation' section of this post. This can be disabled but please be aware that it's there by default.

Recommended minor patch

I highly recommend making this very small patch to you daemons. This simply prevents your debug.log being spammed.

In the file rpc.cpp or bitcoinrpc.cpp search for this line (there may be extra strMethod's in there so search for "ThreadRPCServer method="):

Code:

if (strMethod != "getwork")
        printf("ThreadRPCServer method=%s\n", strMethod.c_str());

and change it to:

Code:

if (strMethod != "getwork" && strMethod != "getworkaux" && strMethod != "getauxblock"
               && strMethod != "buildmerkletree" && strMethod != "getblocknumber" && strMethod != "getmemorypool")
       printf("ThreadRPCServer method=%s\n", strMethod.c_str());

Now onto the new features...

Merged Mining Support

Poolserverj now has a complete native merged mining implementation. It handles all the functions of merged-mining-proxy internally and performs all the additional functions that previously required a merged-mining version of bitcoind. This means that requirements for merged minings are much simpler:

   * No merged-mining-proxy required
   * A stock version of bitcoind that includes the getmemorypool patch (bitcoin 0.5.0 includes this)
   * namecoind merged mining version.
   * Optionally if you want to take advantage of coinbasing on the namecoin chain then you can apply the getmemorypool patch to namecoind (which is very simpe to do)

There are a few gotchas with merged mining we've all discovered over the past few weeks which poolserverj handles:

Partial Stales

When a single chain (e.g. namecoin) finds a new block but the other chain doesn't it's possible for a share to be stale for one but not the other. Poolserverj detects this and sets our_result=1 and reason=partial_stale. Optionally you can also configure a BOOLEAN database column for each chain that will be marked with 1,0 so you can calculate share credits on a per chain basis. This is particularly a problem with cgminer clients (fixed in the latest source code though) who do not respect longpoll unless prev_block_hash has changed. This will only change when a bitcoin block is found so cgminer client can get partial-stales for the namecoin chain quite frequently.

Longpoll Passthrough

To address the double longpoll issue. There is a fundamental design clash between longpolling and merged mining. Basically it works like this: When there is a double block change (i.e. btc and nmc solved by one solution) the daemons won't see the new block at exactly the same time. From what I've seen a delay of around a 1-2 seconds is not unusual. What happens then is that one chain advances to the next block. PSJ checks the other chains to see if they've updated, sees they haven't so starts sending out LPs. Before the miner receives the LP and establishes a new LP connection the second chain advances so PSJ starts sending out another batch of LP's. Those miners who haven't got their new LP connection registered before the second LP miss out so continue working on the old block for as long as they would normally (probably about a minute). This patch addresses this by setting a longpoll passthru period. Where two block changes happen within a specified period (10 seconds). After the 2nd block change and until 10 seconds after the 1st block change has passed, any longpolls received will have a short expiry of 1 second after which they'll return new work to the worker. This will give any slow miners a few seconds to get their longpoll in and learn that there's a 2nd new block to work on. The reason for the 1 second delay is to prevent longpoll spam. Most miners will immediately send another LP request as soon as they get a response. With 0 delay this sets up an LP spam loop. They will still spam 1/sec for up to 10 seconds which is why this is called a workaround rather than a fix. There's really no way to fix this issue properly except to ditch either longpolling or merged mining alltogether.

Experimental SCrypt Chain Support

Currently only litecoin is supported but additional chains can be added relatively easily. All that's required to add new chain support is to define the chain in the source and add a few constants. This hasn't been tested in the field yet.

Database Fault Tolerance

PoolServerj is now highly tolerant to database failures. Connections are retried if they fail.

Workers will continue to be served from the cache in this case and given the default cache expiry of 60 minutes (unless you explicitly flush the worker) this means the only worker impact is that changes to the worker from the front end will not be propagated and new workers will not be able to connect until the DB connection is restored.

Shares will be serialized to disk in batches when the DB is not available. When it comes back online any shares on disk will then start being sent to the database. Shares are stored in batches as separate files so it is perfectly feasible to take these files and give them to a different poolserverj server to upload.

The end result is your database can go offline for a significant period and poolserverj should happily carry on working with no data loss. The only impact will be that workers that have not connected for more than 1 hour will not be able to authenticate.

I have tested this by taking mysql down for an hour with a stress test client submitting about 400 shares /sec. When the database was brought back online it took only a couple of minutes to flush all shares to the database.

WorkMaker

Note that all the variations of WorkMaker and Coinbasing have been tested and proven to work on testnet for both bitcoin and namecoin testnet.

My biggest bugbear with merged mining was the additional overhead it put onto the server to keep track of everything. But... what merged mining taketh away WorkMaker giveth back 10 fold

WorkMaker is internal work generation. No more getwork rpc calls to bitcoin daemons which means you can use a stock standard version of bitcoind (as long it's a version that supports the getmemorypool rpc call).

Aside from enabling coinbasing functionality (which I'll discuss below) it offers huge performance benefits.

You may ask "aren't you just moving the CPU load from bitcoind to poolserverj?". Partially... There are two major ways this is a win though:

   1. You eliminate RPC/network latency overhead which is significant for what should be a microsecond operation.
   2. The bitcoin implementation uses a very inefficient algorithm for generating work. The majority of the CPU load comes from hashing. The default implementation requires ~ 2 * nTransactions hashes to generate a work. The poolserverj implementation requires log2(nTransactions). For an average block with 50 transactions this means 100 hashes vs 6. For a large block with say 200 transactions this means 400 hashes vs 8.

Performance

Of course I did some benchmarking to prove the point so here's the numbers...

Raw generation tested by altering poolserverj to consume the work internally so that as soon as it's generated it will try to generate another one...

   * 0.3.0 with JK patched bitcoind daemon: ~2000 works/sec
   * WorkMaker: 24000 works/sec

Frontside getwork capacity - using stress test client with 50 concurrent thread continuously issuing getwork requests. This measures the throughput including RPC overhead:

   * 0.3.0 with JK patched bitcoind daemon: ~1000 works/sec
   * WorkMaker: ~4000 works/sec

The highest frontside getwork rate I've seen in a production environment with 0.3.0 was on one of BTC Guild's servers: 4500 works/sec so it's probably reasonable to guess that this server would be capable of ~15000/sec.

This is only the first iteration and there numerous ways this can be further optimised which will happen in the future.

Coinbasing

So aside from performance what else does workmaker do for you? Because it generates the coinbase transaction internally (similar to luke-jr's coinbaser patch) we have a few options to play with.

Firstly you set the payout address in the properties file. This does not have be associated with the bitcoind you are connected to. It could be an offline secure wallet address if you want. Or if you run multiple instances of poolserverj on different servers you can ensure all coinbase rewards go to a single wallet regardless of which server generated them.

Coinbase message string: There is an option to set a short coinbase message string. I have hardcoded this to be limited to 20 bytes as I don't want to encourage spam in the blockchain. You may want to use some sort of pool identifier or even a private UID.

Coinbasing can also work on namecoin or other aux chains but it requires the getmemorypool patch to be applied. This is a very simple patch to apply, even I was able to do it first go and I'm biggest numpty around when it comes to c++.

Coinbase Donations

It is now possible to set an automatic donation to any address in the coinbase transaction. This can be calculated using 4 different methods:

   1. an absolute value in bitcoins (or fractions)
   2. a percentage of total block reward
   3. a percentage of total block reward excluding transaction fees
   4. a percentage of transaction fees only

Note that I said ANY address. If you are donating and you use open source software from other developers in your pool please consider sending them a tip as well. You can set as many donation targets as you like.

Donations will work on aux chains as well if you set the chain to localCoinbasing=true but you MUST have the getmemorypool patch applied to the aux chain daemon.

Why did I put this feature in?

As many of you know I don't run a pool and I don't even own a mining rig. Donations for development work are my sole source of coins. This is good for poolserverj users because I am not distracted by the day to day stuff of running a pool and I can concentrate on developing the software. In full time hours I could probably measure the time I've spent on poolserverj in months. I never really expected donations but when a few started coming in it was rather nice. And I noticed it gave me a lot more motivation to keep improving the code. This is a simple no hassle way you can help keep me and other open source developers interested and motivated.

If you choose not to use this feature I have no problem with that. There's many reasons people may not (0 fee pools for example) and I'm not going to give any preferential support based on whether people do or don't donate, in fact there no real way I can tell where donations are coming from that I can see. I think the record has shown I've always been happy in the past to give support and advice without the expectation of donations.

Default Donation

The sample properties file is setup with a default donation. You can remove this simply by commenting out those lines. I realise this may be controversial, the reason I've chosen to do it this way is simply so that people have to make a conscious choice to NOT donate. This feature obviously interests me more than other people and I'm sure that if it wasn't the default option many people that may have been happy to donate would not simply because it never crosses their mind to look at the feature. This way everyone who uses it has to stop and think about it for a moment.

The first time you start this version of poolserverj you will be prompted with a warning that this default donation exists. If the file 'donation.ack' exists in the tmp directory you will not see the prompt and poolserverj will start normally.

If anyone genuinely feels they missed the message and donated unwittingly contact me with the blocks involved and as long as I can verify the blocks belong to your pool I'll be happy to send the coins back to the same address that the main coinbase reward was paid to.

Future Development

I'll be maintaining the 0.3.x branch for a short time until the 0.4.x branch is considered stable at which time it will be merged.

Non-merged mining is quite possible with 0.4.0 but it hasn't been extensively tested.

A couple of the next major developments I plan to work on include:

   * Enable poolserverj to listen on multiple ports
   * Binary work protocol to replace frontside RPC. This can reduce bandwidth usage by 85% and will remove the need for longpolling and all it's associated problems.

Full Changelog

From 0.3.0.FINAL to now:

[0.4.0rc1 WorkMaker]

Major Features:

Full merged mining support including longpoll for all aux chains.
Support for SCrypt chains (litecoin, tbx, fbx)
WorkMaker internal work generation (more than 10x faster than rpc with JK patched daemons)
Coinbasing to any payout address both for parent chain and aux chains (aux chain daemon must have getmemorypool patch applied to use this feature)
Donations via coinbase transaction. PLEASE READ THE SAMPLE CONFIG AS THERE IS A DEFAULT DONATION WHICH YOU CAN REMOVE.
DB fault tolerance, shares serialized to disk if DB connection is lost and sent to server when connection is reestablished. This means along with Worker caching
you can switch your db off without losing any shares. The only impact will be that new workers will not be able to authenticate and higher than
normal DB load when you switch it back on.

Detailed changes:

- added useragent as an optional column for share logging.
- added support for SCrypt as a proof of work hashing algorithm.
- Update to generate merged mining auxPoW internally. This completely removes the dependency on bitcoind-mm version and allows us to use stock bitcoind which is nifty since it contains getmemorypool and we need that. A point to note: hashes in the 2 merkle branches contained in auxpow are reversed compared to a transaction hash in the main block merkle tree.
- added db.connectionOptions property to allow users to add arbitrary connect paramenters to connection URL.
- Update to share logging to retry failed connections. If the connection is still failed then shares are serialized to disk. The ShareLogger thread periodically checks for shares on the disk and resubmits them to the database. This means shares can survive across a poolserverj restart if the DB is failed for that long. THIS IS AN API BREAKING CHANGE. Any db.engine.shareLogging plugins that inherit from DefaultPreparedStatementSharesDBFlushEngine will need to have their method signatures updated.
- Integrate local block generation for aux chains allowing coinbasing in the aux blocks
- cleaner thread now checks for dead longpoll connections, can only detect connections that have not been silently dropped by the miner.
- added timestamps to logging
- fix: missing block check interval property
- fix: with a single worksource the fetcher was blocking until a new block check was issued and forced all blocks to be marked in sync.
- block checker change sleep() to wait() so if can be woken up with notify
- Refactored all JsonRpc specific code out of WorkSource, ShareSubmitter and BlockChainTracker. bitcoind side of server is now complete abstracted from protocol and transport.
- add case sensitive worker name option (caseSensitiveWorkerNames=false)
- add AnyWorkerFetchEngine that bypasses DB lookup and returns a worker with the requested name. This is for pools that do no have miner accounts and use a payout address as username. Password is set to empty String.
- add blackhole db share flush engine which does nothing. A quick work around for disabling writing shares to db.
- hack to allow properties to return empty String instead of null - use value '~#EMPTY_STRING#~'
- rebuild block tracker to handle multiple chains with different block numbers.
- adjust longpolling to account for additional chains. LP now fires when any chain finds a new block but waits until a getblocknumber has come back from each chain first to try and prevent double LPs. If a daemon is down it will timeout and fire the longpoll anyway after 1 second.
- allow subtractive traceTargets. i.e. 'traceTargets=all,-merged' will show all traceTargets except 'merged'
- fixes for block sync cases where sync tracking loses the plot and never fires block change.
- support for partially stale shares. Where a single chain has advanced to the next block old work may still be valid for the other blocks. This patch checks that the work is valid for at least one block giving three possible outcomes. accepted, accepted partial-stale, stale
- longpoll passthru. To address the double longpoll issue. There is a fundamental design clash between longpolling and merged mining. Basically it works like this: When there is a double block change (i.e. btc and nmc solved by one solution) the daemons won't see the new block at exactly the same time. From what I've seen a delay of around a 1-2 seconds is not unusual. What happens then is that one chain advances to the next block. PSJ checks the other chains to see if they've updated, sees they haven't so starts sending out LPs. Before the miner receives the LP and establishes a new LP connection the second chain advances so PSJ starts sending out another batch of LP's. Those miners who haven't got their new LP connection registered in time miss out so continue working on the old block for as long as they would normally (probably about a minute). This patch addresses this by setting a longpoll passthru period. Where two block changes happen within a specified period (10 seconds). After the 2nd block change and until 10 seconds after the 1st block change has passed, any longpolls received will have a short expiry of 1 second after which they'll return new work to the worker. This will give any slow miners a few seconds to get their longpoll in and learn that there's a 2nd new block to work on. The reason for the 1 second delay is to prevent longpoll spam. Most miners will immediately send another LP request as soon as they get a response. With 0 delay this sets up an LP spam loop. They will still spam 1/sec which is why this is called a workaround rather than a fix. There's realy way to fix this issue except to ditch either longpolling or merged mining alltogether.
- added trace message to indicate longpoll passthru has started/stopped
- add user-agent to request meta-data for future logging to database
- fix for restoreWorkMap. Now that workmaps are not cycled per block the blocknumber is no longer relevent when loading the map from disk. Only work age matters and old works will be cleaned out on the next Cleaner cycle so we simply accept everything from the file into the workmap.
- when retrieving work from cache add validate check to ensure work is not expired and continue polling until one is found that is valid or cache is empty.
- add retry for worker db fetch. If connection fails close and reopen connection then retry query before throwing an exception.
- add litecoin support
- extend fireBlockChange to run in lite mode when block hasn't changed but you want to trigger a cache flush and longpoll (e.g. if a high value transaction come in.)
- rebuild default share logger to use column mappings so all columns can now be optional.
- added components needed to trigger a longpoll when new transaction included in the block increase expected fees by more than a user defined threshold.
- enable workmaker for non-merged mining config
- added forced acknowledgement of default coinbase donations on first startup

[0.3.1]
- security hotfixes for two bugs that allowed duplicate shares to be submitted in specific circumstances.

Topic: [ANNOUNCE] Poolserverj WORKMAKER EDITION RELEASED - 0.4.0rc1 - page 7. (Read 17448 times)