[1500 TH] p2pool: Decentralized, DoS-resistant, Hop-Proof pool - page 83.

KorbinDallas

newbie

Activity: 55

Merit: 0

As my profile tag indicates, I'm clearly a newbie. So just wanted to start by saying thanks for the help, and they weren't kidding when they said "the best part about Bitcoin is the community"

So the hardfork, here's the newbie question: How can I tell which fork is the right one?

A little explanation:
- P2Pool.in: Downloaded Windows binary version 16.0, installs and runs fine. P2Pool Pool Rate: approx. 140 TH/s
- Github Master Branch: Downloaded and installed master branch code from Github P2Pool. Includes 16.0 update (I assume) P2Pool Pool Rate: approx. 900 TH/s.

- Why such the big variance between these two different versions of P2Pool? Which side is best?

squidicuz

newbie

Activity: 58

Merit: 0

I submitted a PR to fix the above issues: https://github.com/p2pool/p2pool/pull/313/files

The fix is working for me and my node is no longer experiencing connecting issues with peers.

This happens every time we hardfork. Perhaps we should implement a better method to do the upgrade?

KorbinDallas

newbie

Activity: 55

Merit: 0

Quick Question to my P2Pool Jedi's:

Just upgraded to V.16, experienced many of the cut over problems disabused already....
But what is with the Pool Rate? Check out the screenshot, only 42TH/s? I utilized the Windows version provided by Forest.

Any ideas?

http://imgur.com/dPgieNm

mememiner

newbie

Activity: 1

Merit: 0

Hello, just my BTC0.02.

Instead of diving straight into the code, I first leaned back and started thinking at bit.
Since forrest said this is a hard fork, than quite possibly the switchover mechanism is ... well ... failing.

Code:

2016-06-29 06:32:58.539260 Switchover imminent. Upgraded: 95.437% Threshold: 95.000%
2016-06-29 06:32:58.743562 Switchover imminent. Upgraded: 95.437% Threshold: 95.000%
2016-06-29 06:34:06.060484 Switchover imminent. Upgraded: 95.418% Threshold: 95.000%
2016-06-29 06:34:13.480511 Switchover imminent. Upgraded: 95.418% Threshold: 95.000%
2016-06-29 06:40:43.738976 Switchover imminent. Upgraded: 95.351% Threshold: 95.000%
2016-06-29 06:40:43.830259 Switchover imminent. Upgraded: 95.351% Threshold: 95.000%
2016-06-29 06:42:48.230374 Switchover imminent. Upgraded: 95.416% Threshold: 95.000%
2016-06-29 06:42:48.327881 Switchover imminent. Upgraded: 95.416% Threshold: 95.000%
2016-06-29 06:43:18.880773 Switchover imminent. Upgraded: 95.333% Threshold: 95.000%
2016-06-29 06:43:18.988181 Switchover imminent. Upgraded: 95.333% Threshold: 95.000%
(An upgrade that counts down?)

I could still see v15 clients connecting and sending me v15 shares, somehow these get dealt with in a certain -most inefficient- way.
What seemed most strange was that suddenly there were 22000+ shares instead of the normal 17300 or so.

so I went and edited ./p2pool/networks/bitcoin.py and changed 1500 to 1600 on the line were it says:

Code:

MINIMUM_PROTOCOL_VERSION = 1500

(Don't want no nasty v15 shares to deal with.)

After removing bitcoin.pyc and restarting, p2pool is chugging along nicely again.

(I also removed my local sharechain and let it rebuild, but I don't know if this was necessary)

Good luck!

aib

member

Activity: 135

Merit: 11

Advance Integrated Blockchains (AIB)

Quote from: jtoomim on June 28, 2016, 06:49:42 PM

Looks like most of the stalling was in p2pool/data.py:439(generate_transaction):

Code:

         1158218988 function calls (1142867929 primitive calls) in 1003.809 seconds

   Ordered by: internal time
   List reduced from 3782 to 100 due to restriction <100>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
4082057/4082056  679.103    0.000  679.103    0.000 {method 'poll' of 'select.epoll' objects}
     6046   23.784    0.004   53.010    0.009 p2pool/data.py:439(generate_transaction)
 15805300   22.434    0.000   27.909    0.000 p2pool/util/pack.py:221(write)
 13644452   21.269    0.000   21.269    0.000 {_hashlib.openssl_sha256}
 13644451   14.189    0.000   14.189    0.000 {method 'digest' of 'HASH' objects}
   844076   13.670    0.000   19.192    0.000 p2pool/util/math.py:64(add_dicts)
  9104192   12.518    0.000   15.279    0.000 p2pool/util/pack.py:215(read)

Of about 15 minutes total operation time, nearly 1 minute was inside generate_transaction or functions called inside generate_transaction. I believe that most of that time was in verifying one batch of 211 shares. This was running via pypy. When run via stock CPython, I think it takes much longer.

I also got this in my data/bitcoin/log file:

Code:

2016-06-28 10:25:46.131180 Processing 91 shares from 10.0.1.3:36464...
2016-06-28 10:29:38.855212 ... done processing 91 shares. New: 25 Have: 22532/~17280
2016-06-28 10:29:38.855714 Requesting parent share 01582c80 from 10.0.1.3:47276
2016-06-28 10:29:38.856757 > Watchdog timer went off at:

... boring stuff deleted ...

2016-06-28 10:29:38.858448 >   File "/home/p2pool/p2pool/p2pool/data.py", line 646, in check
2016-06-28 10:29:38.858476 >     share_info, gentx, other_tx_hashes2, get_share = self.generate_transaction(tracker, self.share_info['share_data'], self.header['bits'].target, self.share_info['timestamp'], self.share_info['bits'].target, self.contents['ref_merkle_link'], [(h, None) for h in other_tx_hashes], self.net, last_txout_nonce=self.contents['last_txout_nonce'])
2016-06-28 10:29:38.858513 >   File "/home/p2pool/p2pool/p2pool/data.py", line 491, in generate_transaction
2016-06-28 10:29:38.858541 >     65535*net.SPREAD*bitcoin_data.target_to_average_attempts(block_target),
2016-06-28 10:29:38.858568 >   File "/home/p2pool/p2pool/p2pool/util/memoize.py", line 28, in b
2016-06-28 10:29:38.858594 >     res = f(*args)
2016-06-28 10:29:38.858621 >   File "/home/p2pool/p2pool/p2pool/util/skiplist.py", line 44, in __call__
2016-06-28 10:29:38.858648 >     return self.finalize(sol_if, args)
2016-06-28 10:29:38.858674 >   File "/home/p2pool/p2pool/p2pool/data.py", line 739, in finalize
2016-06-28 10:29:38.858701 >     return math.add_dicts(*math.flatten_linked_list(weights_list)), total_weight, total_donation_weight
2016-06-28 10:29:38.858729 >   File "/home/p2pool/p2pool/p2pool/util/math.py", line 67, in add_dicts
2016-06-28 10:29:38.858760 >     for k, v in d.iteritems():
2016-06-28 10:29:38.858787 >   File "/home/p2pool/p2pool/p2pool/main.py", line 313, in 
2016-06-28 10:29:38.858814 >     sys.stderr.write, 'Watchdog timer went off at:\n' + ''.join(traceback.format_stack())
2016-06-28 10:29:38.883268 > ########################################
2016-06-28 10:29:38.883356 > >>> Warning: LOST CONTACT WITH BITCOIND for 3.9 minutes! Check that it isn't frozen or dead!
2016-06-28 10:29:38.883392 > ########################################
2016-06-28 10:29:38.883427 P2Pool: 17323 shares in chain (22532 verified/22532 total) Peers: 3 (2 incoming)
2016-06-28 10:29:38.883452  Local: 20604GH/s in last 10.0 minutes Local dead on arrival: ~1.0% (0-3%) Expected time to share: 35.6 minutes

That indicates that p2pool was working on subtasks inside generate_transaction at the moment that the watchdog timer went off. The watchdog is there to notice when something is taking too long and to spit out information on where it was stalled. Helpful in this case.

Specifically, it looks like the add_dicts function might be inefficient. The for k, v in d.iteritems() line sounds like it might be an O(n^2) issue. I'll take a look at the context and see what I can find.

In the mean time, if your nodes are stalling, try running p2pool using pypy. It seems to help nodes get caught up.

Instructions for setting up pypy to run p2pool can be found here. Note that pypy uses a lot more memory, and performance with pypy seems to degrade after a few days, so it's probably a good idea to only use pypy as a temporary measure.

Do you think if using Golang to implement the P2Pool will improve the efficiency?

jtoomim

hero member

Activity: 818

Merit: 1006

Looks like most of the stalling was in p2pool/data.py:439(generate_transaction):

Code:

         1158218988 function calls (1142867929 primitive calls) in 1003.809 seconds

   Ordered by: internal time
   List reduced from 3782 to 100 due to restriction <100>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
4082057/4082056  679.103    0.000  679.103    0.000 {method 'poll' of 'select.epoll' objects}
     6046   23.784    0.004   53.010    0.009 p2pool/data.py:439(generate_transaction)
 15805300   22.434    0.000   27.909    0.000 p2pool/util/pack.py:221(write)
 13644452   21.269    0.000   21.269    0.000 {_hashlib.openssl_sha256}
 13644451   14.189    0.000   14.189    0.000 {method 'digest' of 'HASH' objects}
   844076   13.670    0.000   19.192    0.000 p2pool/util/math.py:64(add_dicts)
  9104192   12.518    0.000   15.279    0.000 p2pool/util/pack.py:215(read)

Of about 15 minutes total operation time, nearly 1 minute was inside generate_transaction or functions called inside generate_transaction. I believe that most of that time was in verifying one batch of 211 shares. This was running via pypy. When run via stock CPython, I think it takes much longer.

I also got this in my data/bitcoin/log file:

Code:

2016-06-28 10:25:46.131180 Processing 91 shares from 10.0.1.3:36464...
2016-06-28 10:29:38.855212 ... done processing 91 shares. New: 25 Have: 22532/~17280
2016-06-28 10:29:38.855714 Requesting parent share 01582c80 from 10.0.1.3:47276
2016-06-28 10:29:38.856757 > Watchdog timer went off at:

... boring stuff deleted ...

2016-06-28 10:29:38.858448 >   File "/home/p2pool/p2pool/p2pool/data.py", line 646, in check
2016-06-28 10:29:38.858476 >     share_info, gentx, other_tx_hashes2, get_share = self.generate_transaction(tracker, self.share_info['share_data'], self.header['bits'].target, self.share_info['timestamp'], self.share_info['bits'].target, self.contents['ref_merkle_link'], [(h, None) for h in other_tx_hashes], self.net, last_txout_nonce=self.contents['last_txout_nonce'])
2016-06-28 10:29:38.858513 >   File "/home/p2pool/p2pool/p2pool/data.py", line 491, in generate_transaction
2016-06-28 10:29:38.858541 >     65535*net.SPREAD*bitcoin_data.target_to_average_attempts(block_target),
2016-06-28 10:29:38.858568 >   File "/home/p2pool/p2pool/p2pool/util/memoize.py", line 28, in b
2016-06-28 10:29:38.858594 >     res = f(*args)
2016-06-28 10:29:38.858621 >   File "/home/p2pool/p2pool/p2pool/util/skiplist.py", line 44, in __call__
2016-06-28 10:29:38.858648 >     return self.finalize(sol_if, args)
2016-06-28 10:29:38.858674 >   File "/home/p2pool/p2pool/p2pool/data.py", line 739, in finalize
2016-06-28 10:29:38.858701 >     return math.add_dicts(*math.flatten_linked_list(weights_list)), total_weight, total_donation_weight
2016-06-28 10:29:38.858729 >   File "/home/p2pool/p2pool/p2pool/util/math.py", line 67, in add_dicts
2016-06-28 10:29:38.858760 >     for k, v in d.iteritems():
2016-06-28 10:29:38.858787 >   File "/home/p2pool/p2pool/p2pool/main.py", line 313, in 
2016-06-28 10:29:38.858814 >     sys.stderr.write, 'Watchdog timer went off at:\n' + ''.join(traceback.format_stack())
2016-06-28 10:29:38.883268 > ########################################
2016-06-28 10:29:38.883356 > >>> Warning: LOST CONTACT WITH BITCOIND for 3.9 minutes! Check that it isn't frozen or dead!
2016-06-28 10:29:38.883392 > ########################################
2016-06-28 10:29:38.883427 P2Pool: 17323 shares in chain (22532 verified/22532 total) Peers: 3 (2 incoming)
2016-06-28 10:29:38.883452  Local: 20604GH/s in last 10.0 minutes Local dead on arrival: ~1.0% (0-3%) Expected time to share: 35.6 minutes

That indicates that p2pool was working on subtasks inside generate_transaction at the moment that the watchdog timer went off. The watchdog is there to notice when something is taking too long and to spit out information on where it was stalled. Helpful in this case.

Specifically, it looks like the add_dicts function might be inefficient. The for k, v in d.iteritems() line sounds like it might be an O(n^2) issue. I'll take a look at the context and see what I can find.

In the mean time, if your nodes are stalling, try running p2pool using pypy. It seems to help nodes get caught up.

Instructions for setting up pypy to run p2pool can be found here. Note that pypy uses a lot more memory, and performance with pypy seems to degrade after a few days, so it's probably a good idea to only use pypy as a temporary measure.

jtoomim

hero member

Activity: 818

Merit: 1006

I think what we're dealing with is not a single >= 1,000,000 byte share, but instead a single sharereq that lists multiple shares which total over 1,000,000 bytes in size. When you request shares, you request them in batches. Here's some of the code for that:

Code:

                print 'Requesting parent share %s from %s' % (p2pool_data.format_hash(share_hash), '%s:%i' % peer.addr)
                try:
                    shares = yield peer.get_shares(
                        hashes=[share_hash],
                        parents=random.randrange(500), # randomize parents so that we eventually get past a too large block of shares
                        stops=list(set(self.node.tracker.heads) | set(
                            self.node.tracker.get_nth_parent_hash(head, min(max(0, self.node.tracker.get_height_and_last(head)[0] - 1), 10)) for head in self.node.tracker.heads
                        ))[:100],
                    )

Note the "# randomize parents so that we eventually get past a too large block of shares" comment there. That looks to me like one heck of a hack. It seems that p2pool does not do anything intelligent to make sure that a bundle of shares does not exceed the limit, or to fail cleanly when it does. If I understand it correctly, this is resulting in repeated requests for too large a bundle of shares (which fail), followed eventually by a request for a large bundle that does not exceed the limit. This bundle then takes a while to process, causing the node to hang for a while and eventually lose connections to its peers. Maybe.

Or maybe there's another reason why the shares are taking so long. I'm trying pypy right now to see if that reduces the share processing time. Doesn't seem to help enough. Next step is to run cProfile and see what's taking so long.

jtoomim

hero member

Activity: 818

Merit: 1006

Quote from: KyrosKrane on June 28, 2016, 05:20:59 PM

That would make sense, except for one thing - p2pool shares only have a limited lifetime (less than two days?), and I've been seeing this message for much longer than that. Shouldn't the dud share eventually fall off the sharechain anyway? If that's correct, it means there's some scenario where oversized blocks can be created frequently.

I don't think it's oversize blocks. I think it's just blocks that are closer to the 1000000 byte limit than p2pool was designed to handle. IIRC, Bitcoin Classic did away with the "sanity limit" of 990,000 bytes (or something like that) and will create blocks up to 1,000,000 bytes exactly. So this might just be a bug from not allowing a few extra bytes for p2pool metadata with a 1,000,000 byte block.

I know that I recently changed my bitcoind settings to use a lower minrelaytxfee setting and a higher max

Or it could be something else.

jtoomim

hero member

Activity: 818

Merit: 1006

Two of my three nodes are having performance issues. As far as I can tell, the third one (:9336) is serving up the shares that are tripping up the other two. I'm going to shut off my working node to see if that helps the two non-working nodes work. If it helps, then I'll leave :9336 offline long enough for the other two nodes and the rest of the p2pool network overtake it in the share chain and hopefully purge out the naughty shares.

Edit: nope, didn't help.

KyrosKrane

sr. member

Activity: 295

Merit: 250

Quote from: jtoomim on June 28, 2016, 05:02:48 PM

My guess is that what's happening is that someone mined a share that is very close to the 1 MB limit, like 999980 bytes or something like that, and there's enough p2pool metadata overhead to push the share message over 1000000 bytes, which is the per-message limit defined in p2pool/bitcoin/p2p:17. This is triggering a TooLong exception which prevents that share (and any subsequent shares) from being processed and added to the share chain. Later, the node notices that it's missing the share (which it hasn't blacklisted or marked as invalid), and tries to download it again. Rinse, repeat.

That would make sense, except for one thing - p2pool shares only have a limited lifetime (less than two days?), and I've been seeing this message for much longer than that. Shouldn't the dud share eventually fall off the sharechain anyway? If that's correct, it means there's some scenario where oversized blocks can be created frequently.

KyrosKrane

sr. member

Activity: 295

Merit: 250

Quote from: sawa on June 28, 2016, 04:21:45 PM

At the time when the giant amount of transactions go through BTC network, many nodes lose their connection with the daemon.

I understand and agree with the issue you've opened, but I'm curious about this statement. Did you mean "through the P2Pool network" rather than "BTC network"? The number of shares in the P2Pool sharechain has nothing to do with how many transactions are being processed by the bitcoind daemon at any given moment.

jtoomim

hero member

Activity: 818

Merit: 1006

Note: accepting messages over 1000000 bytes comprises a network hard fork. Don't do this at home without approval from forrestv. I'm just doing this for testing, and afterwards I will purge my share history on that node.

jtoomim

hero member

Activity: 818

Merit: 1006

My guess is that what's happening is that someone mined a share that is very close to the 1 MB limit, like 999980 bytes or something like that, and there's enough p2pool metadata overhead to push the share message over 1000000 bytes, which is the per-message limit defined in p2pool/bitcoin/p2p:17. This is triggering a TooLong exception which prevents that share (and any subsequent shares) from being processed and added to the share chain. Later, the node notices that it's missing the share (which it hasn't blacklisted or marked as invalid), and tries to download it again. Rinse, repeat.

A quick hack to get past this might be to increase the 1000000 in p2pool/bitcoin/p2p:17 to something bigger. As far as I can tell if we did that, this would mean that people who mine excessively large blocks (e.g. with Bitcoin Unlimited) that would get rejected by the network might have their shares accepted by p2pool, which would end up being a sort of block withholding attack. I don't think this would be too big of a concern, so I'm going to try to raise the limit on one of my nodes and see if it resolves the issue.

jtoomim

hero member

Activity: 818

Merit: 1006

Quote from: KyrosKrane on June 28, 2016, 01:27:52 PM

Is something odd going on with P2pool? Check the hashrate and number of miners for the last few hours on this chart:

I'm seeing this too. Furthermore, I'm seeing stuff like this:

2016-06-28 14:40:39.001988 > in download_shares:
2016-06-28 14:40:39.002117 > Traceback (most recent call last):
2016-06-28 14:40:39.002181 > Failure: p2pool.p2p.ShareReplyError: too long
2016-06-28 14:40:39.002370 Requesting parent share ea0f2aee from 10.0.1.3:38393
2016-06-28 14:40:39.311156 > in download_shares:
2016-06-28 14:40:39.311285 > Traceback (most recent call last):
2016-06-28 14:40:39.311351 > Failure: p2pool.p2p.ShareReplyError: too long
2016-06-28 14:40:39.311523 Requesting parent share ea0f2aee from 10.0.1.3:37971
2016-06-28 14:40:39.494657 > in download_shares:
2016-06-28 14:40:39.494775 > Traceback (most recent call last):
2016-06-28 14:40:39.494827 > Failure: p2pool.p2p.ShareReplyError: too long

sawa

legendary

Activity: 1308

Merit: 1011

At the time when the giant amount of transactions go through BTC network, many nodes lose their connection with the daemon.
When the connection is restored, they actively start to synchronize shares with other nodes.
I noticed that in the process of an exchange of shares with remote peers the node stops taking the job from miners.

Code:

2016-06-29 01:38:16.266381 Processing 109 shares from 46.32.254.29:9333...
2016-06-29 01:39:56.764554 ... done processing 109 shares. New: 19 Have: 22792/~17280

40 seconds of inactivity. It causes a significant drop of hashrate and the graphs of local rate become spiked.

I have reported this issue (please leave your comments there): https://github.com/p2pool/p2pool/issues/311
It seems that such problems on all nodes.

KyrosKrane

sr. member

Activity: 295

Merit: 250

Is something odd going on with P2pool? Check the hashrate and number of miners for the last few hours on this chart:

http://minefast.coincadence.com/p2pool-stats.php

This mirrors what I'm seeing with my own P2pool nodes - one hosted locally with my miner, and one on a VPS. It looks like the miner isn't getting proper responses (?) back from the pool, so it keeps mining without any new work. But it seems it's getting *something* that makes it think the pool is alive, so it's not failing over to the backup pools. In turn, it seems the P2pool node is registering zero hash rate. If I restart the pool and miner, it goes back to normal for a while, then zeros out again.

A quick check of the servers shows that P2pool is hogging 100% CPU time (on one core, of course), and the messages in the text log show it receiving hundreds of new shares.

Currently mining on V16.

Any thoughts?

Edit: An image of the chart at the link above, in case it later turns out to be an unrelated error.

Meuh6879

legendary

Activity: 1512

Merit: 1012

v16 : job done.

squidicuz

newbie

Activity: 58

Merit: 0

Quote from: jtoomim on June 23, 2016, 09:13:41 AM

I have switched my nodes over to p2pool v16. The current hashrate is approximately 97% v16 over the last hour, and about 76% v16 over the last day. This means that anyone still on v15 will soon have their shares ignored by the rest of the network.

Awesome!

Seems there are a few new miners as well? Welcome!

aib

member

Activity: 135

Merit: 11

Advance Integrated Blockchains (AIB)

for a question regards P2Pool for litecoin (sCrypt)

the highest difficulty is 15258.556231856346 when I do address/999999999+999999999

anyone know if that's enough to scale up to a higher hash rate?

and also I found that hash power always get cut off by P2Pool 10-20% compare to F2Pool.
anyone know how to tweak it to make it make stable and high hashrate?

astutiumRob

full member

Activity: 201

Merit: 100

Quote from: forrestv on June 19, 2016, 04:25:02 PM

P2Pool release 16.0 - commit hash: d3bbd6df33ccedfc15cf941e7ebc6baf49567f97
So, please upgrade to 16.0 now and also tell everyone else to.

Updated our small pool @ http://p2pool.ukbmg.com:9332/static/efe/
and advised the 3 other pools on kit we host to do the same

Topic: [1500 TH] p2pool: Decentralized, DoS-resistant, Hop-Proof pool - page 83. (Read 2591994 times)