Hello pool software/stratum protocol/miner software experts,
I've got several miners that seem to take random amounts of time to establish communications to a pool such that shares are accepted. My GekkoScience Compac miner can take anywhere from just a few minutes to multiple hours before I start seeing messages like:
[2016-07-22 19:36:16.626] Accepted 2bbe2621 Diff 1.5K/17 COMPAC
In the case of some pools it seems that the Compaq never does establish communications with the pool such that shares are accepted. Clearly you want to point to a pool that is relatively close by so that there isn't a lot of lag in communications. By close by I mean round trip ping times are low and do not fluctuate too much.
In addition to the Compac I have an Antminer S7-LN that I can not get to mine successfully on any pool I have pointed it at. What I see on the Miner Stats page looks like this:
As you can see, this miner has been working for over an hour, but has not had the DiffA# or DiffR# shares counters pegged. That means the pool is not happy with what it is receiving, or more specifically should be receiving, but probably is not. When miner and pool are communicating well (my Compac with cgminer running on my Mac Mini), it looks like this:
Mac-mini:~ boris$ netstat -alnt | awk 'NR == 2 || /a.b.c.d/'
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp4 0 0 192.168.1.101.51812 a.b.c.d.3333 ESTABLISHED
Mac-mini:~ boris$ netstat -alnt | awk 'NR == 2 || /a.b.c.d/'
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp4 0 0 192.168.1.101.51812 a.b.c.d.3333 ESTABLISHED
Mac-mini:~ boris$
And when the miner and pool are not communicating well (my S7-LN) it looks like this:
root@antMiner:~# netstat -alnt | awk 'NR == 2 || /a.b.c.d/'
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 5217 192.168.1.196:53729 a.b.c.d:3333 FIN_WAIT1
tcp 0 815 192.168.1.196:53730 a.b.c.d:3333 ESTABLISHED
netstat: /proc/net/tcp6: No such file or directory
root@antMiner:~# netstat -alnt | awk 'NR == 2 || /a.b.c.d/'
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 5217 192.168.1.196:53729 a.b.c.d:3333 FIN_WAIT1
tcp 0 2934 192.168.1.196:53730 a.b.c.d:3333 ESTABLISHED
netstat: /proc/net/tcp6: No such file or directory
root@antMiner:~# netstat -alnt | awk 'NR == 2 || /a.b.c.d/'
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 5217 192.168.1.196:53729 a.b.c.d:3333 FIN_WAIT1
tcp 0 3097 192.168.1.196:53730 a.b.c.d:3333 ESTABLISHED
netstat: /proc/net/tcp6: No such file or directory
root@antMiner:~# netstat -alnt | awk 'NR == 2 || /a.b.c.d/'
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 3097 192.168.1.196:53730 a.b.c.d:3333 ESTABLISHED
netstat: /proc/net/tcp6: No such file or directory
root@antMiner:~#
For those reading this thread who are unfamiliar with TCP/IP communications, the state FIN_WAIT1 means that the local side (miner in this case) has told the remote side to close the connection. In this case the reason the miner is in the FIN_WAIT1 state on this particular connection is because it has exhausted all of the TCP retransmissions for the most recent message, and the connection has timed out.
If you look at the output from netstat there is a column called Send-Q. When things are working normally (see the netstat output from my Mac Mini) the value in that column is zero. That means that there is no data in the output queue waiting to be sent to the remote side of the connection. When things are not working normally (see the netstat output from the Antminer) there is a number greater than zero in that column.
I put a wireshark packet sniffer onto the network and I can "see" that the miner is communicating with the pool and the "conversation" seems to be going OK right up until the miner sends a stratum mining.submit message. The mining.submit message is never ACKnowledged by the pool. Eventually that causes the miner to retransmit the message. The miner will continue to retransmit (waiting an exponentially longer time before sending the message again) until it has reached the TCP/IP time-out limit which is 30 seconds.
Can any of the pool software/stratum protocol/miner software experts help me figure out what is happening here Initially I thought it was network bandwidth issues, but I have increased my bandwidth and I'm still seeing the same behavior. I know that the Compac miner eventually "sync's" with the mining pool, and the S7-LN has occasional "moments" when it is submitting DiffA# or DiffR# accepted shares, but seemingly only when I have a second pool destination URL specified, and only intermittently.
Edit: Added the below info...
Here is the image of the miner configured with two pool addresses. The addresses are the same, with the miner somehow successful in submitting shares to the "backup" pool, but not to the "primary" address.
The netstat from the miner looks like:
root@antMiner:~# netstat -alnt | awk 'NR == 2 || /a.b.c.d/'
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 7173 192.168.1.196:58286 a.b.c.d:3333 FIN_WAIT1
tcp 0 3423 192.168.1.196:58295 a.b.c.d:3333 ESTABLISHED
tcp 0 0 192.168.1.196:58247 a.b.c.d:3333 ESTABLISHED
netstat: /proc/net/tcp6: No such file or directory
root@antMiner:~# netstat -alnt | awk 'NR == 2 || /a.b.c.d/'
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 3586 192.168.1.196:58295 a.b.c.d:3333 ESTABLISHED
tcp 0 0 192.168.1.196:58247 a.b.c.d:3333 ESTABLISHED
netstat: /proc/net/tcp6: No such file or directory
root@antMiner:~# netstat -alnt | awk 'NR == 2 || /a.b.c.d/'
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 1630 192.168.1.196:58298 a.b.c.d:3333 ESTABLISHED
tcp 0 7336 192.168.1.196:58296 a.b.c.d:3333 FIN_WAIT1
tcp 0 0 192.168.1.196:58247 a.b.c.d:3333 ESTABLISHED
netstat: /proc/net/tcp6: No such file or directory
root@antMiner:~#
And of course when the miner "fails" back to the primary it isn't able to submit shares.
Thanks,
- zed