I was mining on a local stratum install testing out some stuff when the stratum server ran into this message from cgminer:
2013-01-21 16:30:15-0500 [Protocol,4,192.168.2.100] Unhandled Error
...................
stratum.custom_exceptions.ProtocolException: Cannot decode message '{"params": ["gigavps.000", "8� 0002", "H��0000", "Hh� b334", "4c39cdf0"], "id": 2320, "method": "mining.submit"}'
Not sure what that is, but cgminer submitted it. All of the mining equipment running at the time was BFL singles or mini rigs.
Firstly, I presume in cgminer you also see a share reject for it?
What was the share reject message? Was it also corrupted?
Normal rejects (the very low % ones) in Stratum mining seem to only be like so
i.e. at the same time as the new block message:
[2013-01-04 07:28:17] Stratum from pool 0 detected new block
[2013-01-04 07:28:17] Rejected 13d41e5e Diff 12/8 BFL 0 pool 0 (Not current block.)The message in
(...) is up to the pool/proxy
Any other Stratum rejects will most likely represent a pool/proxy bug or cgminer problem
Disconnects, of course, should be like so (I get them at midnight coz I forcefully disconnect/reconnect my network)
[2013-01-22 00:00:21] Lost 1 shares due to stratum disconnect on pool 0It would also be ideal to actually see the dump of the protocol on the wire.
On linux that can be got by:
tcpdump -n -l -i any -s 2000 -XX port 3333 | tee dump.log or
tcpdump -n -l -i any -s 2000 -XX port 3333 > dump.log if you don't want to watch it
Obviously the port number (
3333) would match the Stratum port number you are using.
The log file should not be overwhelmingly large even for a MiniRig since Stratum doesn't need to transfer ridiculous amounts of data
The start of each packet dump includes a timestamp
Basically this will show if indeed cgminer/Curl is corrupting it or if the proxy is.
I have a memory overlay for cgminer that works on linux and windows - but not with GBT coz my overlay uses about 30x the RAM and thus with GBT uses way too much RAM
It's a single *.h that you only need to include in miner.h with no other changes anywhere else,
that does a lot of memory protection and testing (and also reports unfreed memory via the API)
This will find corruptions that happen before or after any malloc'ed data and report the line of code that allocated the data and if the corruption falls under a common subset, also crash cgminer when the corruption actually happens so the traceback shows the code that caused it
It wont find corruptions inside data - but hopefully, if there is a problem, it happens randomly enough to not just land in the middle of data
It's my lazy way for debugging bad pointers - coz reading random code usually doesn't find such errors without a lot more details or knowing the change that caused the bug (e.g. using a git bisect)
But a git bisect isn't always going to work either since code changes can move bad pointers around and thus hide and reveal them independently of the bug still existing.
Anyway, if you could first do the
tcpdump and find the matching packet to be sure it is cgminer,
then chase me up in #cgminer and we can work out what to do next
The reason for wanting the tcpdump is as mentioned further up - where he was only seeing it with the proxy, not with direct pool access - and that could be the same problem you are seeing (but of course both could still be cgminer and not the proxy's fault - I've no idea at this point)