Author

Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 446. (Read 5805537 times)

-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
New version - 2.9.4, 19th November 2012

Bugfixes. This should push this version up to the status of current stable release.


Human readable changelog

Fixed the elusive stratum disconnect windows crash bug.
Fixed mining stratum on EMC.
Fixed mining GBT on bitminter.
Provided preliminary support for balance and loadbalance with stratum and GBT.
Provided support for numeric IPV6 stratum support.
Don't shoot GPU speed up to max when the temperature drops dramatically.
Quieten the pool not responding messages for backup pools.
Flush more work on longpoll that we may have been leaving behind.
Fixes to build on windows.
Support for fractional diff values with stratum.


Full changelog

- Provide rudimentary support for the balancing failover strategies with stratum
and GBT by switching pools silently on getwork requests.
- Convert remaining modminer and bfl uses of usleep to nmsleep.
- Convert libztex to nmsleep where possible.
- Convert unreliable usleep calls to nmsleep calls in ztex driver.
- Support workid for block submission on GBT pools that use it.
- Provide rudimentary support for literal ipv6 addresses when parsing stratum
URLs.
- Work around libcurl cflags not working on hacked up mingw installations on
windows.
- Only increase gpu engine speed by a larger step if the temperature is below
hysteresis instead of increasing it to max speed.
- Convert pool not responding and pool alive message on backup pools to verbose
level only since they mean a single failed getwork.
- Update work block on the longpoll work item before calling restart threads to
ensure all work but the longpoll work item gets discarded when we call
discard_stale from restart_threads.
- Do not attempt to remove the stratum share hash after unsuccessful submission
since it may already be removed by clear_stratum_shares.
- Check against a double for current pool diff.
- Support for fractional diffs and the classic just-below-1 share all FFs diff
target.
legendary
Activity: 1540
Merit: 1001
Alrighty, I found some time to look at the code and think I've found something hopefully. I'll try to spin something up soon for you try.
Try one of these builds again:
http://ck.kolivas.org/apps/cgminer/temp/

Interestingly if this is the actual bug, it should be possible to hit it on linux as well, but it seems to be much harder to hit because linux can easily tell when the socket has dropped out.

Confirmed, does not crash.  It also recovers properly once I plug the NIC back in.

I still see the same oddity with it saying the pool that just died is now active, even though the NIC is still disconnected:

Code:
[2012-11-18 06:38:18] Started cgminer 2.9.3
[2012-11-18 06:38:18] Loaded configuration file cgminer.conf
[2012-11-18 06:38:18] Probing for an alive pool
[2012-11-18 06:38:25] Accepted cc924ccb Diff 1/1 GPU 0 pool 0
[2012-11-18 06:38:28] Pool 2 http://us.ozco.in:8332 alive
[2012-11-18 06:38:29] Pool 3 http://eu2.ozco.in:8332 alive
[2012-11-18 06:38:30] Pool 4 http://eu.ozco.in:8332 alive
[2012-11-18 06:38:31] Pool 5 http://us1.eclipsemc.com:8337 alive
[2012-11-18 06:38:31] Pool 6 http://us2.eclipsemc.com:8337 alive
[2012-11-18 06:38:32] Pool 7 http://us3.eclipsemc.com:8337 alive
[2012-11-18 06:38:32] Pool 8 http://pool.50btc.com:8332 alive
[2012-11-18 06:38:33] Pool 9 http://localhost:8332 alive
[2012-11-18 06:38:33] Block change for http://localhost:8332 detection via http
//stratum.ozco.in:3333 stratum
[2012-11-18 06:38:35] Accepted 7c19995b Diff 2/1 GPU 0 pool 0
[2012-11-18 06:38:42] Accepted 87a4d802 Diff 1/1 GPU 0 pool 0
[2012-11-18 06:39:08] Pool 0 stratum share submission failure    <------ NIC disconnected here
[2012-11-18 06:39:08] Lost 4 shares due to stratum disconnect on pool 0
[2012-11-18 06:39:29] Pool 0 http://stratum.ozco.in:3333 not responding!
[2012-11-18 06:39:29] Switching to http://us.ozco.in:8332
[2012-11-18 06:39:29] Long-polling activated for http://us.ozco.in:8332/LP
[2012-11-18 06:39:34] Pool 0 http://stratum.ozco.in:3333 alive
[2012-11-18 06:39:34] Switching to http://stratum.ozco.in:3333 <-------- still disconnected

I think you might have it nailed! Smiley

M
legendary
Activity: 952
Merit: 1000
This version doesn't crash. At least not within two minutes after disconnect.
Thanks  Smiley
I unplugged my adapter, and it kept going. No crash. Cheesy
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
This version doesn't crash. At least not within two minutes after disconnect.
Thanks  Smiley
hero member
Activity: 675
Merit: 514
This version doesn't crash. At least not within two minutes after disconnect.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Alrighty, I found some time to look at the code and think I've found something hopefully. I'll try to spin something up soon for you try.
Try one of these builds again:
http://ck.kolivas.org/apps/cgminer/temp/

Interestingly if this is the actual bug, it should be possible to hit it on linux as well, but it seems to be much harder to hit because linux can easily tell when the socket has dropped out.
legendary
Activity: 1540
Merit: 1001
Well it looks like in every case where there's enough details, is seems that a null (or corrupt) pointer is being used.
(Well one clearly said it was a null pointer, the others could be that or a corrupt pointer)
The "--verbose fix" is most likely a memory rearrangement that hides the problem.

I tried my memory.h on windows yesterday and made the changes necessary to work on windows also. Smiley
Only problem of course is that a solitary null/corrupt pointer wont be picked up by my memory check code.

I guess I'll also have a go at trying to find it ... soon ... if I ever finish this MMQ USB code Tongue
... trying to get it to work on windows at the moment Sad

Think it's related to it trying to use the pool after it's clearly dead? 

Code:
[2012-11-17 06:39:07] Pool 0 share submission failure  <----- disconnected NIC
[2012-11-17 06:39:28] Pool 0 http://stratum.ozco.in:3333 not responding!
[2012-11-17 06:39:28] Switching to http://us.ozco.in:8332 
[2012-11-17 06:39:28] Long-polling activated for http://us.ozco.in:8332/LP
[2012-11-17 06:39:28] Pool 0 http://stratum.ozco.in:3333 alive   <---------- ????
[2012-11-17 06:39:28] Switching to http://stratum.ozco.in:3333

M
legendary
Activity: 2576
Merit: 1186
I wonder if this crash affects BFGMiner...? (If not, then it's a simple matter of finding out which bugfix Con hasn't merged)
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
Well it looks like in every case where there's enough details, is seems that a null (or corrupt) pointer is being used.
(Well one clearly said it was a null pointer, the others could be that or a corrupt pointer)
The "--verbose fix" is most likely a memory rearrangement that hides the problem.

I tried my memory.h on windows yesterday and made the changes necessary to work on windows also. Smiley
Only problem of course is that a solitary null/corrupt pointer wont be picked up by my memory check code.

I guess I'll also have a go at trying to find it ... soon ... if I ever finish this MMQ USB code Tongue
... trying to get it to work on windows at the moment Sad
legendary
Activity: 1540
Merit: 1001
Here's something interesting!  Using 2.9.3 with --verbose, it doesn't crash!!!  I still observe the same oddity as in 2.8.7 where it says pool 0 is no responding, then it says it's alive again.  Since the NIC isn't plugged in, it can't do anything, but it does not crash.  When I plug the NIC back in, it successfully recovers.

Using -T doesn't change anything, it still crashes.

Hopefully this helps!!

--verbose works! I unplugged my wireless adapter, and it disconnected. Waited about 60 seconds, plugged it back in, and it reconnected. No crashes. At least we have a temporary fix!
That is most interesting guys, and at least gives me a lead! I appreciate your testing and input. This is what free software and open development is about.
It's a long shot but try the latest builds here (without verbose):
http://ck.kolivas.org/apps/cgminer/temp/

Nope. Sad

Code:
Dump Summary
------------
Dump File: cgminer_121117_183817.dmp : C:\downloads\Programs\cgminer_121117_183817.dmp
Last Write Time: 11/17/2012 6:38:17 PM
Process Name: cgminer.exe : C:\mining\cgminer\cgminer.exe
Process Architecture: x86
Exception Code: 0xC0000005
Exception Information: The thread tried to read from or write to a virtual address for which it does not have the appropriate access.

M
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Here's something interesting!  Using 2.9.3 with --verbose, it doesn't crash!!!  I still observe the same oddity as in 2.8.7 where it says pool 0 is no responding, then it says it's alive again.  Since the NIC isn't plugged in, it can't do anything, but it does not crash.  When I plug the NIC back in, it successfully recovers.

Using -T doesn't change anything, it still crashes.

Hopefully this helps!!

--verbose works! I unplugged my wireless adapter, and it disconnected. Waited about 60 seconds, plugged it back in, and it reconnected. No crashes. At least we have a temporary fix!
That is most interesting guys, and at least gives me a lead! I appreciate your testing and input. This is what free software and open development is about.
It's a long shot but try the latest builds here (without verbose):
http://ck.kolivas.org/apps/cgminer/temp/
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Here's something interesting!  Using 2.9.3 with --verbose, it doesn't crash!!!  I still observe the same oddity as in 2.8.7 where it says pool 0 is no responding, then it says it's alive again.  Since the NIC isn't plugged in, it can't do anything, but it does not crash.  When I plug the NIC back in, it successfully recovers.

Using -T doesn't change anything, it still crashes.

Hopefully this helps!!

--verbose works! I unplugged my wireless adapter, and it disconnected. Waited about 60 seconds, plugged it back in, and it reconnected. No crashes. At least we have a temporary fix!
That is most interesting guys, and at least gives me a lead! I appreciate your testing and input. This is what free software and open development is about.
legendary
Activity: 952
Merit: 1000
Here's something interesting!  Using 2.9.3 with --verbose, it doesn't crash!!!  I still observe the same oddity as in 2.8.7 where it says pool 0 is no responding, then it says it's alive again.  Since the NIC isn't plugged in, it can't do anything, but it does not crash.  When I plug the NIC back in, it successfully recovers.

Using -T doesn't change anything, it still crashes.

Hopefully this helps!!

M

--verbose works! I unplugged my wireless adapter, and it disconnected. Waited about 60 seconds, plugged it back in, and it reconnected. No crashes. At least we have a temporary fix!
legendary
Activity: 1973
Merit: 1007
Thanks for your dedication to this project. I'd like to request a new load balancing option. 1 device per pool. I'm concerned about pool stability once ASICs start rolling out.

From my experience with cgminer, load-balance and balance do not really distribute work evenly across pools. I'd like to define a list of 10 pools, and each device mines solely on one of the pools in that list. If one of the pools fails, the device attached will move on to the next pool in the list.

Thanks!

This can already be done just by running multiple instances of CGminer.  While it's not the most elegant solution, it is far easier than asking Con to add or rewrite code.
Yes that's already my backup plan, but I'd much rather monitor my miners from one instance vs n instances.
sr. member
Activity: 378
Merit: 250
Why is it so damn hot in here?
Thanks for your dedication to this project. I'd like to request a new load balancing option. 1 device per pool. I'm concerned about pool stability once ASICs start rolling out.

From my experience with cgminer, load-balance and balance do not really distribute work evenly across pools. I'd like to define a list of 10 pools, and each device mines solely on one of the pools in that list. If one of the pools fails, the device attached will move on to the next pool in the list.

Thanks!

This can already be done just by running multiple instances of CGminer.  While it's not the most elegant solution, it is far easier than asking Con to add or rewrite code.
legendary
Activity: 1973
Merit: 1007
Thanks for your dedication to this project. I'd like to request a new load balancing option. 1 device per pool. I'm concerned about pool stability once ASICs start rolling out.

From my experience with cgminer, load-balance and balance do not really distribute work evenly across pools. I'd like to define a list of 10 pools, and each device mines solely on one of the pools in that list. If one of the pools fails, the device attached will move on to the next pool in the list.

Thanks!
legendary
Activity: 1540
Merit: 1001
Well, anyone got any bright ideas about windows and stratum? Disabling threading didn't help, debug versions spit out no debug info, and it's not like I can just tell people to fuck windows off and start mining only on linux. Even after ASICs arrive, someone somewhere will still want to mine with windows.

Windows people, any ideas? Was there a version with stratum that was stable for you across multiple disconnections?

I haven't given up on this yet.  Using procdump, I was able to get this:

Code:
Dump Summary
------------
Dump File: cgminer_121117_062709.dmp : C:\downloads\Programs\cgminer_121117_062709.dmp
Last Write Time: 11/17/2012 6:27:09 AM
Process Name: cgminer.exe : C:\mining\cgminer\cgminer.exe
Process Architecture: x86
Exception Code: 0xC0000005
Exception Information: The thread tried to read from or write to a virtual address for which it does not have the appropriate access.
Heap Information: Not Present

I thought 2.8.7 behaved the same, but it doesn't.  With 2.9.3, I can fire up cgminer pointing to oz, unplug the NIC, and in about 30 seconds in crashes.  On 2.8.7, it recognizes it fails and tries to switch, but fails on the switch, as it suddenly says the same pool is alive (can't be, NIC is still unplugged), but it does recover after I plug the NIC back in:

Code:
cgminer version 2.8.7 - Started: [2012-11-17 06:38:33]
---------------------------------------------------------------------------
(5s):7.000 (avg):173.1Mh/s | Q:14  A:2  R:0  HW:0  E:14%  U:0.7/m
TQ: 2  ST: 1  SS: 0  DW: 10  NB: 2  LW: 12  GF: 1  RF: 1  WU: 2.5
Connected to stratum.ozco.in with stratum as user xxxxxxxx
Block: 04332ddfab79c06078e5c763...  Started: [06:41:13]  Best share: 50
---------------------------------------------------------------------------
[P]ool management [G]PU management [S]ettings [D]isplay options [Q]uit
GPU 0:  41.0C 1126RPM | 668.0M/173.1Mh/s | A:2 R:0 HW:0 U:  0.71/m I: 9
---------------------------------------------------------------------------

[2012-11-17 06:38:30] Started cgminer 2.8.7
[2012-11-17 06:38:30] Loaded configuration file cgminer.conf
[2012-11-17 06:38:31] Probing for an alive pool
[2012-11-17 06:38:34] Accepted 05179ea0 Diff 50/1 GPU 0 pool 0
[2012-11-17 06:38:34] Stratum from pool 0 requested work restart
[2012-11-17 06:38:34] Accepted 98e184f3 Diff 1/1 GPU 0 pool 0
[2012-11-17 06:38:36] Pool 2 http://us.ozco.in:8332 alive
[2012-11-17 06:38:38] API running in local read access mode on port 4028
[2012-11-17 06:38:41] Pool 5 http://us1.eclipsemc.com:8337 alive
[2012-11-17 06:38:42] Pool 6 http://us2.eclipsemc.com:8337 alive
[2012-11-17 06:38:42] Pool 7 http://us3.eclipsemc.com:8337 alive
[2012-11-17 06:38:42] Pool 8 http://pool.50btc.com:8332 alive
[2012-11-17 06:38:43] Pool 9 http://localhost:8332 alive
[2012-11-17 06:39:07] Pool 0 share submission failure  <----- disconnected NIC
[2012-11-17 06:39:28] Pool 0 http://stratum.ozco.in:3333 not responding!
[2012-11-17 06:39:28] Switching to http://us.ozco.in:8332  
[2012-11-17 06:39:28] Long-polling activated for http://us.ozco.in:8332/LP
[2012-11-17 06:39:28] Pool 0 http://stratum.ozco.in:3333 alive   <---------- ????
[2012-11-17 06:39:28] Switching to http://stratum.ozco.in:3333

Here's something interesting!  Using 2.9.3 with --verbose, it doesn't crash!!!  I still observe the same oddity as in 2.8.7 where it says pool 0 is no responding, then it says it's alive again.  Since the NIC isn't plugged in, it can't do anything, but it does not crash.  When I plug the NIC back in, it successfully recovers.

Using -T doesn't change anything, it still crashes.

Hopefully this helps!!

M

EDIT: It seems the problem isn't just with stratum.  I have two miners pointing to p2pool (which is LP), and both locked up (but didn't crash) when p2pool wasn't providing work fast enough.  All on 2.9.3.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Well, anyone got any bright ideas about windows and stratum? Disabling threading didn't help, debug versions spit out no debug info, and it's not like I can just tell people to fuck windows off and start mining only on linux. Even after ASICs arrive, someone somewhere will still want to mine with windows.

Windows people, any ideas? Was there a version with stratum that was stable for you across multiple disconnections?
legendary
Activity: 1795
Merit: 1208
This is not OK.
Incidently, I'm now running a debug version (stock 2.9.3) on my router. Hopefully the core dump will tell us what's going on.

Of course, now I'm running the debug version, it's running perfectly.

 Roll Eyes
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4

Yasm is not needed anymore as CPU mining is no longer supported and the CPU code will be ripped out of Cgminer soon.

Referring to that windows-build.txt file you were talking about, did you modify the package config file for curl as directed?

Edit: It could be what C. Kolivas said, but I would check the package config file first.

Yes, About 5 sucessful builds ago. Then upon trying to build a 2.9.3 from git so that I could try stratum on Eclipse(Still didn't work for me) I had that error. Altering the file mentioned in the link did in fact let me build it. As I had built with this setup and no changes 2.7.5. It is running but eclipse US1, 2 and 3 are all marked dead on stratum.
Well - having written about 1200 lines of new code (still not complete) for direct USB support on the MMQ, latest git+my changes compiled fine first time on mingw for me (after adding libusb) on WinXP32 - no issues.
Odd.

(Edit: P.S. I used:
CFLAGS="-g -W -Wall" ./autogen.sh --enable-modminer --enable-ztex
make clean
make

Like I do in linux)

Jump to: