Author

Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 189. (Read 5805531 times)

legendary
Activity: 1078
Merit: 1001
Are nanofurys supported by 3.6.6?
legendary
Activity: 1526
Merit: 1000
the grandpa of cryptos
the 3.6.6 versio nseems super buggy
member
Activity: 109
Merit: 10
I'm still getting reports from Mac users that 3.6.4 through 3.6.6 won't work with BFL Jalapeños, returning the following line:

Code:
BitForceSC detect (29:6) failed to initialise (incorrect device?)

This only occurs in the new operating system version, Mac OS X 10.9, aka Mavericks, which was a free upgrade for all Mac users.  OS 10.8 and lower work fine.

I wish I could do more to help test, but --usb-dump 0 and --verbose both do not return any additional information.  If there is anything else I can try, please let me know.  Perhaps I'll have them work back through cgminer releases and my Mac binaries (which had different libusb version up until recently when you started including them).
newbie
Activity: 56
Merit: 0
ckolivas, I think you may be on to something here with the "to" build.  It seems to be the best so far, but not stable like the 3.5.1 build.

I had it running for 1 hour 20 mins before a zombie AMU 29 appeared, which in itself is longer than the other tests have managed.  However, I checked the logfile and found that the zombie was preceded by LIBUSB_ERROR_TIMEOUTs from AMU 20 and it was followed by repeated LIBUSB_ERROR_TIMEOUTs from AMU 0.  Once the errors from AMU 0 start, there appears to be a permanent condition such they never stop.
Thanks. What was the initial timeout error? Did it say whether it was a write or a read?
All "write" errors in the whole log - including 10 from AMU 20 at 11:44 (probably a red-herring) then:

 [2013-10-31 11:46:28] AMU 29 SendWork usb write err:(-4) LIBUSB_ERROR_NO_DEVICE
 [2013-10-31 11:46:28] AMU29: Comms error (werr=-4 amt=0)
 [2013-10-31 11:46:28] AMU 29 failure, disabling!
 [2013-10-31 11:46:28] Thread 29 being disabled

-4 really is the system telling us the device has effectively gone. That's quite different to timeouts, and in fact has been in every one of your logs (appreciate you having posted them by the way). I honestly don't know what this means but it's a very different failure to some kind of runaway process that never times out.
Yes it does seem strange.  There are timeouts from a device, then some *other* device goes AWOL without any notice whatsoever.  Anyway, I've now posted the entire logfile if you want to have a look at it.  See Edit* in previous post.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
ckolivas, I think you may be on to something here with the "to" build.  It seems to be the best so far, but not stable like the 3.5.1 build.

I had it running for 1 hour 20 mins before a zombie AMU 29 appeared, which in itself is longer than the other tests have managed.  However, I checked the logfile and found that the zombie was preceded by LIBUSB_ERROR_TIMEOUTs from AMU 20 and it was followed by repeated LIBUSB_ERROR_TIMEOUTs from AMU 0.  Once the errors from AMU 0 start, there appears to be a permanent condition such they never stop.
Thanks. What was the initial timeout error? Did it say whether it was a write or a read?
All "write" errors in the whole log - including 10 from AMU 20 at 11:44 (probably a red-herring) then:

 [2013-10-31 11:46:28] AMU 29 SendWork usb write err:(-4) LIBUSB_ERROR_NO_DEVICE
 [2013-10-31 11:46:28] AMU29: Comms error (werr=-4 amt=0)
 [2013-10-31 11:46:28] AMU 29 failure, disabling!
 [2013-10-31 11:46:28] Thread 29 being disabled

-4 really is the system telling us the device has effectively gone. That's quite different to timeouts, and in fact has been in every one of your logs (appreciate you having posted them by the way). I honestly don't know what this means but it's a very different failure to some kind of runaway process that never times out.
newbie
Activity: 56
Merit: 0
ckolivas, I think you may be on to something here with the "to" build.  It seems to be the best so far, but not stable like the 3.5.1 build.

I had it running for 1 hour 20 mins before a zombie AMU 29 appeared, which in itself is longer than the other tests have managed.  However, I checked the logfile and found that the zombie was preceded by LIBUSB_ERROR_TIMEOUTs from AMU 20 and it was followed by repeated LIBUSB_ERROR_TIMEOUTs from AMU 0.  Once the errors from AMU 0 start, there appears to be a permanent condition such they never stop.
Thanks. What was the initial timeout error? Did it say whether it was a write or a read?
All "write" errors in the whole log - including 10 from AMU 20 at 11:44 (probably a red-herring) then:

 [2013-10-31 11:46:28] AMU 29 SendWork usb write err:(-4) LIBUSB_ERROR_NO_DEVICE
 [2013-10-31 11:46:28] AMU29: Comms error (werr=-4 amt=0)
 [2013-10-31 11:46:28] AMU 29 failure, disabling!
 [2013-10-31 11:46:28] Thread 29 being disabled

Edit: logfile here:
 https://dl.dropboxusercontent.com/u/44240170/logfile-to.txt
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
ckolivas, I think you may be on to something here with the "to" build.  It seems to be the best so far, but not stable like the 3.5.1 build.

I had it running for 1 hour 20 mins before a zombie AMU 29 appeared, which in itself is longer than the other tests have managed.  However, I checked the logfile and found that the zombie was preceded by LIBUSB_ERROR_TIMEOUTs from AMU 20 and it was followed by repeated LIBUSB_ERROR_TIMEOUTs from AMU 0.  Once the errors from AMU 0 start, there appears to be a permanent condition such they never stop.
Thanks. What was the initial timeout error? Did it say whether it was a write or a read?
newbie
Activity: 56
Merit: 0
Looks like the "bug" - if we can call it that - started with the 3.6.x build:
    https://bitcointalk.org/index.php?topic=28402.msg3443712#msg3443712
Yikes, you and I may have separate issues after all and/or there may be more than one issue to be found. The plot thickens. I sure hope my problem isn't right under my nose.
In that case let me try

http://ck.kolivas.org/apps/cgminer/temp/cgminer-to.exe
After 10.75 hours the "to" version has one zombie, AMU5. I'm not sure how long it was there before I checked. I think the "to" version is the best of the recent candidates on my machine. I'll swap the zombie and keep the run going if I can.
ckolivas, I think you may be on to something here with the "to" build.  It seems to be the best so far, but not stable like the 3.5.1 build.

I had it running for 1 hour 20 mins before a zombie AMU 29 appeared, which in itself is longer than the other tests have managed.  However, I checked the logfile and found that the zombie was preceded by LIBUSB_ERROR_TIMEOUTs from AMU 20 and it was followed by repeated LIBUSB_ERROR_TIMEOUTs from AMU 0.  Once the errors from AMU 0 start, there appears to be a permanent condition such they never stop.

Now here's the interesting bit: I unplugged the zombie AMU 29 and removed it for about 5-10 secs, then plugged it back in.  It hotplugged perfectly as AMU34 and started working and at the same time the errors for AMU 0 stopped.

Wait.... Now I have another zombie AMU 16, and repeated errors from AMU 0.   I have unplugged it and left it out and the errors from AMU 0 have stopped.  The zombie is still showing in the list (I suppose it would as no communication exists with it?)  Anyway, I've now plugged it back in and the zombie has disappeared, to be replaced by AMU 35.  All working normally again for a few minutes, but now more errors and another zombie AMU 16.  This is behaving almost like a memory leak (?) - once the errors start, they keep coming and re-occur more frequently.

I have unplugged AMU 16 - so only 33 erupters plugged in at the moment -and the errors on screen have stopped.

I suspect that if I plug it in again, the errors will restart, but I'll give it one more go. Hotplugged at 12:22 - Recognised as AMU 36 - ...and mining ok again.  I'll leave it while I have lunch and come back in an hour...

Edit: Stopped due to ongoing errors.

Logfile (without --debug) here:
  https://dl.dropboxusercontent.com/u/44240170/logfile-to.txt
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Looks like the "bug" - if we can call it that - started with the 3.6.x build:
    https://bitcointalksearch.org/topic/m.3443712


Yikes, you and I may have separate issues after all and/or there may be more than one issue to be found. The plot thickens. I sure hope my problem isn't right under my nose.

I'd get errors left and right on my AMD box with anything after 3.3.4, I think... until 3.6.4.  So what fixed problems for me caused problems for others.
Since yours got better moving to 3.6.x, can you also test to see if the -to binary does not make things worse for you please?

http://ck.kolivas.org/apps/cgminer/temp/cgminer-to.exe

Thanks!
legendary
Activity: 1450
Merit: 1013
Cryptanalyst castrated by his government, 1952
Looks like the "bug" - if we can call it that - started with the 3.6.x build:
    https://bitcointalk.org/index.php?topic=28402.msg3443712#msg3443712


Yikes, you and I may have separate issues after all and/or there may be more than one issue to be found. The plot thickens. I sure hope my problem isn't right under my nose.

In that case let me try

http://ck.kolivas.org/apps/cgminer/temp/cgminer-to.exe

After 10.75 hours the "to" version has one zombie, AMU5. I'm not sure how long it was there before I checked. I think the "to" version is the best of the recent candidates on my machine. I'll swap the zombie and keep the run going if I can.

Edit: the swap of AMU5 worked fine, but I just did a double-take looking at the allocations. The AMUs usually start out as AMU 0-12, with BAL0 somewhere in the middle or at the end. Now I have AMU0-4, BAL0, AMU7-8, 11, 14-17 - yikes it just changed again. It's saying AMU5 has gone zombie but there was no AMU5 a moment ago because it had been reallocated when the first zombie showed up. There is an LED on solid, but it's not the erupter that I just swapped. Bottom line - apparent allocation anomalies, and I get the feeling that it might have recovered from errors on its own and done some reallocations without intervention. I'll swap out the new zombie AMU5 and keep going if possible. Total run time now is 11 hours.

Edit: Getting weird. The "second AMU5" was reallocated to AMU18 when it was restarted - as expected. A few seconds later AMU3 went zombie. I think this is the same physical erupter as the original AMU5, which I restarted without moving it to a different location on the hub. Maybe I haven't had enough coffee but I thought it would be known as AMU17. Bottom line - the run seems to be getting flaky now, but my observations may include human error. I'll try to keep the run going though.

Edit: the AMU3 zombie was restarted as AMU19, as expected. Now showing AMU 0 1 2 4 7 8 11 14-19, with BAL 0 after AMU 4.

Edit: about an hour later - the display now says that AMU3 is a zombie - but there was no AMU3 (see list above). The display also shows AMU15 has gone to hashrate zero. Two LEDs are on. I'll keep it all going if possible.

Edit: I unplugged an erupter with its LED on. The display said AMU5 had gone zombie. Five? There was no five. AMU15 had morphed to five, it seems, as it was no longer showing at zero hashrate. Plugged it back in and it got reallocated to AMU20. The list is now AMU 0 1 2 4 BAL0 AMU 7 8 11 14 17-20 then AMU3 showing as a zombie. I imagine 3 will get reallocated to 21 when I restart it.

Edit: Yes, 3 became 21 when plugged back in, and it is slowly climbing back to full hash rate.

Edit: Another hour or so later I find five zombies, listed as AMU 3,5,6,9 and 10 - none of those numbers were in use (see above). Still reported running are AMU 0, 1, 2, 4, 7, 8, 11, 17 and the BAL. I'll start the run from scratch, I think, sticking with "to" candidate.

Edit: Since the restart, the "to" candidate has run for seven hours with no errors or anomalies, still going strong.




newbie
Activity: 56
Merit: 0
3.5.1 was stable overnight with my 34 erupters, although in the morning I had "lost" AMUs 17-12 which had been replaced by 34-37, but no zombies and my average hash-rate on Slush over ten rounds was the max I would have expected.

Now trialling the "to" version. Fingers crossed.
hero member
Activity: 742
Merit: 500
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
ckolivas

it is normal for 3.6.6-1?
in other versions there is no such
This is not a solo. So on any script coins. Not only at me.
3.6.4 works fine
 

Yeah I probably screwed that up for $scryptcoins when I fixed it for bitcoin.
hero member
Activity: 546
Merit: 500
whatever it is I'd think its right but just different. It looks like each device recognizing the block.
hero member
Activity: 742
Merit: 500
ckolivas

it is normal for 3.6.6-1?
in other versions there is no such
This is not a solo. So on any script coins. Not only at me.
3.6.4 works fine
 
legendary
Activity: 1450
Merit: 1013
Cryptanalyst castrated by his government, 1952
Looks like the "bug" - if we can call it that - started with the 3.6.x build:
    https://bitcointalk.org/index.php?topic=28402.msg3443712#msg3443712


Yikes, you and I may have separate issues after all and/or there may be more than one issue to be found. The plot thickens. I sure hope my problem isn't right under my nose.

In that case let me try

http://ck.kolivas.org/apps/cgminer/temp/cgminer-to.exe

Running "to" now after shutting down "rs10 with debug/log". As before, when debug is on, cgminer runs fine with no zombies or errors of any kinds - no LED drama either, just normal pinpoint flashes.

Only 5 minutes into "to" but it is well-behaved so far.

Edit: one hour in and the "to" version is perfect so far. It will run more-or-less unattended now for about 10 hours if possible.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Looks like the "bug" - if we can call it that - started with the 3.6.x build:
    https://bitcointalksearch.org/topic/m.3443712


Yikes, you and I may have separate issues after all and/or there may be more than one issue to be found. The plot thickens. I sure hope my problem isn't right under my nose.

In that case let me try

http://ck.kolivas.org/apps/cgminer/temp/cgminer-to.exe
legendary
Activity: 1540
Merit: 1001
Looks like the "bug" - if we can call it that - started with the 3.6.x build:
    https://bitcointalksearch.org/topic/m.3443712


Yikes, you and I may have separate issues after all and/or there may be more than one issue to be found. The plot thickens. I sure hope my problem isn't right under my nose.

I'd get errors left and right on my AMD box with anything after 3.3.4, I think... until 3.6.4.  So what fixed problems for me caused problems for others.

M
legendary
Activity: 1540
Merit: 1001
I'd like to think some other variable is at play here.  I've run every single version of cgminer from 3.3.1 to 3.5.1 and have never had a single problem.  I ran it on a Windows 7 machine 24/7 with two 7970 video cards and six block erupters.  The only down time I've had is from halting for software updates and the maybe an average of one restart a month for system updates.  After 3.5.1 I moved the block erupters to a raspberry pi and it has almost literally 100% up time.  The only times it has stopped mining are when I halted the process to start the latest version of cgminer.  It's run every version between 3.5.1 all the way up to 3.6.6 without issue.

There almost has to be some key variable...os, wiring, electrical, hardware defect, environment...

Chad

I have two win7x64 machines. One is AMD based, somewhat older technology (was a sempton single core, is now an athlon II dual core @ 3.4GHz with 4 gb memory), doesn't have any USB 3.0 ports.  3.6.4 works fine with my 36 erupters.  No problems aside from the possible winsock error, but recently I've been restarting it for other reasons before it runs more than a few days, so I don't know if the winsock error is still around or not.

I move the same 36 erupters and same hubs to the other machine, Intel based, core i7, 12 gb of memory, with USB 3.0 ports.  I almost immediately get zombies and IO errors.  Neither of which I get on the older machine.  Oh, and the erupters aren't detected at all on the USB 3.0 ports.  I have to plug them into the 2.0 ports to see them.

I think it's hardware based.  Chipset maybe?  Something beyond your control?

M
legendary
Activity: 1450
Merit: 1013
Cryptanalyst castrated by his government, 1952
I'll just leave 3.5.1 running until morning and see what ckolivas makes of all this. Shocked

I just noticed I went straight from 3.5.0 to 3.6.2 "way back then" - never tried 3.5.1 along the way.

I've been running candidate rs10 all day. It gets occasional zombies but it seems as good as anything for the moment. Maybe I'll add debugging to it and see if that keeps the zombies from showing up the way it used to do.

The pressure mounts!         Smiley
Jump to: