OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 189.

forzendiablo

legendary

Activity: 1526

Merit: 1000

the grandpa of cryptos

the 3.6.6 versio nseems super buggy

Karin

member

Activity: 109

Merit: 10

I'm still getting reports from Mac users that 3.6.4 through 3.6.6 won't work with BFL Jalapeños, returning the following line:

Code:

BitForceSC detect (29:6) failed to initialise (incorrect device?)

This only occurs in the new operating system version, Mac OS X 10.9, aka Mavericks, which was a free upgrade for all Mac users. OS 10.8 and lower work fine.

I wish I could do more to help test, but --usb-dump 0 and --verbose both do not return any additional information. If there is anything else I can try, please let me know. Perhaps I'll have them work back through cgminer releases and my Mac binaries (which had different libusb version up until recently when you started including them).

jmc1517

newbie

Activity: 56

Merit: 0

Quote from: -ck on October 31, 2013, 07:49:22 AM

Quote from: jmc1517 on October 31, 2013, 07:46:47 AM

Quote from: -ck on October 31, 2013, 07:39:19 AM

Quote from: jmc1517 on October 31, 2013, 07:31:17 AM

ckolivas, I think you may be on to something here with the "to" build. It seems to be the best so far, but not stable like the 3.5.1 build.

I had it running for 1 hour 20 mins before a zombie AMU 29 appeared, which in itself is longer than the other tests have managed. However, I checked the logfile and found that the zombie was preceded by LIBUSB_ERROR_TIMEOUTs from AMU 20 and it was followed by repeated LIBUSB_ERROR_TIMEOUTs from AMU 0. Once the errors from AMU 0 start, there appears to be a permanent condition such they never stop.

Thanks. What was the initial timeout error? Did it say whether it was a write or a read?

All "write" errors in the whole log - including 10 from AMU 20 at 11:44 (probably a red-herring) then:

[2013-10-31 11:46:28] AMU 29 SendWork usb write err:(-4) LIBUSB_ERROR_NO_DEVICE
[2013-10-31 11:46:28] AMU29: Comms error (werr=-4 amt=0)
[2013-10-31 11:46:28] AMU 29 failure, disabling!
[2013-10-31 11:46:28] Thread 29 being disabled

-4 really is the system telling us the device has effectively gone. That's quite different to timeouts, and in fact has been in every one of your logs (appreciate you having posted them by the way). I honestly don't know what this means but it's a very different failure to some kind of runaway process that never times out.

Yes it does seem strange. There are timeouts from a device, then some *other* device goes AWOL without any notice whatsoever. Anyway, I've now posted the entire logfile if you want to have a look at it. See Edit* in previous post.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: jmc1517 on October 31, 2013, 07:46:47 AM

Quote from: -ck on October 31, 2013, 07:39:19 AM

Quote from: jmc1517 on October 31, 2013, 07:31:17 AM

ckolivas, I think you may be on to something here with the "to" build. It seems to be the best so far, but not stable like the 3.5.1 build.

I had it running for 1 hour 20 mins before a zombie AMU 29 appeared, which in itself is longer than the other tests have managed. However, I checked the logfile and found that the zombie was preceded by LIBUSB_ERROR_TIMEOUTs from AMU 20 and it was followed by repeated LIBUSB_ERROR_TIMEOUTs from AMU 0. Once the errors from AMU 0 start, there appears to be a permanent condition such they never stop.

Thanks. What was the initial timeout error? Did it say whether it was a write or a read?

All "write" errors in the whole log - including 10 from AMU 20 at 11:44 (probably a red-herring) then:

[2013-10-31 11:46:28] AMU 29 SendWork usb write err:(-4) LIBUSB_ERROR_NO_DEVICE
[2013-10-31 11:46:28] AMU29: Comms error (werr=-4 amt=0)
[2013-10-31 11:46:28] AMU 29 failure, disabling!
[2013-10-31 11:46:28] Thread 29 being disabled

-4 really is the system telling us the device has effectively gone. That's quite different to timeouts, and in fact has been in every one of your logs (appreciate you having posted them by the way). I honestly don't know what this means but it's a very different failure to some kind of runaway process that never times out.

jmc1517

newbie

Activity: 56

Merit: 0

Quote from: -ck on October 31, 2013, 07:39:19 AM

Quote from: jmc1517 on October 31, 2013, 07:31:17 AM

ckolivas, I think you may be on to something here with the "to" build. It seems to be the best so far, but not stable like the 3.5.1 build.

I had it running for 1 hour 20 mins before a zombie AMU 29 appeared, which in itself is longer than the other tests have managed. However, I checked the logfile and found that the zombie was preceded by LIBUSB_ERROR_TIMEOUTs from AMU 20 and it was followed by repeated LIBUSB_ERROR_TIMEOUTs from AMU 0. Once the errors from AMU 0 start, there appears to be a permanent condition such they never stop.

Thanks. What was the initial timeout error? Did it say whether it was a write or a read?

All "write" errors in the whole log - including 10 from AMU 20 at 11:44 (probably a red-herring) then:

[2013-10-31 11:46:28] AMU 29 SendWork usb write err:(-4) LIBUSB_ERROR_NO_DEVICE
[2013-10-31 11:46:28] AMU29: Comms error (werr=-4 amt=0)
[2013-10-31 11:46:28] AMU 29 failure, disabling!
[2013-10-31 11:46:28] Thread 29 being disabled

Edit: logfile here:
https://dl.dropboxusercontent.com/u/44240170/logfile-to.txt

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: jmc1517 on October 31, 2013, 07:31:17 AM

ckolivas, I think you may be on to something here with the "to" build. It seems to be the best so far, but not stable like the 3.5.1 build.

I had it running for 1 hour 20 mins before a zombie AMU 29 appeared, which in itself is longer than the other tests have managed. However, I checked the logfile and found that the zombie was preceded by LIBUSB_ERROR_TIMEOUTs from AMU 20 and it was followed by repeated LIBUSB_ERROR_TIMEOUTs from AMU 0. Once the errors from AMU 0 start, there appears to be a permanent condition such they never stop.

Thanks. What was the initial timeout error? Did it say whether it was a write or a read?

jmc1517

newbie

Activity: 56

Merit: 0

Quote from: aigeezer on October 31, 2013, 06:50:18 AM

Quote from: -ck on October 30, 2013, 07:57:41 PM

Quote from: aigeezer on October 30, 2013, 02:26:25 PM

Quote from: jmc1517 on October 30, 2013, 02:21:14 PM

Looks like the "bug" - if we can call it that - started with the 3.6.x build:
https://bitcointalksearch.org/topic/m.3443712

Yikes, you and I may have separate issues after all and/or there may be more than one issue to be found. The plot thickens. I sure hope my problem isn't right under my nose.

In that case let me try

http://ck.kolivas.org/apps/cgminer/temp/cgminer-to.exe

After 10.75 hours the "to" version has one zombie, AMU5. I'm not sure how long it was there before I checked. I think the "to" version is the best of the recent candidates on my machine. I'll swap the zombie and keep the run going if I can.

ckolivas, I think you may be on to something here with the "to" build. It seems to be the best so far, but not stable like the 3.5.1 build.

I had it running for 1 hour 20 mins before a zombie AMU 29 appeared, which in itself is longer than the other tests have managed. However, I checked the logfile and found that the zombie was preceded by LIBUSB_ERROR_TIMEOUTs from AMU 20 and it was followed by repeated LIBUSB_ERROR_TIMEOUTs from AMU 0. Once the errors from AMU 0 start, there appears to be a permanent condition such they never stop.

Now here's the interesting bit: I unplugged the zombie AMU 29 and removed it for about 5-10 secs, then plugged it back in. It hotplugged perfectly as AMU34 and started working and at the same time the errors for AMU 0 stopped.

Wait.... Now I have another zombie AMU 16, and repeated errors from AMU 0. I have unplugged it and left it out and the errors from AMU 0 have stopped. The zombie is still showing in the list (I suppose it would as no communication exists with it?) Anyway, I've now plugged it back in and the zombie has disappeared, to be replaced by AMU 35. All working normally again for a few minutes, but now more errors and another zombie AMU 16. This is behaving almost like a memory leak (?) - once the errors start, they keep coming and re-occur more frequently.

I have unplugged AMU 16 - so only 33 erupters plugged in at the moment -and the errors on screen have stopped.

I suspect that if I plug it in again, the errors will restart, but I'll give it one more go. Hotplugged at 12:22 - Recognised as AMU 36 - ...and mining ok again. I'll leave it while I have lunch and come back in an hour...

Edit: Stopped due to ongoing errors.

Logfile (without --debug) here:
https://dl.dropboxusercontent.com/u/44240170/logfile-to.txt

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: mdude77 on October 30, 2013, 05:13:33 PM

Quote from: aigeezer on October 30, 2013, 02:26:25 PM

Quote from: jmc1517 on October 30, 2013, 02:21:14 PM

Looks like the "bug" - if we can call it that - started with the 3.6.x build:
https://bitcointalksearch.org/topic/m.3443712

Yikes, you and I may have separate issues after all and/or there may be more than one issue to be found. The plot thickens. I sure hope my problem isn't right under my nose.

I'd get errors left and right on my AMD box with anything after 3.3.4, I think... until 3.6.4. So what fixed problems for me caused problems for others.

Since yours got better moving to 3.6.x, can you also test to see if the -to binary does not make things worse for you please?

http://ck.kolivas.org/apps/cgminer/temp/cgminer-to.exe

Thanks!

aigeezer

legendary

Activity: 1450

Merit: 1013

Cryptanalyst castrated by his government, 1952

Quote from: -ck on October 30, 2013, 07:57:41 PM

Quote from: aigeezer on October 30, 2013, 02:26:25 PM

Quote from: jmc1517 on October 30, 2013, 02:21:14 PM

Looks like the "bug" - if we can call it that - started with the 3.6.x build:
https://bitcointalksearch.org/topic/m.3443712

Yikes, you and I may have separate issues after all and/or there may be more than one issue to be found. The plot thickens. I sure hope my problem isn't right under my nose.

In that case let me try

http://ck.kolivas.org/apps/cgminer/temp/cgminer-to.exe

After 10.75 hours the "to" version has one zombie, AMU5. I'm not sure how long it was there before I checked. I think the "to" version is the best of the recent candidates on my machine. I'll swap the zombie and keep the run going if I can.

Edit: the swap of AMU5 worked fine, but I just did a double-take looking at the allocations. The AMUs usually start out as AMU 0-12, with BAL0 somewhere in the middle or at the end. Now I have AMU0-4, BAL0, AMU7-8, 11, 14-17 - yikes it just changed again. It's saying AMU5 has gone zombie but there was no AMU5 a moment ago because it had been reallocated when the first zombie showed up. There is an LED on solid, but it's not the erupter that I just swapped. Bottom line - apparent allocation anomalies, and I get the feeling that it might have recovered from errors on its own and done some reallocations without intervention. I'll swap out the new zombie AMU5 and keep going if possible. Total run time now is 11 hours.

Edit: Getting weird. The "second AMU5" was reallocated to AMU18 when it was restarted - as expected. A few seconds later AMU3 went zombie. I think this is the same physical erupter as the original AMU5, which I restarted without moving it to a different location on the hub. Maybe I haven't had enough coffee but I thought it would be known as AMU17. Bottom line - the run seems to be getting flaky now, but my observations may include human error. I'll try to keep the run going though.

Edit: the AMU3 zombie was restarted as AMU19, as expected. Now showing AMU 0 1 2 4 7 8 11 14-19, with BAL 0 after AMU 4.

Edit: about an hour later - the display now says that AMU3 is a zombie - but there was no AMU3 (see list above). The display also shows AMU15 has gone to hashrate zero. Two LEDs are on. I'll keep it all going if possible.

Edit: I unplugged an erupter with its LED on. The display said AMU5 had gone zombie. Five? There was no five. AMU15 had morphed to five, it seems, as it was no longer showing at zero hashrate. Plugged it back in and it got reallocated to AMU20. The list is now AMU 0 1 2 4 BAL0 AMU 7 8 11 14 17-20 then AMU3 showing as a zombie. I imagine 3 will get reallocated to 21 when I restart it.

Edit: Yes, 3 became 21 when plugged back in, and it is slowly climbing back to full hash rate.

Edit: Another hour or so later I find five zombies, listed as AMU 3,5,6,9 and 10 - none of those numbers were in use (see above). Still reported running are AMU 0, 1, 2, 4, 7, 8, 11, 17 and the BAL. I'll start the run from scratch, I think, sticking with "to" candidate.

Edit: Since the restart, the "to" candidate has run for seven hours with no errors or anomalies, still going strong.

jmc1517

newbie

Activity: 56

Merit: 0

3.5.1 was stable overnight with my 34 erupters, although in the morning I had "lost" AMUs 17-12 which had been replaced by 34-37, but no zombies and my average hash-rate on Slush over ten rounds was the max I would have expected.

Now trialling the "to" version. Fingers crossed.

paladin281978

hero member

Activity: 742

Merit: 500

I stay on 3.6.4

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: paladin281978 on October 30, 2013, 08:12:08 PM

ckolivas

it is normal for 3.6.6-1?
in other versions there is no such
This is not a solo. So on any script coins. Not only at me.
3.6.4 works fine

Yeah I probably screwed that up for $scryptcoins when I fixed it for bitcoin.

techman05

hero member

Activity: 546

Merit: 500

whatever it is I'd think its right but just different. It looks like each device recognizing the block.

paladin281978

hero member

Activity: 742

Merit: 500

ckolivas

it is normal for 3.6.6-1?
in other versions there is no such
This is not a solo. So on any script coins. Not only at me.
3.6.4 works fine

aigeezer

legendary

Activity: 1450

Merit: 1013

Cryptanalyst castrated by his government, 1952

Quote from: -ck on October 30, 2013, 07:57:41 PM

Quote from: aigeezer on October 30, 2013, 02:26:25 PM

Quote from: jmc1517 on October 30, 2013, 02:21:14 PM

Looks like the "bug" - if we can call it that - started with the 3.6.x build:
https://bitcointalksearch.org/topic/m.3443712

Yikes, you and I may have separate issues after all and/or there may be more than one issue to be found. The plot thickens. I sure hope my problem isn't right under my nose.

In that case let me try

http://ck.kolivas.org/apps/cgminer/temp/cgminer-to.exe

Running "to" now after shutting down "rs10 with debug/log". As before, when debug is on, cgminer runs fine with no zombies or errors of any kinds - no LED drama either, just normal pinpoint flashes.

Only 5 minutes into "to" but it is well-behaved so far.

Edit: one hour in and the "to" version is perfect so far. It will run more-or-less unattended now for about 10 hours if possible.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: aigeezer on October 30, 2013, 02:26:25 PM

Quote from: jmc1517 on October 30, 2013, 02:21:14 PM

Looks like the "bug" - if we can call it that - started with the 3.6.x build:
https://bitcointalksearch.org/topic/m.3443712

Yikes, you and I may have separate issues after all and/or there may be more than one issue to be found. The plot thickens. I sure hope my problem isn't right under my nose.

In that case let me try

http://ck.kolivas.org/apps/cgminer/temp/cgminer-to.exe

mdude77

legendary

Activity: 1540

Merit: 1001

Quote from: aigeezer on October 30, 2013, 02:26:25 PM

Quote from: jmc1517 on October 30, 2013, 02:21:14 PM

Looks like the "bug" - if we can call it that - started with the 3.6.x build:
https://bitcointalksearch.org/topic/m.3443712

Yikes, you and I may have separate issues after all and/or there may be more than one issue to be found. The plot thickens. I sure hope my problem isn't right under my nose.

I'd get errors left and right on my AMD box with anything after 3.3.4, I think... until 3.6.4. So what fixed problems for me caused problems for others.

M

mdude77

legendary

Activity: 1540

Merit: 1001

Quote from: chadtn on October 30, 2013, 12:30:58 PM

I'd like to think some other variable is at play here. I've run every single version of cgminer from 3.3.1 to 3.5.1 and have never had a single problem. I ran it on a Windows 7 machine 24/7 with two 7970 video cards and six block erupters. The only down time I've had is from halting for software updates and the maybe an average of one restart a month for system updates. After 3.5.1 I moved the block erupters to a raspberry pi and it has almost literally 100% up time. The only times it has stopped mining are when I halted the process to start the latest version of cgminer. It's run every version between 3.5.1 all the way up to 3.6.6 without issue.

There almost has to be some key variable...os, wiring, electrical, hardware defect, environment...

Chad

I have two win7x64 machines. One is AMD based, somewhat older technology (was a sempton single core, is now an athlon II dual core @ 3.4GHz with 4 gb memory), doesn't have any USB 3.0 ports. 3.6.4 works fine with my 36 erupters. No problems aside from the possible winsock error, but recently I've been restarting it for other reasons before it runs more than a few days, so I don't know if the winsock error is still around or not.

I move the same 36 erupters and same hubs to the other machine, Intel based, core i7, 12 gb of memory, with USB 3.0 ports. I almost immediately get zombies and IO errors. Neither of which I get on the older machine. Oh, and the erupters aren't detected at all on the USB 3.0 ports. I have to plug them into the 2.0 ports to see them.

I think it's hardware based. Chipset maybe? Something beyond your control?

M

aigeezer

legendary

Activity: 1450

Merit: 1013

Cryptanalyst castrated by his government, 1952

Quote from: jmc1517 on October 30, 2013, 04:13:10 PM

I'll just leave 3.5.1 running until morning and see what ckolivas makes of all this. Shocked

I just noticed I went straight from 3.5.0 to 3.6.2 "way back then" - never tried 3.5.1 along the way.

I've been running candidate rs10 all day. It gets occasional zombies but it seems as good as anything for the moment. Maybe I'll add debugging to it and see if that keeps the zombies from showing up the way it used to do.

The pressure mounts!

jmc1517

newbie

Activity: 56

Merit: 0

Quote from: aigeezer on October 30, 2013, 02:26:25 PM

Quote from: jmc1517 on October 30, 2013, 02:21:14 PM

Looks like the "bug" - if we can call it that - started with the 3.6.x build:
https://bitcointalksearch.org/topic/m.3443712

Yikes, you and I may have separate issues after all and/or there may be more than one issue to be found. The plot thickens. I sure hope my problem isn't right under my nose.

Well - hopefully it's the same problem! That's how I stumbled upon this thread in the first place, remember? We seem to confirm each other's findings re the test-builds, i.e. we both consistently get zombies and/or decaying hash-rates on the builds we've tested, but perhaps I see them more quickly as I'm running 34 erupters. Anyway, at the moment I'm happy with 3.5.1. It's been running about 4 hours in total and no errors, so hopefully it will survive overnight and become my new "stable" build.

As an aside, I have 2 * 7970's in my machine also, but they are happily mining LTC using another instance of 3.3.1 (I'll upgrade that to 3.5.1 at some point, even though it's not strictly necessary).

I haven't bothered trialling the latest test-builds. I'll just leave 3.5.1 running until morning and see what ckolivas makes of all this. Shocked

Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 189. (Read 5806015 times)