OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 581.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Darn, didn't try benchmark. Presumably that b0rk with the networking upgrade.

ancow

full member

Activity: 373

Merit: 100

When running the latest cgminer with "cgminer -k poclbm --benchmark", it segfaults. Gdb says (bt with current master):

Code:

Program terminated with signal 11, Segmentation fault.
#0  0x000000000040a099 in reap_curl (pool=0xd5fab0) at cgminer.c:4036
4036            list_for_each_entry_safe(ent, iter, &pool->curlring, node) {
(gdb) bt
#0  0x000000000040a099 in reap_curl (pool=0xd5fab0) at cgminer.c:4036
#1  watchpool_thread (userdata=) at cgminer.c:4061
#2  0x00007f116e9edb50 in start_thread (arg=) at pthread_create.c:304
#3  0x00007f116d9fc90d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

According to a quick git bisect, the following commit introduced a first segfault:

Code:

4cd973264f6426c9b3db73b420bb33768f1f3d90 is the first bad commit
commit 4cd973264f6426c9b3db73b420bb33768f1f3d90
Author: Con Kolivas 
Date:   Thu Apr 26 23:29:21 2012 +1000

    Create discrete persistent submit and get work threads per pool, thus allowing all submitworks belonging to the same pool to reuse the same curl handle, and all getworks to reuse their own handle.
    Use separate handles for submission to not make getwork potentially delay share submission which is time critical.
    This will allow much more reusing of persistent connections instead of opening new ones which can flood routers.
    This mandated a rework of the extra longpoll support (for when pools are switched) and this is managed by restarting longpoll cleanly and waiting for a thread join.

However, that segfault has a different backtrace. According to git bisect, that exact backtrace was introduced by:

Code:

85008a78539f79bbec1f25fcffafe6a232e2597b is the first bad commit
commit 85008a78539f79bbec1f25fcffafe6a232e2597b
Author: ckolivas 
Date:   Wed May 2 10:12:07 2012 +1000

    Reap curls that are unused for over a minute.
    This allows connections to be closed, thereby allowing the number of curl handles to always be the minimum necessary to not delay networking.

However, you should probably do some further testing... Wink

The backtrace produced by 4cd973264f6426c9b3db73b420bb33768f1f3d90 is:

Code:

Program terminated with signal 11, Segmentation fault.
#0  __pthread_mutex_lock (mutex=0x18) at pthread_mutex_lock.c:50
50      pthread_mutex_lock.c: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0  __pthread_mutex_lock (mutex=0x18) at pthread_mutex_lock.c:50
#1  0x00000000004130e3 in mutex_lock (lock=0x18) at miner.h:410
#2  tq_push (tq=0x0, data=) at util.c:625
#3  0x0000000000409705 in workio_get_work (wc=0x3621ff0) at cgminer.c:2068
#4  workio_thread (userdata=0x17414c0) at cgminer.c:3105
#5  0x00007ff96d476b50 in start_thread (arg=) at pthread_create.c:304
#6  0x00007ff96c48590d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: af_newbie on May 03, 2012, 06:57:00 AM

Con, sorry, maybe I was not clear. Every time you startup, you look for bin files with "cgminer-version-bin-file-name.bin" bin names, if not found
create a bin and put current cgminer version in the file name. When the user upgrades to next version of cgminer, you'll not find the bin because you'll be looking for "my-current-cgminer-version-bin-file-name.bin", so you recreate. When they run cgminer again, you already have a bin that does match your current cgminer version so you use it.

That way you don't recreate bins every time someone runs cgminer, only when they upgrade to new cgminer version.

Makes sense?

Yes but then that would make it hard for someone to do what this user just did: copy over the bin files from 2.3.6 to get the benefit from before they fucked up their SDK installation.

OpytZabilParol

jr. member

Activity: 63

Merit: 1

Thank you.
I cut a .bin file from cgminer-2.3.6 with the replacement and now I have 462Mh/s on new 2.4.0 version.
I have just a few days ago started using cgminer, before this was a phoenix.
Thanks again for your help.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: af_newbie on May 03, 2012, 06:28:33 AM

Quote from: -ck on May 03, 2012, 06:06:23 AM

Quote from: OpytZabilParol on May 03, 2012, 06:02:38 AM

Quote

I guess you haven't deleted the .bin files between testing...

cgminer-2.3.6 and cgminer-2.4.0 located in different folders. I copied config file from 2.3.6 to 2.4.0 and received result 418Mh/s

Did you delete the .bin files in the 2.3.6 directory after upgrading your driver+sdk?

Con, it might be worthwhile to put cgminer version in the bin file name so that you can detect upgrades and force recompiles on the fly.

I cannot seem to win with this never ending SDK upgrade fail problem. At least people have the option of keeping around a .bin file from an older SDK installation and use it on the current cgminer provided the kernel hasn't changed in the interim. No solution seems optimal, and most of the problem seems to surround the optimisation cgminer does by caching the .bin file which speeds startup of cgminer significantly. If I forced it to build the kernel every single startup, there would never be this confusion, but startup would be butt slow if you had lots of GPUs whereas now it starts hashing in the blink of an eye after the very first startup.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: OpytZabilParol on May 03, 2012, 06:23:32 AM

Quote from: -ck on May 03, 2012, 06:06:23 AM

Quote from: OpytZabilParol on May 03, 2012, 06:02:38 AM

Quote

I guess you haven't deleted the .bin files between testing...

cgminer-2.3.6 and cgminer-2.4.0 located in different folders. I copied config file from 2.3.6 to 2.4.0 and received result 418Mh/s

Did you delete the .bin files in the 2.3.6 directory after upgrading your driver+sdk?

No. Here are the contents of my folders:

This is what I'm trying to say.. .that has been said many times and is in the FAQ. If you run cgminer, it caches the binary built from the SDK installed the very first time you run it, and then even if you upgrade driver+SDK, it still runs like the old SDK. However if you install a new cgminer, there are no cached .bin files (see the .bin files in each folder), so it generates a new .bin file from the current SDK installed. The only way to compare versions 2.3.6 and 2.4.0 with your current installation of driver+SDK combination is to delete the .bin files from both of them and start them both again.

OpytZabilParol

jr. member

Activity: 63

Merit: 1

Quote from: -ck on May 03, 2012, 06:06:23 AM

Quote from: OpytZabilParol on May 03, 2012, 06:02:38 AM

Quote

I guess you haven't deleted the .bin files between testing...

cgminer-2.3.6 and cgminer-2.4.0 located in different folders. I copied config file from 2.3.6 to 2.4.0 and received result 418Mh/s

Did you delete the .bin files in the 2.3.6 directory after upgrading your driver+sdk?

No. Here are the contents of my folders:

http://img259.imageshack.us/img259/1724/10019549.png

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: OpytZabilParol on May 03, 2012, 06:02:38 AM

Quote

I guess you haven't deleted the .bin files between testing...

cgminer-2.3.6 and cgminer-2.4.0 located in different folders. I copied config file from 2.3.6 to 2.4.0 and received result 418Mh/s

Did you delete the .bin files in the 2.3.6 directory after upgrading your driver+sdk?

OpytZabilParol

jr. member

Activity: 63

Merit: 1

Quote

I guess you haven't deleted the .bin files between testing...

cgminer-2.3.6 and cgminer-2.4.0 located in different folders. I copied config file from 2.3.6 to 2.4.0 and received result 418Mh/s

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: -ck on May 03, 2012, 05:46:04 AM

Quote from: OpytZabilParol on May 03, 2012, 05:43:48 AM

cgminer-2.3.6 + OpenCl from AMD Catalyst 12.5 Beta = 462Mh/s
cgminer-2.4.0 + OpenCl from AMD Catalyst 12.5 Beta = 418Mh/s

Considering none of the opencl, GPU or kernel code was changed between 2.3.6 and 2.4.0, I find that a little more than a little unlikely.

I guess you haven't deleted the .bin files between testing...

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: OpytZabilParol on May 03, 2012, 05:43:48 AM

cgminer-2.3.6 + OpenCl from AMD Catalyst 12.5 Beta = 462Mh/s
cgminer-2.4.0 + OpenCl from AMD Catalyst 12.5 Beta = 418Mh/s

Considering none of the opencl, GPU or kernel code was changed between 2.3.6 and 2.4.0, I find that a little more than a little unlikely.

OpytZabilParol

jr. member

Activity: 63

Merit: 1

Quote from: -ck on May 03, 2012, 05:29:13 AM

Quote from: OpytZabilParol on May 03, 2012, 05:27:20 AM

I use cgminer-2.3.6 on my HD 5870 and received 460-462 Mh/s.
In the new version cgminer-2.4.0 was just 418 Mh/s.
Sys. Win7x64 + OpenCl from AMD Catalyst 12.5 Beta http://www.ngohq.com/home.php?page=Files&go=giveme&dwn_id=1624

Maybe, you should pay attention to the new drivers and optimize a new version of CGMINER

Wrong, you should downgrade your driver and read the FAQ.

If you mean that question,

Quote

Q: The CPU usage is high.
A: The ATI drivers after 11.6 have a bug that makes them consume 100% of one
CPU core unnecessarily so downgrade to 11.6. Binding cgminer to one CPU core on
windows can minimise it to 100% (instead of more than one core). Driver version
11.11 on linux and 11.12 on windows appear to have fixed this issue. Note that
later drivers may have an apparent return of high CPU usage. Try
'export GPU_USE_SYNC_OBJECTS=1' on Linux before starting cgminer.

I know about it.

The problem is:
cgminer-2.3.6 + OpenCl from AMD Catalyst 12.5 Beta = 462Mh/s
cgminer-2.4.0 + OpenCl from AMD Catalyst 12.5 Beta = 418Mh/s

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Quote from: OpytZabilParol on May 03, 2012, 05:27:20 AM

I use cgminer-2.3.6 on my HD 5870 and received 460-462 Mh/s.
In the new version cgminer-2.4.0 was just 418 Mh/s.
Sys. Win7x64 + OpenCl from AMD Catalyst 12.5 Beta http://www.ngohq.com/home.php?page=Files&go=giveme&dwn_id=1624

Maybe, you should pay attention to the new drivers and optimize a new version of CGMINER

WTF?
A download from a web site that includes an add to "Boost Your Internet Connection - Click Here"
Who would be fool enough to get stuff from there rather than ATI?

Edit: and the forum post says "leaked to the web" ...

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: OpytZabilParol on May 03, 2012, 05:27:20 AM

I use cgminer-2.3.6 on my HD 5870 and received 460-462 Mh/s.
In the new version cgminer-2.4.0 was just 418 Mh/s.
Sys. Win7x64 + OpenCl from AMD Catalyst 12.5 Beta http://www.ngohq.com/home.php?page=Files&go=giveme&dwn_id=1624

Maybe, you should pay attention to the new drivers and optimize a new version of CGMINER

Wrong, you should downgrade your driver and read the FAQ.

OpytZabilParol

jr. member

Activity: 63

Merit: 1

I use cgminer-2.3.6 on my HD 5870 and received 460-462 Mh/s.
In the new version cgminer-2.4.0 was just 418 Mh/s.
Sys. Win7x64 + OpenCl from AMD Catalyst 12.5 Beta http://www.ngohq.com/home.php?page=Files&go=giveme&dwn_id=1624

Maybe, you should pay attention to the new drivers and optimize a new version of CGMINER

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

NEW VERSION: 2.4.0 - May 3, 2012

This version has a fairly significant upgrade to the way networking is done, so there is a minor version number update instead of a micro version, but it has already been heavily tested.

Human readable changelog:
A whole networking scheduler of sorts was written for this version, designed to scale to any sized workload with the fastest networking possible, while minimising the number of connections in use, reusing them as much as possible.
The restart feature was added to the API to restart cgminer remotely.
If you're connected to a pool that starts rejecting every single share, cgminer will now automatically disable it unless you add the --no-pool-disable option.
Once a pool stops responding, cgminer won't keep trying to open a flood of extra connections.
Failing BFL won't cause cgminer to stop; it'll just disable the device, which an attempt may be made to re-enable it.
Hashrates on FPGAs may be more accurate (though still not ideal).
Longpoll messages won't keep going indefinitely while a pool is down.

Full changelog:
- Only show longpoll warning once when it has failed.
- Convert hashes to an unsigned long long as well.
- Detect pools that have issues represented by endless rejected shares and
disable them, with a parameter to optionally disable this feature.
- Bugfix: Use a 64-bit type for hashes_done (miner_thread) since it can overflow
32-bit on some FPGAs
- Implement an older header fix for a label existing before the pthread_cleanup
macro.
- Limit the number of curls we recruit on communication failures and with
delaynet enabled to 5 by maintaining a per-pool curl count, and using a pthread
conditional that wakes up when one is returned to the ring buffer.
- Generalise add_pool() functions since they're repeated in add_pool_details.
- Bugfix: Return failure, rather than quit, if BFwrite fails
- Disable failing devices such that the user can attempt to re-enable them
- Bugfix: thread_shutdown shouldn't try to free the device, since it's needed
afterward
- API bool's and 1TBS fixes
- Icarus - minimise code delays and name timer variables
- api.c V1.9 add 'restart' + redesign 'quit' so thread exits cleanly
- api.c bug - remove extra ']'s in notify command
- Increase pool watch interval to 30 seconds.
- Reap curls that are unused for over a minute. This allows connections to be
closed, thereby allowing the number of curl handles to always be the minimum
necessary to not delay networking.
- Use the ringbuffer of curls from the same pool for submit as well as getwork
threads. Since the curl handles were already connected to the same pool and are
immediately available, share submission will not be delayed by getworks.
- Implement a scaleable networking framework designed to cope with any sized
network requirements, yet minimise the number of connections being reopened. Do
this by create a ring buffer linked list of curl handles to be used by getwork,
recruiting extra handles when none is immediately available.
- There is no need for the submit and getwork curls to be tied to the pool
struct.
- Do not recruit extra connection threads if there have been connection errors
to the pool in question.
- We should not retry submitting shares indefinitely or we may end up with a
huge backlog during network outages, so discard stale shares if we failed to
submit them and they've become stale in the interim.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: zefir on May 03, 2012, 12:43:11 AM

As reported in another thread, the latest GIT updates improved cgminer performance with Clipse's bonuspool noticeably. With 2.3.6 my router had hundreds of active connections to the pools, now there are max. 20.

If you're mining with bonuspool, you should try and build cgminer from latest GIT sources.

Thanks for the feedback. This update has been significant enough to release a new version so I'm planning on releasing 2.4.0 soon.

zefir

donator

Activity: 919

Merit: 1000

As reported in another thread, the latest GIT updates improved cgminer performance with Clipse's bonuspool noticeably. With 2.3.6 my router had hundreds of active connections to the pools, now there are max. 20.

If you're mining with bonuspool, you should try and build cgminer from latest GIT sources.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Massive update to git tree:

I've basically implemented a kind of "network scheduler" into cgminer. It is designed to increase the number of connections to cope with virtually any sized hashing (see the 91 device example above with 33GH on one machine), yet minimise the amount of open connections used at any time, and reuse connections as much as possible.

This is the relevant part of the changelog (reverse order), though there are other commits to the master branch as well:
Limit the number of curls we recruit on communication failures and with delaynet enabled to 5 by maintaining a per-pool curl count, and using a pthread conditional that wakes up when one is returned to the ring buffer.
Reap curls that are unused for over a minute. This allows connections to be closed, thereby allowing the number of curl handles to always be the minimum necessary to not delay networking.
Use the ringbuffer of curls from the same pool for submit as well as getwork threads. Since the curl handles were already connected to the same pool and are immediately available, share submission will not be delayed by getworks.
Implement a scaleable networking framework designed to cope with any sized network requirements, yet minimise the number of connections being reopened. Do this by create a ring buffer linked list of curl handles to be used by getwork, recruiting extra handles when none is immediately available.

generalfault

newbie

Activity: 26

Merit: 0

Quote from: -ck on May 02, 2012, 07:32:19 PM

That's very interesting and likely an issue with the difference between headers and implementation of pthread_cleanup_pop in that distribution/gcc and all the more modern ones we're compiling on. The reason is that the pthread_cleanup_* functions are implemented as macros so you can't see just why this is an issue unless you spit out the pre-processor output. Either way it should be easy to implement something like that as a fix, thanks.

Ah ha! You know, that would completely explain it... I was racking my brain as to why it would throw THAT error.
Thank you so much, my brain can rest now.

Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 581. (Read 5806088 times)