Pages:
Author

Topic: Ubuntu Blacks Out (Read 3817 times)

hero member
Activity: 924
Merit: 506
June 24, 2011, 10:56:55 AM
#23
If you look at line 4577, it says "Segmentation Fault". Seg faults are when a program accesses memory that it shouldn't, so Linux has to kill the process. (Ever see the "This program has encountered an error and needs to close" in windows? Same thing.)

The backtrace starts at line 4559. The backtrace is basically a list of functions that were called when the program accessed memory it's not allowed to. (A list of all functions on the stack at that time.)
If function 1 calls function 2, and function 2 seg faults, the backtrace would look like:
Function 2
Function 1
So, if you look at your backtrace, you see that miPointerUpdateSprite (updating the pointer, name is self explanatory) eventually calls into /usr/lib/xorg/extra-modules/modules/drivers/fglrx_drv.so (AMD's driver), which is where the Seg Fault occurs. Using, SWCursor, AMD's driver doesn't worry about anything pointer specific, thus avoiding the bug.

Wikipedia can probably explain these concepts better than I can. I program in my spare time, so these things aren't too foreign to me. Also, this isn't my first battle with the X Server.

If you want to learn more about configuring X, you can look for documentation online:
https://wiki.ubuntu.com/X/Config
http://www.x.org/archive/X11R6.8.0/doc/ati5.html
Also, a great resource is the manpages. You can find a manual for almost any command or config file by typing:
man xorg.conf
obviously replacing xorg.conf with the name of the config file/command

If you'd really like to donate, there's an address in my sig. (or at least I think there is. I see it at least.) All donatoins are appreciated, thanks. Smiley

Well, you're not the first person with this bug - I found it here originally: http://phoronix.com/forums/showthread.php?51726-AMD-Catalyst-11.5-Linux-Driver-Released/page4
Anyway, as to why no miners have found this before, I guess it's because the gut reaction is to assume your card is overheating. In other words, miners may have been affected by this bug before, and just assumed they overclocked too much or had a defective fan, etc. (Just a theory)

Yes, using SSH should fix it because the bug happens when moving the mouse cursor, so if you never move the cursor, you never experience the bug. You could also just unplug your mouse, and just use the keyboard for control (and leave the monitor in).
I was also going to suggest disabling your screen saver because it seems to be connected with the problem (Hiding/Showing the pointer could be triggering the bug).
Another option would be to use the open source drivers, which don't have this bug. (No miner will go for this option because of the lack of opencl support)
Anyway, I suggested SWcursor first because it seemed like the best option - if you're willing to edit the config file, you really don't give anything up by having software draw the cursor.

Anyway, I hope the fix works! Good luck mining!

Thanks again Drawoc. And for the educational info... I'll look closer at it later when my brain isn't a soup sandwich. I've been up all night. Also, the miner is still chugging along, and hasn't crashed yet since being started about 3 hours ago. The screensaver has already popped up about three times (no induced crash). I think it's resolved now.

BTW, I sent you .75BTC. Cheesy

See you around.
full member
Activity: 168
Merit: 100
Firstbits: 175wn
June 24, 2011, 10:16:53 AM
#22
Ok. I followed your instructions (including the initial backup). I jumped into mining with all four gpu's again. Now to see if it crashes again after some time. I suppose if it does, I can make save another log file.

Something else that is different, is that I put a big fan on top of the computer (blowing into it of course). Average of the four gpu's is about 70C. I will try to cool them more, but it is tolerable enough for this test run I think. The most itme I think I had it running all four before was about 20 minutes.

Where did you learn to read that file that you asked me to upload? Would you point me to the line item in it where you saw an issue? And how you knew it was a problem...I'd love to learn more about linux. Smiley

p.s.  Your patience and help is appreciated. Where can I donate to you? I only have about .75 BTC in my bitcoin wallet, but it's something. Tongue

If you look at line 4577, it says "Segmentation Fault". Seg faults are when a program accesses memory that it shouldn't, so Linux has to kill the process. (Ever see the "This program has encountered an error and needs to close" in windows? Same thing.)

The backtrace starts at line 4559. The backtrace is basically a list of functions that were called when the program accessed memory it's not allowed to. (A list of all functions on the stack at that time.)
If function 1 calls function 2, and function 2 seg faults, the backtrace would look like:
Function 2
Function 1
So, if you look at your backtrace, you see that miPointerUpdateSprite (updating the pointer, name is self explanatory) eventually calls into /usr/lib/xorg/extra-modules/modules/drivers/fglrx_drv.so (AMD's driver), which is where the Seg Fault occurs. Using, SWCursor, AMD's driver doesn't worry about anything pointer specific, thus avoiding the bug.

Wikipedia can probably explain these concepts better than I can. I program in my spare time, so these things aren't too foreign to me. Also, this isn't my first battle with the X Server.

If you want to learn more about configuring X, you can look for documentation online:
https://wiki.ubuntu.com/X/Config
http://www.x.org/archive/X11R6.8.0/doc/ati5.html
Also, a great resource is the manpages. You can find a manual for almost any command or config file by typing:
man xorg.conf
obviously replacing xorg.conf with the name of the config file/command

If you'd really like to donate, there's an address in my sig. (or at least I think there is. I see it at least.) All donatoins are appreciated, thanks. Smiley

drawoc...

Assuming the software cursor fix works, doesn't it seems a bit of a odd work around way of fixing it. I haven't heard of anyone else having this problem (noteworthy I think).
Anyway, do you think a seperate/independent option could have been to just unplug the mouse and monitor and log remotely using SSH? (i.e. without having to change the conf file)

Regardless, I guess there's always more than one way, and really the fact that I haven't read of anyone else experiencing this issue (yet) in mining has caught my attention.

...systems still going... been about 40 minutes now. Cheesy


Well, you're not the first person with this bug - I found it here originally: http://phoronix.com/forums/showthread.php?51726-AMD-Catalyst-11.5-Linux-Driver-Released/page4
Anyway, as to why no miners have found this before, I guess it's because the gut reaction is to assume your card is overheating. In other words, miners may have been affected by this bug before, and just assumed they overclocked too much or had a defective fan, etc. (Just a theory)

Yes, using SSH should fix it because the bug happens when moving the mouse cursor, so if you never move the cursor, you never experience the bug. You could also just unplug your mouse, and just use the keyboard for control (and leave the monitor in).
I was also going to suggest disabling your screen saver because it seems to be connected with the problem (Hiding/Showing the pointer could be triggering the bug).
Another option would be to use the open source drivers, which don't have this bug. (No miner will go for this option because of the lack of opencl support)
Anyway, I suggested SWcursor first because it seemed like the best option - if you're willing to edit the config file, you really don't give anything up by having software draw the cursor.

Anyway, I hope the fix works! Good luck mining!
hero member
Activity: 924
Merit: 506
June 24, 2011, 08:47:47 AM
#21
drawoc...

Assuming the software cursor fix works, doesn't it seems a bit of a odd work around way of fixing it. I haven't heard of anyone else having this problem (noteworthy I think).
Anyway, do you think a seperate/independent option could have been to just unplug the mouse and monitor and log remotely using SSH? (i.e. without having to change the conf file)

Regardless, I guess there's always more than one way, and really the fact that I haven't read of anyone else experiencing this issue (yet) in mining has caught my attention.

...systems still going... been about 40 minutes now. Cheesy
hero member
Activity: 924
Merit: 506
June 24, 2011, 08:32:16 AM
#20
I'm not exactly sure how to read it, but I'm guessing the line numbers are logged event times in seconds from reboot(?) The ending line number is about what 5 hours would be in seconds - which is about how long it was from bootup to blackout issue.

Yep, basically.

The problem seems to be a bug in ATI's driver, involving hardware accelerated mouse cursor drawing.
We can try turning that off, and just using software drawn mouse cursors.

EDIT: I forgot to tell you to back up your xorg.conf file before editing it.
In a terminal, run:
Code:
sudo cp /etc/X11/xorg.conf /etc/X11/xorg.conf.bak

To edit your xorg.conf, run:
Code:
sudo nano /etc/X11/xorg.conf

That file's separated into sections. At the end of each Device section you see, add a line that says Option "SWCursor" "true"
So, for example:
Code:
Section "Device"
 ...
  Option "SWCursor" "true"
EndSection
Where ... is everything that was in that section before.
Each GPU has its own device section, so you should have four device sections to edit. (2 cards x 2 GPUs/card)

Hopefully that will solve the issue.

Ok. I followed your instructions (including the initial backup). I jumped into mining with all four gpu's again. Now to see if it crashes again after some time. I suppose if it does, I can make save another log file.

Something else that is different, is that I put a big fan on top of the computer (blowing into it of course). Average of the four gpu's is about 70C. I will try to cool them more, but it is tolerable enough for this test run I think. The most itme I think I had it running all four before was about 20 minutes.

Where did you learn to read that file that you asked me to upload? Would you point me to the line item in it where you saw an issue? And how you knew it was a problem...I'd love to learn more about linux. Smiley

p.s.  Your patience and help is appreciated. Where can I donate to you? I only have about .75 BTC in my bitcoin wallet, but it's something. Tongue
full member
Activity: 168
Merit: 100
Firstbits: 175wn
June 23, 2011, 04:08:11 PM
#19
I'm not exactly sure how to read it, but I'm guessing the line numbers are logged event times in seconds from reboot(?) The ending line number is about what 5 hours would be in seconds - which is about how long it was from bootup to blackout issue.

Yep, basically.

The problem seems to be a bug in ATI's driver, involving hardware accelerated mouse cursor drawing.
We can try turning that off, and just using software drawn mouse cursors.

EDIT: I forgot to tell you to back up your xorg.conf file before editing it.
In a terminal, run:
Code:
sudo cp /etc/X11/xorg.conf /etc/X11/xorg.conf.bak

To edit your xorg.conf, run:
Code:
sudo nano /etc/X11/xorg.conf

That file's separated into sections. At the end of each Device section you see, add a line that says Option "SWCursor" "true"
So, for example:
Code:
Section "Device"
 ...
  Option "SWCursor" "true"
EndSection
Where ... is everything that was in that section before.
Each GPU has its own device section, so you should have four device sections to edit. (2 cards x 2 GPUs/card)

Hopefully that will solve the issue.
hero member
Activity: 924
Merit: 506
June 23, 2011, 01:27:46 PM
#18
You could put it on pastebin, and post the link here.
http://pastebin.com/

Thanks! That's handy! Here's the link:

http://pastebin.com/1Buv9cE3

I'm not exactly sure how to read it, but I'm guessing the line numbers are logged event times in seconds from reboot(?) The ending line number is about what 5 hours would be in seconds - which is about how long it was from bootup to blackout issue.
full member
Activity: 168
Merit: 100
Firstbits: 175wn
June 23, 2011, 01:18:25 PM
#17
You could put it on pastebin, and post the link here.
http://pastebin.com/
hero member
Activity: 924
Merit: 506
June 23, 2011, 01:04:49 PM
#16
Now in your home directory, there'll be a file named Xorg.0.log
If you could post it on the forum, or upload it somewhere with Firefox, that would be useful.
If you want to read it on the terminal, you can use:
cat Xorg.0.log | less
or you can just log in and use the GUI (The file will still be in your home directory.)

It's too long to post. Where to upload it?
full member
Activity: 168
Merit: 100
Firstbits: 175wn
June 23, 2011, 01:02:49 PM
#15
Now in your home directory, there'll be a file named Xorg.0.log
If you could post it on the forum, or upload it somewhere with Firefox, that would be useful.
If you want to read it on the terminal, you can use:
cat Xorg.0.log | less
or you can just log in and use the GUI (The file will still be in your home directory.)
hero member
Activity: 924
Merit: 506
June 23, 2011, 12:19:10 PM
#14
After you get to the terminal and log in copy the xorg log file to your home directory like this:
cp /var/log/Xorg.0.log ~
(Capitalization is important) and post the logfile here.

Also, try:
sudo service gdm restart
This should restart the x server.
if that doesn't work, try:
sudo service x11-common restart
Never mind, instead try:
startx

Latest data or info. No mining being done during the steps below, it was all only a static testing.

I set the computer so there was no password required to login. And that login was automtic at startup. Rebooting will now bring me to the desktop. Screensaver was set, I thought, to one hour. However, it seeemd to go to screensaver in about 15 minutes (maybe I made a mistake setting it).
I moved the mouse after the screensaver came up, and the small login window was there. So, no black screen yet.
Entered password and it came to desktop. I let it go about the same amount of time to the screen saver again. Repeated the login and it came to desktop again.

Let screen saver reappear and stay on screen saving for 5 hours. This time I tried touching the keyboard. The login window worked again. Desktop viewable again.
BUT I tried nothing until a few minutes later when I wanted to double check my power and screen saver settings.
Once I touched the mouse to check that, it immediately went to a black screen with the following:

Code:
* Stopping System V runlevel compatibility           [ok]
* Starting CUPS printing spooler/server              [ok]
                                                         _

The underscore represents the position of the flashing cursor at the right end of the third line.

Shortly after, while typing this comment on another computer, the screen went black and the text was gone. I touched the keyboard and the above text reappeared.
Maybe the screensaver timeout setting triggered this event, but it was not the animated screen saver.

Pressed CTRL+ALT+F7 ---->nothing.

I copied the Xorg log as you requested. How do I open it from the terminal?

I next typed: sudo service gdm restart
The desktop started, but once I touched the mouse it returned to the screen that I just left. Commands and stuff I had typed were still there.

I next typed: startx
The screen went completely black. No typing becomes viewable.

Hard reboot. Square one. Tongue

[Late entry: it was already set for 15 minutes screen saver with power management already set to 'never' on both options of the computer and the monitor to go to sleep from inactivity.]
full member
Activity: 168
Merit: 100
Firstbits: 175wn
June 23, 2011, 09:17:14 AM
#13
After you get to the terminal and log in copy the xorg log file to your home directory like this:
cp /var/log/Xorg.0.log ~
(Capitalization is important) and post the logfile here.

Also, try:
sudo service gdm restart
This should restart the x server.
if that doesn't work, try:
sudo service x11-common restart
Never mind, instead try:
startx
sr. member
Activity: 313
Merit: 250
June 23, 2011, 06:40:53 AM
#12
ups double
sr. member
Activity: 313
Merit: 250
June 23, 2011, 06:35:02 AM
#11
If you are at the black window, try ALT+F7 that should switch you back
to the X Server (If it is still alive that is)

If not I would login on the ALT+F1 terminal and check if the X server is still
running, maybe do a ps axu and look for a program called X Smiley or do a  pgrep X or so,
you could also check some logs

dmesg
cat /var/log/syslog
cat /var/log/messages
etc etc... maybe there is info what is happening.

!!! carefull on my sapphire 5850 that turns the fan completly off !!! not sure wtf that is...
but "theoretical" it should work like this to set it back into auto mode
DISPLAY=:0.0 aticonfig --pplib-cmd "set fanspeed 0 auto"


After thinking about it, better don't try to set it back to auto this way, you can
do it with a tool called AMDOverDriveCtrl, go to the Tab Fanspeed and then click default.



member
Activity: 69
Merit: 10
June 23, 2011, 06:17:04 AM
#10
Set that fan speed to 100 then see if it still locks up.  Then you can work on auto settings or lowering to more tolerable speeds.
hero member
Activity: 924
Merit: 506
June 22, 2011, 09:43:22 PM
#9
I'll do some more testing later tonight when I'm back home. I'll apply the various suggestions from this thread and make a nice little grid to help define and furtherr isoloate the issue.

Anyone here have any experience on what linux will do if the video cards overheat? In this case, two 6990's at times reaching maybe between 80C to 90C...either for a few moments or for sustained periods of many minutes.

That is too hot for sustained use. You need to leave fan speed on auto. I'm not sure what Linux will do, but it could be what you're experiencing. Sometimes, regardless of OS, the system will just freeze up.

Thaks for the suggestion. I'l include that in the matrix of things I try. Do you happen to know how to set auto fan speed in linux? I only know how to manually set a fan speed..for that I am using:

Code:
export DISPLAY=:0.0; aticonfig --pplib-cmd "set fanspeed 60"

or it might have been...
Code:
export DISPLAY=:0.0; aticonfig --pplib-cmd "set fanspeed 0 60"

One of those is how I set the fanspeed for for 60% on gpu device 0.0 [then I'd use 0.1, 0.2, 0.3 on the other gpu fan speeds].

hero member
Activity: 560
Merit: 500
Ad astra.
June 22, 2011, 07:35:35 PM
#8
I'll do some more testing later tonight when I'm back home. I'll apply the various suggestions from this thread and make a nice little grid to help define and furtherr isoloate the issue.

Anyone here have any experience on what linux will do if the video cards overheat? In this case, two 6990's at times reaching maybe between 80C to 90C...either for a few moments or for sustained periods of many minutes.

That is too hot for sustained use. You need to leave fan speed on auto. I'm not sure what Linux will do, but it could be what you're experiencing. Sometimes, regardless of OS, the system will just freeze up.
hero member
Activity: 924
Merit: 506
June 22, 2011, 07:30:05 PM
#7
I'll do some more testing later tonight when I'm back home. I'll apply the various suggestions from this thread and make a nice little grid to help define and furtherr isoloate the issue.

Anyone here have any experience on what linux will do if the video cards overheat? In this case, two 6990's at times reaching maybe between 80C to 90C...either for a few moments or for sustained periods of many minutes.
hero member
Activity: 924
Merit: 506
June 22, 2011, 07:20:03 PM
#6
When you get the black screen, try pressing Ctrl-Alt-F7
If that doesn't do anything, then try Ctrl-Alt-F1

When the screen is black, I found I can type, but it does nothing but display the characters typed. It's not the terminal mode with the prompt, it's just some strange mode. To add to that oddity, lately the screen goes completely black and no typing I do is displayed, BUT I discovered if I just type my login password (even though nothing is on the screen) it comes back to the desktop mode.

I didn't try Ctrl-Alt-F7, yet, but I did try Alt-F1 (without control) which brings me to a login prompt (not the small linux login window). When I login, it makes the whole screen effectively a terminal but no desktop.
hero member
Activity: 924
Merit: 506
June 22, 2011, 07:14:14 PM
#5
First, I must warn you that while I have some experience with Linux and Ubuntu, I'm no expert, so don't take what I say as fact.

You could try disabling any power-saving settings, but it seems to me Ubuntu is crashing somehow. Try setting fan speed higher, the 6990s need a lot of air throughput. You could be seeing the Linux equivalent of a BSOD if your cards are overheating. Also try just running Linux for awhile without mining, and see if the problem occurs or not.

Running it a while without mining results in a screen saver. However, I have seen it go to that black screen without mining. The screen saver setting (the Matrix screen saver Tongue) set for about 45 minutes. It comes up, but when it does, and I touch the mouse or keyboard, it either goes into ogin window or a black screen mode where if I type, it shows text being typed...but nothing come from that... it just displays what I type like any notepad...but no window...since hte whole screen is black. If I press ALT + F1 it displays login prompt (not window) and if I login...it just goes into what seems to be terminal mode.

I let my cards get close to 80C at times. I know that is too hot, but I don't think it is hot enough to make the system choke. It could crash soon after startup.

I'm not sure if I am dealing with two problems with similar symptoms or not.
member
Activity: 69
Merit: 10
June 22, 2011, 06:48:48 PM
#4
I second the suggestion to turn that fan speed up or leave it on auto.
Are you overclocking the cards?  Leave the speeds stock, also.
It sounds similar to a crash mode I get on ubuntu from cards that are pushed too far, thermally/speed wise after a while.
Pages:
Jump to: