Author

Topic: [ mining os ] nvoc - page 303. (Read 418549 times)

newbie
Activity: 14
Merit: 0
July 07, 2017, 01:17:31 AM
Question for the Linux gurus. So my problems were coming from bad risers, once I changed them out, the pci bus errors messages stopped. The messages were saying that error correction had occurred. I found that there is a way to suppress these messages in grub by using the "pci=nomsi" option. So once I enabled this option the operating system works (using Simplemining, about to try this on nvoc next) and the cards seem to work. So what is the danger/consequences of enabling this option and leaving it on? At least until I get a new batch of risers from China

thanks
newbie
Activity: 25
Merit: 0
July 07, 2017, 12:50:40 AM
This line means there is a problem with the bios (rom) on one of the GPUs:
Code:
WARNING: infoROM is corrupted at gpu 0000:07:00.0

I would return this GPU or RMA it.

You could try re flashing its rom with NVFlash; but if this doesn't work it will most likely void your warranty; so if the GPUs are new I would go the other route.

For fan speed, try setting:

Code:
SLOW_USB_KEY_MODE="YES" 

let me know if that works.

Also what kind of USB / SSD are you using?


Heya, thanks for the reply.

About to return the GPU, it's brand new bought couple of days ago. Not going to reflash it or anything not to void warranty, thanks for the tip.

About the fan speed.

Code:
m1@rig1:~$ export DISPLAY=
m1@rig1:~$ echo $DISPLAY
m1@rig1:~$ nvidia-settings -a [fan:0]/GPUTargetFanSpeed=75
Failed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused

ERROR: The control display is undefined; please run `nvidia-settings --help` for usage information.
m1@rig1:~$ cat Desktop/oneBash | grep 'SLOW_USB_KEY_MODE='
SLOW_USB_KEY_MODE="YES"         # YES NO
m1@rig1:~$ export DISPLAY=:0.0
m1@rig1:~$ xrandr
xrandr: Failed to get size of gamma for output default
Screen 0: minimum 1024 x 768, current 1024 x 768, maximum 1024 x 768
default connected 1024x768+0+0 0mm x 0mm
   1024x768       0.00*
m1@rig1:~$ echo $DISPLAY
:0.0
m1@rig1:~$ nvidia-settings -a [fan:0]/GPUTargetFanSpeed=75

** (nvidia-settings:5815): WARNING **: Couldn't register with accessibility bus: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

ERROR: Error querying enabled displays on GPU 0 (Missing Extension).


ERROR: Error querying connected displays on GPU 0 (Missing Extension).



ERROR: Error resolving target specification 'fan:0' (No targets match target specification), specified in assignment '[fan:0]/GPUTargetFanSpeed=75'.

xorg.conf
Code:
# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 378.13  (buildmeister@swio-display-x86-rhel47-05)  Tue Feb  7 19:37:00 PST 2017


Section "ServerLayout"
    Identifier     "layout"
    Screen      0  "nvidia" 0 0
    Inactive       "intel"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Keyboard0"
    Driver         "keyboard"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "intel"
    Driver         "modesetting"
    Option         "AccelMethod" "None"
    BusID          "PCI:0@0:2:0"
EndSection

Section "Device"
    Identifier     "nvidia"
    Driver         "nvidia"
    BusID          "PCI:1@0:0:0"
EndSection

Section "Device"
    Identifier     "nvidia"
    Driver         "nvidia"
    Option         "ConstrainCursor" "off"
    BusID          "PCI:4@0:0:0"
EndSection

Section "Device"
    Identifier     "nvidia"
    Driver         "nvidia"
    Option         "ConstrainCursor" "off"
    BusID          "PCI:7@0:0:0"
EndSection

Section "Device"
    Identifier     "nvidia"
    Driver         "nvidia"
    Option         "ConstrainCursor" "off"
    BusID          "PCI:8@0:0:0"
EndSection

Section "Device"
    Identifier     "nvidia"
    Driver         "nvidia"
    Option         "ConstrainCursor" "off"
    BusID          "PCI:10@0:0:0"
EndSection

Section "Screen"
    Identifier     "intel"
    Device         "intel"
    Monitor        "Monitor0"
EndSection

Section "Screen"
    Identifier     "nvidia"
    Device         "nvidia"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "on"
    Option         "IgnoreDisplayDevices" "CRT"
    Option         "ConstrainCursor" "off"
    Option         "Coolbits" "24"
    SubSection     "Display"
        Depth       24
        Modes      "nvidia-auto-select"
    EndSubSection
EndSection

Section "Screen"
    Identifier     "nvidia"
    Device         "nvidia"
    Monitor        "Monitor0"
    Option         "AllowEmptyInitialConfiguration" "on"
    Option         "IgnoreDisplayDevices" "CRT"
EndSection

Section "Screen"
    Identifier     "nvidia"
    Device         "nvidia"
    Monitor        "Monitor0"
    Option         "AllowEmptyInitialConfiguration" "on"
    Option         "IgnoreDisplayDevices" "CRT"
EndSection

Section "Screen"
    Identifier     "nvidia"
    Device         "nvidia"
    Monitor        "Monitor0"
    Option         "AllowEmptyInitialConfiguration" "on"
    Option         "IgnoreDisplayDevices" "CRT"
EndSection

Section "Screen"
    Identifier     "nvidia"
    Device         "nvidia"
    Monitor        "Monitor0"
    Option         "AllowEmptyInitialConfiguration" "on"
    Option         "IgnoreDisplayDevices" "CRT"
EndSection

Sandisk SSD 120GB, used dd to write the img to disk. Access to rigs only possible via SSH, no TV, no RDP (maybe VGA/HDMI if required).

When having a few rigs, easier to identify them like this than by IP (atleast in my case).
Code:
# hostname rig1
# echo "rig1" > /etc/hostname
# sed -i 's/m1-desktop/rig1/g' /etc/hosts
 then in oneBash
XXX_WORKER="$HOSTNAME"

Thanks for the help, i'll keep trying to fix the fanspeed thing
newbie
Activity: 25
Merit: 0
July 06, 2017, 10:28:07 PM
What PCIe slots do you have them plugged into?

Tried changing all the slots because there are 3 cards, tried using 123, 456, 124, 235 etc.

Looking at your MB manual I would recommend slotting the GPUs into PCIX16, PCIEX4_1, PCIEX4_2 (from top to bottom slots 2, 4, 6) Also make sure the monitor is plugged into slot 2 or PCIEX16.

If it still isn't working may need to change some settings such as changing it all to Gen 2 or Gen 1 in the BIOS.



Thanks, changing slots to 1-3-6 did help for 1 GPU, going to return one GPU as it seems broken or at lease it seems that BIOS is corrupted.
hero member
Activity: 651
Merit: 501
My PGP Key: 92C7689C
July 06, 2017, 10:16:07 PM
On a 4 x 1060 rig I use 1.1GB of RAM.  On a 6 x 1060 rig I use 1.3GB of RAM.  Unless there are spikes of memory usage or a memory leak somewhere, I don't see why 4GB would be more than enough.  It certainly shouldn't have any effect on Genoil stability.  Genoil seems to give comparable/better hash rates even with lower clocks and power limits than Claymore requires.  I've found that dropping the clocks has definitely increased stability without reducing hash rate.

I have two 1070s, and since switching from Claymore to Genoil a few days ago, I've had two or three instances where the rig has crashed on me.  My overclock settings had been -200 for GPU and +1200 for memory, which had been stable with Claymore.  I've dropped the memory overclock back to +1000; hopefully the crashes will stop. 

Even with the reduced setting, though, I'm still seeing an extra 2-3 MH/s with Genoil that Claymore wasn't delivering...and that's before you factor in the lack of a fee for the miner. Smiley
newbie
Activity: 27
Merit: 0
July 06, 2017, 09:15:53 PM
Is anybody else experiencing nvOC hang / lockup to the point of needing a hard powerdown when Genoil crashes?  I can log in but when I try to close the miner and shutdown the OS becomes locked up. I am wondering if it is hardware related?  I am only using 4GB of ddr4 is that enough???  I believe I will be going back to Claymore.  I can't seem to get Genoil stable even dialed 300mc back from Claymore.  I will reimage a USB stick and go back to Claymore to see if stability comes back. 

Having more ram would probably help; I use 8gb on most of my rigs and I have achieved multi day stability with the ones that are using genoil by previously lowing the clocks / adjusting the powerlimits whenever a soft crash occurred.

On a 4 x 1060 rig I use 1.1GB of RAM.  On a 6 x 1060 rig I use 1.3GB of RAM.  Unless there are spikes of memory usage or a memory leak somewhere, I don't see why 4GB would be more than enough.  It certainly shouldn't have any effect on Genoil stability.  Genoil seems to give comparable/better hash rates even with lower clocks and power limits than Claymore requires.  I've found that dropping the clocks has definitely increased stability without reducing hash rate.
newbie
Activity: 1
Merit: 0
July 06, 2017, 05:26:09 PM
Good day!  Does anyone know of a good way to identify which PCI Lane is assigned to each physical PCI slot.  Scanning system logs, System Info, etc. it is easy to correlate which PCI lane, GPU #, GPU serial number, etc. fell off the bus.  But is there a way to correlate PCI Lane to actual PCI slot?  Short of me logging the GPU serial number as I add each card to the board, It would be nice if there was a one stop shop to display GPU PCI slot information.  

I don't believe that they are assigned sequentially during system boot up because I have seen lane assignments go from 1, 2, 5, 6, 7, 8.   It would really help in troubleshooting riser problems if physical slot assignment was easily found.  

I have learned over the years that riser board version type really does not matter (even though I have seen several posts that claim v6 is better, v7 is better, C vs. S, etc.).   It all depends on where each lot is manufactured and if they had any kind of quality assurance check prior to shipment.  Heck I have an allotment of v1s from the BTC mining days that still work great.  Some riser boards work fine if the card is not overclocked too high.  But it is never the same across the board.    

Whether the risers are purchased from eBay, Newegg, Amazon, Craig's List, etc., it all seems to be a crap shoot.  And returning products across the ocean is real pain in the a$$.  My recommendation is find one vendor that has not failed you and stick with them (regardless of the wait time)...and order EXTRAS!  Does anyone know of a US, UK or Asia based reliable vendor?      

Just some food for thought from an extremely happy nvOC user.


I identified them by setting all fans to 0 and popping them up to 100 one at a time.
newbie
Activity: 13
Merit: 0
July 06, 2017, 05:02:45 PM
Good day!  Does anyone know of a good way to identify which PCI Lane is assigned to each physical PCI slot.  Scanning system logs, System Info, etc. it is easy to correlate which PCI lane, GPU #, GPU serial number, etc. fell off the bus.  But is there a way to correlate PCI Lane to actual PCI slot?  Short of me logging the GPU serial number as I add each card to the board, It would be nice if there was a one stop shop to display GPU PCI slot information.  

I don't believe that they are assigned sequentially during system boot up because I have seen lane assignments go from 1, 2, 5, 6, 7, 8.   It would really help in troubleshooting riser problems if physical slot assignment was easily found.  

I have learned over the years that riser board version type really does not matter (even though I have seen several posts that claim v6 is better, v7 is better, C vs. S, etc.).   It all depends on where each lot is manufactured and if they had any kind of quality assurance check prior to shipment.  Heck I have an allotment of v1s from the BTC mining days that still work great.  Some riser boards work fine if the card is not overclocked too high.  But it is never the same across the board.    

Whether the risers are purchased from eBay, Newegg, Amazon, Craig's List, etc., it all seems to be a crap shoot.  And returning products across the ocean is real pain in the a$$.  My recommendation is find one vendor that has not failed you and stick with them (regardless of the wait time)...and order EXTRAS!  Does anyone know of a US, UK or Asia based reliable vendor?      

Just some food for thought from an extremely happy nvOC user.
hero member
Activity: 651
Merit: 501
My PGP Key: 92C7689C
July 06, 2017, 04:06:11 PM
Any idea why when i try to SSH into nvOC as root with miner1 as my password i get "Permission denied" but as m1 it works?

EDIT: nevermind figured it out, my linux knowledge is very limited so i wasn't aware i can jump into root with "sudo su", anyhow after that "sudo echo b > /proc/sysrq-trigger" did manage to restart the rig.

Password-based root login is disabled for security reasons.  You can either log in as a normal user and use sudo to get root (as you did) or use public-key authentication to log into root directly.
newbie
Activity: 39
Merit: 0
July 06, 2017, 03:01:41 PM
So new build, similar problem to the first rig I built myself. Getting the never ending "bootloop" when I fire up my mobo with everything plugged in. I know everyone says to unplug everything and try it one part at a time, which i will indeed do, but I was wondering if anyone else had this issue and found a more uniform way to fix it? Last time it was because my RAM was loose. This time my connections are all secure.

Background: building a trio rig with 2 1080ti and a 1080 mini using a 270 mobo with 850w psu. First mobo I used for this build didn't work at all. This one fires up but then goes into the endless loop. I have tried a different psu and a different RAM as well as different GPU. Leaning towards it being a faulty cpu but curious to see if anyone else has any other suggestions before I dismantle everything.

CPUs are almost never bad; sometimes you can have a bent mobo pin that causes CPU related problems.  However, I don't think that is the case here.

Maybe this will work:

Ensure the monitor is connected to the primary GPU ( the one in the 16x slot closest to the CPU )

Disconnect the USB or SSD/HHD from the rig.

Fully power off everything: including the PSU.

Press the power button several times to clear any remaining power in the mobo.

Turn the PSU powerswitch back to | "on".

power on (without the USB attached)

See if the bios posts; if you get nothing in 20 seconds; press ctrl + alt + del repeatedly until the system reboots.

Wait and see if the bios posts.

If the bios posts attach the USB key and press ctrl + alt + delete.

Let me know if this works.


Thanks a ton, will give it a shot. And if this doesn't work? Just break everything?
newbie
Activity: 35
Merit: 0
July 06, 2017, 11:47:43 AM
Working on a video to get this set up with 6 1070 Asus Dual OC and the Z270 prime motherboard, hope it will help new users.

I did have one question, would it be possible to add DBIX in the next update? Also, I am running 12 1070's, and have been mining ZCash since, it seems like that was optimized with this miner and gtx cards in general. (maybe I am wrong?) What are people mining with their 1070's? is Zcash still the best or does mining NICE with the profit switcher make more sense?

DBIX - runs fine on Win on Claymore...I just edited address .. pool and some paramers .
For nvOC I've done the same thing but I get :
Quote
ETH: Authorization failed
: {"id":2,"jsonrpc":"2.0","result":null,"error":{"code":-1,"message":"Invalidlogin"}}
Stratum - reading socket failed, disconnect
ETH: Job timeout, disconnect, retry in 20 sec...

Pool link  : http://dbix.pool.sexy/#/help
Every rig is so stable ... too bad I suck at linux ! Time to learn
Grat job man !  Grin
newbie
Activity: 17
Merit: 0
July 06, 2017, 11:33:42 AM
Is anybody else experiencing nvOC hang / lockup to the point of needing a hard powerdown when Genoil crashes?  I can log in but when I try to close the miner and shutdown the OS becomes locked up. I am wondering if it is hardware related?  I am only using 4GB of ddr4 is that enough???  I believe I will be going back to Claymore.  I can't seem to get Genoil stable even dialed 300mc back from Claymore.  I will reimage a USB stick and go back to Claymore to see if stability comes back. 
legendary
Activity: 1834
Merit: 1080
---- winter*juvia -----
July 06, 2017, 10:34:34 AM
Working on a video to get this set up with 6 1070 Asus Dual OC and the Z270 prime motherboard, hope it will help new users.

I did have one question, would it be possible to add DBIX in the next update? Also, I am running 12 1070's, and have been mining ZCash since, it seems like that was optimized with this miner and gtx cards in general. (maybe I am wrong?) What are people mining with their 1070's? is Zcash still the best or does mining NICE with the profit switcher make more sense?

imho

1060, 1070 = ETH

1080ti = ZEC
newbie
Activity: 18
Merit: 0
July 06, 2017, 10:30:13 AM
Working on a video to get this set up with 6 1070 Asus Dual OC and the Z270 prime motherboard, hope it will help new users.

I did have one question, would it be possible to add DBIX in the next update? Also, I am running 12 1070's, and have been mining ZCash since, it seems like that was optimized with this miner and gtx cards in general. (maybe I am wrong?) What are people mining with their 1070's? is Zcash still the best or does mining NICE with the profit switcher make more sense?
newbie
Activity: 29
Merit: 0
July 06, 2017, 08:31:51 AM
Any idea why when i try to SSH into nvOC as root with miner1 as my password i get "Permission denied" but as m1 it works? turns out one of my cards is problematic and cant handle much lower clocks then the rest of the cards and the whole system gets stuck to the point where i cant even reboot it remotely with SSH, at least not with "sudo reboot", "sudo shutdown -r now" or "sudo init 6", even "sudo systemctl --force --force reboot" doesnt work, just hangs there without even giving me an error message, i want to try using "sudo echo b > /proc/sysrq-trigger" but i cant do that unless im on root.

just fuck my ubuntu up..
https://i.imgur.com/0pSDR1p.png

help.

EDIT: nevermind figured it out, my linux knowledge is very limited so i wasn't aware i can jump into root with "sudo su", anyhow after that "sudo echo b > /proc/sysrq-trigger" did manage to restart the rig.
legendary
Activity: 4354
Merit: 9201
'The right to privacy matters'
July 06, 2017, 07:26:13 AM
I have been GPU mining for over a year now (mostly ETH on AMD rigs) using Ethos. In my experience, 70=80% of the hardware problems I have had have been related to poor quality risers. It's hard to find a source of decent quality ones - they are all made in China and it seems with little or no quality control. They are a very cheap high volume item so this means unfortunately for us it's just luck if  you get good ones.
I have a friend who ordered a bag of 10 and 7 of them were faulty right away, then an eighth failed after 24 hours.

So if you find a reliable source  - buy twice as many as you think you will need!

I have a stash of extra risers; I agree with newmz: its a good idea to keep extras on hand.

this is  a better way  go riser free
2x 1080 ti's 1x 1070 itx
1750 sols  for zec uses 500- 550 watt


mobos slots are better then risers.
a cpu- ram-- mobo-- psu -- usb stick combo for 3 cards is about
120 +  50 +  114 +  130  +    16 =  430  high end 

50  + 25  + 114  +   100  +   11  = 300  low end

so low end   600  all under long warranty  for six cards two 3 card rigs.

six card   rig

50 + 25 + 114 + 400 +  11 + 60  = 660   if you go with evga  1600

______________260 ------------ = 520 if you use 2 rosewill quark 850 watts    and the risers are not under warranty.

-------------------150 ------------ = 410 if you use  a server pico setup 

So 410   vs 600 on smaller farms  is not much  for 2 complete 3 card  vs 1 six complete.

All my three card can go to 4 card with 1 riser
many of my three card can go to 5 cards with 2 risers.

the one below does 5 with zero problems



newbie
Activity: 39
Merit: 0
July 06, 2017, 07:12:20 AM
So new build, similar problem to the first rig I built myself. Getting the never ending "bootloop" when I fire up my mobo with everything plugged in. I know everyone says to unplug everything and try it one part at a time, which i will indeed do, but I was wondering if anyone else had this issue and found a more uniform way to fix it? Last time it was because my RAM was loose. This time my connections are all secure.

Background: building a trio rig with 2 1080ti and a 1080 mini using a 270 mobo with 850w psu. First mobo I used for this build didn't work at all. This one fires up but then goes into the endless loop. I have tried a different psu and a different RAM as well as different GPU. Leaning towards it being a faulty cpu but curious to see if anyone else has any other suggestions before I dismantle everything.
newbie
Activity: 17
Merit: 0
July 05, 2017, 10:39:50 PM
Hi everyone. I have an Asus Prime Z270-P running great with 2 EVGA 1070's.  I want to run it with 8 cards which I have. Should I reimage the USB stick or will it be fine with installing the cards with risers and just running it again? I'm also assuming I have to run it without a monitor hooked up.
I have that exact board and have been running 8 cards for several days now.  You should be fine as long as you have the onebash file configured correctly for the new cards already.  Just shut down, add cards, and reboot.  You will of course need 2 m.2 adapters if you haven't got them already.  I am running 6 1070s, 1 1060, and 1 970.  Right around 224 MH/s.  Could be closer to 230 if I could figure out how to overclock the Maxwell card.

That's great news. Yes I do have all adapters.  Did you have a monitor hooked up or did you ssh in to check on it? Thanks I really appreciate this. 

Although the posts in this thread using SSH and Screen are helpful I have not gone through that.  I have the rig in my basement with a monitor hooked to it as well as Teamviewer installed on all my PCs and cell phone.  With Teamviewer you can remote in from anywhere although it uses more system resources. 
newbie
Activity: 9
Merit: 0
July 05, 2017, 10:24:01 PM
Hi everyone. I have an Asus Prime Z270-P running great with 2 EVGA 1070's.  I want to run it with 8 cards which I have. Should I reimage the USB stick or will it be fine with installing the cards with risers and just running it again? I'm also assuming I have to run it without a monitor hooked up.
I have that exact board and have been running 8 cards for several days now.  You should be fine as long as you have the onebash file configured correctly for the new cards already.  Just shut down, add cards, and reboot.  You will of course need 2 m.2 adapters if you haven't got them already.  I am running 6 1070s, 1 1060, and 1 970.  Right around 224 MH/s.  Could be closer to 230 if I could figure out how to overclock the Maxwell card.

That's great news. Yes I do have all adapters.  Did you have a monitor hooked up or did you ssh in to check on it? Thanks I really appreciate this. 
newbie
Activity: 17
Merit: 0
July 05, 2017, 08:48:28 PM
Hi everyone. I have an Asus Prime Z270-P running great with 2 EVGA 1070's.  I want to run it with 8 cards which I have. Should I reimage the USB stick or will it be fine with installing the cards with risers and just running it again? I'm also assuming I have to run it without a monitor hooked up.
I have that exact board and have been running 8 cards for several days now.  You should be fine as long as you have the onebash file configured correctly for the new cards already.  Just shut down, add cards, and reboot.  You will of course need 2 m.2 adapters if you haven't got them already.  I am running 6 1070s, 1 1060, and 1 970.  Right around 224 MH/s.  Could be closer to 230 if I could figure out how to overclock the Maxwell card.
newbie
Activity: 13
Merit: 0
July 05, 2017, 08:48:07 PM
Hi, newbie miner here!

Just wanted to register and say THANK YOU so much for making this, works brilliant on my PC (dual 980 ti's), and I love that I can run this from a USB drive.
Currently mining ETH with my 980ti's, and XMR with my CPU.
Not really skilled in Linux but my performance was so low in Win10, and this guide was pretty easy to follow , and performance is great IMO.
Any chance for Litecoin support?

Keep up the good work!
Jump to: