Author

Topic: [ mining os ] nvoc - page 305. (Read 418549 times)

newbie
Activity: 14
Merit: 0
July 04, 2017, 11:48:29 PM
So my rig crashed again, it was up for about 19 hours with the current settings. The previous it crashed I had not been able to see it crash, I just knew it because the screen was blank and the fans on the gpus went up to 100 percent. This time I was siting in front of it doing something else when the screen went blank and the fans kicked up to 100 percent. My question is if there is some kind of log that could be looked at to see what caused the crash or can one be enabled that only keeps the last one hour of activity?

thanks in advance

look at the syslog:

go to ubuntu button top left and enter:

sy

click on system log

when I do that it gives me a stream of those messages in my previous post
newbie
Activity: 14
Merit: 0
July 04, 2017, 11:41:03 PM
So my rig crashed again, it was up for about 19 hours with the current settings. The previous it crashed I had not been able to see it crash, I just knew it because the screen was blank and the fans on the gpus went up to 100 percent. This time I was siting in front of it doing something else when the screen went blank and the fans kicked up to 100 percent. My question is if there is some kind of log that could be looked at to see what caused the crash or can one be enabled that only keeps the last one hour of activity?

ssh in and look at the tail end of /var/log/dmesg.  I have some crappy PCIe extenders here that would interrupt the connection between the GPU and the computer as soon as mining software fired up.  The errors show up toward the end of /var/log/dmesg.

There's also /var/log/messages, but that tends to be less useful for hardware errors.

I have a keyboard and monitor connected to the rig for now, I found a file named kern.log that is 1.7 GB in size and kern.log.1 that is about 650 MB. these are the messages

m1-desktop kernel: [105577.938217] pcieport 0000:00:1b.0:    [ 0] Receiver Error         (First)
m1-desktop kernel: [105577.949736] pcieport 0000:00:1b.0: AER: Corrected error received: id=00d8
m1-desktop kernel: [105577.949750] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00d8(Receiver ID)
m1-desktop kernel: [105577.949757] pcieport 0000:00:1b.0:   device [8086:a2eb] error status/mask=00000001/00002000

and

m1-desktop kernel: [105577.995353] pcieport 0000:00:1b.0: AER: Corrected error received: id=00d8
m1-desktop kernel: [105577.995360] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00d8(Receiver ID)
m1-desktop kernel: [105577.995363] pcieport 0000:00:1b.0:   device [8086:a2eb] error status/mask=00000001/00002000

once in a while I get this

m1-desktop kernel: [105576.736779] pcieport 0000:00:1b.0: can't find device of ID00d8


no idea what those mean
newbie
Activity: 51
Merit: 0
July 04, 2017, 10:11:33 PM
You could configure your router to forward a port other than 22 to port 22 on your mining rig.  I haven't bothered with that with mine, though; I can ssh into my FreeNAS media server (or my desktop, if it's booted into Linux...can RDP into it if it's running Windows and set it to reboot into Linux) from outside and then ssh into the mining rig from there.  

Yes, I posted about this earlier, but the concern is that would leave nvOC's SSH daemon open to the WAN running with a default password for those of us who don't have another SSH daemon on our LAN to use as an intermediary.  Someone could wreck all sorts of havoc if they had access to a linux box on your local network to use as a launching point, so I personally would want to have my own unique password set before I'll forward any ports to nvOC.  I have a feeling there would be some extra steps involved if one were to change the password for the m1 user on nvOC since oneBash runs commands that require escalation, but I'm not sure where oneBash gets the m1 user's password from when its executing commands.  I'm sure OP can clarify this when he gets caught up on posts.

PS, Fullzero, I'm really liking v0017 so far.  Excellent work!!
hero member
Activity: 651
Merit: 501
My PGP Key: 92C7689C
July 04, 2017, 10:04:22 PM
So my rig crashed again, it was up for about 19 hours with the current settings. The previous it crashed I had not been able to see it crash, I just knew it because the screen was blank and the fans on the gpus went up to 100 percent. This time I was siting in front of it doing something else when the screen went blank and the fans kicked up to 100 percent. My question is if there is some kind of log that could be looked at to see what caused the crash or can one be enabled that only keeps the last one hour of activity?

ssh in and look at the tail end of /var/log/dmesg.  I have some crappy PCIe extenders here that would interrupt the connection between the GPU and the computer as soon as mining software fired up.  The errors show up toward the end of /var/log/dmesg.

There's also /var/log/messages, but that tends to be less useful for hardware errors.
hero member
Activity: 651
Merit: 501
My PGP Key: 92C7689C
July 04, 2017, 10:01:01 PM
I went a step lower than Pentium on my 2 rigs and bought $50 G3930 Celeron processors since I am only GPU mining.  They run nvOC quite stably (I just returned from a 5 day vacation and both of my rigs that were running v0016 stayed up the entire time I was gone). 

Mine's a Celeron G3920...Skylake vs. Kaby Lake.  The motherboard I was using the first few weeks (a Biostar Racing Z170GT7) might not have shipped with a BIOS that supported Kaby Lake CPUs out of the box.  That board conked out (was an open-box purchase), so I sent it back and am now running an Asus Prime Z270-AR (only difference between it and the Z270-A referenced in the OP is a lack of DisplayPort and DVI ports, AFAIK).

Quote
Granted, I am not using Teamviewer like some folks here.  That will consume more system resources.  I can do everything I need with SSH and the screen command if I'm at home.  I did leave one of my windows workstations online while I was gone so I could teamviewer into that and from there SSH into my rigs if necessary, but luckily I had no need to. 

You could configure your router to forward a port other than 22 to port 22 on your mining rig.  I haven't bothered with that with mine, though; I can ssh into my FreeNAS media server (or my desktop, if it's booted into Linux...can RDP into it if it's running Windows and set it to reboot into Linux) from outside and then ssh into the mining rig from there.  Never used Teamviewer; tried accessing the mining rig with both RDP and VNC, and neither worked.  SSH works better for this purpose anyway, once you're familiar with it.
newbie
Activity: 14
Merit: 0
July 04, 2017, 09:47:20 PM
So my rig crashed again, it was up for about 19 hours with the current settings. The previous it crashed I had not been able to see it crash, I just knew it because the screen was blank and the fans on the gpus went up to 100 percent. This time I was siting in front of it doing something else when the screen went blank and the fans kicked up to 100 percent. My question is if there is some kind of log that could be looked at to see what caused the crash or can one be enabled that only keeps the last one hour of activity?

thanks in advance
hero member
Activity: 672
Merit: 500
July 04, 2017, 09:38:08 PM
Sort of a broad question here, but anyone have any suggestions to why my mobo wont turn on? everything seems like its connected properly, probably screwed the switch up or the pins for the switch. just wondering if anyone encounters semi generic problems or common issues? its a 270 board.

If your mobo is sitting out on a table/etc and isn't inside a case take a photo, upload it to imgur, and post it here.  That might help.

Off the top of my head: did you connect the 24 pin ATX power AND the 8 pin CPU power?  Are you sure you have your power switch connected to the proper headers/pins?  If you look closely at the motherboard front panel pins there's a legend to show which pins correspond to power, reset, HDD LED, etc.  Remove your front panel connectors/power switch and try taking a flathead screwdriver and touching it to the 2 power pins simultaneously for about a half second (creating a short between them which is what your power button does) and see if it powers up.

Last but not least... is the toggle switch on your power supply turned on?  Stranger things have happened Wink

I forgot one time to plug in the CPU power.
I would try starting it up with nothing in it and go from there
Clear the cmos. Little things add up
newbie
Activity: 51
Merit: 0
July 04, 2017, 09:34:47 PM
Sort of a broad question here, but anyone have any suggestions to why my mobo wont turn on? everything seems like its connected properly, probably screwed the switch up or the pins for the switch. just wondering if anyone encounters semi generic problems or common issues? its a 270 board.

If your mobo is sitting out on a table/etc and isn't inside a case take a photo, upload it to imgur, and post it here.  That might help.

Off the top of my head: did you connect the 24 pin ATX power AND the 8 pin CPU power?  Are you sure you have your power switch connected to the proper headers/pins?  If you look closely at the motherboard front panel pins there's a legend to show which pins correspond to power, reset, HDD LED, etc.  Remove your front panel connectors/power switch and try taking a flathead screwdriver and touching it to the 2 power pins simultaneously for about a half second (creating a short between them which is what your power button does) and see if it powers up.

Last but not least... is the toggle switch on your power supply turned on?  Stranger things have happened Wink
newbie
Activity: 39
Merit: 0
July 04, 2017, 09:01:33 PM
Sort of a broad question here, but anyone have any suggestions to why my mobo wont turn on? everything seems like its connected properly, probably screwed the switch up or the pins for the switch. just wondering if anyone encounters semi generic problems or common issues? its a 270 board.
newbie
Activity: 44
Merit: 0
July 04, 2017, 09:01:14 PM
First of all big thank you to fullzero and everyone contributing to this distro!

I've been struggling with the Genoil crash issue and lack of watchdog implementation for the past few days and I have a bandaid solution that seems to be actually working quite well, perhaps it can help others in the community:

Essentially you need to split the Genoil output to a file, grep it (we only care about 'error' instances only ;  and then this output as input for a monitoring script that kills and restarts the misbehaving process.

So we have 2 scripts launched in screen as daemons "ltail" script and "ett" script

$screen -dmS ltail sh ~/eth/Genoil-U/ltail
and
$screen -dmS ett bash ~/ett

ltail:
--------------------------
#!/bin/bash
echo listening...
cd ~/eth/Genoil-U/
tail -fn0 err.log | \
while read line ; do
        DATE=$(date +%d-%m-%Y" "%H:%M:%S)
        echo "$DATE $line" | grep "error" | tee -a ~/eth/Genoil-U/timestamp.log
        if [ $? = 0 ]
        then
                kill $(ps aux | grep '[e]thminer' | awk '{print $2}')
                sleep 1
                screen -dmS ett bash ~/ett
        fi
done
-------------------------
ett:
-------------------------
#!/bin/bash
cd ~/eth/Genoil-U
./ethminer -U -F eth-us.dwarfpool.com:80/0xBEbd092a03827C37B75cd4ea314b207AA65c348f/208 2>&1 | tee >(grep error --color=never --line-buffered | tee -a err.log)

-------------------------

finally I also send output of ltail to timestamp.log to track how many times Genoil fails per hour - with roughly aiming at 1 crash per hour this gives me about 130MHs out of 5xGTX1060 which is a good 20+ MHs higher then Claymore... most importantly it gives stable hashing despite the OC introduced errors. The recovery is literally seconds.
Oh yeah and I also run
$tail -f ~/eth/Genoil-U/timestamp.log in a screen as well as watch -n 5 'sensors |grep Core' in another screen to fine tune the OC vs crash per hour vs temp
Hope this helps, and I hope the message is not too chaotic.
Cheers!

BTC: 13PnEKpfVzNseWkrm6LoueKcCMPj74zPv7
ETH: 0xBEbd092a03827C37B75cd4ea314b207AA65c348f
newbie
Activity: 51
Merit: 0
July 04, 2017, 08:42:23 PM
Seems like 6x pin powered risers solved my issue with 1050ti's crashing. Thanks a lot @fullzero and others

Now, I'm interested, is there a way to see all rigs on API and to be able to see that from outside network? If so, how to configure it with router? I got a MikroTik behind the 24-port switch.

Best way to do this is to setup a OpenVPN into the network and allowing it on the same subnet. Once you VPN, the connection will act just like if you were on the home network. It will also be secure if you use higher level of encryption like AES256-CBC.

You could just use SSH for this if you don't want to setup a VPN server, as SSH also uses AES-256 encryption and is every bit as secure as VPN, plus it's already running!  The only config required would be to apply a static DHCP lease in your router so each miner always has the same LAN IP assigned to it, and to also forward appropriate port(s) in your router (i.e. you could for instance set am unused incoming WAN port like 2222 to forward all inbound traffic on that port to LAN port 22 (default SSH port) on LAN IP 10.20.30.40 if that were the LAN IP for your nvOC rig.  If you have multiple rigs 2222 forwards to port 22 on 10.20.30.40, WAN port 2223 forwards all incoming traffic to LAN port 22 on IP 10.20.30.41, etc).  My only concern here though is that I would want to change the default password (miner1) before opening up an outside port to nvOC's SSH daemon as a clever hacker might scan your WAN IP (which is a thing, bored people/malicious people do this) and find that open port and get lucky somehow by trying "miner1" as a password.  Changing the system password is as simple as running passwd from guake/SSH, but I wouldn't recommend doing that until OP can give some guidance on if that will cause problems within oneBash.  Most of the commands executed in oneBash require privilege escalation and I don't know where it finds the "miner1" password.

OP, can you shed any light on that?  Is it okay to change the password for the m1 user without editing anything else?  I don't see it inside oneBash itself.

newbie
Activity: 51
Merit: 0
July 04, 2017, 08:23:05 PM
Also, I couldn't find how I can see the current mining process. I did see the screen -r commands, but that implies killing the current process and restarting it. I'd like to be able to see, from SSH, the current mining process without killing it. Is this possible?

If you want to monitor the mining process via screen you're going to have to kill the initial gnome-terminal.  There's no way around that, as screen can only reconnect to an existing screen session.

This shouldn't be a big deal if you have a stable rig.  You only need to do it once per reboot.  My process is:

1. From my desktop where I monitor my rigs I initiate a constant ping:
Code:
ping -t 10.20.30.40  # substitute your rig's IP, find it in your router, or by running nmap on your LAN subnet, or by running ifconfig from a guake terminal on the rig if you have a monitor connected
2. Boot the rig
3. Wait until I begin to get ping responses from the rig, thus indicating Ubuntu has booted and rig has network connectivity
4. SSH into the rig (user: m1  password: miner1)
5. Initiate a screen session:
Code:
screen -s [name for your rig, make one up or call it "rig"]
6. Start nvidia-smi dmon to watch for mining process to begin (by waiting until this happens you know OC settings, fan speed settings, etc have been applied.  Running those commands from within screen isn't 100% consistent IME as I always see error messages when I tried it that way.  It's best to let those settings commands run from gnome-terminal as Ubuntu first boots IMO).
Code:
nvidia-smi dmon
7. Wait until you see wattage go up and GPU utilization go up to 100% (which indicates that the oneBash script concluded and opened the mining process).  Exit nvidia-smi with CTRL + c
8. Find the PID for gnome-terminal.  
Code:
ps aux | grep gnome-terminal
9. Kill it:
Code:
kill [PID from step 8]
10. Restart mining:
Code:
bash '/media/m1/1263-A96E/oneBash'

It might seem like a lot of steps, but it takes all of 120 seconds and you shouldn't need to do it very often once your rig is dialed in.  You're losing maybe 1 minute's worth of hashes on avg of every week?  Pretty negligible considering the convenience of monitoring from another workstation, and you're not using up system resources by using Teamviewer.  This also lets you go completely headless if you buy a dummy HDMI plug.  I just updated from 16 to 17 and didn't need to haul my extra monitor upstairs to do it.  Easy peasy.
newbie
Activity: 51
Merit: 0
July 04, 2017, 07:51:43 PM
@ fullzero  

here is a really detailed build of the nvoc0017  with 2 nvidia 1070's on a

GIGABYTE GA-Z270P-D3 LGA1151 Intel Z270 2-Way Crossfire ATX DDR4 Motherboard.

to all this is a solid board  really good

I tested stable up to 5 amd rx 480's  on win 10 and smos
I tested stable up to 4 1080 ti's on win 10 and win 7  tested up to 3 on nvoc

I am sure it will do 5  on all of the above well maybe not win 7.  I just did not test that high on all os's

https://bitcointalksearch.org/topic/here-is-a-thread-for-newbies-to-setup-a-nvoc-0017-rig-to-mine-zec-1998198

Can I ask: why do you go for the higher end CPU?

I've been running 2 eth rigs on Asrock H81 Pro BTC boards for over a year, mostly using Ethos (which is an AMD linux mining distro). I recently started to convert to Nvidia so I'm using the same setup but one of them now using nvOC and 2 1070s + 3 1060s. I always used the cheapest low end pentium (I forget exactly which - 2 cores 3.3GHz) and it was always fine. Seems fine in nvOC so far too. Unless you want to run that XMR CPU miner I guess.

I went a step lower than Pentium on my 2 rigs and bought $50 G3930 Celeron processors since I am only GPU mining.  They run nvOC quite stably (I just returned from a 5 day vacation and both of my rigs that were running v0016 stayed up the entire time I was gone).  Granted, I am not using Teamviewer like some folks here.  That will consume more system resources.  I can do everything I need with SSH and the screen command if I'm at home.  I did leave one of my windows workstations online while I was gone so I could teamviewer into that and from there SSH into my rigs if necessary, but luckily I had no need to. 
newbie
Activity: 14
Merit: 0
July 04, 2017, 04:22:40 PM
My rig crashed from having the settings too high, it went down when I was asleep. I rebooted it and it's up and running but I'm getting a low disk space warning. What file / directory do I delete ?
run this code line and you are golden on space
Code:
sudo apt-get purge $(dpkg -l linux-{image,headers}-"[0-9]*" | awk '/ii/{print $2}' | grep -ve "$(uname -r | sed -r 's/-[a-z]+//')")
that worked. Thank you so much!

Thanks for helping xleejohnx

gig410 what version are you using?

Im using 0017, sorry for the late response.
hero member
Activity: 651
Merit: 501
My PGP Key: 92C7689C
July 04, 2017, 02:49:50 PM
I was able to resolve the issue by removing all the old mine_ALGO.sh files from the /media/m1/1263-A96E directory.  I assumed these would not interfere with the new switch.py script.

I wouldn't have thought those would interfere as the new script shouldn't call them, but I "git rm"'d them before the latest commit.  Odd.
full member
Activity: 169
Merit: 100
July 04, 2017, 02:41:43 PM
Putting up another rig here soon. Going to test out the mining Algo switching that was just put out a few pages back.

Anyone have suggested settings for 1060s? Saw some on the first page but wanted to see if anyone else has some.
newbie
Activity: 26
Merit: 0
July 04, 2017, 01:40:51 PM
For some odd reason since you consolidated all the code to the switch.py, now the only miner that runs is equihash.

Mine's been running daggerhashimoto almost exclusively since the latest release.  Looking at current-profit, we have:

I am noticing if something other than equihash is more profitable in "current-profit" it does kill the equihash mining processes but fails to start anything else.

I manually switched to equihash, let it get up and running, and then fired up the script.  It killed the equihash miner and restarted the daggerhashimoto miner.

You are aware that the miner runs in a screen session, right?  When the script switches from one algo to another, the screen session associated with the first miner ends and a new one is started with the second miner.  screen -dr miner will bring up the currently-running miner.

I was able to resolve the issue by removing all the old mine_ALGO.sh files from the /media/m1/1263-A96E directory.  I assumed these would not interfere with the new switch.py script.
hero member
Activity: 651
Merit: 501
My PGP Key: 92C7689C
July 04, 2017, 01:29:41 PM
For some odd reason since you consolidated all the code to the switch.py, now the only miner that runs is equihash.

Mine's been running daggerhashimoto almost exclusively since the latest release.  Looking at current-profit, we have:

I am noticing if something other than equihash is more profitable in "current-profit" it does kill the equihash mining processes but fails to start anything else.

I manually switched to equihash, let it get up and running, and then fired up the script.  It killed the equihash miner and restarted the daggerhashimoto miner.

You are aware that the miner runs in a screen session, right?  When the script switches from one algo to another, the screen session associated with the first miner ends and a new one is started with the second miner.  screen -dr miner will bring up the currently-running miner.
newbie
Activity: 26
Merit: 0
July 04, 2017, 12:23:52 PM
I will hold off on integrating this for now then (and wait for your changes); in the meantime I will make a link to your repo on the OP.

I've committed an update that, if it pans out, rolls everything into one Python script...no auxiliary shell scripts.  I'm testing it right now to verify that it behaves the same as the previous version.  I suspect I'll know in the morning.

Edit: Just did some accelerated testing by manually switching to a less-profitable coin first...the script killed the miner and fired up the appropriate miner.  I think the most recent update is ready for wider testing:

https://gitlab.com/salfter/nvoc-nicehash-switcher


For some odd reason since you consolidated all the code to the switch.py, now the only miner that runs is equihash.

Mine's been running daggerhashimoto almost exclusively since the latest release.  Looking at current-profit, we have:

Code:
neoscrypt: 0.00122266 BTC/day (3.14 USD/day)
lyra2rev2: 0.00044478 BTC/day (1.14 USD/day)
daggerhashimoto: 0.00221102 BTC/day (5.67 USD/day)
lbry: 0.00039243 BTC/day (1.01 USD/day)
equihash: 0.00163831 BTC/day (4.20 USD/day)
pascal: -0.00003248 BTC/day (-0.08 USD/day)

I now have it logging the data (instead of overwriting it), and I have the current unpaid balances at NiceHash.  I'll let it keep running and see what happens.

I am noticing if something other than equihash is more profitable in "current-profit" it does kill the equihash mining processes but fails to start anything else.
hero member
Activity: 651
Merit: 501
My PGP Key: 92C7689C
July 04, 2017, 12:16:03 PM
I will hold off on integrating this for now then (and wait for your changes); in the meantime I will make a link to your repo on the OP.

I've committed an update that, if it pans out, rolls everything into one Python script...no auxiliary shell scripts.  I'm testing it right now to verify that it behaves the same as the previous version.  I suspect I'll know in the morning.

Edit: Just did some accelerated testing by manually switching to a less-profitable coin first...the script killed the miner and fired up the appropriate miner.  I think the most recent update is ready for wider testing:

https://gitlab.com/salfter/nvoc-nicehash-switcher


For some odd reason since you consolidated all the code to the switch.py, now the only miner that runs is equihash.

Mine's been running daggerhashimoto almost exclusively since the latest release.  Looking at current-profit, we have:

Code:
neoscrypt: 0.00122266 BTC/day (3.14 USD/day)
lyra2rev2: 0.00044478 BTC/day (1.14 USD/day)
daggerhashimoto: 0.00221102 BTC/day (5.67 USD/day)
lbry: 0.00039243 BTC/day (1.01 USD/day)
equihash: 0.00163831 BTC/day (4.20 USD/day)
pascal: -0.00003248 BTC/day (-0.08 USD/day)

I now have it logging the data (instead of overwriting it), and I have the current unpaid balances at NiceHash.  I'll let it keep running and see what happens.
Jump to: