Pages:
Author

Topic: [ mining os ] nvoc - page 84. (Read 418542 times)

newbie
Activity: 4
Merit: 0
January 08, 2018, 01:57:00 PM
Is there a simple command to update xorg? I think this is the solution to my driver/temp control problems.

this will generate one that enables most features


Code:
sudo nvidia-xconfig --enable-all-gpus --cool-bits=31
for normal gpus


Code:
sudo nvidia-xconfig --enable-all-gpus --cool-bits=12
for P106_100s


My guess is your problem is related to the P106_100s and normal GPUs working best with different coolbits settings.  Hard to know without one of those to test with.




hey just wondering if anyone can help? A puzzle for someone....  
I have 12 x1080's and have always had an issue wth the auto temp/fan speed for the 12th (GPU 11). Anyways, since I updated to the NVOC in dec to nvOC-19-2-update (by_fullzero_unofficial), it has mined anything no issues. and the only thing was as I mentioned..

Suddenly the end of December it constantly crashes on Equihash on both ZM and EWBF. The error was always that GPU 11 was lost. Anyhow, after some digging I found that in the Nvidia X Server Settings for GPU 11 the box was not there to tick for the fan speed.

After some more digging I found that in the etc/X11/xorg.conf file, the GPU 11 was not set correctly. It is in pci slot 18, and in the Section "Device" it only went up to PCI-17.

 I did as in the quote  sudo nvidia-xconfig --enable-all-gpus --cool-bits=31 and that didnt work either (so had to restore original), but it did give me all the correct pci locations of the GPU devices, so I have updated the xorg.conf  with the relevant pci settings.

Now the  auto temp/fan speed errors have gone and the tick box appears in Nvidia X Server Settings for GPU 11 .  

But, it still crashes on Equihash, it says
"Unable to determine the device handel for GPU 0000:1200.0 GPU is lost."
 I  updated NVOC on the 4th JAN to the nvOC v0019-2.0 - community release. Any suggestions anyone?    

 Also, why does my  etc/X11/xorg.conf file say Im using 1050's when I am using 1080's? Does it even matter?

I do not know if you have updated from version 0019-1.4 to version 19-2.0 with papampi scripts.

If so, is it possible for you to try to use the latest papampi image from the links we just posted and apply the latest updates from papampi.

Of course, you will not forget to save your current system with HDDrawCopy or diskdump (dd) for example, in another medium.

I had problems with version 19-1.4 too. not of this type, but I have a month since I installed the 19-2.0 and repared the disk by adding swap. I know it has nothing to do with the problem you are describing

I am using https://github.com/papampi/nvOC_by_fullzero_Community_Release 

I would do the reimaging, but my linux command knowlege is limited, and I a bit unsure as to how to go about that, if anyone could take me though that great ly appreciate it :-)



so.....

I gave it another crack with the
Code:
sudo nvidia-xconfig --enable-all-gpus --cool-bits=31
for normal gpus and rebooted. terminal ran as normal and came up with error as before, I closed terminal before it had chance to reboot,   reopended terminal and hey presto its all working as it should and not crashing on equihash. Now the fan tick box comes up in the Nvidia X Server Settings for GPU 11, and no more GPU 11 errors. phew
full member
Activity: 200
Merit: 101
January 08, 2018, 01:40:41 PM
Hi all, I am mining zec by zm and met some problem.
My rig :
Gpu 1063 x10 power limit set to 70w, 4g ram , cpu is g3930,  asus mining expert. Psu 750w x2
nvoc 19-2

The watchdog log shows gpu is lost and the watchdog try to reboot system to recovery gpu.
the system then hang up.

I also try to turn off the watchdog. The zm error msg is gpu launched timed out. Then the system become unstable and hang up in one or two move.

Anyone met this problem before? I think if the watchdog can reboot system correctly  then the zm error will not be a problem.

Thanks

Obviously not a watchdog problem as your system hangs up without watchdog too. Something wrong with one of your GPU's. It might be bad cable, bad riser or bad GPU. Try to troubleshoot by swapping cables, riser, even GPU.

Notice which GPU reports problem by watchdog or your miner, then run
Code:
./nvOC gpumap

To determine the physical location of the problematic GPU.
full member
Activity: 200
Merit: 101
January 08, 2018, 01:31:28 PM
Hello All, I am running the latest version of nvoc and it was working perfectly with NICE_EQUIHASH. I have 2 Nvidia 980 ti cards and was averaging about $12 per day. Suddenly the machine rebooted and now the miner will not start. I am getting an error about GPU Utilization and unable to resolve host.





Did something happen to my video cards or PC?

Your thoughts?

Thanks

This appears to be a DNS or internet access issue given the errors. Can you ping your pool?

Your right, no internet. I have an IP from the router but cannot do anything. I even attempted to assign a manual IP and no go. What else should I check?

Upgrade to v0019-2.0, it has better internet check to prevent miner restart and rig start
And soon pool watch will be added too

Thanks but I need internet to run the update. I mounted a second usb but without internet I cannot run the update.

Restart your router and rig. That usually restores internet connection.

I have done that many times. Tried dhcp and static but it's just not working. It doesn't make sense. Changed cat 5 cables too. I can ping the gateway but if I try to ping yahoo.com or pull up Firefox and go to google.com it fails.

There were quite a few issues with 19-1.4 version of nvoc. Not worth trying to figure out and fix the problem when there is newer and much improved version 19-2.0. Why don't you give 19-2.0 a try?
newbie
Activity: 2
Merit: 0
January 08, 2018, 12:32:35 PM
Hi all, I am mining zec by zm and met some problem.
My rig :
Gpu 1063 x10 power limit set to 70w, 4g ram , cpu is g3930,  asus mining expert. Psu 750w x2
nvoc 19-2

The watchdog log shows gpu is lost and the watchdog try to reboot system to recovery gpu.
the system then hang up.

I also try to turn off the watchdog. The zm error msg is gpu launched timed out. Then the system become unstable and hang up in one or two move.

Anyone met this problem before? I think if the watchdog can reboot system correctly  then the zm error will not be a problem.

Thanks
newbie
Activity: 46
Merit: 0
January 08, 2018, 11:48:16 AM
Guys, can you make a version of nvOC with Nvidia driver 384.90, because this is the last version of driver to work correctly with power limit for 1050ti?

You can just apply the latest updates yourself. This command should patch your Ubuntu installation:

Code:
sudo -- sh -c 'apt-get update; apt-get upgrade -y; apt-get dist-upgrade -y; apt-get autoremove -y; apt-get autoclean -y

Edit: If you do this, be sure to shutdown the nvOC processes (watchdog in particular) properly or else you run the risk of having the machine reboot in the middle of applying patches. For nvOC 19-2.0, the command to stop everything is:

Code:
./nvOC stop

Thanks.

This is fine for upgrading the version but if i understand it right it's look like he want to downgrade.

warning, make a backup, if something go wrong you could be unable to boot.

Code:
./nvOC stop
sudo apt-get install nvidia-384

Then reboot





Oh. My bad. I had assumed upgrade because no driver that I had heard of works properly with the 1050:

https://devtalk.nvidia.com/default/topic/1024744/linux/nvidia-387-12-breaks-power-reading-in-nvidia-smi-/post/5213384/#5213384

So, for power draw on the 1050, you either get errant numbers with old versions or no numbers with the newer versions. The downgrade to a previous version is done via locally attached display using these instructions from leenoox:

The easiest way is LOCAL:
1. I suggest closing all apps, ctrl-c on wdog, temp, miner
2. click on the 4th icon on the left-side panel "Additional Drivers"
3. in the new window click on "Additional Drivers" tab
4. click on the "Using NVIDIA binary driver - version 384.90 from..."
5. click on "Apply Changes" button
6. enter root password: miner1
6. reboot

1) I know, how to downgrade. Both your suggestions won't work. nvOC with Nvidia driver 384.90 for those who can't downgrade and have 1050Tis
2) Any comparatively new driver up to 384.90 does right numbers, I checked with wattmeter.
3) See images: first screenshot v0019-1.4 updated to v0019-2.0 and driver downgraded to 384.90, the second screenshot as you see PoisonXA edition of nvOC)
https://imgur.com/a/UhGtx
newbie
Activity: 7
Merit: 0
January 08, 2018, 11:47:05 AM
Hello All, I am running the latest version of nvoc and it was working perfectly with NICE_EQUIHASH. I have 2 Nvidia 980 ti cards and was averaging about $12 per day. Suddenly the machine rebooted and now the miner will not start. I am getting an error about GPU Utilization and unable to resolve host.

http://prntscr.com/hxlf0w

http://prntscr.com/hxlfjm

Did something happen to my video cards or PC?

Your thoughts?

Thanks

This appears to be a DNS or internet access issue given the errors. Can you ping your pool?

Your right, no internet. I have an IP from the router but cannot do anything. I even attempted to assign a manual IP and no go. What else should I check?

Upgrade to v0019-2.0, it has better internet check to prevent miner restart and rig start
And soon pool watch will be added too

Thanks but I need internet to run the update. I mounted a second usb but without internet I cannot run the update.

Restart your router and rig. That usually restores internet connection.

I have done that many times. Tried dhcp and static but it's just not working. It doesn't make sense. Changed cat 5 cables too. I can ping the gateway but if I try to ping yahoo.com or pull up Firefox and go to google.com it fails.
full member
Activity: 200
Merit: 101
January 08, 2018, 11:30:48 AM
Hello All, I am running the latest version of nvoc and it was working perfectly with NICE_EQUIHASH. I have 2 Nvidia 980 ti cards and was averaging about $12 per day. Suddenly the machine rebooted and now the miner will not start. I am getting an error about GPU Utilization and unable to resolve host.





Did something happen to my video cards or PC?

Your thoughts?

Thanks

This appears to be a DNS or internet access issue given the errors. Can you ping your pool?

Your right, no internet. I have an IP from the router but cannot do anything. I even attempted to assign a manual IP and no go. What else should I check?

Upgrade to v0019-2.0, it has better internet check to prevent miner restart and rig start
And soon pool watch will be added too

Thanks but I need internet to run the update. I mounted a second usb but without internet I cannot run the update.

Restart your router and rig. That usually restores internet connection.
newbie
Activity: 7
Merit: 0
January 08, 2018, 11:13:11 AM
Hello All, I am running the latest version of nvoc and it was working perfectly with NICE_EQUIHASH. I have 2 Nvidia 980 ti cards and was averaging about $12 per day. Suddenly the machine rebooted and now the miner will not start. I am getting an error about GPU Utilization and unable to resolve host.

http://prntscr.com/hxlf0w

http://prntscr.com/hxlfjm

Did something happen to my video cards or PC?

Your thoughts?

Thanks

This appears to be a DNS or internet access issue given the errors. Can you ping your pool?

Your right, no internet. I have an IP from the router but cannot do anything. I even attempted to assign a manual IP and no go. What else should I check?

Upgrade to v0019-2.0, it has better internet check to prevent miner restart and rig start
And soon pool watch will be added too

Thanks but I need internet to run the update. I mounted a second usb but without internet I cannot run the update.
full member
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
January 08, 2018, 10:58:07 AM
Hello All, I am running the latest version of nvoc and it was working perfectly with NICE_EQUIHASH. I have 2 Nvidia 980 ti cards and was averaging about $12 per day. Suddenly the machine rebooted and now the miner will not start. I am getting an error about GPU Utilization and unable to resolve host.





Did something happen to my video cards or PC?

Your thoughts?

Thanks

This appears to be a DNS or internet access issue given the errors. Can you ping your pool?

Your right, no internet. I have an IP from the router but cannot do anything. I even attempted to assign a manual IP and no go. What else should I check?

Upgrade to v0019-2.0, it has better internet check to prevent miner restart and rig start
And soon pool watch will be added too
newbie
Activity: 16
Merit: 0
January 08, 2018, 10:51:02 AM
What version of ethminer does nvOC have in v0019? I am not running nvOC yet just evaluating my options for now.
newbie
Activity: 7
Merit: 0
January 08, 2018, 10:46:28 AM
Hello All, I am running the latest version of nvoc and it was working perfectly with NICE_EQUIHASH. I have 2 Nvidia 980 ti cards and was averaging about $12 per day. Suddenly the machine rebooted and now the miner will not start. I am getting an error about GPU Utilization and unable to resolve host.

http://prntscr.com/hxlf0w

http://prntscr.com/hxlfjm

Did something happen to my video cards or PC?

Your thoughts?

Thanks

This appears to be a DNS or internet access issue given the errors. Can you ping your pool?

Your right, no internet. I have an IP from the router but cannot do anything. I even attempted to assign a manual IP and no go. What else should I check?
member
Activity: 126
Merit: 10
January 08, 2018, 09:42:15 AM
Hello,
I built a very crude SRR circuitry for a Raspberry Pi to remotely reset a rig or switch it on/off (The raspberry Pi basically controls an optocoupler connected to the reset and poweron pins of the mobo).
For the moment the Raspberry Pi just detect if port 22 is open on the rig. If not, it assumes that the rig is not reachable by SSH and reset the rig.

I see that there is an SRR scrips on nvOC, that looks much more sophisticated than what I am doing. Unfortunately there is no mention of the watchdog software to run on the Raspberry Pi.
Anyone can give me a pointer to the software to run on the Raspberry Pi.

Cheers


I have some code to share with you but the site is blocking my input.
Trying to contact with an admin about.
Thanks :-) Maybe you can post a link outside this forum

Ok, have a look at this and see if can help:
https://bitcointalksearch.org/topic/m.22615430

I don't use Nvoc's temp control.

Here the latest version of script (removed reset bit and add var to use/not to use telegram alerts. I did not test the telegram bit so be aware):
https://pastebin.com/mcqmt9CF

It is possible to add code to nvoc's temp control (or wdog) and reset/poweroff/on the rig from a raspberry pi.
I just look forward to have a well configured rig and make it work stable instead of using wdog.
I do not trust wdog not because of the coders/scripts but because of so many gpus and drivers (from time to time they change the output of error codes, etc).
Hi kk003,

This is awsome!! I will modify the code according the my setup.
As far as I can see from a quick look at the the code, the script assumes a rig is frozen if it doesn't respond to pings. Is it correct?
In the past I had rigs that were clearly frozen, but were still responding to pings. That is why, instead of pinging I check if the SSH port is open on the rig using netcat:

Code:
nc -z -v -w5 $rigIP 22

(w5 is a timeout of 5 seconds)

Thanks again :-)

Since we are on RPi talks ...
How many rigs can be monitored and reboot/reset/hard reboot with one RPi?
What are the hardwares needed for multiple rig control?

I'm not much of a hardware guy but love to make a rig controller with RPi if its not too hard.
Hi Papampi,
A Raspberry Pi 2B or 3B has up to 26 programmable digital pins (the original Raspberry Pi B has only 17).
You could connect each digital output to an optocoupler or a relay and control the power swithch and/or the reset button of the motherboard.
Suppose that you want to want to control both switch and reset of each mobo, that would make that you could monitor up to 13 mobo (26 if you just want to control the reset switch).

You can also connect digital temperature sensors to the pins (programming them in input mode).
You can find pretty inexpensive ready made relay boards with up to 16 relays.

In practice the hardware needed for for controlling 12 mobos would be:
1) 16 relay module. There are many on Amazon, for example https://www.amazon.com/SainSmart-101-70-103-16-Channel-Relay-Module/dp/B0057OC66U
2) 8 relay module.
3) Usb power supply
4) Optional external HDMI screen, unless you are running the RPi headless through SSH
5) Of course, a raspberry Pi
6) Plenty of cables with female Dupont connectors

The standard OS for the Raspberry Pi is Debian (Raspbian), but if you prefer you can have as well Ubuntu or Windows 10 IoT.
The advantage of Raspbian is that it comes with all the utilities for controlling directly the GPIO pins, either from Bash or a high level language like C++ or Python.


Awesome, Thanks a lot for the info
I have 3 RPi 2B that dont use any more because when 3B came out I replaced them for faster kodi
So I'm going to get a relay module and start playing with it, for sure I'm going to ask for more help on programming and connecting the relay to the board and ...

Thanks a lot mate
Keep calm and carry on mining Wink
No problems ;-)
Just a thing, do not buy the board that I put in the link of my previous message. I justs realised that the inputs are for 5V signals while what you need is a board compatible with 3.3V signals. Look for a board that is specifically designed for the RPi.

I have 2 16-Channel Relay Modules with 5v, they work OK. One of them is 5v powered and another 12v powered, and I had to add 12v power supply for the second relay, so now the Rpi3 is powered by the second relay and doesn't need 3) Usb power supply and the first relay is powered by Rpi3.
Hi wi$em@n,
If we are going to use the RPi3 just to control the reset and the power switch, a relay is probably overkill. You can acheive the same results with just a simple optocoupler, with the advantage that the current draw is minimal. In other words with an optocoupler board you won't need an external power supply.
I am building the board right now. As soon as this is done and tested I will publish schematics and software :-)
member
Activity: 224
Merit: 13
January 08, 2018, 09:17:04 AM
Hello All, I am running the latest version of nvoc and it was working perfectly with NICE_EQUIHASH. I have 2 Nvidia 980 ti cards and was averaging about $12 per day. Suddenly the machine rebooted and now the miner will not start. I am getting an error about GPU Utilization and unable to resolve host.





Did something happen to my video cards or PC?

Your thoughts?

Thanks

This appears to be a DNS or internet access issue given the errors. Can you ping your pool?
newbie
Activity: 7
Merit: 0
January 08, 2018, 08:59:10 AM
Hello All, I am running the latest version of nvoc and it was working perfectly with NICE_EQUIHASH. I have 2 Nvidia 980 ti cards and was averaging about $12 per day. Suddenly the machine rebooted and now the miner will not start. I am getting an error about GPU Utilization and unable to resolve host.

http://prntscr.com/hxlf0w

http://prntscr.com/hxlfjm

Did something happen to my video cards or PC?

Your thoughts?

Thanks
member
Activity: 224
Merit: 13
January 08, 2018, 08:11:30 AM
Guys, can you make a version of nvOC with Nvidia driver 384.90, because this is the last version of driver to work correctly with power limit for 1050ti?

You can just apply the latest updates yourself. This command should patch your Ubuntu installation:

Code:
sudo -- sh -c 'apt-get update; apt-get upgrade -y; apt-get dist-upgrade -y; apt-get autoremove -y; apt-get autoclean -y

Edit: If you do this, be sure to shutdown the nvOC processes (watchdog in particular) properly or else you run the risk of having the machine reboot in the middle of applying patches. For nvOC 19-2.0, the command to stop everything is:

Code:
./nvOC stop

Thanks.

This is fine for upgrading the version but if i understand it right it's look like he want to downgrade.

warning, make a backup, if something go wrong you could be unable to boot.

Code:
./nvOC stop
sudo apt-get install nvidia-384

Then reboot





Oh. My bad. I had assumed upgrade because no driver that I had heard of works properly with the 1050:

https://devtalk.nvidia.com/default/topic/1024744/linux/nvidia-387-12-breaks-power-reading-in-nvidia-smi-/post/5213384/#5213384

So, for power draw on the 1050, you either get errant numbers with old versions or no numbers with the newer versions. The downgrade to a previous version is done via locally attached display using these instructions from leenoox:

The easiest way is LOCAL:
1. I suggest closing all apps, ctrl-c on wdog, temp, miner
2. click on the 4th icon on the left-side panel "Additional Drivers"
3. in the new window click on "Additional Drivers" tab
4. click on the "Using NVIDIA binary driver - version 384.90 from..."
5. click on "Apply Changes" button
6. enter root password: miner1
6. reboot
newbie
Activity: 14
Merit: 0
January 08, 2018, 07:18:21 AM
Guys, can you make a version of nvOC with Nvidia driver 384.90, because this is the last version of driver to work correctly with power limit for 1050ti?

You can just apply the latest updates yourself. This command should patch your Ubuntu installation:

Code:
sudo -- sh -c 'apt-get update; apt-get upgrade -y; apt-get dist-upgrade -y; apt-get autoremove -y; apt-get autoclean -y

Edit: If you do this, be sure to shutdown the nvOC processes (watchdog in particular) properly or else you run the risk of having the machine reboot in the middle of applying patches. For nvOC 19-2.0, the command to stop everything is:

Code:
./nvOC stop

Thanks.

This is fine for upgrading the version but if i understand it right it's look like he want to downgrade.

warning, make a backup, if something go wrong you could be unable to boot.

Code:
./nvOC stop
sudo apt-get install nvidia-384

Then reboot



member
Activity: 224
Merit: 13
January 08, 2018, 05:48:05 AM
Guys, can you make a version of nvOC with Nvidia driver 384.90, because this is the last version of driver to work correctly with power limit for 1050ti?

You can just apply the latest updates yourself. This command should patch your Ubuntu installation:

Code:
sudo -- sh -c 'apt-get update; apt-get upgrade -y; apt-get dist-upgrade -y; apt-get autoremove -y; apt-get autoclean -y

Edit: If you do this, be sure to shutdown the nvOC processes (watchdog in particular) properly or else you run the risk of having the machine reboot in the middle of applying patches. For nvOC 19-2.0, the command to stop everything is:

Code:
./nvOC stop

Thanks.
newbie
Activity: 4
Merit: 0
January 08, 2018, 05:36:22 AM
Is there a simple command to update xorg? I think this is the solution to my driver/temp control problems.

this will generate one that enables most features


Code:
sudo nvidia-xconfig --enable-all-gpus --cool-bits=31
for normal gpus


Code:
sudo nvidia-xconfig --enable-all-gpus --cool-bits=12
for P106_100s


My guess is your problem is related to the P106_100s and normal GPUs working best with different coolbits settings.  Hard to know without one of those to test with.



hey just wondering if anyone can help? A puzzle for someone....  
I have 12 x1080's and have always had an issue wth the auto temp/fan speed for the 12th (GPU 11). Anyways, since I updated to the NVOC in dec to nvOC-19-2-update (by_fullzero_unofficial), it has mined anything no issues. and the only thing was as I mentioned..

Suddenly the end of December it constantly crashes on Equihash on both ZM and EWBF. The error was always that GPU 11 was lost. Anyhow, after some digging I found that in the Nvidia X Server Settings for GPU 11 the box was not there to tick for the fan speed.

After some more digging I found that in the etc/X11/xorg.conf file, the GPU 11 was not set correctly. It is in pci slot 18, and in the Section "Device" it only went up to PCI-17.

 I did as in the quote  sudo nvidia-xconfig --enable-all-gpus --cool-bits=31 and that didnt work either (so had to restore original), but it did give me all the correct pci locations of the GPU devices, so I have updated the xorg.conf  with the relevant pci settings.

Now the  auto temp/fan speed errors have gone and the tick box appears in Nvidia X Server Settings for GPU 11 .  

But, it still crashes on Equihash, it says
"Unable to determine the device handel for GPU 0000:1200.0 GPU is lost."
 I  updated NVOC on the 4th JAN to the nvOC v0019-2.0 - community release. Any suggestions anyone?    

 Also, why does my  etc/X11/xorg.conf file say Im using 1050's when I am using 1080's? Does it even matter?

I do not know if you have updated from version 0019-1.4 to version 19-2.0 with papampi scripts.

If so, is it possible for you to try to use the latest papampi image from the links we just posted and apply the latest updates from papampi.

Of course, you will not forget to save your current system with HDDrawCopy or diskdump (dd) for example, in another medium.

I had problems with version 19-1.4 too. not of this type, but I have a month since I installed the 19-2.0 and repared the disk by adding swap. I know it has nothing to do with the problem you are describing

I am using https://github.com/papampi/nvOC_by_fullzero_Community_Release 

I would do the reimaging, but my linux command knowlege is limited, and I a bit unsure as to how to go about that, if anyone could take me though that great ly appreciate it :-)

newbie
Activity: 46
Merit: 0
January 08, 2018, 03:37:03 AM
Guys, can you make a version of nvOC with Nvidia driver 384.90, because this is the last version of driver to work correctly with power limit for 1050ti?
full member
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
January 08, 2018, 03:04:15 AM
please add --submit-stale param for zcoin, without it my 19mh/s rig has only 12mh/s in miningpoolhub.

You can add any optional arguments in v0019-2.0 to ewbf and dstm
Code:
EWBF_OPTS=""
ZM_OPTS=""
Pages:
Jump to: