Author

Topic: [ANN] dstm's ZCash / Equihash Nvidia Miner v0.6.2 (Linux / Windows) - page 156. (Read 224961 times)

full member
Activity: 350
Merit: 126
Yes, there are network fixes/improvements.
If you have still problems just post them here, I'll fix them pretty quickly as soon as I'm able to reproduce them.

Btw. I don't use colors in my output Smiley

I would like colors for temp ranges. and I don't know what the "+" to the right mean... or the ">" signs. oh well.

I have noticed that then it shows disconnection it starts getting lots of TCP errors. The werid part is only the computers that are running ZM miner show this behaviour, so i am sure this is not a network issue.

Quote
I would like colors for temp ranges. and I don't know what the "+" to the right mean... or the ">" signs. oh well.             
Colored output has some drawbacks. People might have different terminal background which makes colored output hard to read - also if you redirect the output to a file  - colors would make the file unreadable, you'll also get issues if you wan't to parse the output.

The "+"/">" etc. are explained on OP, however I have to write a documentation ofc. when there is time for it.       

Have you tried the 0.5_nat version I pm to you? Does it have the same issues?
newbie
Activity: 12
Merit: 0
Yes, there are network fixes/improvements.
If you have still problems just post them here, I'll fix them pretty quickly as soon as I'm able to reproduce them.

Btw. I don't use colors in my output Smiley

I would like colors for temp ranges. and I don't know what the "+" to the right mean... or the ">" signs. oh well.

I have noticed that then it shows disconnection it starts getting lots of TCP errors. The werid part is only the computers that are running ZM miner show this behaviour, so i am sure this is not a network issue.

https://image.prntscr.com/image/seXWVVuuRP6t36sCm3INWw.png
https://prnt.sc/gsy4x3
newbie
Activity: 54
Merit: 0
I used to have some error sometimes .

something like Connection timed out , or unable to connect to server in red.

This would stop my miner.

This was version 0,3 or something.. is this potentially fixed in 0,5?

Yes, there are network fixes/improvements.
If you have still problems just post them here, I'll fix them pretty quickly as soon as I'm able to reproduce them.

Btw. I don't use colors in my output Smiley

Hehe, it was few days ago since I sampled your miner.
Loved the result except ofc the network error.

Will try running them again tomorrow.

Keep you posted.!
full member
Activity: 350
Merit: 126
I used to have some error sometimes .

something like Connection timed out , or unable to connect to server in red.

This would stop my miner.

This was version 0,3 or something.. is this potentially fixed in 0,5?

Yes, there are network fixes/improvements.
If you have still problems just post them here, I'll fix them pretty quickly as soon as I'm able to reproduce them.

Btw. I don't use colors in my output Smiley
newbie
Activity: 54
Merit: 0
I used to have some error sometimes .

something like Connection timed out , or unable to connect to server in red.

This would stop my miner.

This was version 0,3 or something.. is this potentially fixed in 0,5?
full member
Activity: 350
Merit: 126
I think an intensity option would be useful for people who want their desktop to remain responsive while mining for example.

Something like an option to throttle the amount of work that is sent to the GPU? If I understand you correctly.
Does it even make sense to use your desktop while mining? Not sure if I understand the use case for this option.

Some people mine on their main PC while doing other things, that's useful, if not too troublesome to implement I highly appreciate if you can add intensity options.

I see, will do.
So you must be able to set intensity per GPU - since you might have multiple GPUs on your main PC on you want to throttle only the GPU that's used for rendering, right?
newbie
Activity: 54
Merit: 0
I think an intensity option would be useful for people who want their desktop to remain responsive while mining for example.

Something like an option to throttle the amount of work that is sent to the GPU? If I understand you correctly.
Does it even make sense to use your desktop while mining? Not sure if I understand the use case for this option.

Some people mine on their main PC while doing other things, that's useful, if not too troublesome to implement I highly appreciate if you can add intensity options.
hero member
Activity: 630
Merit: 502
I think an intensity option would be useful for people who want their desktop to remain responsive while mining for example.

Something like an option to throttle the amount of work that is sent to the GPU? If I understand you correctly.
Does it even make sense to use your desktop while mining? Not sure if I understand the use case for this option.
I'm sure it will be helpful for some people.
full member
Activity: 350
Merit: 126
I think an intensity option would be useful for people who want their desktop to remain responsive while mining for example.

Something like an option to throttle the amount of work that is sent to the GPU? If I understand you correctly.
Does it even make sense to use your desktop while mining? Not sure if I understand the use case for this option.
hero member
Activity: 630
Merit: 502
I think an intensity option would be useful for people who want their desktop to remain responsive while mining for example.
newbie
Activity: 12
Merit: 0
Quote
Are your systems on wifi? If yes, you could try using a cable if possible.
You might test running zm on less GPUs using the '--dev' option so you'll have less connections per system.
                                                                                                                     
Systems are wired.

Quote
Not sure what you mean by 'build up of connections'. It opens one socket per GPU there should be no 'build up of connections'.
Imagine a memory leak, but made with open sockets. That is what this feels like. could be or not, I am just trying to explain it somehow. Like, if you have a disconect, do you close the socket or does garbage collection close it? do you reuse the sockets? what if the sockets are left alive and for some reason my installation doesn't do garbage collection until network crashes? again, this is pure guesswork. this is a shot in the dark and may not even be possible. Just saying what it "feels" like.

Quote
I have a 10GPU system running stable, no disconnects, on flypool. I'm pretty sure they have no issues handling much more connections. If we run out of ideas I'll write a test for this Smiley
Seems weird. I imaged the system as ubuntu 17.04. I can get the image online if you want to try it or I can get you access to one of the rigs too if this proves too troublesome.

Quote
Complete output till it disconnects (pls include also the 'connection closed by server' message - if you get it).
I'll make it with -dev, give me a couple of hours to let it run and I'll send you those.

Quote
Edit:
I'm pretty sure there should be no issues with the number of connections on the pool side. So let's say you have 1000 rigs mining the same ZEC address, the pool must be able to handle all 1000 connections mining the same ZEC address.
Yeah, I am pretty sure this is something that should not happen, this is weird.
full member
Activity: 350
Merit: 126
Thank you DSTM for your software.

Las week I used it with 2500 GPUs, but I had a lot of problems. 10% of the Gpus crash each 30-60 minutes. I really want to work with you to solve that problem. We plan to switch more GPUs but we need more stability and after we will have specials requests for the JSON API.

Questions :

1- What is the ideal nvidia driver version ? We use the latest : 384.90
2- What is the perfect Ubuntu version ? We use Ubuntu 14.04
3- What log I need to follow to try to find the problem ?


When a GPU fail, could you recover it ? I note with nvidia-smi that the GPU still working. I think EWBF recover failed GPU.


Quote
Las week I used it with 2500 GPUs, but I had a lot of problems. 10% of the Gpus crash each 30-60 minutes. I really want to work with you to solve that problem. We plan to switch more GPUs but we need more stability and after we will have specials requests for the JSON API.
I had no reports about crashes, the development is pretty fast paced currently so there could be bugs ofc. I think the fastest/easiest way to resolve the issues is to have ssh access to one of your systems that crashes.

JSON API: this is pretty easy to extend, currently I've only a basic set, It was meant for testing.

Quote
1- What is the ideal nvidia driver version ? We use the latest : 384.90
2- What is the perfect Ubuntu version ? We use Ubuntu 14.04
3- What log I need to follow to try to find the problem ?

1. I've tested zm on 375.66 and 384.90, both perform equal without issues.
2. I've tested zm on 16.04 so I can't make a robust statement about 14.04
3. It's much faster/easier if you could provide an ssh access to one of your systems.

Quote
When a GPU fail, could you recover it ? I note with nvidia-smi that the GPU still working. I think EWBF recover failed GPU.
ZM is designed such that every GPU is separated and independent, so yes on some cases a crashed GPU is recoverable (not always!), this is currently not implemented but it's planned.
newbie
Activity: 1
Merit: 0
Thank you DSTM for your software.

Las week I used it with 2500 GPUs, but I had a lot of problems. 10% of the Gpus crash each 30-60 minutes. I really want to work with you to solve that problem. We plan to switch more GPUs but we need more stability and after we will have specials requests for the JSON API.

Questions :

1- What is the ideal nvidia driver version ? We use the latest : 384.90
2- What is the perfect Ubuntu version ? We use Ubuntu 14.04
3- What log I need to follow to try to find the problem ?


When a GPU fail, could you recover it ? I note with nvidia-smi that the GPU still working. I think EWBF recover failed GPU.
full member
Activity: 350
Merit: 126
Quote
I tested this by only running zm in one GPU miner while the others where doing other miners and only the box with zm disconnected.
                                                                                                                     
We'll get it Smiley The basics first, I hope it costs not too much time Smiley
Are your systems on wifi? If yes, you could try using a cable if possible.
You might test running zm on less GPUs using the '--dev' option so you'll have less connections per system.

Quote
It seems a lot like a build up of connections not being cleaned until network crashes.
Not sure what you mean by 'build up of connections'. It opens one socket per GPU there should be no 'build up of connections'.

Quote
I agree, but all the connections are closed. And most connections are to servers that are meant for multiple connections (as httpd).
I have a 10GPU system running stable, no disconnects, on flypool. I'm pretty sure they have no issues handling much more connections. If we run out of ideas I'll write a test for this Smiley

Quote
Sure, what logs do you need?
Complete output till it disconnects (pls include also the 'connection closed by server' message - if you get it).


Edit:
I'm pretty sure there should be no issues with the number of connections on the pool side. So let's say you have 1000 rigs mining the same ZEC address, the pool must be able to handle all 1000 connections mining the same ZEC address.
newbie
Activity: 12
Merit: 0
Quote
What zm version is this?
4.5 and 5

Quote
Disconnects are very regular.
Yes they appear regular, however disconnects only happen when zm is running. Otherwise the computers do not disconnect. I tested this by only running zm in one GPU miner while the others where doing other miners and only the box with zm disconnected.

Quote
Is there anything that happens every hour on your host/network?
Not that I am aware of.

Quote
It's very strange that it happens exactly after one hour but works fine 1 hour long.
It's approximately one hour and it starts counting at the time zm starts running. I don't see any memory leaks. It seems a lot like a build up of connections not being cleaned until network crashes.

Quote
Other users seem not to have this issues.                                 
This is puzzling, sure, I do have 3 boxes with the same behavior. I'll go ahead and see if I can reproduce this behavior in a freshly installed box.

Quote
24 simultaneous connection is a nothing special, browsers open up to 8/16 sim. connections per website/domain.
I agree, but all the connections are closed. And most connections are to servers that are meant for multiple connections (as httpd).

Quote
I'll check ofc if there is something wrong in my code.
Hard to check if we can't pinpoint where to look.

Quote
Could you pls. pm me some logfiles?
Sure, what logs do you need?

Quote
Btw. zm looks faster on pool side Smiley
It most definitively is.
full member
Activity: 350
Merit: 126
Thx, if there are any regressions on new versions pls report them as soon as possible, it's welcome and speedups development.

Regular disconnects are bad ofc I'm not sure why you're getting them. It's most likely not an issue on the pool side, since flypool works fine, without disconnects for other users.

All GPUs are separated/independent, this design has a lot of benefits. So for example if you have different GPUs with different speeds in one system - each GPU will receive it's own job and it's own difficulty according to it's speed. Currently I'm not taking full advantage of this design, but it's planned to restarts each GPU separately on connection, hardware etc. failures.

If you have the same issues on 0.5 I'll think about a way to debug this issue, just report it.

Edit: Disconnects happen very regular, after the same amount of time, that's very strange.

I switched miners to ewbf to test, and the graph I got was no more disconnections. This has to be some sort of throttling, I have 24 cards, I split them in groups of equals. So tot he pool it must seem like I am making 24 separate connections at the exact same time.

you can see the difference between zm disconnecting, and ewbf staying normalized. Something is amiss, but I don't know exactly what it is, nothing changed, only the miner program.



What zm version is this?

Disconnects are very regular. Is there anything that happens every hour on your host/network? It's very strange that it happens exactly after one hour but works fine 1 hour long. Other users seem not to have this issues.                                 
24 simultaneous connection is a nothing special, browsers open up to 8/16 sim. connections per website/domain.                 
I'll check ofc if there is something wrong in my code.
Could you pls. pm me some logfiles?

Btw. zm looks faster on pool side Smiley
newbie
Activity: 12
Merit: 0
Thx, if there are any regressions on new versions pls report them as soon as possible, it's welcome and speedups development.

Regular disconnects are bad ofc I'm not sure why you're getting them. It's most likely not an issue on the pool side, since flypool works fine, without disconnects for other users.

All GPUs are separated/independent, this design has a lot of benefits. So for example if you have different GPUs with different speeds in one system - each GPU will receive it's own job and it's own difficulty according to it's speed. Currently I'm not taking full advantage of this design, but it's planned to restarts each GPU separately on connection, hardware etc. failures.

If you have the same issues on 0.5 I'll think about a way to debug this issue, just report it.

Edit: Disconnects happen very regular, after the same amount of time, that's very strange.

I switched miners to ewbf to test, and the graph I got was no more disconnections. This has to be some sort of throttling, I have 24 cards, I split them in groups of equals. So tot he pool it must seem like I am making 24 separate connections at the exact same time.

you can see the difference between zm disconnecting, and ewbf staying normalized. Something is amiss, but I don't know exactly what it is, nothing changed, only the miner program.
https://image.prntscr.com/image/SkRIFTAmQOWRi5WTz_2ang.png
full member
Activity: 350
Merit: 126
New Version 0.5

con: support set_extranonce rpc
con: improve handling of temporary slow network conditions
con: add monitoring support using web browser
con: add monitoring support using json-rpc
mp: rebalance queue sizes - this improves the solution rate as
seen by the pool, especially on pool that submit new jobs often


This is a testing version, it has a lot internal changes and is less tested. Feedback on stability performance is welcome. Rebalanced queue sizes improve the solution rate on pool side, however this might reduce the performance in situations of heavy cpu load - pls check if there are any improvement on the pool side for you. Telemetry is pretty simple currently, if there is anything more you need - suggestions are welcome.

new version works fine and stable. API works fine as well.
- could you please add current values in JSON as well and not only average values. ( i put all values into grafana for monitoring and statistics)
- lower sols/s on poolside is fixed. localy i have ~2644 and currently (after 24 hours) on flypool up to 2600- (even less than the devfee of 2%). So average Sols/s on poolside is now perfect. very well done.




Quote
                                                                                                                       
could you please add current values in JSON as well and not only average values. ( i put all values into grafana for monitoring and statistics)
                                                                                                                       
Ofc, will add them.
I was also thinking about adding plots of current values to the web output, however this seemed useless for me since they are noisy / jumping randomly, average values are the things that really matter. So I kept it simple which has advantages if you display the things on mobile devices.

Quote
                                                                                                                       
lower sols/s on poolside is fixed. localy i have ~2644 and currently (after 24 hours) on flypool up to 2600- (even less than the devfee of 2%). So average Sols/s on poolside is now perfect. very well done.                                               
                                                                                                                       
Nice, as I said, my numbers are exact even slightly rounded down Smiley
full member
Activity: 350
Merit: 126
I keep getting disconnects every hour or so (as you can see in the graphs) and that happened when I upgraded to 0.4.5

I am now trying 0.5




This is really bad for me since the average with this is a lot lower than it should.
Is there a way to just make just one connection instead of one connection per card?


Thx, if there are any regressions on new versions pls report them as soon as possible, it's welcome and speedups development.

Regular disconnects are bad ofc I'm not sure why you're getting them. It's most likely not an issue on the pool side, since flypool works fine, without disconnects for other users.

All GPUs are separated/independent, this design has a lot of benefits. So for example if you have different GPUs with different speeds in one system - each GPU will receive it's own job and it's own difficulty according to it's speed. Currently I'm not taking full advantage of this design, but it's planned to restarts each GPU separately on connection, hardware etc. failures.

If you have the same issues on 0.5 I'll think about a way to debug this issue, just report it.

Edit: Disconnects happen very regular, after the same amount of time, that's very strange.
newbie
Activity: 12
Merit: 0
I keep getting disconnects every hour or so (as you can see in the graphs) and that happened when I upgraded to 0.4.5

I am now trying 0.5

https://image.prntscr.com/image/4YGPv_YCTyiD0A7l2SfRHg.png
https://image.prntscr.com/image/limqbGRARyyb2riP2Z68RA.png

This is really bad for me since the average with this is a lot lower than it should.
Is there a way to just make just one connection instead of one connection per card?

edit:changed image links
Jump to: