Pages:
Author

Topic: [ANN] ccminer 2.3 - opensource - GPL (tpruvot) - page 12. (Read 500257 times)

legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---

Ccminer is very unstable with 19 gpu's ..


I just wander why would anyone create system like that and get a  constant problems. Comparing to total system cost, an extra motherboard+cpu+memory would not be that big investment, but release complexity by times.

Well ...

Following the strict rules that are required to run such a machine, there really isn't much issues, let alone problems.

Once a design is set, it is very easy to duplicate. That is the reason you would go that route. Density plays a massive part if you are doing this on a larger scale.

That is why, at least for CWI.

#crysx

Maybe the price of adding the extra mb/cpu/ram/ssd needs to be spelled out, thats like an extra $400-500. Fit as many GPU's as you can for the farms.  Cool

Perhaps in terms of initial costs but maybe not in terms of availabilty management (maintenance costs).

If you split your single 19 gpu rig into multiple smaller rigs (let's say 3 rigs), you will probably remove many annoying factors. For example, you will probably remove PSU coupling, ease thermal dissipation, etc.
You will probably have less stress with complete downtime; 2 rigs still working while one is on fault/repair.

As stated by crysx, the only valuable reason to build huge rigs is "limited" space


Space and power availability.

Two of the main reasons people go this route.

We are 'prototyping' a number of these motherboards filled with these cards and we are looking at the maintenance side very closely. As you mentioned, the bigger you grow in the singular unit, the more headaches you open yourself up to. However, the flipside of that also, is that once you have a tried and tested system in place (which we now do), the maintenance time is reduced to such a small factor, that it is more of an issue to go the smaller GPU/More MotherBoard way. We are finding that out literally as we chat about it.

Hence why we have also purchased another 7 x 19GPU motherboards, and our 'standard' way of building them, which are a lot easier to maintain. One only needs a system in place and a method maintenance, and it becomes easy. When all these 7 MotherBoards are populated, we will let you know what 'other' headaches may arise. So far - the first two are not only doing well, but running like wildfire. BTW - only 16 GPU's are in each due to the nVidia driver limitation. Duplicated Linux and you are running in an hour including the physical build.

The first build/prototype took a full weekend. So a little over an hour from unboxing to full build and mining is a massive achievement in comparison to a full weekend.

#crysx
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---

Ccminer is very unstable with 19 gpu's ..


I just wander why would anyone create system like that and get a  constant problems. Comparing to total system cost, an extra motherboard+cpu+memory would not be that big investment, but release complexity by times.

Well ...

Following the strict rules that are required to run such a machine, there really isn't much issues, let alone problems.

Once a design is set, it is very easy to duplicate. That is the reason you would go that route. Density plays a massive part if you are doing this on a larger scale.

That is why, at least for CWI.

#crysx

Maybe the price of adding the extra mb/cpu/ram/ssd needs to be spelled out, thats like an extra $400-500. Fit as many GPU's as you can for the farms.  Cool

Not worth it...I've killed two motherboards trying to get past six graphics cards on normal motherboards...I think if you want to get 12 gpu's you should just buy a special motherboard with 12 pci express slots. If you want to save power buy more powerful graphics cards if you can get them. I can buy surplus hard drives for $10 and I can buy a ton of amd cpus and motherboards for like $150 or less...the more crap you add to a system the more likely it's going to fail by crashing consistently. also, the more power you draw on any circuit will reduce the performance at some point. you definitely don't want to put 8 1080 tis on one circuit unless you have a special electrical output setup by an electrician that can handle that situation. I've even been hooking up power supplies on different circuits when I'm daisy chaining them for added amperage and power stability. just my two cents...

Well,

We have built HUNDREDS of systems, and the experience tells us that this smaller GPU to MotherBoard ratio is more of an 'issue' than not.

So for us who have more than 7 or 25 or 70 GPU's to place, density plays a big part - unfortunately. Agreed there is slightly more headache than normal, but that is with ANYTHING you put your hand to when it comes to building something out of the norm. TWO GPU's were a massive things we I first started mining, and 4 GPU's was a monster machine. Now look at the 'minimum' spec you will need to even start mining well.

So it is worth it for us to attack the density factor with gusto, and (in our case) develop a MUCH better cooling solution that magnifies the mining effort by almost 3 times.

So your two cents are worth much more than that and I would agree in that respect also.

I agree also with the power - which is also why we have designed (and have our first prototype) CWI-Power system fully functional, which we hope to make available for those that WANT to mine in a much more extensive fashion. Circuits in most houses are limited to a small amperage, so our newest design does away with the singular circuits and raises the bar for spreading the circuits within ONE system. We are still improving the MK I model, and MK II will be much more desne again and able to cater for small farms easily.

BTW - ALL our cards are a mix of Aorus 1080TI Xtreme WindForce and recently Aorus 1080TI Xtreme WaterForce systems, so they do draw a lot of power depending on what Algo is mined, hence the need for density, power availability, and soon we will be working on efficiency, though that is not the priority currently.

#crysx
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---

Ccminer is very unstable with 19 gpu's ..


I just wander why would anyone create system like that and get a  constant problems. Comparing to total system cost, an extra motherboard+cpu+memory would not be that big investment, but release complexity by times.

Well ...

Following the strict rules that are required to run such a machine, there really isn't much issues, let alone problems.

Once a design is set, it is very easy to duplicate. That is the reason you would go that route. Density plays a massive part if you are doing this on a larger scale.

That is why, at least for CWI.

#crysx

Maybe the price of adding the extra mb/cpu/ram/ssd needs to be spelled out, thats like an extra $400-500. Fit as many GPU's as you can for the farms.  Cool

This is very true.

There is also the density loss involved with the smaller GPU/MotherBoard method as well.

#crysx
legendary
Activity: 1764
Merit: 1024
6+ gpus are nothing but problems, new miners haven't figured that out yet so they aim for some efficiency by strapping as many GPUs to a system as possible. Every miner had the same exact thought when they first started mining, I've revisited the idea a few different times over the years, but the headaches just aren't worth it, just like running nix. I'll spend a little bit more money for a system that will maintain 99.9% uptime and when it goes down it wont be the same as three miners being taken offline at the same time. Nor will I waste a bunch of time troubleshooting frivolous issues that don't plague anyone else other then 6+ GPU machines and almost always require miner devs to fix (who usually don't and are lazy to begin with when it comes to real problems).
newbie
Activity: 3
Merit: 0
I can't get any recent version of ccminer to compile on OSX. I have previously successfully compiled 1.7.6-r10-nanashi, but multiple attempts at compiling several versions of v2.2.4 (windows source, linux source, windows release source) and someone else's fork of v2.2 have all failed with the following errors:

Code:
equi/cuda_equi.cu(2040): error: no instance of constructor "rt_error::rt_error" matches the argument list
            argument types are: (char [512])
          detected during instantiation of "void eq_cuda_context::solve(const char *, unsigned int, const char *, unsigned int, fn_cancel, fn_solution, fn_hashdone) [with RB=9U, SM=1248U, SSM=12U, THREADS=640U, PACKER=packer_cantor]"
(2124): here

equi/cuda_equi.cu(2042): error: no instance of constructor "rt_error::rt_error" matches the argument list
            argument types are: (char [512])
          detected during instantiation of "void eq_cuda_context::solve(const char *, unsigned int, const char *, unsigned int, fn_cancel, fn_solution, fn_hashdone) [with RB=9U, SM=1248U, SSM=12U, THREADS=640U, PACKER=packer_cantor]"
(2124): here

equi/cuda_equi.cu(2061): error: no instance of constructor "rt_error::rt_error" matches the argument list
            argument types are: (char [512])
          detected during instantiation of "void eq_cuda_context::solve(const char *, unsigned int, const char *, unsigned int, fn_cancel, fn_solution, fn_hashdone) [with RB=9U, SM=1248U, SSM=12U, THREADS=640U, PACKER=packer_cantor]"
(2124): here

equi/cuda_equi.cu(1966): error: no instance of constructor "rt_error::rt_error" matches the argument list
            argument types are: (char [512])
          detected during instantiation of "eq_cuda_context::eq_cuda_context(int, int) [with RB=9U, SM=1248U, SSM=12U, THREADS=640U, PACKER=packer_cantor]"
(2124): here

equi/cuda_equi.cu(1967): error: no instance of constructor "rt_error::rt_error" matches the argument list
            argument types are: (char [512])
          detected during instantiation of "eq_cuda_context::eq_cuda_context(int, int) [with RB=9U, SM=1248U, SSM=12U, THREADS=640U, PACKER=packer_cantor]"
(2124): here

equi/cuda_equi.cu(1968): error: no instance of constructor "rt_error::rt_error" matches the argument list
            argument types are: (char [512])
          detected during instantiation of "eq_cuda_context::eq_cuda_context(int, int) [with RB=9U, SM=1248U, SSM=12U, THREADS=640U, PACKER=packer_cantor]"
(2124): here

equi/cuda_equi.cu(2001): error: no instance of constructor "rt_error::rt_error" matches the argument list
            argument types are: (char [512])
          detected during instantiation of "eq_cuda_context::eq_cuda_context(int, int) [with RB=9U, SM=1248U, SSM=12U, THREADS=640U, PACKER=packer_cantor]"
(2124): here

equi/cuda_equi.cu(2002): error: no instance of constructor "rt_error::rt_error" matches the argument list
            argument types are: (char [512])
          detected during instantiation of "eq_cuda_context::eq_cuda_context(int, int) [with RB=9U, SM=1248U, SSM=12U, THREADS=640U, PACKER=packer_cantor]"
(2124): here

equi/cuda_equi.cu(2003): error: no instance of constructor "rt_error::rt_error" matches the argument list
            argument types are: (char [512])
          detected during instantiation of "eq_cuda_context::eq_cuda_context(int, int) [with RB=9U, SM=1248U, SSM=12U, THREADS=640U, PACKER=packer_cantor]"
(2124): here

equi/cuda_equi.cu(2108): error: no instance of constructor "rt_error::rt_error" matches the argument list
            argument types are: (char [512])
          detected during instantiation of "void eq_cuda_context::freemem() [with RB=9U, SM=1248U, SSM=12U, THREADS=640U, PACKER=packer_cantor]"
(2124): here

equi/cuda_equi.cu(2111): error: no instance of constructor "rt_error::rt_error" matches the argument list
            argument types are: (char [512])
          detected during instantiation of "void eq_cuda_context::freemem() [with RB=9U, SM=1248U, SSM=12U, THREADS=640U, PACKER=packer_cantor]"
(2124): here


I've been unable to find any explanation online for these errors. A look through the source code was unproductive outside of revealing the origin of rt_error (equi/eqcuda.hpp appears to remap std::runtime_error to rt_error, not that this tells me much of anything). I've tried using different CUDA versions on the off chance that this was related to that, but 8.0, 9.0, and 9.1 all give the same error...

I'm honestly not sure how the heck to resolve the issue at this point. Does anyone have any ideas? Some help would be greatly appreciated. I've gone through the past 30 pages of the thread looking for answers and have found nothing helpful.

I found a solution. Here are the instructions for anyone else with the same issue in the future...

Open equi/eqcuda.hpp in a text editor and find the following lines:

Code:
#ifdef WIN32
#define rt_error std::runtime_error
#else
class rt_error : public std::runtime_error
{
public:
explicit rt_error(const std::string& str) : std::runtime_error(str) {}
};
#endif

Replace them with this:

Code:
#ifdef WIN32
#define rt_error std::runtime_error
#elif __APPLE__
#define rt_error std::runtime_error
#else
class rt_error : public std::runtime_error
{
public:
explicit rt_error(const std::string& str) : std::runtime_error(str) {}
};
#endif

Explanation: I had a hunch that OSX wasn't handling the std::runtime_error redefinition correctly, so I tried commenting out the original alternative redefinition and replacing it with the windows-type redefinition, which worked on OSX. In order to keep the code compatible with Linux and other similar platforms, I then modified it again to only use that redefinition type for OSX/Windows.

In limited testing with versions compiled using CUDA 8.0 only I noticed no obvious issues caused by my change, but it's possible that some could pop up in the future.

I'm not sure if I want to submit this as a PR to the repo yet. I'll probably want to conduct further testing before I do that.
copper member
Activity: 10
Merit: 0
Getting problems with Tribus with the x86 build.
Im using default intensity, I am not overclocked, and I am using latest drivers (390.77)
Im having this issue on 2 rigs
It doesn't happen immediately always, sometimes it takes 5-10 minutes to start
Only started noticing it when I upgraded my drivers to 390.77

There is also a memory error hidden amongst the CPU errors

Could this be due to CUDA 9.1 drivers? Using latest, 390.77... Notice in the github page it says: CUDA 9.1 required drivers are still in a early stage and unstable in my opinion.

Can someone else please try Tribus with Tpruvot X86 with Nvidia driver 390.77? Let it run for 10-20 min.
newbie
Activity: 49
Merit: 0

Ccminer is very unstable with 19 gpu's ..


I just wander why would anyone create system like that and get a  constant problems. Comparing to total system cost, an extra motherboard+cpu+memory would not be that big investment, but release complexity by times.

Well ...

Following the strict rules that are required to run such a machine, there really isn't much issues, let alone problems.

Once a design is set, it is very easy to duplicate. That is the reason you would go that route. Density plays a massive part if you are doing this on a larger scale.

That is why, at least for CWI.

#crysx

Maybe the price of adding the extra mb/cpu/ram/ssd needs to be spelled out, thats like an extra $400-500. Fit as many GPU's as you can for the farms.  Cool

Perhaps in terms of initial costs but maybe not in terms of availabilty management (maintenance costs).

If you split your single 19 gpu rig into multiple smaller rigs (let's say 3 rigs), you will probably remove many annoying factors. For example, you will probably remove PSU coupling, ease thermal dissipation, etc.
You will probably have less stress with complete downtime; 2 rigs still working while one is on fault/repair.

As stated by crysx, the only valuable reason to build huge rigs is "limited" space
jr. member
Activity: 74
Merit: 1

Ccminer is very unstable with 19 gpu's ..


I just wander why would anyone create system like that and get a  constant problems. Comparing to total system cost, an extra motherboard+cpu+memory would not be that big investment, but release complexity by times.

Well ...

Following the strict rules that are required to run such a machine, there really isn't much issues, let alone problems.

Once a design is set, it is very easy to duplicate. That is the reason you would go that route. Density plays a massive part if you are doing this on a larger scale.

That is why, at least for CWI.

#crysx

Maybe the price of adding the extra mb/cpu/ram/ssd needs to be spelled out, thats like an extra $400-500. Fit as many GPU's as you can for the farms.  Cool

Not worth it...I've killed two motherboards trying to get past six graphics cards on normal motherboards...I think if you want to get 12 gpu's you should just buy a special motherboard with 12 pci express slots. If you want to save power buy more powerful graphics cards if you can get them. I can buy surplus hard drives for $10 and I can buy a ton of amd cpus and motherboards for like $150 or less...the more crap you add to a system the more likely it's going to fail by crashing consistently. also, the more power you draw on any circuit will reduce the performance at some point. you definitely don't want to put 8 1080 tis on one circuit unless you have a special electrical output setup by an electrician that can handle that situation. I've even been hooking up power supplies on different circuits when I'm daisy chaining them for added amperage and power stability. just my two cents...
legendary
Activity: 1470
Merit: 1114
If it gets worse with more cards it's not likely a software issue, probably a resource issue like power.
man i have enough power i am using server psu's, don't waste your time

Power was just an example of a resource issue. But I guess I am wasting my time with
someone with a closed mind.
sr. member
Activity: 1021
Merit: 324

Ccminer is very unstable with 19 gpu's ..


I just wander why would anyone create system like that and get a  constant problems. Comparing to total system cost, an extra motherboard+cpu+memory would not be that big investment, but release complexity by times.

Well ...

Following the strict rules that are required to run such a machine, there really isn't much issues, let alone problems.

Once a design is set, it is very easy to duplicate. That is the reason you would go that route. Density plays a massive part if you are doing this on a larger scale.

That is why, at least for CWI.

#crysx

Maybe the price of adding the extra mb/cpu/ram/ssd needs to be spelled out, thats like an extra $400-500. Fit as many GPU's as you can for the farms.  Cool
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---

Ccminer is very unstable with 19 gpu's ..


I just wander why would anyone create system like that and get a  constant problems. Comparing to total system cost, an extra motherboard+cpu+memory would not be that big investment, but release complexity by times.

Well ...

Following the strict rules that are required to run such a machine, there really isn't much issues, let alone problems.

Once a design is set, it is very easy to duplicate. That is the reason you would go that route. Density plays a massive part if you are doing this on a larger scale.

That is why, at least for CWI.

#crysx
full member
Activity: 420
Merit: 108

Ccminer is very unstable with 19 gpu's ..


I just wander why would anyone create system like that and get a  constant problems. Comparing to total system cost, an extra motherboard+cpu+memory would not be that big investment, but release complexity by times.
newbie
Activity: 82
Merit: 0
The same config with 16 cards + without oc work well. so i don't think that there is trouble with intesivity.

What does "die very fast" mean? How far it gets and how it dies is very important.
Is there an error message? Does it start hashing?

And how many is "16 cards +"? What is the maximum that work?

i have tested 16 18 and 19 cards. 17 haven't tested. 16 works just fine and i have stoped experiments.
when it is 19 gpus it started hashing always after intensity sets and sometimes it even 45 seconds shows hashes of 4 or 5 cards.
when it was 18 gpus it die faster than 19gpu config in stage of intensity sets.
I can't now show error's because i have removed 3 cards. just ccminer crashed system was stable.

If it gets worse with more cards it's not likely a software issue, probably a resource issue like power.

man i have enough power i am using server psu's, don't waste your time
legendary
Activity: 1470
Merit: 1114
The same config with 16 cards + without oc work well. so i don't think that there is trouble with intesivity.

What does "die very fast" mean? How far it gets and how it dies is very important.
Is there an error message? Does it start hashing?

And how many is "16 cards +"? What is the maximum that work?

i have tested 16 18 and 19 cards. 17 haven't tested. 16 works just fine and i have stoped experiments.
when it is 19 gpus it started hashing always after intensity sets and sometimes it even 45 seconds shows hashes of 4 or 5 cards.
when it was 18 gpus it die faster than 19gpu config in stage of intensity sets.
I can't now show error's because i have removed 3 cards. just ccminer crashed system was stable.

If it gets worse with more cards it's not likely a software issue, probably a resource issue like power.
newbie
Activity: 82
Merit: 0
The same config with 16 cards + without oc work well. so i don't think that there is trouble with intesivity.


Remember also.

nVidia themselves have put on a hardcap of max GPU in their drivers for 'gaming' GPU's in one machine.

Seeing you have under the 16 'gaming' GPU and mining GPU also, this shouldn't be an issue.

We are about to build a few of these this coming week, with ALL 'gaming' GPU's, so I think we will cap our systems at 16 GPU for every miner.

We will be mining under the CWI name, so look out for that sort of performance when it comes about, as we will be testing both ccminer-tpruvot and CWIgm-0.9.9

#crysx
i have used the latest drivers 23.21.13.9077
i have used 13 gaming and 6 mining because b250 mining expert mobo only boots in such configuration if you will try 14 gaming your mobo will not boot even.
you mean that i have to use one driver for gaming gpus and another for mining gpus? i just don't know how to do such idea.
newbie
Activity: 82
Merit: 0
The same config with 16 cards + without oc work well. so i don't think that there is trouble with intesivity.

What does "die very fast" mean? How far it gets and how it dies is very important.
Is there an error message? Does it start hashing?

And how many is "16 cards +"? What is the maximum that work?

i have tested 16 18 and 19 cards. 17 haven't tested. 16 works just fine and i have stoped experiments.
when it is 19 gpus it started hashing always after intensity sets and sometimes it even 45 seconds shows hashes of 4 or 5 cards.
when it was 18 gpus it die faster than 19gpu config in stage of intensity sets.
I can't now show error's because i have removed 3 cards. just ccminer crashed system was stable.
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
The same config with 16 cards + without oc work well. so i don't think that there is trouble with intesivity.


Remember also.

nVidia themselves have put on a hardcap of max GPU in their drivers for 'gaming' GPU's in one machine.

Seeing you have under the 16 'gaming' GPU and mining GPU also, this shouldn't be an issue.

We are about to build a few of these this coming week, with ALL 'gaming' GPU's, so I think we will cap our systems at 16 GPU for every miner.

We will be mining under the CWI name, so look out for that sort of performance when it comes about, as we will be testing both ccminer-tpruvot and CWIgm-0.9.9

#crysx
copper member
Activity: 10
Merit: 0
Getting "does not validate on CPU" and "memory access" problems with Tribus on Tpruvot x86. Doesn't happen immediately, but if left on for a while, it starts to do it, maybe 5-10 min.
No overclock and latest Nvidia drivers, default intensity. 2 1080 TI.
legendary
Activity: 1470
Merit: 1114
The same config with 16 cards + without oc work well. so i don't think that there is trouble with intesivity.

What does "die very fast" mean? How far it gets and how it dies is very important.
Is there an error message? Does it start hashing?

And how many is "16 cards +"? What is the maximum that work?
newbie
Activity: 3
Merit: 0

Welcome to the discussion thread for my ccminer fork.



https://imgur.com/BaLuHQy

What´s the boo and yes about?

"Boo" means that the share it's referring to was rejected (in this case the reason for the reject can be found on the next line). "Yes" means that the share it's referring to was accepted.
Pages:
Jump to: