Is this really needed ?
I mean, on windows MSI AB gives already these infos (and some others more important) and on linux there is nv_smi (or something like that) which can give you info about the card temp, fan, clocks speed)
As long as it won't be possible to overclock the card from cudaminer (obviously this is the goal), you will still need to have MSI AB opened to be able to overclock the card, so you already see the temps/fan speed etc... so...
edit: Must admit that MSI AB is quite messy when there are several cards on the system
the basic idea is to enable an emergency shutdown and/or a throttling based on temperature so you don't risk your GPU's life because a fan has failed (nVidia drivers are throttling hard at 95 deg C but I've seen my MARS cards get to 100 deg C during testing). Also these values are an important data points to send through a remote monitoring API - without these the monitoring would be next to useless.
I have to praise this code submission as mission critical (the mission being able to run massive nVidia mining farms).
Christian
Hi,
I do think that the reporting of the extra GPU monitoring data should be an optional feature so that people that don't want it don't have to read it :-)
In regards to internal monitoring vs. external tools like nvidia-smi, I have a few thoughts:
For more humble or small-scale purposes, the patch is also useful in that allows in-band monitoring of the GPUs used by cudaminer, without having to collate separate logs from cudaminer and nvidia-smi.
At the moment the code I sent Christian only does monitoring of a few GPU attributes. The NVIDIA drivers in some cases also allow application clock rates and other settings to be modified on some types of GPUs, and reporting what is limiting the performance of a GPU, whether it is power, heat, or other factors. My existing patch doesn't do any of these other things yet, but as the capabilities of NVIDIA's NVML library increase, it should be possible to add more of them if people want them.
My cudaminer patch uses the same interface that the nvidia-smi utility itself uses (NVML), so potential feature set is similar, except that my cudaminer patch also does one important extra thing: the data reported by my cudaminer patch will exactly match the GPU ID that the hash rates etc are reported for. If you use nvidia-smi, one problem is that it uses a different GPU indexing scheme than CUDA uses, so you would have to sort that out for yourself if you did external monitoring. The issue of the different GPU indexing schemes is most confusing when you have a few identical GPUs where you can't tell them apart by their names. This is one of the things I dealt with in the cudaminer patch that you wouldn't get if you did monitoring with nvidia-smi, and likely other tools.
Cheers,
John Stone