Pages:
Author

Topic: Efudd Z-Series Fuddware 2.3 -Z11/Z11e/Z11j/Z9/Mini - page 27. (Read 45501 times)

member
Activity: 504
Merit: 51
...snip...

I Reflashed the firmware and now i can push it higher then 656mhz. Temps now is 49-50c fans 2000 rpm

Okey. well if it starts to drop When there is to cold outside i have to split the intake air abit Smiley Thanks for that info.

Another question..

I also tried the Biggie firmware earlier, with the same Mhz it could spike to +17ksols avg was the around 14.5 i think
i got higher spikes with the "biggie" firmware then the mini.

Avg is better at the mini FW though Smiley
Good job! will buy the license when you start bringing in new ones Smiley


The spikes are gonna be completely random for what it is worth. Your miner could get really lucky on calculations for a few seconds and jump to 2x what you would otherwise expect, but the average is where the truth really sits.

I honestly am not sure I am going to sell new licenses and instead stick with the dev supported model. It actually is cheaper for users that way to be honest... it'll take 3-6 months of runtime or more for me to make up what the license fee was at 3%. It's just a lot easier on me to not manage individual licenses.

Jason
newbie
Activity: 9
Merit: 0

This is a very good question. First on the CRC error -- that is going to happen some and is only a problem if it is constant. It happens on even the stock firmwares depending on machines, temps, frequencies, and phase of the moon.

Temperatures will play into how far you can push these, but there is not a clear formula for that. What's really interesting is I have a customer with a large install (1000+ machines) who has observed that there is a point where the machines get too cold and slow down! I'm unsure of the exact details on the temperatures, just the observation that was shared with me.

So yes, temperature has a play both when going up and when going down.

The summers here are very hot -- my miners I had to constantly tune even through the day to get maximum out of them; they always ran best at night.

I hope this helps some.

Jason

I Reflashed the firmware and now i can push it higher then 656mhz. Temps now is 49-50c fans 2000 rpm

Okey. well if it starts to drop When there is to cold outside i have to split the intake air abit Smiley Thanks for that info.

Another question..

I also tried the Biggie firmware earlier, with the same Mhz it could spike to +17ksols avg was the around 14.5 i think
i got higher spikes with the "biggie" firmware then the mini.

Avg is better at the mini FW though Smiley
Good job! will buy the license when you start bringing in new ones Smiley
member
Activity: 504
Merit: 51
Definitely once a day. Just a quick question, is there a way I can set the frequency for the different hashboards via PuTTY / the JSON?

Yessir, bitmain-freq1, bitmain-freq2, bitmain-freq3 are the 3 variables for that.

The only caveat is if you set the frequencies via that method the web interface will not get updated to reflect it until you go into the web interface and "save frequencies".

Jason

member
Activity: 504
Merit: 51
Thoughts.

z9 mini
I have my miners Hosted outside, OR direct outside air, we have been having around -15c and the miner worked great got it up to stable at 681mhz since release.
2 fans front 1800rpm rear 1640 rpm Chips temp around 28-30c  hash Avg 14.9ksols

Today the weather Drastically changed to +2c and i got all 3 boards xxxx

seems that it was the bm1740_verify_nonce_integrality CRC error. Reebooted but only took like 7min then got the same error again.
But after that ive only been able to maintain 656mhz.

soon as i go above that i loose one board.

CAN it be possible that the colder the chips can be maintained the higher mhz we can maintain? I never tried above 681mhz

This is a very good question. First on the CRC error -- that is going to happen some and is only a problem if it is constant. It happens on even the stock firmwares depending on machines, temps, frequencies, and phase of the moon.

Temperatures will play into how far you can push these, but there is not a clear formula for that. What's really interesting is I have a customer with a large install (1000+ machines) who has observed that there is a point where the machines get too cold and slow down! I'm unsure of the exact details on the temperatures, just the observation that was shared with me.

So yes, temperature has a play both when going up and when going down.

The summers here are very hot -- my miners I had to constantly tune even through the day to get maximum out of them; they always ran best at night.

I hope this helps some.

Jason
newbie
Activity: 5
Merit: 0
Definitely once a day. Just a quick question, is there a way I can set the frequency for the different hashboards via PuTTY / the JSON?
newbie
Activity: 9
Merit: 0
Thoughts.

z9 mini
I have my miners Hosted outside, OR direct outside air, we have been having around -15c and the miner worked great got it up to stable at 681mhz since release.
2 fans front 1800rpm rear 1640 rpm Chips temp around 28-30c  hash Avg 14.9ksols

Today the weather Drastically changed to +2c and i got all 3 boards xxxx

seems that it was the bm1740_verify_nonce_integrality CRC error. Reebooted but only took like 7min then got the same error again.
But after that ive only been able to maintain 656mhz.

soon as i go above that i loose one board.

CAN it be possible that the colder the chips can be maintained the higher mhz we can maintain? I never tried above 681mhz
member
Activity: 504
Merit: 51
Folk,

I wanted to get some feedback on dev-fees: Once per day, or split up throughout the day? I've had feedback from both, but am leaning towards once-per-day.

I personally think that the once-per-day has the least impact since it greatly reduces the swapping/moving things around.

Can you please share you view point on this and reasoning why?

Thank you,

Jason
member
Activity: 504
Merit: 51
@efudd Please read PM. Thank's.

@Marchcat2008 - Responded. AT this point in time, I am not selling new licenses. The developer supported version will remain available. If this changes in the future, I will update the original post in this thread.

Thank you,

Jason
newbie
Activity: 17
Merit: 0
@efudd Please read PM. Thank's.
member
Activity: 504
Merit: 51
Excellent good job dude! I want that PS4  Grin

Install now! Supplies are Limited!

I was gonna return it and then thought... wait a second!

Best of luck to you. It'll go to someone!

(I can't help but type this while reading it in a "Saul Goodman" voice).

Jason
full member
Activity: 192
Merit: 119
★Bitvest.io★ Play Plinko or Invest!
Excellent good job dude! I want that PS4  Grin
member
Activity: 504
Merit: 51
Folk,

For the month of December, I will be running a contest for users of the Z9 and Z9 Mini version 2.1 or later firmware. There will be one automatic entry per day per miner. On 12/24, a random machine will be selected. The Summary page on the version 2.1 firmware will be automatically updated to let you know who the winner is. Details will be posted in the thread here and on the Equihash discord.

I've created a thread to discuss this at https://bitcointalksearch.org/topic/december-z9z9mini-firmware-users-ps4spiderman-bundle-giveaway-thread-5077347

Users will find your Summary page on the miner updating with this information over the next 24 hours automatically.

I have put details in the original post, but will copy here also.



Thank you,

Jason
member
Activity: 504
Merit: 51
I installed 2.1 and I don't have an option to upload a licences file. 

The system page says:
Efudd's Z9 Series Firmware v2.1
No dev-fee until 12/01/2018!

But under upgrade no option to upload a license file.

P.S. My Z9 is running faster now then your old firm ware with no clock changes.

Refresh your browser cache on the upgrade page. Shift-f5 or cmd-f5 if you are on a Mac. Once uploaded, the license will tell you if applied. The summary page will update on the next poll or restart with your license status. That page may be cached as well.. same thing.

In the next release (currently being tested), I have fixed the page cache issues.

-Jason
member
Activity: 449
Merit: 24
I installed 2.1 and I don't have an option to upload a licences file. 

The system page says:
Efudd's Z9 Series Firmware v2.1
No dev-fee until 12/01/2018!

But under upgrade no option to upload a license file.

P.S. My Z9 is running faster now then your old firm ware with no clock changes.
member
Activity: 504
Merit: 51
... snip ...
Kernel log,monitor log, screenshot System,Miner Status and img fan emu. Password in PM
Link https://dropmefiles.com/3iSv8

Thank you for those details. I think I might know what is going on -- can you check your PM and email me at the email address I provided?

Once we can get confirmation, I should be able to get this fixed reasonably quickly.
....snip...
Ok, this is going to take a little longer. This is Roskomnadzor. I am looking for a workaround.

xkosx - by the time you wake up the issue should be resolved. I have migrated primary services to something not blocked by Roskomnadzor. In fact, I can already see russian installations coming online.

Thank you,

Jason
member
Activity: 504
Merit: 51
...snip...
This should fix the problem the majority of the time if not completely. Doing a double check gives cgminer 120 seconds and a double check before it starts a new instance of the miner.

place this in the monitorcg file

#!/bin/sh
#set -x
check_inter="60s"
while true; do
   sleep $check_inter
   #date
   a="$(ps | grep cgminer | grep -v 'grep cgminer')"
   if [ -z "$a" ] ; then
      chk_again
   fi

chk_again() {
while true; do
   sleep $check_inter
   #date
   a="$(ps | grep cgminer | grep -v 'grep cgminer')"
   if [ -z "$a" ] ; then
      /etc/init.d/cgminer.sh restart
   fi
}
done


There also needs to be some checking added for the asic status. If there is x number of acics reporting an x for status then cgminer restarts or the system reboots. This can help when overclocking and out of the blue a board fails. The miner will at least restart rather then stay dropped out losing speed

Please clean up your quotes instead of re-quoting everyone else before you each time.

That will not fix the race. You can add "chk_again" as many times as you want, it does not remove it.

Jason
jr. member
Activity: 559
Merit: 4
Folk,

I'd like to share a lesson's learned that may be useful to others. I have a Z9mini here that I just checked and it has a hash rate of 0. Immediately I went to the logs to see what was going on and found the following:

Code:
Nov 28 21:33:52 (none) local0.err cgminer[23558]: bm1740_verify_nonce_integrality CRC error. cal-crc=374c, chip-crc=60bf
Nov 28 21:33:52 (none) local0.warn cgminer[23558]: receive a error nonce. total = 8908
Nov 28 21:33:52 (none) local0.err cgminer[23546]: bm1740_verify_nonce_integrality CRC error. cal-crc=ac2d, chip-crc=3f77
Nov 28 21:33:52 (none) local0.warn cgminer[23546]: receive a error nonce. total = 8705

The key here is that these are happening constantly, every second, the count is up to 8000+. (If these are infrequet, every few minutes, to hours, they can be ignored) What's going on? To figure that out, let's take a look at the process list ("System" -> "Monitor")... and what do I find?

Code:
23546 23545 root     S <   225m  98%  50% /usr/bin/cgminer --version-file=/usr/bin/compile_time --config=/config/cgminer.conf -T --syslog
23558 23557 root     S <   257m 111%  40% /usr/bin/cgminer --version-file=/usr/bin/compile_time --config=/config/cgminer.conf -T --syslog

Two copies of cgminer running! How could that happen? The answer is in this little program right here:

Code:
1012     1 root     S     2152   1%   0% {monitorcg} /bin/sh /sbin/monitorcg

This is a factory process that tries to be a "watchdog" for cgminer and restart it if it is not running. From the factory it ran every 20 seconds, but I modified it to sleep for 60 seconds to try to limit the possibility of this race condition.

What happens is if you change frequency or pool configuration, cgminer is stopped and restarted. While that stop/start is occurring, monitorcg has a change to see cgminer is not running and start one itself. End result: Two cgminer's stepping on each other.

I may end up removing /sbin/monitorcg from the firmware as I've attempted to fix this particular race a myriad of ways... but when two separate processes (web interface actions and monitorcg) are both touching the same resource ("cgminer"), there is not any good way to prevent them from stepping on each other unless they are talking to each other constantly to achieve what is called "quorum".

What's the lesson here? Many times the errors that you may see are a function of this particular race condition.... and if you have two cgminer processes running, the fix is to kill/restart them. The simplest way to do that is ust to go to the frequency page and click submit. That will terminate both cgminers and hopefully restart it before monitorcg tries to help. A guaranteed way to fix it is to reboot, but I am not a fan of unnecessary reboots.

Hopefully this bit of information will be useful to someone. I've been meaning to write posts like this explaining various scenarios for a while.

Thank you,

Jason

This should fix the problem the majority of the time if not completely. Doing a double check gives cgminer 120 seconds and a double check before it starts a new instance of the miner.

place this in the monitorcg file

#!/bin/sh
#set -x
check_inter="60s"
while true; do
   sleep $check_inter
   #date
   a="$(ps | grep cgminer | grep -v 'grep cgminer')"
   if [ -z "$a" ] ; then
      chk_again
   fi

chk_again() {
while true; do
   sleep $check_inter
   #date
   a="$(ps | grep cgminer | grep -v 'grep cgminer')"
   if [ -z "$a" ] ; then
      /etc/init.d/cgminer.sh restart
   fi
}
done


There also needs to be some checking added for the asic status. If there is x number of acics reporting an x for status then cgminer restarts or the system reboots. This can help when overclocking and out of the blue a board fails. The miner will at least restart rather then stay dropped out losing speed
member
Activity: 504
Merit: 51
Folk,

I'd like to share a lesson's learned that may be useful to others. I have a Z9mini here that I just checked and it has a hash rate of 0. Immediately I went to the logs to see what was going on and found the following:

Code:
Nov 28 21:33:52 (none) local0.err cgminer[23558]: bm1740_verify_nonce_integrality CRC error. cal-crc=374c, chip-crc=60bf
Nov 28 21:33:52 (none) local0.warn cgminer[23558]: receive a error nonce. total = 8908
Nov 28 21:33:52 (none) local0.err cgminer[23546]: bm1740_verify_nonce_integrality CRC error. cal-crc=ac2d, chip-crc=3f77
Nov 28 21:33:52 (none) local0.warn cgminer[23546]: receive a error nonce. total = 8705

The key here is that these are happening constantly, every second, the count is up to 8000+. (If these are infrequet, every few minutes, to hours, they can be ignored) What's going on? To figure that out, let's take a look at the process list ("System" -> "Monitor")... and what do I find?

Code:
23546 23545 root     S <   225m  98%  50% /usr/bin/cgminer --version-file=/usr/bin/compile_time --config=/config/cgminer.conf -T --syslog
23558 23557 root     S <   257m 111%  40% /usr/bin/cgminer --version-file=/usr/bin/compile_time --config=/config/cgminer.conf -T --syslog

Two copies of cgminer running! How could that happen? The answer is in this little program right here:

Code:
1012     1 root     S     2152   1%   0% {monitorcg} /bin/sh /sbin/monitorcg

This is a factory process that tries to be a "watchdog" for cgminer and restart it if it is not running. From the factory it ran every 20 seconds, but I modified it to sleep for 60 seconds to try to limit the possibility of this race condition.

What happens is if you change frequency or pool configuration, cgminer is stopped and restarted. While that stop/start is occurring, monitorcg has a change to see cgminer is not running and start one itself. End result: Two cgminer's stepping on each other.

I may end up removing /sbin/monitorcg from the firmware as I've attempted to fix this particular race a myriad of ways... but when two separate processes (web interface actions and monitorcg) are both touching the same resource ("cgminer"), there is not any good way to prevent them from stepping on each other unless they are talking to each other constantly to achieve what is called "quorum".

What's the lesson here? Many times the errors that you may see are a function of this particular race condition.... and if you have two cgminer processes running, the fix is to kill/restart them. The simplest way to do that is ust to go to the frequency page and click submit. That will terminate both cgminers and hopefully restart it before monitorcg tries to help. A guaranteed way to fix it is to reboot, but I am not a fan of unnecessary reboots.

Hopefully this bit of information will be useful to someone. I've been meaning to write posts like this explaining various scenarios for a while.

Thank you,

Jason
member
Activity: 504
Merit: 51
...snip...

Netstat on the machine don't lie about how many times you are trying to connect. Your in and out of dev mode and the callbacks to your server is telling the miner what mode to mine in. More or less the firmware is dependent on the ability to reach your servers and if it don't then it stops mining and the user lost some shares. I ran for hours and it would only mine if it connected to your server at bootup it never went into a bypass mode. Not a smear just the facts about your release.

Thank you for your feedback and I'm sorry the firmware did not meet your needs. Please let me know if I can assist in some manner in the future.

Jason
jr. member
Activity: 559
Merit: 4

Never said I didn't know what I was doing and never used your work or others besides the factory firmware. As far as the lockout problem your firmware will have the same issue at some point unless you found the cause. By your statements you have had the issue or have helped already. Your fix was just do a recovery but you don't know the cause of the issue as far as I know your explanation on my post was unsure. I forget but in the end you still had to tell them to do a recovery.


I pointed out that I had already included the recovery conditions as a worst case, not that I had seen the issue. I have had to use the recovery exactly once when I bricked my own machine on the very first image I ever made. I was attempting to be polite. Users won't have that issue with my releases unless it is flat out concocted.


You also need to inform your customers that their miner on the current 2.1 version may stop mining if it cant reach your api callback server us-api1.fudd.net or your dev-fee pool zec-bj.ss.poolin.com or they may lose shares on a callback or dev-fee connection error. I did some testing and from bootup for 6 hours and anytime the callback couldnt reach your server or the dev server it gave the error I posted to you in an earlier post it which the kernel log stated that about 100 shares were lost and I stopped mining at different times and eventually would start again. Callbacks seem to happen more then once every 280 minutes also.

To sumit up if your server is down or unreachable the miner may stop mining all servers for a short time and the owner lose shares because of it.

If the authorization server is down, the inability to start is a purposeful design decision. What you are missing there is how that is cached and handled in various failure scenarios and the fact there is regionalization on the API server you have not yet found. You've also not found the downtime mode that exists such that maintenance server-side can occur without downtime to miners. Nor have you found the retry mechanisms to ensure continuity in mining. Instead, you've created fake scenarios without understanding what you are doing.

As far as the pool being down, no, you have done something wrong. cgminer will simply find another pool if a pool is unreachable.

As far as the callback goes, you are also simply wrong there, too unless you were repeatedly restarting cgminer and purposefully breaking the network.

As far as 280 minutes go, you are also simply wrong there as well. In fact, 280 minutes is not even configured at the moment. Paid user callbacks are once a day. Dev user callbacks are once every 2 hours simply for testing purposes (more traffic) and will drop back to once a day by 12/01.

As far as your analysis about lost shares, there is a piece you are missing there as well; shares there are not defined as "successful shares waiting to be submitted to the server", but rather the current getwork queue, many of which will be discarded locally since they do not meet the targets.


Enjoy the input


i did! Nice smear attempted though.

Jason


Netstat on the machine don't lie about how many times you are trying to connect. Your in and out of dev mode and the callbacks to your server is telling the miner what mode to mine in. More or less the firmware is dependent on the ability to reach your servers and if it don't then it stops mining and the user lost some shares. I ran for hours and it would only mine if it connected to your server at bootup it never went into a bypass mode. Not a smear just the facts about your release.
Pages:
Jump to: