Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards - page 5.

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Quote from: eldentyrell on December 15, 2012, 12:21:20 PM

...
1. If the work is more than a few minutes old, the pool reports "unknown-work" or similar. Most pools only retain the outstanding jobs in RAM, and since this is limited they forget jobs after 90 seconds or so. If you try to submit a nonce from a job older than this the pool will reject it even if it's valid (and even if it would have resulted in finding a block!).
...

Of course, since the work IS invalid.

The problem is your software not adhering to the rules given it by the pool - don't try and shift the blame elsewhere.
The pool states the time that work is valid - and your software should adhere to that.
Yes it is a bug in your miner, as you have implied, but that is all it is - the '90 seconds or so' is not some uncertain number as you are implying, it is specified to you by the pool.

There is also a very important reason why that work SHOULD be invalid - it directly represents increasing BTC transaction confirm times.
If you work on a piece of work for half an hour (on a long block) there will be half an hour of BTC transactions that you have ignored if you find a block.

kakobrekla

hero member

Activity: 714

Merit: 500

Psi laju, karavani prolaze.

Quote from: lukasbradley on December 15, 2012, 07:12:03 PM

Quote from: kakobrekla on December 15, 2012, 01:04:52 PM

Probably a good idea, as long as ignore is added until reasonable sample is gathered (~10min after reset?).

Actually, I'd leave it up to the user. Frankly, I need both. A time to first check samples, and the the frequency of checking rates thereafter.

Lets see what ET will say about it.

lukasbradley

donator

Activity: 90

Merit: 10

Quote from: kakobrekla on December 15, 2012, 01:04:52 PM

Probably a good idea, as long as ignore is added until reasonable sample is gathered (~10min after reset?).

Actually, I'd leave it up to the user. Frankly, I need both. A time to first check samples, and the the frequency of checking rates thereafter.

kakobrekla

hero member

Activity: 714

Merit: 500

Psi laju, karavani prolaze.

Quote from: lukasbradley on December 15, 2012, 12:37:40 PM

Quote from: eldentyrell on December 15, 2012, 12:21:20 PM

Short summary: pool rejections may be caused by a memory leak. Please be sure to use -Dminimum_accept_rate=X until I come up with a proper fix. If you're using this option the bug will cost less than 0.01% of your hashpower.

I'd like to make a suggestion: please add something to the effect of minimum_accept_timeframe, so that we can change the frequency that the accept rate is checked. If I could force TML to check every minute (instead of every 10) that my rate is above 700, I think TML would be superior to BFG.

However, from my tests, the 10 minutes max that a miner may sit idle forces the average hashrate below that of other software.

In short, I would love to be able to have the miner decide to reset itself every minute if hashrate drops below 700.

Probably a good idea, as long as ignore is added until reasonable sample is gathered (~10min after reset?).

lukasbradley

donator

Activity: 90

Merit: 10

Quote from: eldentyrell on December 15, 2012, 12:21:20 PM

Short summary: pool rejections may be caused by a memory leak. Please be sure to use -Dminimum_accept_rate=X until I come up with a proper fix. If you're using this option the bug will cost less than 0.01% of your hashpower.

I'd like to make a suggestion: please add something to the effect of minimum_accept_timeframe, so that we can change the frequency that the accept rate is checked. If I could force TML to check every minute (instead of every 10) that my rate is above 700, I think TML would be superior to BFG.

However, from my tests, the 10 minutes max that a miner may sit idle forces the average hashrate below that of other software.

In short, I would love to be able to have the miner decide to reset itself every minute if hashrate drops below 700.

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

Short summary: pool rejections may be caused by a memory leak. Please be sure to use -Dminimum_accept_rate=X until I come up with a proper fix. If you're using this option the bug will cost less than 0.01% of your hashpower.

I've found a small, slow memory leak in the TML host-side software. It takes at least a week to fill up the JVM heap. If you use the -Xmx=1G command line option to set the JVM heap to something huge like 1GB it will take almost a month.

What's peculiar about this is the failure mode: the TML gets stuck in a mode where it stops loading new work onto the chips, or at least waits way too long between loading jobs. The bitstream is designed in such a way that it doesn't bother checking that it's run through the whole nonce-space -- it just loops around and starts again from the beginning. So if you don't load new work before the nonce-space is exhausted, you get duplicate results. Unfortunately I haven't been checking for these in software, so the duplicates get submitted to the pool. This results in one of two things happening:

1. If the work is more than a few minutes old, the pool reports "unknown-work" or similar. Most pools only retain the outstanding jobs in RAM, and since this is limited they forget jobs after 90 seconds or so. If you try to submit a nonce from a job older than this the pool will reject it even if it's valid (and even if it would have resulted in finding a block!).

2. The job is left running on the ring for longer than (2³²)/(clock_rate/2) seconds, at which point it simply loops through the nonce-space again and starts reporting duplicates. These are reported to the pool, which rejects them.

Since X-Reject-Reason headers aren't standardized, both (1), (2), and stales all count as "rejects". You can see the pool-specific description string in the logfile, but in the statistics I don't separate them because, well, I can't (each pool uses a different text string for each case).

So the end result is that running out of memory manifests itself as what looks like a lot of stales.

I am working on a "proper" fix but I just want to emphasize that if you're using -Dminimum_accept_rate=X this bug has virtually no impact on you. Potential hard-to-debug intermittent performance bugs like this are why I added the -Dminumum_accept_rate option. You really ought to be using it. I'm considering making it mandatory in the next release.

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

Quote from: mining4fun11 on December 03, 2012, 08:25:13 AM

Yes. They start mining and shut off one at a time, some within 5 minutes of starting.

Please post a log, or we can't help you.

If it's too big, put it on pastebin and post a link.

Recently there have been at least three bug reports that turned out to be people running an outdated version of the TML, and the latest version had the fix for the problem. If you post a log, at the very least we can look at the first line where it says what version it is. You'd be surprised how many problems this solves.

Please don't post a question here and email us the log. You can email a question and email the log, or you can post a question and post the log. Please don't do one of each -- not only is it hard to correlate emails and forum posts (peoples' email addresses seem to bear no relation to their usernames), but if you post a problem and email the log, other users don't get to see the fix… even if it was "please use the latest version".

Thanks.

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

TML 1.55 is released.

12.Dec.2012 Version 1.55 Telnet monitor improvements Remove limits on HTTP submit thread pool size

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

Heads up, I am going to be mostly offline until Monday. Kakobrekla (and a few other people on the forum) have my phone number if there is any sort of dire crisis.

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

Quote from: kakobrekla on December 05, 2012, 08:33:13 AM

Quote from: mining4fun11 on December 05, 2012, 08:27:12 AM

Same issues cutting out after running for a little bit. Is anyone else still experiencing this problem on the modminer.

Please submit debugging log.

Aren't you using -Dminimum_accept_rate=X?

kakobrekla

hero member

Activity: 714

Merit: 500

Psi laju, karavani prolaze.

Quote from: mining4fun11 on December 05, 2012, 08:27:12 AM

Quote from: kakobrekla on December 03, 2012, 08:51:13 AM

Any success with http://www.tricone-mining.com/troubleshooting.html ?

Same issues cutting out after running for a little bit. Is anyone else still experiencing this problem on the modminer.

Please submit debugging log.

mining4fun11

member

Activity: 110

Merit: 10

Quote from: kakobrekla on December 03, 2012, 08:51:13 AM

Any success with http://www.tricone-mining.com/troubleshooting.html ?

Same issues cutting out after running for a little bit. Is anyone else still experiencing this problem on the modminer.

kakobrekla

hero member

Activity: 714

Merit: 500

Psi laju, karavani prolaze.

Any success with http://www.tricone-mining.com/troubleshooting.html ?

mining4fun11

member

Activity: 110

Merit: 10

Yes. They start mining and shut off one at a time, some within 5 minutes of starting.

kakobrekla

hero member

Activity: 714

Merit: 500

Psi laju, karavani prolaze.

Do you have the same issues as lukasbradley?

mining4fun11

member

Activity: 110

Merit: 10

Quote from: lukasbradley on December 02, 2012, 06:55:43 PM

Quote from: eldentyrell on December 01, 2012, 07:15:03 PM

Quote from: lukasbradley on December 01, 2012, 11:48:37 AM

Quote from: mining4fun11 on December 01, 2012, 09:46:30 AM

Has the issues with the modminer been corrected yet.

Mine still dies after 20-60 minutes.

The log file you sent me is from tml-1.52. I fixed the minimum_accept_rate bug in 1.53.

That works. When the accept rate drops below, the process resets.

I was just noting that it is not stable long term.

I'll set it back up on that machine tomorrow, and give you shell access, along with a debugging port.

Does this work with just entering the new command line entry or does it need the script.

lukasbradley

donator

Activity: 90

Merit: 10

Quote from: eldentyrell on December 01, 2012, 07:15:03 PM

Quote from: lukasbradley on December 01, 2012, 11:48:37 AM

Quote from: mining4fun11 on December 01, 2012, 09:46:30 AM

Has the issues with the modminer been corrected yet.

Mine still dies after 20-60 minutes.

The log file you sent me is from tml-1.52. I fixed the minimum_accept_rate bug in 1.53.

That works. When the accept rate drops below, the process resets.

I was just noting that it is not stable long term.

I'll set it back up on that machine tomorrow, and give you shell access, along with a debugging port.

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

Quote from: lukasbradley on December 01, 2012, 11:48:37 AM

Quote from: mining4fun11 on December 01, 2012, 09:46:30 AM

Has the issues with the modminer been corrected yet.

Mine still dies after 20-60 minutes.

The log file you sent me is from tml-1.52. I fixed the minimum_accept_rate bug in 1.53.

kakobrekla

hero member

Activity: 714

Merit: 500

Psi laju, karavani prolaze.

Quote from: mining4fun11 on December 01, 2012, 09:46:30 AM

Has the issues with the modminer been corrected yet.

Get latest version (1.54) from here and give it a go. Follow the procedure if it fails.

lukasbradley

donator

Activity: 90

Merit: 10

Quote from: mining4fun11 on December 01, 2012, 09:46:30 AM

Has the issues with the modminer been corrected yet.

Mine still dies after 20-60 minutes.

Tyrell, I'd be happy to open a port today and let you take a look remotely.

Topic: Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards - page 5. (Read 119468 times)