Pages:
Author

Topic: Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards - page 5. (Read 119460 times)

legendary
Activity: 4634
Merit: 1851
Linux since 1997 RedHat 4
...
 1. If the work is more than a few minutes old, the pool reports "unknown-work" or similar.  Most pools only retain the outstanding jobs in RAM, and since this is limited they forget jobs after 90 seconds or so.  If you try to submit a nonce from a job older than this the pool will reject it even if it's valid (and even if it would have resulted in finding a block!).
...
Of course, since the work IS invalid.

The problem is your software not adhering to the rules given it by the pool - don't try and shift the blame elsewhere.
The pool states the time that work is valid - and your software should adhere to that.
Yes it is a bug in your miner, as you have implied, but that is all it is - the '90 seconds or so' is not some uncertain number as you are implying, it is specified to you by the pool.

There is also a very important reason why that work SHOULD be invalid - it directly represents increasing BTC transaction confirm times.
If you work on a piece of work for half an hour (on a long block) there will be half an hour of BTC transactions that you have ignored if you find a block.
hero member
Activity: 714
Merit: 500
Psi laju, karavani prolaze.
Probably a good idea, as long as ignore is added until reasonable sample is gathered (~10min after reset?).

Actually, I'd leave it up to the user.  Frankly, I need both.  A time to first check samples, and the the frequency of checking rates thereafter.

Lets see what ET will say about it.
donator
Activity: 90
Merit: 10
Probably a good idea, as long as ignore is added until reasonable sample is gathered (~10min after reset?).

Actually, I'd leave it up to the user.  Frankly, I need both.  A time to first check samples, and the the frequency of checking rates thereafter.
hero member
Activity: 714
Merit: 500
Psi laju, karavani prolaze.
Short summary: pool rejections may be caused by a memory leak.  Please be sure to use -Dminimum_accept_rate=X until I come up with a proper fix.  If you're using this option the bug will cost less than 0.01% of your hashpower.

I'd like to make a suggestion: please add something to the effect of minimum_accept_timeframe, so that we can change the frequency that the accept rate is checked.  If I could force TML to check every minute (instead of every 10) that my rate is above 700, I think TML would be superior to BFG.

However, from my tests, the 10 minutes max that a miner may sit idle forces the average hashrate below that of other software.

In short, I would love to be able to have the miner decide to reset itself every minute if hashrate drops below 700.

Probably a good idea, as long as ignore is added until reasonable sample is gathered (~10min after reset?).
donator
Activity: 90
Merit: 10
Short summary: pool rejections may be caused by a memory leak.  Please be sure to use -Dminimum_accept_rate=X until I come up with a proper fix.  If you're using this option the bug will cost less than 0.01% of your hashpower.

I'd like to make a suggestion: please add something to the effect of minimum_accept_timeframe, so that we can change the frequency that the accept rate is checked.  If I could force TML to check every minute (instead of every 10) that my rate is above 700, I think TML would be superior to BFG.

However, from my tests, the 10 minutes max that a miner may sit idle forces the average hashrate below that of other software.

In short, I would love to be able to have the miner decide to reset itself every minute if hashrate drops below 700.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Short summary: pool rejections may be caused by a memory leak.  Please be sure to use -Dminimum_accept_rate=X until I come up with a proper fix.  If you're using this option the bug will cost less than 0.01% of your hashpower.

I've found a small, slow memory leak in the TML host-side software.  It takes at least a week to fill up the JVM heap.  If you use the -Xmx=1G command line option to set the JVM heap to something huge like 1GB it will take almost a month.

What's peculiar about this is the failure mode: the TML gets stuck in a mode where it stops loading new work onto the chips, or at least waits way too long between loading jobs.  The bitstream is designed in such a way that it doesn't bother checking that it's run through the whole nonce-space -- it just loops around and starts again from the beginning.  So if you don't load new work before the nonce-space is exhausted, you get duplicate results.  Unfortunately I haven't been checking for these in software, so the duplicates get submitted to the pool.  This results in one of two things happening:

 1. If the work is more than a few minutes old, the pool reports "unknown-work" or similar.  Most pools only retain the outstanding jobs in RAM, and since this is limited they forget jobs after 90 seconds or so.  If you try to submit a nonce from a job older than this the pool will reject it even if it's valid (and even if it would have resulted in finding a block!).

 2. The job is left running on the ring for longer than (232)/(clock_rate/2) seconds, at which point it simply loops through the nonce-space again and starts reporting duplicates.  These are reported to the pool, which rejects them.

Since X-Reject-Reason headers aren't standardized, both (1), (2), and stales all count as "rejects".  You can see the pool-specific description string in the logfile, but in the statistics I don't separate them because, well, I can't (each pool uses a different text string for each case).

So the end result is that running out of memory manifests itself as what looks like a lot of stales.

I am working on a "proper" fix but I just want to emphasize that if you're using -Dminimum_accept_rate=X this bug has virtually no impact on you.  Potential hard-to-debug intermittent performance bugs like this are why I added the -Dminumum_accept_rate option.  You really ought to be using it.  I'm considering making it mandatory in the next release.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Yes.  They start mining and shut off one at a time, some within 5 minutes of starting.  

Please post a log, or we can't help you.

If it's too big, put it on pastebin and post a link.

Recently there have been at least three bug reports that turned out to be people running an outdated version of the TML, and the latest version had the fix for the problem.  If you post a log, at the very least we can look at the first line where it says what version it is.  You'd be surprised how many problems this solves.

Please don't post a question here and email us the log.  You can email a question and email the log, or you can post a question and post the log.  Please don't do one of each -- not only is it hard to correlate emails and forum posts (peoples' email addresses seem to bear no relation to their usernames), but if you post a problem and email the log, other users don't get to see the fix… even if it was "please use the latest version".

Thanks.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
TML 1.55 is released.


12.Dec.2012  Version 1.55
             Telnet monitor improvements
             Remove limits on HTTP submit thread pool size
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Heads up, I am going to be mostly offline until Monday.  Kakobrekla (and a few other people on the forum) have my phone number if there is any sort of dire crisis.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Same issues cutting out after running for a little bit.  Is anyone else still experiencing this problem on the modminer.

Please submit debugging log.

Aren't you using -Dminimum_accept_rate=X?
hero member
Activity: 714
Merit: 500
Psi laju, karavani prolaze.

Same issues cutting out after running for a little bit.  Is anyone else still experiencing this problem on the modminer.


Please submit debugging log.
member
Activity: 110
Merit: 10

Same issues cutting out after running for a little bit.  Is anyone else still experiencing this problem on the modminer.
hero member
Activity: 714
Merit: 500
Psi laju, karavani prolaze.
member
Activity: 110
Merit: 10
Yes.  They start mining and shut off one at a time, some within 5 minutes of starting. 
hero member
Activity: 714
Merit: 500
Psi laju, karavani prolaze.
Do you have the same issues as lukasbradley?
member
Activity: 110
Merit: 10
Has the issues with the modminer been corrected yet.

Mine still dies after 20-60 minutes.

The log file you sent me is from tml-1.52.  I fixed the minimum_accept_rate bug in 1.53.

That works.  When the accept rate drops below, the process resets.

I was just noting that it is not stable long term.

I'll set it back up on that machine tomorrow, and give you shell access, along with a debugging port.

Does this work with just entering the new command line entry or does it need the script. 
donator
Activity: 90
Merit: 10
Has the issues with the modminer been corrected yet.

Mine still dies after 20-60 minutes.

The log file you sent me is from tml-1.52.  I fixed the minimum_accept_rate bug in 1.53.

That works.  When the accept rate drops below, the process resets.

I was just noting that it is not stable long term.

I'll set it back up on that machine tomorrow, and give you shell access, along with a debugging port.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Has the issues with the modminer been corrected yet.

Mine still dies after 20-60 minutes.

The log file you sent me is from tml-1.52.  I fixed the minimum_accept_rate bug in 1.53.
hero member
Activity: 714
Merit: 500
Psi laju, karavani prolaze.
Has the issues with the modminer been corrected yet.

Get latest version (1.54) from here and give it a go. Follow the procedure if it fails.
donator
Activity: 90
Merit: 10
Has the issues with the modminer been corrected yet.

Mine still dies after 20-60 minutes.

Tyrell, I'd be happy to open a port today and let you take a look remotely.
Pages:
Jump to: