Pages:
Author

Topic: Cointerra Hardware Support **Unofficial - page 15. (Read 56761 times)

member
Activity: 76
Merit: 10
OK so I have got myself a Silverstone Tundra TD3 water pump and replaced my faulty pump.

Unfortunately it is not much better than the cheap Cooler Master Seidon 120 that I've tried a few days ago...


Both could solve my problem, partially.  With the replacement pump I can now mine with all 16 cores, but only if set to Power Step 6.  many chips shuts as soon as I step up to Power Step 7.  Probably the pump is not strong enough to remove the heat???

The pump speed only reads 1200 RPM for TD3, while it's ~1000 RPM for Seidon 120.  Noticed that the original cointerra pumps are 3000 RPM ...


Any idea on how I could boost the replacement pumps so that I could fully utilize my miner to Step 9 again?  Maybe Liquid Pro could help?  I am using Artic Silver 5 for thermal paste now.
member
Activity: 117
Merit: 10

OK, maybe I will try the cgminer first.  Other than getting a version to run on the beaglebone (not sure if it will run with the Angstrom linux version or if I will need to load a different linux version) would I need to do anything else for it to recognize the terraminer boards?  I am assuming cgminer would find them and then start sending the data to the pool provided on the command line.  Does that make sense?


There's a USB trick you need to perform to have cgminer recognize the CT devices, it's described in the cgminer README. Other than that, no fancy stuff, it just works.

https://github.com/ckolivas/cgminer/blob/master/README#L355

The build instructions are also there:

https://github.com/ckolivas/cgminer/blob/master/README#L100

Keep in mind that Angstrom linux (the *normal* version, not the wonky scheme Cointerra boots into) uses a package manager called opkg. So far as I know you can get all the necessary packages to build cgminer from it, but I haven't actually tried it myself, I'm just recalling an almost identical process with KNC.
member
Activity: 76
Merit: 10
Mine is ver. 0.7.6, should it be ok?
hero member
Activity: 686
Merit: 500
Has anyone here replace the Beaglebone in their Terraminer?
Thanks.

If you want to completely duplicate the setup Cointerra put on there, you'd need to image the beaglebone exactly as they did. If you really, really want to have it back as-is, you might be able to get an image dump from someone. Otherwise I'd just load on some default embedded linux version for the beaglebone, compile a version of cgminer, and just run that.


Quote
What firmware version are you running?  I've tried them all, from 0.6.x through 0.8.8.  Makes no difference.

Where are you getting version 0.8.8?

Seriously.. wtf is this 0.8.8 shit? sounds like you got infected. Smiley
newbie
Activity: 60
Merit: 0
Has anyone here replace the Beaglebone in their Terraminer?
Thanks.

If you want to completely duplicate the setup Cointerra put on there, you'd need to image the beaglebone exactly as they did. If you really, really want to have it back as-is, you might be able to get an image dump from someone. Otherwise I'd just load on some default embedded linux version for the beaglebone, compile a version of cgminer, and just run that.


Quote

OK, maybe I will try the cgminer first.  Other than getting a version to run on the beaglebone (not sure if it will run with the Angstrom linux version or if I will need to load a different linux version) would I need to do anything else for it to recognize the terraminer boards?  I am assuming cgminer would find them and then start sending the data to the pool provided on the command line.  Does that make sense?
full member
Activity: 224
Merit: 100
Quote
What firmware version are you running?  I've tried them all, from 0.6.x through 0.8.8.  Makes no difference.

Where are you getting version 0.8.8?

I got it from their tech. support a while back.  I think it's similar to 0.7.6 with added support for extensive debug printout put nicely into a tarball so you can (could) email it to CT.  I don't think CT cares so much anymore.  

Someone had a set of links to all the firmware versions on the CT forums, before they took them down.

They were all up on GitHub too but I'm not sure if they still are.  The devs may have wised up and put them into a private repo.

I suspect if you email [email protected] they might still send it to you.  Who knows.
member
Activity: 117
Merit: 10
Has anyone here replace the Beaglebone in their Terraminer?
Thanks.

If you want to completely duplicate the setup Cointerra put on there, you'd need to image the beaglebone exactly as they did. If you really, really want to have it back as-is, you might be able to get an image dump from someone. Otherwise I'd just load on some default embedded linux version for the beaglebone, compile a version of cgminer, and just run that.


Quote
What firmware version are you running?  I've tried them all, from 0.6.x through 0.8.8.  Makes no difference.

Where are you getting version 0.8.8?
newbie
Activity: 60
Merit: 0
Has anyone here replace the Beaglebone in their Terraminer? Mine is dead and I just purchased a new one, but not completely sure how to load the Cointerra firmware onto it.  I was told I need to point the boot loader at the firmware files and have been doing some research but thought any experience or suggestions would be helpful.

Thanks.
full member
Activity: 224
Merit: 100
Ok, here's some bizarre observational data.

When I run with PL7 or below the bad temp sensor tends to flip on the plus side more often (eg., +300C) and the other core on the board runs in the 70s - 80s.

When I run with PL8 or PL9 the sensor seems to stay in the negative more often ( -299C currently) and the other core runs in the 60s.

Go figure.  More power = lower temp.

Things go kooky when chips and senors get wonky.

EDIT:  not so sure about this anymore.  It seems to flip back to + without rhyme or reason now.  I may have to just disable the board entirely.  First I'll have to check how much power it's drawing.  I suspect it's now a very inefficient miner electricity wise.

EDIT:  yup, it's been showing +300C or so all day, hence no mining from CTA0 as it's throttling itself.  I wish there was a way to force it NOT to throttle.
full member
Activity: 224
Merit: 100
have you tried my idea, or pulling the 3-pin power supply off the bad block?  It should shut that core down, and thus revitalizing the good core remaining on that board.

FYI, I have now one machine with only 3 water blocks connected (one core with no water block, waiting for the new block to arrive tonight).  And it is hashing nicely with 12 chips, 4 at rest because of the lack of any water cooling system connected to it, not even the power connected.

Try this

I did try it.  No luck.  The core does not shut down.  I'm pretty sure that cable only reports pump speed.

What firmware version are you running?  I've tried them all, from 0.6.x through 0.8.8.  Makes no difference.
member
Activity: 76
Merit: 10
have you tried my idea, or pulling the 3-pin power supply off the bad block?  It should shut that core down, and thus revitalizing the good core remaining on that board.

FYI, I have now one machine with only 3 water blocks connected (one core with no water block, waiting for the new block to arrive tonight).  And it is hashing nicely with 12 chips, 4 at rest because of the lack of any water cooling system connected to it, not even the power connected.

Try this
full member
Activity: 224
Merit: 100
Shortly after writing the above, I see my core temps have flopped back to the + side of nonsense.

Core Temp 1 (°C)   34.7   31.24   32.71
Core Temp 2 (°C)   326.59   28.09   105.18

It will occasionally flip back to a very large negative temp, but now seems to be stuck on the positive side, so the CTA is not hashing at all since the system thinks it is running too hot and is throttling it off.

What a pain.
full member
Activity: 224
Merit: 100
you scared the hell out of me ... just received my liquid pro yesterday, and some thermal pads too from coollab

i hasitate to apply now ...

btw, how many chips can one syringe of liquid pro cover?? I only got 1 syringe and doubt it would be enough even for one machine ...

I don't think you need to be scared.  But I will say that I've basically had this happen with the last two boards I've applied TIM to.  The first was over a month ago using Noctua HT-N1.  It seemed fine, then a die went bad (remember there are 8 dies per board).  There was no rhyme or reason to it either.  I inspected afterward an my application was fine.  No overflow, short circuit, or anything.

Most recently, using LP, the same thing happened but worse.  The temp sensor went along with the goldstrike die causing more problems.  A dead die isn't so bad.  You just lose 1/8 of your boards hashing power.   I can't say I blame LP for this either.

Particularly if you have no choice and your machine is running way too hot, then I'd say to go ahead with the LP.  I honestly believe it's better than other TIM.

Some notes:

1.  I modified my jig since I can't find long M3 screws near where I live (Lowe's doesn't have them).  Simply put, I cut the chopsticks down to about 2" long and just used them as temporary screws.  Turns out I really didn't need the foam and the top part of the jig.  The sticks / wooden pins worked just as well as my more complicated jig.  Regardless, you really do need something to keep the water block in place when applying any TIM to these machines.

2.  Use a good solvent to get rid of the old TIM.  I used Arcticlean.  But who knows, maybe that's the culprit here for all I know.  It is a common element.

3.  Make sure you apply LP to both the cores AND the bottom of the water block.  I feel like one syringe of LP can do an entire machine, or pretty close to it.  You really DO NOT NEED MUCH material when applying.  When you "paint" it on to the bottom of the block you'll see.  It's like you're spreading it only a few molecules thick.

4.  After you restart your machine and especially if it seems ok, try to just leave it that way for at least 24 hrs.  I feel like in both instances where I've had a board drop a die, it happened after a restart that occured shortly after application.  I'm speculating here.

Here is something else to consider.  In both my cases, it's been the last core on the board that's gone bad:  core7.  So I'm not convinced this is necessarily the fault of any TIM or application method.  That last core seems to be the weak link in the chain and was also likely the problem core running hot in each case.  So there might be more going on to these failures than we'll never know until some ex employee of CT writes a book or something.

LP goes on more like silver paint than thermal paste.  It also won't stick to a surface until you spread it a bit.  Try to resist the urge to put on too much.

Be careful and you stand a good chance.  But then again, that's what I thought when I started applying Noctua last month (before I went to LP) and I still had a chip go bad.  Any time you work on one of these very fragile boxes you're taking a bit of a chance.
member
Activity: 76
Merit: 10
you scared the hell out of me ... just received my liquid pro yesterday, and some thermal pads too from coollab

i hasitate to apply now ...

btw, how many chips can one syringe of liquid pro cover?? I only got 1 syringe and doubt it would be enough even for one machine ...
full member
Activity: 224
Merit: 100
An update ...

Spent all day trying to get the board back to normal.  No dice.  One of the ASIC chips is fried (only reports 7 dies on the advanced stats page) and the temp sensor is mangled along with it.  I still get swings from +300C do -300C.  But after many reboots and other attempts, it's mostly reading negative low temps right now.  So it gets some hashing in.  Currently:

Core Temp 1 (°C)   65.22   48.64   54.80
Core Temp 2 (°C)   46.94   -216.28   -22.62

I have a hunch that -216.28 actually means 21C (the temp on the dead chip) and that 310C means 31C.  At someplace below 28C it reports a negative value (times -10) and somewhere above 30C it reports as a positive value (times 10) ... that's my guess based on what I've been seeing.  Those scalar multiples may not be exact of course.

So I'm trying to keep the board as cool as possible for now in the hopes I can use SOME of CTA0.
full member
Activity: 224
Merit: 100
For you guys that are thinking of trying Liquid Pro stop and think about this for a minute. When you go to spread Liquid Pro to the chips you will see it takes a little work to get it to cover the chip. Once you have a thin layer on the chips and you apply a little to the center you will see it flow across the chip. So what's going to happen if you don't apply it to the water block also? Its going to push the Liquid Pro off the sides of the chip instead of flowing between them like it should. You need to break the surface tension on both the chips and the water block by spreading just enough to cover each then putting a drop in the center off the chip before assembling it. Measure out where it needs to be if you can't see the outline of the chips on the bottom of the water block. Last thing you want is for the water block to push it somewhere its not supposed to be. Not sure where this applying it just to the chip came from. Liquid Pro is NOT like regular thermal paste it HAS to be spread!

That's a good point.  But ...

1.  The first machine I did this with is working flawlessly.  Maybe I just got lucky.

2.  The second machine was running fine, until whatever episode happened that led to my current situation.  But when I re-applied LP to the 2nd machine I did have it on the water block too.  

It could be that the damage was done once the LP oozed out past the chip die edge and shorted or caused whatever havoc.

I'd generally agree with you though.  If anyone goes down the risky LP road they should apply it to the cooling block too.  It doesn't need much.  Just "paint" enough on there to break the surface tension so that, as you observe, it will bond with the LP on your goldstrike chips.

However, a little bit of LP oozing over the edge of your chip should not wreak such havoc with the motherboard.
full member
Activity: 224
Merit: 100
Yeah, these machines were barely out of beta testing IMHO and they shipped them to us anyway.  I can only imagine how many little hacks and tricks have gone on both in hardware AND software just to get them running in the first place (from the factory).

Now we all have this very delicate machines that can go haywire for the most nonsensical reason.

I doubt it's the pump.  This machine has run perfectly for more than three months.  It was "elective" surgery I performed on it yesterday as one of the cores would sometimes get up to 90C.  Inspired by my initial success with Liquid Pro I thought I'd give it a go.  As I said in my post about it, I had all the tools and everything laid out on my bench so why not, right?

All I did was apply Liquid Pro.  It actually ran perfectly for about 30 min.  Only after I noticed one of the green LEDs wasn't on did this most recent nonsense happen.  Yes, I too have the red light.  I'm pretty sure that's just the board telling me I have a bad chip.  Since the temp sensors are somehow integrated into those chips ... the problem is compounded.

If you have an ASIC chip go out and the temp sensor remain in tact, you're ok.  You'll just hash with one less ASIC.

If you have an ASIC chip go out and the temp sensor gets messed up to, then you're in for a world of pain.  

It is truly annoying.  I hope the class action suit drives CT into bankruptcy.  I bought mine just after November 1 so I'm SOL for that myself.

If they had just made it so you can remove individual ASIC chips all our lives would be better.  Shit, there'd be a market for buying and selling just the damn chips.

I'm sure I'll waste more time on it today and the rest of this week.  It's so aggravating words can't express.
full member
Activity: 169
Merit: 100
For you guys that are thinking of trying Liquid Pro stop and think about this for a minute. When you go to spread Liquid Pro to the chips you will see it takes a little work to get it to cover the chip. Once you have a thin layer on the chips and you apply a little to the center you will see it flow across the chip. So what's going to happen if you don't apply it to the water block also? Its going to push the Liquid Pro off the sides of the chip instead of flowing between them like it should. You need to break the surface tension on both the chips and the water block by spreading just enough to cover each then putting a drop in the center of the chip before assembling it. Measure out where it needs to be if you can't see the outline of the chips on the bottom of the water block. Last thing you want is for the water block to push it somewhere its not supposed to be. Not sure where this applying it just to the chip came from. Liquid Pro is NOT like regular thermal paste it HAS to be spread!
member
Activity: 76
Merit: 10
Let me share my experience with you see if helpful.

I got quite a few machines with 1 core (i.e. 4 chips) dead, temperature at 50C, pump speed at 0, not hashing.  So the machines are hashing at 1200 TH/s max at step 9.

I am so upsat about this, and have tested so many times and have identified which core was the faulty core on one of my machines.  I found one very special hint, that when I swap the top water block 3-pin cable with the bottom water block, the machine fires back to 1600 TH/s, very very briefly for like 3 seconds ... but the temperature reading reads +200C or something like that.  Also, the row of orange chaser lights suddenly showed a "special" red light.  Guess nothing is good about red light usually haha, so I shut off the machine and swap back the 3-pin cables to it's original connection ... everything back to normal.

Now, it hinted me that the chips were not dead, the board was not dead, and if i tricked the board to believe that the water block was runing, it will run.  So, I suspect it must be something wrong with the water block !!!

So, I went off the the computer and bought myself the cheapest water blcok available, and spend the whole weekend learning how to install a water block.  I remove the faulty water block with the new block.  Guess what ..... it did start for a while back on with 2 chips in the faulty core !!!!! big improvement already.... now I see that pump speed was only 1000 RPM, I was guessing that because it is too hot so only 2 chips ran.  I step down to power step 6 .... now guess what, the machine hashes with all 16 chips !!! which it had not done so for over 2 weeks !!!  The pump IS the problem.  So I went off and bought myself some higher grade water blocks!!! Since it was out of stock ... now I'm waiting for the new block to arrive and will further test, but it is looking good, as I have successfully revitalized the dead chips albit could only run at step 6 (the whole board, even the good core, shuts as soon as i step up to step 7).


Now I see your problem, I hesitate to apply the Liquid Pro, which I have just received today.

But anyway, if your problem is only with 1 chip, you can remove the 3-pin power to that specific faulty water block, it powers it off, trust me, I have done this.  If it's stopped, that core will not start (there will be reading, but it is not hashing without the cooling system going online), then the good core can still run happily without being affected by the bad core, at least you get 4 chips back.

full member
Activity: 224
Merit: 100
I have faced many problems with my machines... half board not working, 1 core not runing, water pump showing 0 flow rate.

I did some experiments over the past week, and your +- 200C problem happened to me once and only once:---   Did you switch the 3-pin cable connections of your 2 water blocks on the board?  (i.e. the bottom water block takes power from the the top, and the top block taking power from the bottom) 

The water blocks HAVE to take power from the exact 3-pin supply that it was originally shipped, that's how the board communicate with the block and check the temperature.  If you mess with 3-pin cables, the board will get confused.

I'm 99% sure the cables coming from the water blocks are for reporting pump speed and nothing more.  You can verify this by unplugging them and starting up your machine.  It will run pretty much as normal but there will be no pump speed reported on the status page.

Just for fun, I tried your switch.  No difference.  Same problems.

Regardless, I did have all the cables connected exactly as they were. 

It seems like some kind of sensor mapping / byte issue in the firmware.  IE., the high temp will shoot up to +300C (or some other unrealistic number) then the next second the lowest temp is -300C.  It flip flops and if you're unlucky landing on the + side you stay throttled until it flips to a minus reading.  If the chip really did get to 300C it would likely burn out.  So ...

The negative temp reading was discussed on the CT forums.  I'm not sure if anyone discovered a workaround. 

It might be, in my case, that I had the extreme misfortune of a tiny bit of LP shorting out or otherwise messing up one of the temp sensors on one of the goldstrike chips.  Now it's borked and can't be reset to behave normally.  And as of right now, that spells GAME OVER for the entire board ... along with the other seven chips.

I'm back to hating CT again.  The machines are way too fragile for the $$$ we paid for them.
Pages:
Jump to: