Pages:
Author

Topic: Cairnsmore1 - Quad XC6SLX150 Board - page 70. (Read 286370 times)

sr. member
Activity: 462
Merit: 251
July 09, 2012, 05:05:09 AM
USB Problem

We have found a bug in the FTDI drivers/CGminer used with/for the Twin bitstream. At the moment we know this is an issue in Windows7 64bit but may extend to other OS versions or even Linux. Basically we are seeing some COM ports "lost" when the host plugs and plays a new device. That can be any device e.g. memory stick. We are looking at this to determine the cause and potential fixes.
sr. member
Activity: 462
Merit: 251
July 09, 2012, 03:05:32 AM

Yohan,

could you maybe provide some information about the changes this controller update contains? When you say it is for the underperforming devices, does this include those that do not generate valid shares at all and/or those failing the golden nonce test?

I'd need to setup a Windows machine (VM is too unstable for that) for programming the controller and I'd at least need to know that it has some potential improvement for the failures I see most.


Thanks, Zefir

Ok we don't know if this will fully fixes all problems because as yet we don't setups here that show the same problems some of you have and that is why we need work on each problem individually. We do need each board that you have a problem with to be reported to the bitcoin support email with the full circumstances. Not everyone on the team has the time to wade through the forum and they don't unless they stop work on new features or support work so that means problems can be missed. I will try to patch the gaps but I also have a limit in what I can find time to do. It's much better if the information arrives at the correct place and several people get to see it. It also acts as a log we can go back through then also.

Ok what we do know is that this build appears to improve the clocking at 100MHz i.e. that used for the Twin build. We do have some more tests to check this out and that is why it is beta. What we don't know is whether the clocking is the problem failure of the golden nonce test. I wasn't aware of that issue and that is much more likely to be a setup, software or firmware issue. We think that the FPGA DCMs lose lock sometimes and we already have fixes for this also in the working FPGA end which will be available in our own bitstream design that we can't do to the Twin which basically isn't our design.

We also think a few rigs may be suffering from power surge issues particularly at start up. This a problem in several parts including the host power supply quality, wiring quality, and even the Cairnsmore1 itself. This might explain some of the USBs not enumerating but there are also other possibilities for that including faulty USB cables that we did have a few problem ones of. Rev 1.2 has power startup sequencing so that we power each of the 4 power sections in a sequence over 4 seconds and that switch on sequence will be very obvious when you power the boards. This softens the surge on the board and we will extend this rig wide when the up/down becomes functional. One the things that is different with Cairnsmore1 is that there are several large rigs using Cairnsmore1 the size of which have not been seen before in Bitcoin mining. These bring new design challenges in things like power and cooling and we have designed Cairnsmore1 to cope with these extra challenges. Cairnsmore2 when we do that will take the concept to a much bigger level again.

We have also phased on board clocks in the new controller build to reduce "beat" cycles on the power supply. Every FPGA doing exactly the same thing at the same time is a very good way to cause beat surges on the PSU so if we can move those slightly apart then that is good thing to do and make life easier for the power supply.




Can you clarify the directions on the loader.pdf for the controller switch positions when programming the controller?  I can clearly see SW1, and SW6 Directions however, I just assumed that SW 2, 3, 4, and 5 were all off., is that correct?

I tried the update on the broken board but same results with it not being able to mine.

Wrong For the Controller I don't thing you need to change dip switches from normal settings but I will check that.

Look in document xc3s50an_loader_v1.1.bit that is in the zip for the 1.1 update for the dip switch settings.

sr. member
Activity: 327
Merit: 250
July 08, 2012, 08:04:47 PM
An update (Rev 1.2) to the controller is available on http://www.enterpoint.co.uk/cairnsmore/cairnsmore1_support_materials.html. I would not update your board to this unless you have a unit that is underperforming with the twin bitstream. As with all controller updates please be careful that you understand the instructions (Rev 1.1 update) fully before starting and ensure that your power is unlikely to be interrupted. If a controller update goes wrong it is likely that a programming cable will be necessary to perform a unit recovery but do take note of the first line recovery method if first attempt goes wrong another go is possible if unit remains powered.

We will do some more testing and work on the controller this week and there may be further updates.

Whatever revision 1.2+ we reach in the next few days will remain available to support 3rd party bitstream providers longterm but is then likely to be frozen when our own original bitstream becomes available. At this point the controller get a major update to Rev 2.0 signifying the major changes in functionality and all development will be on this branch.

Can you clarify the directions on the loader.pdf for the controller switch positions when programming the controller?  I can clearly see SW1, and SW6 Directions however, I just assumed that SW 2, 3, 4, and 5 were all off., is that correct?

I tried the update on the broken board but same results with it not being able to mine.
newbie
Activity: 49
Merit: 0
July 08, 2012, 03:19:43 PM
Ive flashed the 1.2 controller update and things seem a bit better for me so far, previously i was seeing the second pga only producing 50% accepted shares compared the first pga, tested over multiple 12/24/48 hour runs, i.e.;
Code:
 ICA 0:                | 351.2/360.4Mh/s | A:1311 R:1 HW:0 U:1.49/m
 ICA 1:                | 352.0/359.3Mh/s | A: 635 R:0 HW:0 U:0.72/m
so far (after only 10 mins or so) both pga seem to give the same number of accepted shares;
Code:
 ICA 0:                | 321.0/355.5Mh/s | A:20 R:0 HW:0 U: 1.62/m
 ICA 1:                | 316.0/359.7Mh/s | A:22 R:0 HW:0 U: 1.80/m
ill report more when its been running longer.

after 13 hours, not a massive difference from before;
Code:
 ICA 0:                | 379.7/372.4Mh/s | A:1476 R:3 HW:0 U: 1.89/m
 ICA 1:                | 379.6/372.8Mh/s | A: 967 R:2 HW:0 U: 1.24/m
donator
Activity: 919
Merit: 1000
July 08, 2012, 03:11:28 PM
An update (Rev 1.2) to the controller is available on http://www.enterpoint.co.uk/cairnsmore/cairnsmore1_support_materials.html. I would not update your board to this unless you have a unit that is underperforming with the twin bitstream. As with all controller updates please be careful that you understand the instructions (Rev 1.1 update) fully before starting and ensure that your power is unlikely to be interrupted. If a controller update goes wrong it is likely that a programming cable will be necessary to perform a unit recovery but do take note of the first line recovery method if first attempt goes wrong another go is possible if unit remains powered.

We will do some more testing and work on the controller this week and there may be further updates.

Whatever revision 1.2+ we reach in the next few days will remain available to support 3rd party bitstream providers longterm but is then likely to be frozen when our own original bitstream becomes available. At this point the controller get a major update to Rev 2.0 signifying the major changes in functionality and all development will be on this branch.

Yohan,

could you maybe provide some information about the changes this controller update contains? When you say it is for the underperforming devices, does this include those that do not generate valid shares at all and/or those failing the golden nonce test?

I'd need to setup a Windows machine (VM is too unstable for that) for programming the controller and I'd at least need to know that it has some potential improvement for the failures I see most.


Thanks, Zefir
hero member
Activity: 686
Merit: 500
July 08, 2012, 03:08:55 PM
your vm show me only 3 units not 4 when I do: xc3sprog –c cm1 –j
sr. member
Activity: 462
Merit: 251
July 08, 2012, 02:57:16 PM
An update (Rev 1.2) to the controller is available on http://www.enterpoint.co.uk/cairnsmore/cairnsmore1_support_materials.html. I would not update your board to this unless you have a unit that is underperforming with the twin bitstream. As with all controller updates please be careful that you understand the instructions (Rev 1.1 update) fully before starting and ensure that your power is unlikely to be interrupted. If a controller update goes wrong it is likely that a programming cable will be necessary to perform a unit recovery but do take note of the first line recovery method if first attempt goes wrong another go is possible if unit remains powered.

We will do some more testing and work on the controller this week and there may be further updates.

Whatever revision 1.2+ we reach in the next few days will remain available to support 3rd party bitstream providers longterm but is then likely to be frozen when our own original bitstream becomes available. At this point the controller get a major update to Rev 2.0 signifying the major changes in functionality and all development will be on this branch.
legendary
Activity: 1378
Merit: 1003
nec sine labore
July 08, 2012, 08:56:49 AM
Heya,

Just got my board up and running (using the twin_test.bit), on MPBM...

All seems to be working well... on com22 and com23 I have it running at 378/377 MH's respectively.  The blockchain the total avg running at 755 MH's. (running for two and a half hours now).

My pool is only registering about 425HM's... no errors are showing up in the log however... no rejects.. and only 1.16% canceled.

Is this in line with best current performance?

Thanks in advance.



Yes, each FPGA mines at 190MH/s when using twin_test.bit and you're using just two of the four FPGAs on your board.

Mpbm and cgminer report twice the real speed when used with default parameters.

You can use --icarus-timing parameter on cgminer and/or ebereon cairnsmore worker for Mpbm to have a more precise reported speed, see messages in this thread starting from a couple of weeks ago.

spiccioli.
newbie
Activity: 18
Merit: 0
July 08, 2012, 07:05:54 AM
Heya,

Just got my board up and running (using the twin_test.bit), on MPBM...

All seems to be working well... on com22 and com23 I have it running at 378/377 MH's respectively.  The blockchain the total avg running at 755 MH's. (running for two and a half hours now).

My pool is only registering about 425HM's... no errors are showing up in the log however... no rejects.. and only 1.16% canceled.

Is this in line with best current performance?

Thanks in advance.

hero member
Activity: 910
Merit: 1000
Items flashing here available at btctrinkets.com
July 07, 2012, 10:56:41 AM
When trying everything I can think of to resolve the slow core issue I've been having it has become almost obvious that Im dealing with a defective unit. Board 62-0158 the second core works over 10x slower that the other one, not reported speed, but shares and rejects.

Things I have done to reach this conclusion are:
Reflashing the bitstream multiple times
Powering off the pc and/or unit
Running the board in a usb2 port, usb3 port and in both ports with an unpowered usb hub.
Setting up a different system for testing (win 7 x64 and windows 95 were used)
Run the boards in separate cgminer instances.

It would appear the problematic core gets more shares when more boards are plugged in, so this might suggest that the cgminer build is reporting shares to incorrect  cores at times.. and if that is the case then I suppose a single core can function alone, like someone asked a few pages back.
hero member
Activity: 910
Merit: 1000
Items flashing here available at btctrinkets.com
July 07, 2012, 06:18:42 AM
One thing we have found is that a small percentage of the USB cables we supply have been faulty and may explain a few of the coms failures. We are changing our testing to include the cable that ships with the unit rather than assuming the externally supplied cable is ok. Some of you may have faulty USB cables and if possible you should try problem units with a different cable.

On USB hubs we just got a few USB hubs for our test setup and also so that we can start to put together a recommended list. Some of the cheaper ones do have an inadequate power supply e.g. 1A for 10 ports. However one that we do have http://www.ebuyer.com/123772-d-link-dub-h7-usb2-0-7-port-hub-dub-h7-b looks very good on paper and I will update you on real use of this particular hub when we do some testing.
Thank you on this bit, I will go trough any available cables I have and see if this could be the case for me and my slow-core issue.
[edit]
First results are inconsistent, I dubbed the cables 1,2&3.
1,2&3 plugged has a slow core.
1,2 plugged has a slow core.
1,3 plugged works fine.
2,3 plugged has a slow core.
Adding a random similiar usb cable to the mix (4)
1,3,4 has a slow core.
sr. member
Activity: 462
Merit: 251
July 07, 2012, 05:30:06 AM
One thing we have found is that a small percentage of the USB cables we supply have been faulty and may explain a few of the coms failures. We are changing our testing to include the cable that ships with the unit rather than assuming the externally supplied cable is ok. Some of you may have faulty USB cables and if possible you should try problem units with a different cable.

On USB hubs we just got a few USB hubs for our test setup and also so that we can start to put together a recommended list. Some of the cheaper ones do have an inadequate power supply e.g. 1A for 10 ports. However one that we do have http://www.ebuyer.com/123772-d-link-dub-h7-usb2-0-7-port-hub-dub-h7-b looks very good on paper and I will update you on real use of this particular hub when we do some testing.
hero member
Activity: 910
Merit: 1000
Items flashing here available at btctrinkets.com
July 06, 2012, 12:44:20 PM
After a permanent flas of the twin_test bitstream and plugging all 3 of my boards in to separate usb 3 ports my. Last core on cgminer is slow problem still persists.
sr. member
Activity: 327
Merit: 250
July 06, 2012, 12:14:23 PM
I can not believe that the differences are so great with 50 boards. Enterpoint stated that the problem is the controller firmware but why these differences?
Most probably it is a controller issue, or it could even be on SW side and solvable with cgminer updates.

Since boards were tested in shipping-test mode at Enterpoint before delivery, I should re-program my batch and repeat the test. But since this will eat up one additional weekend, I prefer to wait for a better bitstream (hoping for the next week).


Crap I hope my 23 boards work when they arrive. On the other hand I'm sure enterpoint will take back the non working boards and send you new ones.
I guess only those 3 boards that fail to get detected might need to be RMAd, while all the other should be fixable with new FW. No doubt defunct units will get replaced, like already happened in this thread.


Can you send as much information into the bitcoin support email so that everyone relevant gets to see the problem. Outside what's already been mentioned on the forum, and thanks to all that have already forwarded into our support, we have not had any reports like this come in. So as yet we don't have a big statistical base to work on. We also haven't had a faulty unit arrive back to us yet to analyse. One I think is on it's way. So bear with us for a few days whilst we get enough information to give us a clue where to look. Probably the first thing to do is to try and create a model setup as similar to Zefir's as we can. We do think there is more than one aspect here and elements of software and firmware are probably the main place to look. However we can't rule out a hardware failure so that also needs to be part of the analysis. From what we have seen here on the line USB failures are very low and all down real identifable things like solder shorts.





I'm sending mine back as soon as I can, I just got the email today with the information to get that going. Id love to know what was wrong with it as well, since it will find 1-2 shares then stop hashing altogether. Ive tried everything and reprogrammed it a good 50 times with no luck.

The new one you sent me has been working at 370mhash for the last 3 nights without any problems. Thats running on the latest cgminer in Linux.

Doff
hero member
Activity: 910
Merit: 1000
Items flashing here available at btctrinkets.com
July 06, 2012, 06:49:56 AM
The problem with me having one core working much slower that the other ones is back.
I am running 3 boards with an unpowered usb-hub at an usb3 port (the boards do not even detect correctly in regular usb ports). Running the twin_test bitstream, with the correct dipswich positions used while flashing and operating, as indicated in your pdf. The miner I am using is yor cgminer_twintest.exe I am starting it with the command cgminer_twintest -o mint.bitminter.com:8332 -u USR -p PASS --disable-gpu -S noauto -S\\.\COM22 -S \\.\COM23 -S \\.\COM26 -S \\.\COM27 -S \\.\COM30 -S \\.\COM31

Any other relevant information that would help identifying this, just ask.
hero member
Activity: 910
Merit: 1000
Items flashing here available at btctrinkets.com
July 06, 2012, 06:22:35 AM
Code:
xc3sprog –c cm1 –p 0 –Ixc6lx150.bit twin_test.bit
does not work, but this time it's complaning about a filename issue to me. My initial hunch is that since the board im punishing is #108 the firmware filename on it has changed along the way. I'll re-read up on this , of how I wsh the search function on this forum worked better Smiley .. but for now Im going with the temporary flash.

Strange...
When you type a "ls -l" do you see the files in the directory? And make sure you use the correct filename, the filename in UPPERCASE is XC6LX150.bit, I did a typo the first times, so I used xc61 (number one here! but it is a small "L")...

Here is my directory (ls -l) after copy the files from usb stick:


If you have the same, the commandline should work.

good luck!
[Edit]
Recieved clarification on this, thank you Slipbye ...flashing in progress now

I decided to get back to permanently flashing my boards, my problem appears to be that the file xc6lx150.bit is not there, am I correct in assuming that copying it in there with the usb stick will result in success ?
...and does this mean that the virtual machine requires the bitstream that will be removed as a "reference point" so it can permanentley flash in a new one ?
donator
Activity: 543
Merit: 500
July 06, 2012, 06:00:50 AM
On a totally different point how are you guys finding the cooling system performance? We are interested in feedback on that as well.
That's hard to tell unless the board is operating at full (or at least high) speed, isn't it?
hero member
Activity: 910
Merit: 1000
Items flashing here available at btctrinkets.com
July 06, 2012, 05:31:07 AM
I am now testing the bitminter beta java client with all 3 of my boards, with the twin_test bitstream. I made a separate worker for it. Hopefully I'll hav something nice to report later on.


Tons of errors and the test ending in one board dissapearing and requiring a reflash, i dumped the log to drharibo, but I doubt it will be usefull to him before enterpoint rolls out a reliable bitstream.
hero member
Activity: 910
Merit: 1000
Items flashing here available at btctrinkets.com
July 06, 2012, 05:15:06 AM
I am now testing the bitminter beta java client with all 3 of my boards, with the twin_test bitstream. I made a separate worker for it. Hopefully I'll hav something nice to report later on.

sr. member
Activity: 462
Merit: 251
July 06, 2012, 05:15:00 AM
On a totally different point how are you guys finding the cooling system performance? We are interested in feedback on that as well.
Pages:
Jump to: