Ah ha!
So, after running for about 36 hours, I was going to tweak the settings on one of the miners, but CTRL-C didn't do anything. I noticed that all the Mhash/sec figures were frozen. I rebooted, tried to start the miner again, and the machine froze up for a bit, and then syslog had this to say:
Oct 26 15:34:03 kernel: [ 690.520008] [fglrx] ASIC hang happened
Oct 26 15:34:03 kernel: [ 690.520020] Pid: 9047, comm: clinfo Tainted: P 2.6.38-12-generic #51-Ubuntu
Oct 26 15:34:03 kernel: [ 690.520026] Call Trace:
Oct 26 15:34:03 kernel: [ 690.520117] [] ? KCL_DEBUG_OsDump+0xe/0x10 [fglrx]
Oct 26 15:34:03 kernel: [ 690.520196] [] ? firegl_hardwareHangRecovery+0x1c/0x50 [fglrx]
Oct 26 15:34:03 kernel: [ 690.520325] [] ? _ZN4Asic9WaitUntil15ResetASICIfHungEv+0x9/0x10 [fglrx]
Oct 26 15:34:03 kernel: [ 690.520448] [] ? _ZN4Asic9WaitUntil15WaitForCompleteEv+0x6c/0xb0 [fglrx]
Oct 26 15:34:03 kernel: [ 690.520572] [] ? _ZN19mmEngineR600_DRMDMA4idleEv+0x72/0xc0 [fglrx]
Oct 26 15:34:03 kernel: [ 690.520693] [] ? _ZN14CMMHeapManager22freeAllExpiredTSMemoryEj+0x64/0xe0 [fglrx]
Oct 26 15:34:03 kernel: [ 690.520816] [] ? _ZN18mmEnginesContainer4idleEv+0x46/0x60 [fglrx]
Oct 26 15:34:03 kernel: [ 690.520935] [] ? _ZN15QS_PRIVATE_CORE7idleAllE15idle_WaitMethod+0x2d/0x40 [fglrx]
Oct 26 15:34:03 kernel: [ 690.521050] [] ? _ZN3MSF19doGarbageCollectionEv+0x35/0x260 [fglrx]
Oct 26 15:34:03 kernel: [ 690.521061] [] ? down+0x2e/0x50
Oct 26 15:34:03 kernel: [ 690.521127] [] ? KCL_SPINLOCK_STATIC_Release+0x16/0x20 [fglrx]
Oct 26 15:34:03 kernel: [ 690.521213] [] ? firegl_cmmqs_ProcessTerminate+0x32/0xc0 [fglrx]
Oct 26 15:34:03 kernel: [ 690.521287] [] ? firegl_release_helper+0x3a8/0x6c0 [fglrx]
Oct 26 15:34:03 kernel: [ 690.521362] [] ? firegl_release+0x60/0x1c0 [fglrx]
Oct 26 15:34:03 kernel: [ 690.521426] [] ? ip_firegl_release+0x11/0x20 [fglrx]
Oct 26 15:34:03 kernel: [ 690.521436] [] ? __fput+0xbe/0x200
Oct 26 15:34:03 kernel: [ 690.521444] [] ? fput+0x25/0x30
Oct 26 15:34:03 kernel: [ 690.521451] [] ? filp_close+0x60/0x90
Oct 26 15:34:03 kernel: [ 690.521461] [] ? put_files_struct+0x88/0xf0
Oct 26 15:34:03 kernel: [ 690.521469] [] ? exit_files+0x54/0x70
Oct 26 15:34:03 kernel: [ 690.521477] [] ? do_exit+0x175/0x410
Oct 26 15:34:03 kernel: [ 690.521549] [] ? drm_free+0xf3/0x180 [fglrx]
Oct 26 15:34:03 kernel: [ 690.521558] [] ? do_group_exit+0x58/0xd0
Oct 26 15:34:03 kernel: [ 690.521566] [] ? get_signal_to_deliver+0x247/0x410
Oct 26 15:34:03 kernel: [ 690.521650] [] ? firegl_cmmqs_CWDDE32+0x0/0x100 [fglrx]
Oct 26 15:34:03 kernel: [ 690.521658] [] ? do_signal+0x56/0x180
Oct 26 15:34:03 kernel: [ 690.521723] [] ? ip_firegl_unlocked_ioctl+0xe/0x20 [fglrx]
Oct 26 15:34:03 kernel: [ 690.521733] [] ? do_vfs_ioctl+0x8f/0x360
Oct 26 15:34:03 kernel: [ 690.521740] [] ? do_notify_resume+0x65/0x80
Oct 26 15:34:03 kernel: [ 690.521748] [] ? sys_ioctl+0x91/0xa0
Oct 26 15:34:03 kernel: [ 690.521754] [] ? int_signal+0x12/0x17
Oct 26 15:34:03 kernel: [ 690.521765] pubdev:0xffffffffa09b03c0, num of device:3 , name:fglrx, major 8, minor 86.
Oct 26 15:34:03 kernel: [ 690.521772] device 0 : 0xffff880144430000 .
Oct 26 15:34:03 kernel: [ 690.521778] Asic ID:0x689c, revision:0x2, MMIOReg:0xffffc90011140000.
Oct 26 15:34:03 kernel: [ 690.521784] FB phys addr: 0xc0000000, MC :0xf00000000, Total FB size :0x40000000.
Oct 26 15:34:03 kernel: [ 690.521791] gart table MC:0xf0f91f000, Physical:0xcf91f000, size:0x3e0000.
Oct 26 15:34:03 kernel: [ 690.521798] mc_node :FB, total 1 zones
Oct 26 15:34:03 kernel: [ 690.521803] MC start:0xf00000000, Physical:0xc0000000, size:0xfd00000.
Oct 26 15:34:03 kernel: [ 690.521811] Mapped heap -- Offset:0x0, size:0xf91f000, reference count:16, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.521818] Mapped heap -- Offset:0x0, size:0x1000000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.521825] Mapped heap -- Offset:0xf91f000, size:0x3e1000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.521832] mc_node :INV_FB, total 1 zones
Oct 26 15:34:03 kernel: [ 690.521837] MC start:0xf0fd00000, Physical:0xcfd00000, size:0x30300000.
Oct 26 15:34:03 kernel: [ 690.521844] Mapped heap -- Offset:0x302f4000, size:0xc000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.521851] mc_node :GART_USWC, total 2 zones
Oct 26 15:34:03 kernel: [ 690.521856] MC start:0x3e750000, Physical:0x0, size:0x4d800000.
Oct 26 15:34:03 kernel: [ 690.521863] Mapped heap -- Offset:0x30000, size:0x2000000, reference count:14, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.521869] mc_node :GART_CACHEABLE, total 3 zones
Oct 26 15:34:03 kernel: [ 690.521875] MC start:0x10400000, Physical:0x0, size:0x2e350000.
Oct 26 15:34:03 kernel: [ 690.521881] Mapped heap -- Offset:0x2600000, size:0x100000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.521889] Mapped heap -- Offset:0x1400000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.521897] Mapped heap -- Offset:0xb00000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.521904] Mapped heap -- Offset:0x200000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.521911] Mapped heap -- Offset:0x0, size:0x200000, reference count:7, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.521919] Mapped heap -- Offset:0xef000, size:0x11000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.521928] GRBM : 0x3828, SRBM : 0x200000c0 .
Oct 26 15:34:03 kernel: [ 690.521937] CP_RB_BASE : 0x3e7800, CP_RB_RPTR : 0x19dc0 , CP_RB_WPTR :0x19dc0.
Oct 26 15:34:03 kernel: [ 690.521946] CP_IB1_BUFSZ:0x0, CP_IB1_BASE_HI:0x0, CP_IB1_BASE_LO:0x3eaa8000.
Oct 26 15:34:03 kernel: [ 690.521953] last submit IB buffer -- MC :0x3eaa8000,phys:0x131ebc000.
Oct 26 15:34:03 kernel: [ 690.521961] device 1 : 0xffff880145b14000 .
Oct 26 15:34:03 kernel: [ 690.521967] Asic ID:0x689c, revision:0x2, MMIOReg:0xffffc90011180000.
Oct 26 15:34:03 kernel: [ 690.521973] FB phys addr: 0xb0000000, MC :0xf00000000, Total FB size :0x40000000.
Oct 26 15:34:03 kernel: [ 690.521979] gart table MC:0xf0f91f000, Physical:0xbf91f000, size:0x3e0000.
Oct 26 15:34:03 kernel: [ 690.521985] mc_node :FB, total 1 zones
Oct 26 15:34:03 kernel: [ 690.521990] MC start:0xf00000000, Physical:0xb0000000, size:0xfd00000.
Oct 26 15:34:03 kernel: [ 690.521997] Mapped heap -- Offset:0x0, size:0xf91f000, reference count:10, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522004] Mapped heap -- Offset:0x0, size:0x1000000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522010] Mapped heap -- Offset:0xf91f000, size:0x3e1000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522017] mc_node :INV_FB, total 1 zones
Oct 26 15:34:03 kernel: [ 690.522022] MC start:0xf0fd00000, Physical:0xbfd00000, size:0x30300000.
Oct 26 15:34:03 kernel: [ 690.522028] Mapped heap -- Offset:0x302f4000, size:0xc000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522034] mc_node :GART_USWC, total 2 zones
Oct 26 15:34:03 kernel: [ 690.522039] MC start:0x3e750000, Physical:0x0, size:0x4d800000.
Oct 26 15:34:03 kernel: [ 690.522045] Mapped heap -- Offset:0x30000, size:0x2000000, reference count:10, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522052] mc_node :GART_CACHEABLE, total 3 zones
Oct 26 15:34:03 kernel: [ 690.522057] MC start:0x10400000, Physical:0x0, size:0x2e350000.
Oct 26 15:34:03 kernel: [ 690.522063] Mapped heap -- Offset:0x1d00000, size:0x900000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522070] Mapped heap -- Offset:0x1400000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522077] Mapped heap -- Offset:0xb00000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522084] Mapped heap -- Offset:0x200000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522091] Mapped heap -- Offset:0x0, size:0x200000, reference count:4, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522098] Mapped heap -- Offset:0xef000, size:0x11000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522106] GRBM : 0x3828, SRBM : 0x20000ac0 .
Oct 26 15:34:03 kernel: [ 690.522113] CP_RB_BASE : 0x3e7800, CP_RB_RPTR : 0x5b0 , CP_RB_WPTR :0x5b0.
Oct 26 15:34:03 kernel: [ 690.522121] CP_IB1_BUFSZ:0x0, CP_IB1_BASE_HI:0x0, CP_IB1_BASE_LO:0x3e8df000
Oct 26 15:34:03 kernel: [ 690.522127] last submit IB buffer -- MC :0x3e8df000,phys:0x12ffd9000.
Oct 26 15:34:03 kernel: [ 690.522135] device 2 : 0xffff880145b08000 .
Oct 26 15:34:03 kernel: [ 690.522140] Asic ID:0x9440, revision:0x2, MMIOReg:0xffffc900111c0000.
Oct 26 15:34:03 kernel: [ 690.522146] FB phys addr: 0xd0000000, MC :0xf00000000, Total FB size :0x40000000.
Oct 26 15:34:03 kernel: [ 690.522152] gart table MC:0xf0fc1f000, Physical:0xdfc1f000, size:0x3e0000.
Oct 26 15:34:03 kernel: [ 690.522158] mc_node :FB, total 1 zones
Oct 26 15:34:03 kernel: [ 690.522162] MC start:0xf00000000, Physical:0xd0000000, size:0x10000000.
Oct 26 15:34:03 kernel: [ 690.522169] Mapped heap -- Offset:0x0, size:0xfc1f000, reference count:11, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522176] Mapped heap -- Offset:0x0, size:0x1000000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522183] Mapped heap -- Offset:0xfc1f000, size:0x3e1000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522189] mc_node :INV_FB, total 1 zones
Oct 26 15:34:03 kernel: [ 690.522194] MC start:0xf10000000, Physical:0xe0000000, size:0x30000000.
Oct 26 15:34:03 kernel: [ 690.522201] Mapped heap -- Offset:0x2fffd000, size:0x3000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522207] mc_node :GART_USWC, total 2 zones
Oct 26 15:34:03 kernel: [ 690.522211] MC start:0x3e750000, Physical:0x0, size:0x4d800000.
Oct 26 15:34:03 kernel: [ 690.522218] Mapped heap -- Offset:0x30000, size:0x2000000, reference count:6, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522224] mc_node :GART_CACHEABLE, total 3 zones
Oct 26 15:34:03 kernel: [ 690.522229] MC start:0x10400000, Physical:0x0, size:0x2e350000.
Oct 26 15:34:03 kernel: [ 690.522235] Mapped heap -- Offset:0x1d00000, size:0x900000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522242] Mapped heap -- Offset:0x1400000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522249] Mapped heap -- Offset:0xb00000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522256] Mapped heap -- Offset:0x200000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522263] Mapped heap -- Offset:0x0, size:0x200000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522270] Mapped heap -- Offset:0xef000, size:0x11000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [ 690.522278] GRBM : 0x3028, SRBM : 0x200000c0 .
Oct 26 15:34:03 kernel: [ 690.522284] CP_RB_BASE : 0x3e7800, CP_RB_RPTR : 0x330 , CP_RB_WPTR :0x330.
Oct 26 15:34:03 kernel: [ 690.522291] CP_IB1_BUFSZ:0x0, CP_IB1_BASE_HI:0x0, CP_IB1_BASE_LO:0x3e8bc000.
Oct 26 15:34:03 kernel: [ 690.522297] last submit IB buffer -- MC :0x3e8bc000,phys:0x12daa5000.
Oct 26 15:34:03 kernel: [ 690.522303] Dump the trace queue.
Oct 26 15:34:03 kernel: [ 690.522307] End of dump
I've got a gigabyte board with 4 GB ram and 1x 5970 and 1x4830. The MB, ram, CPU are all new. I am also using a new SSD drive for this box.
I think I'll pull the 4830 and see if the problem persists.
I "fixed" it again by rebooting, running aticonfig -f --initial --adapter=all, rebooting and it plugged away for a bit more before freezing again.
One other thought, I'm running this with the latest updates in the 11.04 ubuntu tree, including the most recent kernel.
Thoughts?
*Edit*
I should have listed my clock settings, which I pulled from the mining hardware page. I have both the 5970 and the 4830 set to 850/300 for core/memory. In addition to trying to pull the 4830, I'll try mining at stock clocks to see if I can reproduce.