Antminer T17/S17 Temp Sensor problem discussion.

btcshop4u

brand new

Activity: 0

Merit: 0

we are giving special Discount on purchasing in bitcoin and we also give you discount on purchasing item
Find the Below link:

BTC https://btcshop4u.com/product-category/mining-machine/

mikeywith

legendary

Activity: 2478

Merit: 6693

be constructive or S.T.F.U

Quote from: Pendrak on May 04, 2020, 10:23:16 AM

EDIT: i just found info on the net that lot of people have problems with the heatsinks falling in this units, i guess was not the underclock autotuning.

I confirm, in fact you didn't have to go so far, reading this section alone, you will notice that the majority of the problems reported happen because a losen heatsink/chip, it's almost always the case ignoring the obvious PSU/Network issues.

Quote

PSU is good, need go to my office (where my soldering equipment is) to put the heatsinks back to the bad board and see if it work, i will salvage the sensors from the fry board to fix the other one...

Before you do that, I suggest you apply some pressure on all other heat sinks (from one of the sides) just as if you are trying to remove them, by doing so, you will be able to take out all the heatsinks that are due to fall either way, so it's better to fix them all at once than having to come back to the same issue a week or two later.

Pendrak

member

Activity: 208

Merit: 46

I try recovery and nothing.

I remove one board that was not working and leave 2 board, one was working fine but the other got problems, i have to unplug the miner 3 times before it work.

I install vinish firmware to see what happen and it work, somethimes report 1 sensor somethimes 3 sensors and other times all the 4, was random but still it work until i try to underclock (autotuning), in the process from the board with temp sensor problems 2 heatsink moved and one fall to the good board and make a short, the good board was destroyed, burned, kaboon... shit happens Cry

PSU is good, need go to my office (where my soldering equipment is) to put the heatsinks back to the bad board and see if it work, i will salvage the sensors from the fry board to fix the other one...

Right now i have the equipment with the board with only 2 chips, vinish software have no problems detecting the board and is working fine, somethimes 1 sensor disappear, i just reset and voila, sensor is back.

There is always one sensor that report more temperature than others, the software always take the report from the sensor working, lets say this is the temperature report:

60-79-55-62

If the board boot up with only the 55 sensor working, the software take as reference that temperature, lets say you configure the max temperature to 77, if the sensor that show 55 go to 77 °C the sensor that show 79°C will go to 90°C or high and i guess this happen with my equipment.

By the way the fans of the latest equipment from bitmain are shit, brand nidec, they are loud and move lot less air (1.6 amp), i have to change then to delta ones.

Sorry for my english.

EDIT: i just found info on the net that lot of people have problems with the heatsinks falling in this units, i guess was not the underclock autotuning.

mikeywith

legendary

Activity: 2478

Merit: 6693

be constructive or S.T.F.U

Quote from: jacktman on May 03, 2020, 01:10:06 PM

[...]

I am happier to know you fixed your miner than knowing I was right Grin

, all credit goes to zeusbtc for taking the time to write the article, the owner of that company is very helpful and very friendly and never hesitates to help, so really I shouldn't take any credit for his work.

Quote from: PopoJeff on May 03, 2020, 01:35:14 PM

I reflashed the firmware on the new problem unit, and it has been running all 3 boards properly for 2 hours 28 minutes and counting. Fingers crossed 🤞

Good to know you got yours fixed too, hope it stays this way.

So now as far as the temp sensor issue is concerned here are my recommendations to fix it.

1- Flash the recovery firmware using a Sdcard, and then flash the latest firmware.

2-If 1 fails, your next suspect is the PSU, check the screws, tighten them well, clean any dust and try.

3-If 2 fails, try a different PSU because there is a possibility the PSU is bad and cleaning it won't do any good.

4-if all the above fails, then it's safe to assume that one of the chips on the hashboard is bad, it's usually the first chip, refer to this image to identify the chip, sometimes the heatsink is a bit loose and by pressing it down ( applying some pressure using your fingers) could fix the problem, if not then you will need to remove it and glue it back.

PopoJeff

full member

Activity: 414

Merit: 182

S17+ 70

3:29 elapsed now

3:40 elapsed... dropped the board

Code:

 2020-05-03 15:52:17 driver-btm-api.c:2577:bitmain_soc_init: Init done!
2020-05-03 15:52:17 driver-btm-api.c:216:set_miner_status: STATUS_INIT
2020-05-03 15:52:23 driver-btm-api.c:216:set_miner_status: STATUS_OKAY
2020-05-03 15:52:24 frequency.c:205:get_ideal_hash_rate_GH: ideal_hash_rate = 70780
2020-05-03 15:52:24 frequency.c:223:get_sale_hash_rate_GH: sale_hash_rate = 70000
2020-05-03 15:52:27 driver-btm-api.c:1458:dhash_chip_send_job: Version num 4.
2020-05-03 15:52:27 driver-btm-api.c:1606:dhash_chip_send_job: stime.tv_sec 1588521147, block_ntime 1588521135
2020-05-03 16:22:29 thread.c:257:calc_hashrate_avg: avg rate is 72271.06 in 30 mins
2020-05-03 16:22:29 temperature.c:516:temp_statistics_show:   pcb temp 42~66  chip temp 61~79
2020-05-03 16:52:31 thread.c:257:calc_hashrate_avg: avg rate is 72094.89 in 30 mins
2020-05-03 16:52:31 temperature.c:516:temp_statistics_show:   pcb temp 43~67  chip temp 62~80
2020-05-03 17:22:33 thread.c:257:calc_hashrate_avg: avg rate is 72515.29 in 30 mins
2020-05-03 17:22:33 temperature.c:516:temp_statistics_show:   pcb temp 42~66  chip temp 61~80
2020-05-03 17:52:35 thread.c:257:calc_hashrate_avg: avg rate is 72144.32 in 30 mins
2020-05-03 17:52:35 temperature.c:516:temp_statistics_show:   pcb temp 42~66  chip temp 60~80
2020-05-03 18:22:37 thread.c:257:calc_hashrate_avg: avg rate is 71985.06 in 30 mins
2020-05-03 18:22:37 temperature.c:516:temp_statistics_show:   pcb temp 44~68  chip temp 61~80
2020-05-03 18:52:39 thread.c:257:calc_hashrate_avg: avg rate is 72061.94 in 30 mins
2020-05-03 18:52:39 temperature.c:516:temp_statistics_show:   pcb temp 43~66  chip temp 61~78
2020-05-03 19:22:41 thread.c:257:calc_hashrate_avg: avg rate is 72222.42 in 30 mins
2020-05-03 19:22:41 temperature.c:516:temp_statistics_show:   pcb temp 44~67  chip temp 61~79
2020-05-03 19:29:10 thread.c:976:asic_status_monitor_thread: ERROR: chain 0 get hashrate_reg_counter 7, require 65, failed times 1: ooooo ooxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
2020-05-03 19:29:12 thread.c:996:asic_status_monitor_thread: chain 0 can't get enough hashrate reg val for 0 times.
2020-05-03 19:29:12 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 0, chip = 14, reg = 0
2020-05-03 19:29:12 thread.c:976:asic_status_monitor_thread: ERROR: chain 0 get hashrate_reg_counter 7, require 65, failed times 1: ooooo ooxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
2020-05-03 19:29:12 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 0, chip = 14, reg = 1
2020-05-03 19:29:13 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 1, chip = 10, reg = 0
2020-05-03 19:29:13 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 1, chip = 10, reg = 1
2020-05-03 19:29:13 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 2, chip = 54, reg = 0
2020-05-03 19:29:14 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 2, chip = 54, reg = 1
2020-05-03 19:29:14 thread.c:996:asic_status_monitor_thread: chain 0 can't get enough hashrate reg val for 1 times.
2020-05-03 19:29:14 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 50, reg = 0
2020-05-03 19:29:14 thread.c:976:asic_status_monitor_thread: ERROR: chain 0 get hashrate_reg_counter 7, require 65, failed times 1: ooooo ooxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
2020-05-03 19:29:14 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 50, reg = 1
2020-05-03 19:29:15 temperature.c:865:get_temp_info: ERROR: chain 0 can get NONE temp info or temp value abnormal, power it off 
2020-05-03 19:29:16 thread.c:996:asic_status_monitor_thread: chain 0 can't get enough hashrate reg val for 2 times.
2020-05-03 19:29:16 thread.c:976:asic_status_monitor_thread: ERROR: chain 0 get hashrate_reg_counter 7, require 65, failed times 1: ooooo ooxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
2020-05-03 19:29:17 frequency.c:205:get_ideal_hash_rate_GH: ideal_hash_rate = 46420
2020-05-03 19:29:17 frequency.c:223:get_sale_hash_rate_GH: sale_hash_rate = 46000

Edit: Rebooted once after the error. And we're up and running.
So, to summarize....... after a dozen or so reboots, and factory reset, still randomly threw the error and dropped the board. Reflashed the firmware and it ran fine almost 4hrs til it threw the error. Did one reboot after that error, and it's running real nice 15+ hours and counting now.

(And maybe I should start looking to miners other than Bitmain for less headaches)

Edit: ran 20 hrs til it dropped the board this time.

philipma1957

legendary

Activity: 4382

Merit: 9330

'The right to privacy matters'

is yours a t17+?

PopoJeff

full member

Activity: 414

Merit: 182

I reflashed the firmware on the new problem unit, and it has been running all 3 boards properly for 2 hours 28 minutes and counting. Fingers crossed 🤞

jacktman

newbie

Activity: 6

Merit: 9

Hello mikeywith.
I have to say, you were right. Grin

I've already solved this problem.

The problem was at PSU, It was so dirty in the air intake where the fans are, that it couldn't get any fresh air in, It was completely covered with dirt. I guess the PSU components overheated and sent the wrong voltage to the hashboards

After cleaning the PSU my T17 works correctly again.

Thanks!!!

PopoJeff

full member

Activity: 414

Merit: 182

I'm having the same issue. S17+ 70th arrived Friday. Unboxed and hooked it up and it ran fine for almost a day. Saturday afternoon, I notice its only running 40th. It's dropping a board, after a temp sensor failure-to-read error.

I've rebooted it a bunch of times and did a reset on it. It's still doing the same thing. I'll probably try a re-install of the firmware today. If that doesnt work, I'll just run it at 40th on 2 boards til the halving, then send it back for repair/replacement. It was an "extra" miner that I got, paid for almost entirely with account credit.

Here, you can see at the bottom of the log, it restarts fine and runs 70th.... then after a time period (sometimes 10 seconds, sometimes 20 minutes), it drops a board after not detecting a temp sensor.

Code:



Booting Linux on physical CPU 0x0
Linux version 4.6.0-xilinx-gff8137b-dirty (lzq@armdev2) (gcc version 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-23) ) #25 SMP PREEMPT Fri Nov 23 15:30:52 CST 2018
CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=18c5387d
CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
Machine model: Xilinx Zynq
cma: Reserved 16 MiB at 0x0e000000
Memory policy: Data cache writealloc
On node 0 totalpages: 61440
free_area_init_node: node 0, pgdat c0b39280, node_mem_map cde10000
  Normal zone: 480 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 61440 pages, LIFO batch:15
percpu: Embedded 12 pages/cpu @cddf1000 s19776 r8192 d21184 u49152
pcpu-alloc: s19776 r8192 d21184 u49152 alloc=12*4096
pcpu-alloc: [0] 0 [0] 1 
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 60960
Kernel command line: mem=240M console=ttyPS0,115200 ramdisk_size=33554432 root=/dev/ram rw earlyprintk
PID hash table entries: 1024 (order: 0, 4096 bytes)
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
Memory: 203444K/245760K available (6345K kernel code, 231K rwdata, 1896K rodata, 1024K init, 223K bss, 25932K reserved, 16384K cma-reserved, 0K highmem)
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
    vmalloc : 0xcf800000 - 0xff800000   ( 768 MB)
    lowmem  : 0xc0000000 - 0xcf000000   ( 240 MB)
    pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
    modules : 0xbf000000 - 0xbfe00000   (  14 MB)
      .text : 0xc0008000 - 0xc090c424   (9234 kB)
      .init : 0xc0a00000 - 0xc0b00000   (1024 kB)
      .data : 0xc0b00000 - 0xc0b39fe0   ( 232 kB)
       .bss : 0xc0b39fe0 - 0xc0b71c28   ( 224 kB)
Preemptible hierarchical RCU implementation.
	Build-time adjustment of leaf fanout to 32.
	RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
RCU: Adjusting geometry for rcu_fanout_leaf=32, nr_cpu_ids=2
NR_IRQS:16 nr_irqs:16 16
efuse mapped to cf800000
ps7-slcr mapped to cf802000
L2C: platform modifies aux control register: 0x72360000 -> 0x72760000
L2C: DT/platform modifies aux control register: 0x72360000 -> 0x72760000
L2C-310 erratum 769419 enabled
L2C-310 enabling early BRESP for Cortex-A9
L2C-310 full line of zeros enabled for Cortex-A9
L2C-310 ID prefetch enabled, offset 1 lines
L2C-310 dynamic clock gating enabled, standby mode enabled
L2C-310 cache controller enabled, 8 ways, 512 kB
L2C-310: CACHE_ID 0x410000c8, AUX_CTRL 0x76760001
zynq_clock_init: clkc starts at cf802100
Zynq clock init
sched_clock: 64 bits at 333MHz, resolution 3ns, wraps every 4398046511103ns
clocksource: arm_global_timer: mask: 0xffffffffffffffff max_cycles: 0x4ce07af025, max_idle_ns: 440795209040 ns
Switching to timer-based delay loop, resolution 3ns
clocksource: ttc_clocksource: mask: 0xffff max_cycles: 0xffff, max_idle_ns: 537538477 ns
ps7-ttc #0 at cf80a000, irq=18
Console: colour dummy device 80x30
Calibrating delay loop (skipped), value calculated using timer frequency.. 666.66 BogoMIPS (lpj=3333333)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
CPU: Testing write buffer coherency: ok
CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
Setting up static identity map for 0x100000 - 0x100058
CPU1: failed to boot: -1
Brought up 1 CPUs
SMP: Total of 1 processors activated (666.66 BogoMIPS).
CPU: All CPU(s) started in SVC mode.
devtmpfs: initialized
VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 4
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
pinctrl core: initialized pinctrl subsystem
NET: Registered protocol family 16
DMA: preallocated 256 KiB pool for atomic coherent allocations
cpuidle: using governor menu
hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers.
hw-breakpoint: maximum watchpoint size is 4 bytes.
zynq-ocm f800c000.ps7-ocmc: ZYNQ OCM pool: 256 KiB @ 0xcf880000
vgaarb: loaded
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
media: Linux media interface: v0.10
Linux video capture interface: v2.00
pps_core: LinuxPPS API ver. 1 registered
pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti 
PTP clock support registered
EDAC MC: Ver: 3.0.0
Advanced Linux Sound Architecture Driver Initialized.
clocksource: Switched to clocksource arm_global_timer
NET: Registered protocol family 2
TCP established hash table entries: 2048 (order: 1, 8192 bytes)
TCP bind hash table entries: 2048 (order: 2, 16384 bytes)
TCP: Hash tables configured (established 2048 bind 2048)
UDP hash table entries: 256 (order: 1, 8192 bytes)
UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
PCI: CLS 0 bytes, default 64
Trying to unpack rootfs image as initramfs...
rootfs image is not initramfs (no cpio magic); looks like an initrd
Freeing initrd memory: 12892K (cce6a000 - cdb01000)
hw perfevents: enabled with armv7_cortex_a9 PMU driver, 7 counters available
futex hash table entries: 512 (order: 3, 32768 bytes)
workingset: timestamp_bits=28 max_order=16 bucket_order=0
jffs2: version 2.2. (NAND) (SUMMARY)  © 2001-2006 Red Hat, Inc.
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
dma-pl330 f8003000.ps7-dma: Loaded driver for PL330 DMAC-241330
dma-pl330 f8003000.ps7-dma: 	DBUFF-128x8bytes Num_Chans-8 Num_Peri-4 Num_Events-16
e0000000.serial: ttyPS0 at MMIO 0xe0000000 (irq = 158, base_baud = 6249999) is a xuartps
console [ttyPS0] enabled
xdevcfg f8007000.ps7-dev-cfg: ioremap 0xf8007000 to cf86e000
[drm] Initialized drm 1.1.0 20060810
brd: module loaded
loop: module loaded
CAN device driver interface
gpiod_set_value: invalid GPIO
libphy: MACB_mii_bus: probed
macb e000b000.ethernet eth0: Cadence GEM rev 0x00020118 at 0xe000b000 irq 31 (00:0a:35:00:00:00)
Generic PHY e000b000.etherne:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=e000b000.etherne:00, irq=-1)
e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci-pci: EHCI PCI platform driver
usbcore: registered new interface driver usb-storage
mousedev: PS/2 mouse device common for all mice
i2c /dev entries driver
Xilinx Zynq CpuIdle Driver started
sdhci: Secure Digital Host Controller Interface driver
sdhci: Copyright(c) Pierre Ossman
sdhci-pltfm: SDHCI platform and OF driver helper
mmc0: SDHCI controller on e0100000.ps7-sdio [e0100000.ps7-sdio] using ADMA
ledtrig-cpu: registered to indicate activity on CPUs
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
nand: Micron MT29F2G08ABAGAWP
nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128
nand: WARNING: pl35x-nand: the ECC used on your system is too weak compared to the one required by the NAND chip
Bad block table found at page 131008, version 0x01
Bad block table found at page 130944, version 0x01
6 ofpart partitions found on MTD device pl35x-nand
Creating 6 MTD partitions on "pl35x-nand":
0x000000000000-0x000002800000 : "BOOT.bin-env-dts-kernel"
0x000002800000-0x000004800000 : "ramfs"
0x000004800000-0x000005000000 : "configs"
0x000005000000-0x000006000000 : "reserve"
0x000006000000-0x000008000000 : "ramfs-bak"
0x000008000000-0x000010000000 : "reserve1"
NET: Registered protocol family 10
sit: IPv6 over IPv4 tunneling driver
NET: Registered protocol family 17
can: controller area network core (rev 20120528 abi 9)
NET: Registered protocol family 29
can: raw protocol (rev 20120528)
can: broadcast manager protocol (rev 20120528 t)
can: netlink gateway (rev 20130117) max_hops=1
zynq_pm_ioremap: no compatible node found for 'xlnx,zynq-ddrc-a05'
zynq_pm_late_init: Unable to map DDRC IO memory.
Registering SWP/SWPB emulation handler
hctosys: unable to open rtc device (rtc0)
ALSA device list:
  No soundcards found.
RAMDISK: gzip image found at block 0
EXT4-fs (ram0): couldn't mount as ext3 due to feature incompatibilities
EXT4-fs (ram0): mounted filesystem without journal. Opts: (null)
VFS: Mounted root (ext4 filesystem) on device 1:0.
devtmpfs: mounted
Freeing unused kernel memory: 1024K (c0a00000 - c0b00000)
EXT4-fs (ram0): re-mounted. Opts: block_validity,delalloc,barrier,user_xattr
random: dd urandom read with 0 bits of entropy available
ubi0: attaching mtd2
ubi0: scanning is finished
ubi0: attached mtd2 (name "configs", size 8 MiB)
ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
ubi0: good PEBs: 64, bad PEBs: 0, corrupted PEBs: 0
ubi0: user volume: 1, internal volumes: 1, max. volumes count: 128
ubi0: max/mean erase counter: 8/2, WL threshold: 4096, image sequence number: 237714726
ubi0: available PEBs: 0, total reserved PEBs: 64, PEBs reserved for bad PEB handling: 40
ubi0: background thread "ubi_bgt0d" started, PID 708
UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 711
UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0, name "configs"
UBIFS (ubi0:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
UBIFS (ubi0:0): FS size: 1396736 bytes (1 MiB, 11 LEBs), journal size 888833 bytes (0 MiB, 5 LEBs)
UBIFS (ubi0:0): reserved for root: 65970 bytes (64 KiB)
UBIFS (ubi0:0): media format: w4/r0 (latest is w4/r0), UUID FF09433B-E002-4C29-97EC-6220DDC4BB36, small LPT model
ubi1: attaching mtd5
ubi1: scanning is finished
ubi1: attached mtd5 (name "reserve1", size 128 MiB)
ubi1: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
ubi1: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
ubi1: VID header offset: 2048 (aligned 2048), data offset: 4096
ubi1: good PEBs: 1020, bad PEBs: 4, corrupted PEBs: 0
ubi1: user volume: 1, internal volumes: 1, max. volumes count: 128
ubi1: max/mean erase counter: 7/3, WL threshold: 4096, image sequence number: 3265111179
ubi1: available PEBs: 0, total reserved PEBs: 1020, PEBs reserved for bad PEB handling: 36
ubi1: background thread "ubi_bgt1d" started, PID 720
UBIFS (ubi1:0): background thread "ubifs_bgt1_0" started, PID 723
UBIFS (ubi1:0): UBIFS: mounted UBI device 1, volume 0, name "reserve1"
UBIFS (ubi1:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
UBIFS (ubi1:0): FS size: 123039744 bytes (117 MiB, 969 LEBs), journal size 6221824 bytes (5 MiB, 49 LEBs)
UBIFS (ubi1:0): reserved for root: 4952683 bytes (4836 KiB)
UBIFS (ubi1:0): media format: w4/r0 (latest is w4/r0), UUID 1AF6F1E1-61F0-462C-AE44-C7D9596CF7E2, small LPT model
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
macb e000b000.ethernet eth0: unable to generate target frequency: 25000000 Hz
macb e000b000.ethernet eth0: link up (100/Full)
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
In axi fpga driver!
request_mem_region OK!
AXI fpga dev virtual address is 0xcfb38000
*base_vir_addr = 0xb023
In fpga mem driver!
request_mem_region OK!
fpga mem virtual address is 0xd2000000
random: nonblocking pool is initialized
2020-05-03 11:52:51 driver-btm-api.c:741:init_freq_mode: This is scan-user version
2020-05-03 11:52:51 driver-btm-api.c:2413:bitmain_soc_init: opt_multi_version     = 1
2020-05-03 11:52:51 driver-btm-api.c:2414:bitmain_soc_init: opt_bitmain_ab        = 1
2020-05-03 11:52:51 driver-btm-api.c:2415:bitmain_soc_init: opt_bitmain_work_mode = 0
2020-05-03 11:52:51 driver-btm-api.c:2416:bitmain_soc_init: Miner compile time: Tue Apr  7 14:11:08 CST 2020 type: Antminer S17+
2020-05-03 11:52:51 driver-btm-api.c:2417:bitmain_soc_init: commit version: 0fa7066 2020-04-06 22:14:32, build by: lol 2020-04-07 14:17:53
2020-05-03 11:52:51 driver-btm-api.c:2045:show_sn: len:16, 8108ed0c2b10481c
2020-05-03 11:52:51 driver-btm-api.c:2423:bitmain_soc_init: show sn return 1
2020-05-03 11:52:51 driver-btm-api.c:2065:handle_sn_for_factory_mode: show sn return 1
2020-05-03 11:52:51 driver-btm-api.c:2103:handle_sn_for_factory_mode: read sn success, 8108ed0c2b10481c
2020-05-03 11:52:51 fan.c:279:front_fan_power_on: Note: front fan is power on!
2020-05-03 11:52:51 fan.c:291:rear_fan_power_on: Note: rear fan is power on!
2020-05-03 11:52:51 driver-btm-api.c:1276:miner_device_init: Detect 256MB control board of XILINX
2020-05-03 11:52:51 driver-btm-api.c:1217:init_fan_parameter: fan_eft : 0  fan_pwm : 0
2020-05-03 11:52:57 driver-btm-api.c:1201:init_miner_version: miner ID : 8108ed0c2b10481c
2020-05-03 11:52:57 driver-btm-api.c:1207:init_miner_version: FPGA Version = 0xB023
2020-05-03 11:53:03 driver-btm-api.c:799:get_product_id: product_id[0] = 0
2020-05-03 11:53:03 driver-btm-api.c:799:get_product_id: product_id[1] = 0
2020-05-03 11:53:03 driver-btm-api.c:799:get_product_id: product_id[2] = 0
2020-05-03 11:53:03 driver-btm-api.c:2196:update_conf_by_power_feedback: Power feedback is disabled
2020-05-03 11:53:03 driver-btm-api.c:2200:update_conf_by_power_feedback: get_calibration_voltage, vol:1970.
2020-05-03 11:53:03 frequency.c:1366:adjust_higer_max_vol_table: adjust_higer_max_vol_table, adjust_vol = 0
2020-05-03 11:53:03 thread.c:1066:create_read_nonce_reg_thread: create thread
2020-05-03 11:53:09 driver-btm-api.c:1201:init_miner_version: miner ID : 8108ed0c2b10481c
2020-05-03 11:53:09 driver-btm-api.c:1207:init_miner_version: FPGA Version = 0xB023
2020-05-03 11:53:14 driver-btm-api.c:799:get_product_id: product_id[0] = 0
2020-05-03 11:53:14 driver-btm-api.c:799:get_product_id: product_id[1] = 0
2020-05-03 11:53:14 driver-btm-api.c:799:get_product_id: product_id[2] = 0
2020-05-03 11:53:14 driver-btm-api.c:754:_set_project_type: project:0
2020-05-03 11:53:14 driver-btm-api.c:775:_set_project_type: Project type: Antminer S17+
2020-05-03 11:53:14 driver-btm-api.c:786:dump_pcb_bom_version: Chain [0] PCB Version: 0x0100
2020-05-03 11:53:14 driver-btm-api.c:787:dump_pcb_bom_version: Chain [0] BOM Version: 0x0100
2020-05-03 11:53:14 driver-btm-api.c:786:dump_pcb_bom_version: Chain [1] PCB Version: 0x0100
2020-05-03 11:53:14 driver-btm-api.c:787:dump_pcb_bom_version: Chain [1] BOM Version: 0x0100
2020-05-03 11:53:14 driver-btm-api.c:786:dump_pcb_bom_version: Chain [2] PCB Version: 0x0100
2020-05-03 11:53:14 driver-btm-api.c:787:dump_pcb_bom_version: Chain [2] BOM Version: 0x0100
2020-05-03 11:53:16 driver-btm-api.c:2334:bitmain_board_init: Fan check passed.
2020-05-03 11:53:17 board.c:36:jump_and_app_check_restore_pic: chain[0] PIC jump to app
2020-05-03 11:53:19 board.c:40:jump_and_app_check_restore_pic: Check chain[0] PIC fw version=0x88
2020-05-03 11:53:20 board.c:36:jump_and_app_check_restore_pic: chain[1] PIC jump to app
2020-05-03 11:53:22 board.c:40:jump_and_app_check_restore_pic: Check chain[1] PIC fw version=0x88
2020-05-03 11:53:23 board.c:36:jump_and_app_check_restore_pic: chain[2] PIC jump to app
2020-05-03 11:53:25 board.c:40:jump_and_app_check_restore_pic: Check chain[2] PIC fw version=0x88
2020-05-03 11:53:25 thread.c:1061:create_pic_heart_beat_thread: create thread
2020-05-03 11:53:25 power_api.c:213:power_init: Power init:
2020-05-03 11:53:25 power_api.c:214:power_init: current_voltage_raw = 0
2020-05-03 11:53:25 power_api.c:215:power_init: highest_voltage_raw = 2100
2020-05-03 11:53:25 power_api.c:216:power_init: working_voltage_raw = 1950
2020-05-03 11:53:25 power_api.c:217:power_init: higher_voltage_raw  = 2040
2020-05-03 11:53:25 power_api.c:218:power_init: check_asic_voltage_raw  = 2100
2020-05-03 11:53:25 driver-btm-api.c:2344:bitmain_board_init: Enter 30s sleep to make sure power release finish.
2020-05-03 11:53:25 power_api.c:186:power_off: init gpio907
2020-05-03 11:53:56 register.c:306:get_register: !!! reg crc error
2020-05-03 11:53:56 register.c:306:get_register: !!! reg crc error
2020-05-03 11:53:56 register.c:306:get_register: !!! reg crc error
2020-05-03 11:53:56 register.c:306:get_register: !!! reg crc error
2020-05-03 11:53:56 register.c:306:get_register: !!! reg crc error
2020-05-03 11:53:57 power_api.c:324:set_to_highest_voltage_by_steps: Set to voltage raw 2100, step by step.
2020-05-03 11:54:23 power_api.c:85:check_voltage_multi: retry time: 0
2020-05-03 11:54:24 power_api.c:40:_get_avg_voltage: chain = 0, voltage = 21.078507
2020-05-03 11:54:25 power_api.c:40:_get_avg_voltage: chain = 1, voltage = 21.124679
2020-05-03 11:54:26 power_api.c:40:_get_avg_voltage: chain = 2, voltage = 21.124679
2020-05-03 11:54:26 power_api.c:53:_get_avg_voltage: average_voltage = 21.109288
2020-05-03 11:54:26 power_api.c:71:check_voltage: target_vol = 21.00, actural_vol = 21.11, check voltage passed.
2020-05-03 11:54:26 uart.c:72:set_baud: set fpga_baud to 115200
2020-05-03 11:54:38 driver-btm-api.c:1096:check_asic_number_with_power_on: Chain[0]: find 65 asic, times 0
2020-05-03 11:54:49 driver-btm-api.c:1096:check_asic_number_with_power_on: Chain[1]: find 65 asic, times 0
2020-05-03 11:55:00 driver-btm-api.c:1096:check_asic_number_with_power_on: Chain[2]: find 65 asic, times 0
2020-05-03 11:55:08 driver-hash-chip.c:266:set_uart_relay: set uart relay to 0x330003
2020-05-03 11:55:08 driver-btm-api.c:397:set_order_clock: chain[0]: set order clock, stragegy 3
2020-05-03 11:55:08 driver-btm-api.c:397:set_order_clock: chain[1]: set order clock, stragegy 3
2020-05-03 11:55:08 driver-btm-api.c:397:set_order_clock: chain[2]: set order clock, stragegy 3
2020-05-03 11:55:09 driver-hash-chip.c:502:set_clock_delay_control: core_data = 0x34
2020-05-03 11:55:09 driver-btm-api.c:1854:check_clock_counter: freq 50 clock_counter_limit 6
2020-05-03 11:55:09 voltage[0] = 1960
2020-05-03 11:55:09 voltage[1] = 1960
2020-05-03 11:55:09 voltage[2] = 1960
2020-05-03 11:55:09 power_api.c:226:set_working_voltage_raw: working_voltage_raw = 1960
2020-05-03 11:55:10 temperature.c:340:calibrate_temp_sensor_one_chain: chain 0 temp sensor NCT218
2020-05-03 11:55:12 temperature.c:340:calibrate_temp_sensor_one_chain: chain 1 temp sensor NCT218
2020-05-03 11:55:13 temperature.c:340:calibrate_temp_sensor_one_chain: chain 2 temp sensor NCT218
2020-05-03 11:55:13 uart.c:72:set_baud: set fpga_baud to 12000000
2020-05-03 11:55:14 driver-btm-api.c:264:check_bringup_temp: Bring up temperature is 25
2020-05-03 11:55:14 thread.c:1081:create_check_miner_status_thread: create thread
2020-05-03 11:55:14 thread.c:1071:create_show_miner_status_thread: create thread
2020-05-03 11:55:14 thread.c:1051:create_temperature_monitor_thread: create thread
2020-05-03 11:55:14 freq_tuning.c:183:freq_tuning_get_max_freq: Max freq of tuning is 650
2020-05-03 11:55:14 driver-btm-api.c:1727:send_null_work: [DEBUG] Send null work.
2020-05-03 11:55:14 thread.c:1041:create_asic_status_monitor_thread: create thread
2020-05-03 11:55:14 frequency.c:1019:inc_freq_with_fixed_vco: chain = 255, freq = 510, is_higher_voltage = true
2020-05-03 11:55:27 power_api.c:352:set_to_voltage_by_steps: Set to voltage raw 2090, step by step.
2020-05-03 11:55:29 power_api.c:85:check_voltage_multi: retry time: 0
2020-05-03 11:55:30 power_api.c:40:_get_avg_voltage: chain = 0, voltage = 20.931788
2020-05-03 11:55:31 power_api.c:40:_get_avg_voltage: chain = 1, voltage = 20.978150
2020-05-03 11:55:32 power_api.c:40:_get_avg_voltage: chain = 2, voltage = 20.978150
2020-05-03 11:55:32 power_api.c:53:_get_avg_voltage: average_voltage = 20.962696
2020-05-03 11:55:32 power_api.c:71:check_voltage: target_vol = 20.90, actural_vol = 20.96, check voltage passed.
2020-05-03 11:55:39 frequency.c:1061:inc_freq_with_fixed_step: chain = 0, freq_start = 510, freq_end = 530, freq_step = 5, is_higher_voltage = true
2020-05-03 11:55:47 frequency.c:1061:inc_freq_with_fixed_step: chain = 2, freq_start = 510, freq_end = 530, freq_step = 5, is_higher_voltage = true
2020-05-03 11:55:55 frequency.c:1061:inc_freq_with_fixed_step: chain = 0, freq_start = 530, freq_end = 540, freq_step = 5, is_higher_voltage = true
2020-05-03 11:55:59 power_api.c:352:set_to_voltage_by_steps: Set to voltage raw 2080, step by step.
2020-05-03 11:56:01 power_api.c:85:check_voltage_multi: retry time: 0
2020-05-03 11:56:03 power_api.c:40:_get_avg_voltage: chain = 0, voltage = 20.785070
2020-05-03 11:56:04 power_api.c:40:_get_avg_voltage: chain = 1, voltage = 20.807198
2020-05-03 11:56:05 power_api.c:40:_get_avg_voltage: chain = 2, voltage = 20.780487
2020-05-03 11:56:05 power_api.c:53:_get_avg_voltage: average_voltage = 20.790919
2020-05-03 11:56:05 power_api.c:71:check_voltage: target_vol = 20.80, actural_vol = 20.79, check voltage passed.
2020-05-03 11:56:05 frequency.c:1090:inc_asic_diff_freq_by_steps: chain = 0, start = 540, freq_step = 5
2020-05-03 11:56:12 frequency.c:1090:inc_asic_diff_freq_by_steps: chain = 1, start = 510, freq_step = 5
2020-05-03 11:56:17 frequency.c:1090:inc_asic_diff_freq_by_steps: chain = 2, start = 530, freq_step = 5
2020-05-03 11:56:23 driver-btm-api.c:727:set_timeout: freq = 590, percent = 90, hcn = 44236, timeout = 74
2020-05-03 11:56:23 power_api.c:310:set_to_working_voltage_by_steps: Set to voltage raw 1960, step by step.
2020-05-03 11:56:28 power_api.c:85:check_voltage_multi: retry time: 0
2020-05-03 11:56:29 power_api.c:40:_get_avg_voltage: chain = 0, voltage = 19.612114
2020-05-03 11:56:30 power_api.c:40:_get_avg_voltage: chain = 1, voltage = 19.634962
2020-05-03 11:56:31 power_api.c:40:_get_avg_voltage: chain = 2, voltage = 19.634962
2020-05-03 11:56:31 power_api.c:53:_get_avg_voltage: average_voltage = 19.627346
2020-05-03 11:56:31 power_api.c:71:check_voltage: target_vol = 19.60, actural_vol = 19.63, check voltage passed.
2020-05-03 11:56:31 thread.c:1076:create_check_system_status_thread: create thread
2020-05-03 11:56:32 driver-btm-api.c:2577:bitmain_soc_init: Init done!
2020-05-03 11:56:32 driver-btm-api.c:216:set_miner_status: STATUS_INIT
2020-05-03 11:56:36 driver-btm-api.c:216:set_miner_status: STATUS_OKAY
2020-05-03 11:56:37 frequency.c:205:get_ideal_hash_rate_GH: ideal_hash_rate = 70780
2020-05-03 11:56:37 frequency.c:223:get_sale_hash_rate_GH: sale_hash_rate = 70000
2020-05-03 11:56:41 driver-btm-api.c:1458:dhash_chip_send_job: Version num 4.
2020-05-03 11:56:41 driver-btm-api.c:1606:dhash_chip_send_job: stime.tv_sec 1588507001, block_ntime 1588506981
2020-05-03 12:13:23 thread.c:976:asic_status_monitor_thread: ERROR: chain 0 get hashrate_reg_counter 7, require 65, failed times 1: ooooo ooxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
2020-05-03 12:13:24 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 0, chip = 14, reg = 0
2020-05-03 12:13:24 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 0, chip = 14, reg = 1
2020-05-03 12:13:24 thread.c:996:asic_status_monitor_thread: chain 0 can't get enough hashrate reg val for 0 times.
2020-05-03 12:13:25 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 1, chip = 10, reg = 0
2020-05-03 12:13:25 thread.c:976:asic_status_monitor_thread: ERROR: chain 0 get hashrate_reg_counter 7, require 65, failed times 1: ooooo ooxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
2020-05-03 12:13:25 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 1, chip = 10, reg = 1
2020-05-03 12:13:25 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 2, chip = 54, reg = 0
2020-05-03 12:13:26 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 2, chip = 54, reg = 1
2020-05-03 12:13:26 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 50, reg = 0
2020-05-03 12:13:26 thread.c:996:asic_status_monitor_thread: chain 0 can't get enough hashrate reg val for 1 times.
2020-05-03 12:13:26 temperature.c:838:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 50, reg = 1
2020-05-03 12:13:27 thread.c:976:asic_status_monitor_thread: ERROR: chain 0 get hashrate_reg_counter 7, require 65, failed times 1: ooooo ooxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
2020-05-03 12:13:27 temperature.c:865:get_temp_info: ERROR: chain 0 can get NONE temp info or temp value abnormal, power it off 
2020-05-03 12:13:29 frequency.c:205:get_ideal_hash_rate_GH: ideal_hash_rate = 46420
2020-05-03 12:13:29 frequency.c:223:get_sale_hash_rate_GH: sale_hash_rate = 46000
2020-05-03 12:26:43 thread.c:257:calc_hashrate_avg: avg rate is 61141.64 in 30 mins
2020-05-03 12:26:43 temperature.c:516:temp_statistics_show:   pcb temp 42~67  chip temp 61~80

mikeywith

legendary

Activity: 2478

Merit: 6693

be constructive or S.T.F.U

Quote from: jacktman on May 01, 2020, 09:58:55 AM

... I don't have much knowledge of electronics and unfortunately I must trust what they tell me Roll Eyes

Since they are authorized by zeusbtc then I would still trust them, what the main branch says does not really contradict with what the branch in your country told you, it only weakens the probability of the latter, also I must make clear that it's not just because I tend to trust the main branch more - I base my points on that alone, it's more like common sense, what the Chinese branch said has much higher probability of happening.

Quote from: jacktman on May 01, 2020, 09:58:55 AM

... but is it also possible that if one fails it will send incorrect information to the other 3?

Indeed, you are right, it's very possible, however, it's unlikely, the way the board is designed should not let a single sensor stop the other 3, the proof is we are aware of some hash boards that work with less than 4 sensors, which means the probability of a single sensor stopping all others is kind of low.

Quote from: jacktman on May 01, 2020, 09:58:55 AM

In my case the 4 sensors of the 3 hashboards are failing, so I read this failure is related to the PSU?

Yup, your problem is power related, I am 99% sure it is, and let me explain why:

Reason 1:

You now have a total of 12 (3*4) broken sensors, does that make sense? NO

Reason 2:

You have at least 1 bad chip/heat sink on all 3 boards, how possible is this scnerio? i would give it 1% at best, provided they all failed at the same time.

Reason 3:

Something that has an effect on every aspect of the miner, we have two things, the control board and PSU, from the various reports and personal experience I doubt the control board will cause such an issue, so what is left? the power supply.

I would still suggest that you flash the recovery file using an SDcard, and then flash another firmware, see if that helps,

Quote from: jacktman on May 01, 2020, 09:58:55 AM

Something that also disconcerts me is that my T17 in the cooler hours of the day works normally.
Which is why I thought it made sense that the temperature sensor. Huh

This actually adds more strength to the power supply being the issue, the sensors are designed to work at temps between -40c to 125c, it doesn't make sense that 10 or 15 degrees change in the ambient temp will make them work or not work, however, a 10 degrees change makes a world of difference to the PSU and the chips, I am suspecting that your PSU is overheating at noontime and fails to deliver the DC voltage needed by the boards, also during peak hours the voltage at your house/farm might actually drop to below 200v and thus causes the miner to not hash, make sure you measure that too.

If you have a spare PSU, I suggest you try it, or at least measure the DC voltage coming into the boards from the PSU

Quote from: jacktman on May 01, 2020, 09:58:55 AM

you say you solved the problem by pressing on chip 1
Did your t17 fail on just one sensor or all of them like mine?

4 sensors 1 board, the other two boards were fine and reading all sensors, so it's a bit of a different issue from yours.

Quote from: Scorpyy on May 01, 2020, 12:13:06 PM

This could actually be true. My 2nd T17+ got broken a minute after i tried to overclock it. Worked fine for 2 months before that. So please do not go over 800 Mhz on default cooling.

overclocking has more effect on the chips than on the temp sensor IMO, so it's more likely that some of the chips got toasted, you can test them with a multimeter by the way.

Quote from: jacktman on May 01, 2020, 09:58:55 AM

try lowering the frequency.

That's an excellent idea, even when the PSU is having some issues, going with a lower frequency might fix it.

Scorpyy

jr. member

Activity: 43

Merit: 59

Quote from: jacktman on April 30, 2020, 02:39:22 PM

Mostly it is because at some point the machine was exposed to high temperatures, damaging the temperature sensor...

This could actually be true. My 2nd T17+ got broken a minute after i tried to overclock it. Worked fine for 2 months before that. So please do not go over 800 Mhz on default cooling.

Quote from: jacktman on May 01, 2020, 09:58:55 AM

but is it also possible that if one fails it will send incorrect information to the other 3? I don't know, as I mentioned before I have no technical knowledge of electronics and I don't know if this is possible.

It is possible if 1 chip fails then the whole chain fails. Also if your machine works normally in cooler hours of the day, try lowering the frequency.

jacktman

newbie

Activity: 6

Merit: 9

thanks for sharing the information you received, I don't have much knowledge of electronics and unfortunately I must trust what they tell me Roll Eyes

It really makes a lot of sense what you say in reference to the 4 temperature sensors, how probable it is that all 4 will be damaged at once?
but is it also possible that if one fails it will send incorrect information to the other 3? I don't know, as I mentioned before I have no technical knowledge of electronics and I don't know if this is possible.
In my case the 4 sensors of the 3 hashboards are failing, so I read this failure is related to the PSU?
Something that also disconcerts me is that my T17 in the cooler hours of the day works normally.
Which is why I thought it made sense that the temperature sensor. Huh

you say you solved the problem by pressing on chip 1
Did your t17 fail on just one sensor or all of them like mine?

mikeywith

legendary

Activity: 2478

Merit: 6693

be constructive or S.T.F.U

Thanks for your input, I did contact zeusbtc's main branch china, and they told me quite a different story, not saying yours is wrong, but I think this one actually makes a lot more sense.

To summarize that article, they simply imply that if all temp sensors of all hash boards can't report their temp then it's a PSU problem ( not a very common issue), if however less than 3 boards 'normally 1' hash board shows the temp sensor error, then the problem is one of the chips/heatsinks and not actually the temp sensors.

The reason why I am 99% positive about the chip theory is due to the fact that the chances of FOUR sensors going down at once are very very very unlikely, those sensors don't produce much heat if any, they are located across different areas, I am not saying they will not fail, but highly unlikely for one of them to fail, let alone FOUR of them.

To strengthen this point further, based on the article sent to me by zeusbtc, they suspect that the bad chip is usually the first chip, so we tested a T17 board which was giving us the 4 sensors error by putting some pressure on the heatsinks of the 1st chip pushing them down really hard, i learned this method from a video on youtube on fixing S9s boards where he pushes his finger against the heat sink to see if it reports voltage, just kind of making sure the chip is actually in contact the board and the heatsink is in contact with the chip, as naive as it may sound to you, the board DID work and all temps are being read perfectly.

I, of course, don't expect this to last forever, it's a matter of time before the heatsink becomes loose again and we go back to square zero, but at least we are now pretty close to determining the exact problem of this common problem, and once I have physical access to that boards, I will remove that heat sink and put it back, but for now, all of this is done remotely with the help of a friend.

Anyone who has the same problem, please try the remedy, also keep in mind that it may not always be the 1st chip, but if you have a multimeter that reads diode's resistance you can easily identify the bad chip by measuring the heatsink against RST as shown here.

jacktman

newbie

Activity: 6

Merit: 9

Hello, I have the same problem with my T17. I contacted the Zeusminig authorized service in my country and the technician confirmed that this problem is very recurrent in T17 and T17+ models. Mostly it is because at some point the machine was exposed to high temperatures, damaging the temperature sensor, the good news is that they have already repaired this fault by replacing the sensor, the bad news is repair cost is $80 for each board. If I have any new news I share it.

mikeywith

legendary

Activity: 2478

Merit: 6693

be constructive or S.T.F.U

An update on the situation.

With the help of a friend, I was able to identify the part number of the sensors used on the S17 which is T45187JP9KY , apparently, there is nothing about that on google, so we had to check another miner to confirm, we found on the S9k the sensor part number is T45188J, so this gives us the idea that the sensors used on Bitmain boards must be (T451) which may be referred to as TMP451AIDQFR or simply "T451 Temperature Sensor".

Current Issues:

The temp sensors on the T17 are nowhere to be found ( this is all my friend doing the job so far as I don't have access to the hash boards at the moment), this leads me to think that they might be next to chips but UNDER the heatsinks, they are pretty small and they can pretty much fit in there, so if anyone has a T17 hash board and a good magnifier and is willing to contribute, kindly help us find those sensors, the info you needs is

- They must be 4
- They can't be located next to each other
- They likely have a label that starts with (T451....)

mikeywith

legendary

Activity: 2478

Merit: 6693

be constructive or S.T.F.U

I have concluded based on my experience and the numerous complaints online that the most common issue with all the new 17 series gears is the temp sensor.

Getting this error in the kernel log:

Code:

temperature.c:697:get_temp_info: read temp sensor failed: chain = 1, sensor = 0, chip = 64,

Indicates a problem with the sensor, in some cases, the following methods will help fix it.

1 -Reboot the miner.
2- Sdcard the miner using the recovery firmware.
3- Flash the latest firmware.
4- 2 and 3 together (Sdcard and then the latest firmware).
5- Installing a none-Bitmain firmware.

The success rate, however, is not so great with all the above remedies, and it looks like replacing the sensor is a must in most cases.

My plan is to use this thread to combine our efforts in gathering all the information we need to know about this issue and how to fix it.

The process contains 3 steps:

A - Identify the sensor positions

The new gears have 4 sensors, based on the image posted by Luke in regards to S17 and S17pro.

source support.bitmain

He also sent me this image of a T17 hash board.

source support.bitmain

He told me that the 4 sensors are located near chip 9, chip 7, chip 22, chip 24.

And based on his explanation to me, I have concluded that these 4 sensors on the 17 series gears are always located around the the third column ( 2 sensors above and 2 below) regardless of the miner's exact model.

B- Where to buy them

So far the only piece of information I have on this is provided by fellow member taserz which indicates the sensor has the part number TMP451AIDQFR.

I currently don't have access to any of the hash boards to confirm it (thanks to the quarantine), if someone has it, kindly look up the part number printed on the sensor ( it won't be easy to read as these sensors are very tiny).

I should be able to visit the farm in the coming days, so if nobody provides the information til then, I will do it myself.

To Do:

1-Confirm the part number.
2-Locate the best source to buy the sensors.

C- How to replace them

No information is available yet.

Topic: Antminer T17/S17 Temp Sensor problem discussion. (Read 1322 times)