I need your help in diagnosing what could be the reason that those devices "hang" from time to time in a wierd state which is:
- the scheduler stops running scheduled scripts
- GSM/ppp-client connection dies
- ethernet stops responding at all, winbox does not discover the device neither via IP nor MAC
- CAP LED turns on
- system watchdog does not react - the device is not rebooted
- there is nothing specific in the last log entries
Ad.1 the scheduler stops running scheduled scripts
I know about that because one of my scheduled scripts reboots the device after 60 failed connection attempts. I know by backed/server side that those connections are never made, so the device should reboot by this script from time to time. But it's not.
Ad.2 GSM/ppp-client connection dies
The device is configured to keep the openvpn connection to my vpn server, scheduled to fetch some URLs from time to time and also configured to send logs to remote syslog server. All of those die at the same time.
Ad.3 ethernet stops responding at all
This is the most annoying. From time to time I travel to the location of those devices and to diagnose the unresponsive device I try to connect laptop directly via ethernet. In normal circumstances I would be able to ssh to the device at 192.168.88.1, discover it via winbox or connect via MAC winbox option. None of this works when the device "hangs". I even tried to manually type the MAC address (from a sticker on the device) into winbox to connect to it, but it doesn't work either.
Ad.4 CAP LED turns on
This LED is never on in normal circumstances (except while rebooting maybe) but suddenly turns on when the device gets in this wierd unresponsive state.
Ad.5 system watchdog does not react - the device is not rebooted
The watchdog is enabled, but the device never gets restarted after it enteres this state.
Code: Select all
[admin@dsg-EB820FE18DE1-O-GLO] /tool/netwatch> /system/watchdog/print
watch-address: none
watchdog-timer: yes
ping-start-after-boot: 5m
ping-timeout: 1m
automatic-supout: yes
auto-send-supout: no
The devices are configured to send logs to remote syslog, as follows:
Code: Select all
[admin@dsg-EB820FE18DE1-O-GLO] /tool/netwatch> /system/logging/print
Flags: * - DEFAULT
Columns: TOPICS, ACTION, PREFIX
# TOPICS ACTION PREFIX
0 * info remote dsg-EB820FE18DE1-O-GLO
1 * error remote dsg-EB820FE18DE1-O-GLO
2 * warning remote dsg-EB820FE18DE1-O-GLO
3 * critical remote dsg-EB820FE18DE1-O-GLO
I would love to provide you more details, but when this issue happens, I don't know what else I could do to diagnose what exacly happens on the device. Any hints what else to do in such situation?
One perhaps important thing I noticed is that when I had a "sanity reboot" script scheduled every few hours (to reboot the device "just in case") this "hanging" issue occured much often.
Below some information about the device configuration. Please say if you need more.
Code: Select all
[admin@dsg-EB820FE18DE1-O-GLO] /system/resource> print
uptime: 1h49m24s
version: 7.6 (stable)
build-time: Oct/17/2022 10:55:40
factory-software: 6.44.6
free-memory: 22.4MiB
total-memory: 64.0MiB
cpu: MIPS 24Kc V7.4
cpu-count: 1
cpu-frequency: 650MHz
cpu-load: 6%
free-hdd-space: 3568.0KiB
total-hdd-space: 16.0MiB
write-sect-since-reboot: 631
write-sect-total: 42748
bad-blocks: 0%
architecture-name: mipsbe
board-name: LtAP mini
platform: MikroTik
[admin@dsg-EB820FE18DE1-O-GLO] /system/resource/usb> print
Columns: DEVICE, VENDOR, NAME, SPEED
# DEVICE VENDOR NAME SPEED
0 1-0 Linux 5.6.3 ehci_hcd RB400 EHCI 480
1 1-1 HP HP hs2340 HSPA+ MobileBroadband 480
# jan/02/2023 13:21:16 by RouterOS 7.6
# software id = 4GWQ-5ML7
#
# model = RB912R-2nD
# serial number = EB820FE18DE1
/interface ppp-client
add apn=internet disabled=no modem-init="AT+CFUN=1" name=ppp-out1 port=usb2
[admin@dsg-EB820FE18DE1-O-GLO] /system/gps> export hide-sensitive
# jan/02/2023 13:22:34 by RouterOS 7.6
# software id = 4GWQ-5ML7
#
# model = RB912R-2nD
# serial number = EB820FE18DE1
/system gps
set coordinate-format=dd enabled=yes gps-antenna-select=external port=serial0 set-system-time=yes
[admin@dsg-EB820FE18DE1-O-GLO] /system/scheduler> export hide-sensitive
# jan/02/2023 13:25:01 by RouterOS 7.6
# software id = 4GWQ-5ML7
#
# model = RB912R-2nD
# serial number = EB820FE18DE1
/system scheduler
add interval=3s name=fetchscript on-event=smspollingv2 policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-date=dec/27/2022 start-time=21:46:01
add interval=2m name=mqttsanity on-event=mqttsanity policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-date=dec/27/2022 start-time=21:46:09
add interval=10s name=gps2mqtt on-event=gps2mqtt policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-date=dec/27/2022 start-time=21:46:11
add interval=2m name=pppenabler on-event=pppenabler policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-date=jan/02/2023 start-time=12:52:18
[admin@dsg-EB820FE18DE1-O-GLO] /system/package> print
Columns: NAME, VERSION
# NAME VERSION
0 gps 7.6
1 iot 7.6
2 routeros 7.6