Ok, let's simplify scripts for better understanding situation.
I disabled both scripts in scheduler:
[admin@Branch-Node-Test] /system scheduler> print
Flags: X - disabled
# NAME START-DATE START-TIME INTERVAL ON-EVENT RUN-COUNT
0 X CHECK-GATEWAY jan/02/1970 00:00:00 5s CHECK-GATEWAY 0
1 X CONTROL-3G-CONNECTION jan/02/1970 00:00:00 50s CONTROL-3G-CONNECTION 0
2 CHECK-GATEWAY-TEST jan/02/1970 00:00:00 20s CHECK-GATEWAY-TEST 24
I added simplified copy of "CHECK-GATEWAY" script which is called "CHECK-GATEWAY-TEST". I increased interval to 20s for having more time for diagnosing problem.
Here is the whole text of "CHECK-GATEWAY-TEST" script:
[admin@Branch-Node-Test] /system script> print detail where name=CHECK-GATEWAY-TEST
Flags: I - invalid
0 name="CHECK-GATEWAY-TEST" owner="admin" policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive
last-started=mar/16/2015 13:34:50 run-count=65 source=
#describe variables
:global PingFailCount ;
:global PingFailThreshold 5 ;
:global doublePingFailThreshold ($PingFailThreshold * 2) ;
:local GatewayEth1 10.1.1.1 ;
:local PingResult ;
:local status3G ;
:local greyIPlist { 10.0.0.0/8; 172.16.0.0/12; 192.168.0.0/16 } ;
#describe local functions
:local check3Gstatus {
[/interface ppp-client monitor 0 once do={
:if ($status = "established") do={
:delay 2 ;
}
:if ($status = "connected") do={
:if ($"local-address" in $greyIPlist) do={
:set status3G greyIPmode} else={:set status3G $status}
} else={:set status3G $status}
}
]
}
#set PingFailCount default value
:if ([:typeof $PingFailCount] = "nothing") do={:set PingFailCount 0; /ip route enable [find gateway=$GatewayEth1] ;} ;
#doing ping-check
:set PingResult [ping $GatewayEth1 count=10 ttl=1 interval 0.1 size 100] ;
:put message="Packet Received = $PingResult" ;
:if ($PingResult <= 5) do={
:set PingFailCount ($PingFailCount + 1) ;
}
#END
#show diagnostic results
:put message="PingFailCount = $PingFailCount" ;
:put message="PingFailThreshold = $PingFailThreshold" ;
:put message="status3G = $status3G" ;
Here is the whole output of test:
[admin@Branch-Node-Test] > system reboot
Reboot, yes? [y/N]:
y
system will reboot shortly
Connection closed by foreign host.
serg@ncc:~$ telnet 192.168.88.6
Trying 192.168.88.6...
Connected to 192.168.88.6.
Escape character is '^]'.
MikroTik v6.25
Login: admin
Password:
MMM MMM KKK TTTTTTTTTTT KKK
MMMM MMMM KKK TTTTTTTTTTT KKK
MMM MMMM MMM III KKK KKK RRRRRR OOOOOO TTT III KKK KKK
MMM MM MMM III KKKKK RRR RRR OOO OOO TTT III KKKKK
MMM MMM III KKK KKK RRRRRR OOO OOO TTT III KKK KKK
MMM MMM III KKK KKK RRR RRR OOOOOO TTT III KKK KKK
MikroTik RouterOS 6.25 (c) 1999-2014 http://www.mikrotik.com/
[?] Gives the list of available commands
command [?] Gives help on the command and list of arguments
[Tab] Completes the command/word. If the input is ambiguous,
a second [Tab] gives possible options
/ Move up to base level
.. Move up one level
/command Use command at the base level
[admin@Branch-Node-Test] > environment print
[admin@Branch-Node-Test] > environment print
[admin@Branch-Node-Test] > environment print
[admin@Branch-Node-Test] > system scheduler print
Flags: X - disabled
# NAME START-DATE START-TIME INTERVAL ON-EVENT RUN-COUNT
0 X CHECK-GATEWAY jan/02/1970 00:00:00 5s CHECK-GATEWAY 0
1 X CONTROL-3G-CONNECTION jan/02/1970 00:00:00 50s CONTROL-3G-CONNECTION 0
2 CHECK-GATEWAY-TEST jan/02/1970 00:00:00 20s CHECK-GATEWAY-TEST 0
[admin@Branch-Node-Test] > system clock print
time: 14:02:30
date: mar/16/2015
time-zone-name: Europe/Kiev
gmt-offset: +02:00
dst-active: no
[admin@Branch-Node-Test] > system scheduler print
Flags: X - disabled
# NAME START-DATE START-TIME INTERVAL ON-EVENT RUN-COUNT
0 X CHECK-GATEWAY jan/02/1970 00:00:00 5s CHECK-GATEWAY 0
1 X CONTROL-3G-CONNECTION jan/02/1970 00:00:00 50s CONTROL-3G-CONNECTION 0
2 CHECK-GATEWAY-TEST jan/02/1970 00:00:00 20s CHECK-GATEWAY-TEST 0
[admin@Branch-Node-Test] > environment print
[admin@Branch-Node-Test] > system scheduler print
Flags: X - disabled
# NAME START-DATE START-TIME INTERVAL ON-EVENT RUN-COUNT
0 X CHECK-GATEWAY jan/02/1970 00:00:00 5s CHECK-GATEWAY 0
1 X CONTROL-3G-CONNECTION jan/02/1970 00:00:00 50s CONTROL-3G-CONNECTION 0
2 CHECK-GATEWAY-TEST jan/02/1970 00:00:00 20s CHECK-GATEWAY-TEST 0
[admin@Branch-Node-Test] > environment print
[admin@Branch-Node-Test] > system scheduler print
Flags: X - disabled
# NAME START-DATE START-TIME INTERVAL ON-EVENT RUN-COUNT
0 X CHECK-GATEWAY jan/02/1970 00:00:00 5s CHECK-GATEWAY 0
1 X CONTROL-3G-CONNECTION jan/02/1970 00:00:00 50s CONTROL-3G-CONNECTION 0
2 CHECK-GATEWAY-TEST jan/02/1970 00:00:00 20s CHECK-GATEWAY-TEST 0
[admin@Branch-Node-Test] > environment print
[admin@Branch-Node-Test] > system clock print
time: 14:03:51
date: mar/16/2015
time-zone-name: Europe/Kiev
gmt-offset: +02:00
dst-active: no
[admin@Branch-Node-Test] > environment print
[admin@Branch-Node-Test] > system scheduler print
Flags: X - disabled
# NAME START-DATE START-TIME INTERVAL ON-EVENT RUN-COUNT
0 X CHECK-GATEWAY jan/02/1970 00:00:00 5s CHECK-GATEWAY 0
1 X CONTROL-3G-CONNECTION jan/02/1970 00:00:00 50s CONTROL-3G-CONNECTION 0
2 CHECK-GATEWAY-TEST jan/02/1970 00:00:00 20s CHECK-GATEWAY-TEST 0
[admin@Branch-Node-Test] >
I don't know exactly what have gone wrong but passes more than 1m20s and script was not executed not once.
1st command was executed immediately after reboot.
Here is the 2nd test:
[admin@Branch-Node-Test] > system reboot
Reboot, yes? [y/N]:
y
system will reboot shortly
Connection closed by foreign host.
serg@ncc:~$ telnet 192.168.88.6
Trying 192.168.88.6...
Connected to 192.168.88.6.
Escape character is '^]'.
MikroTik v6.25
Login: admin
Password:
MMM MMM KKK TTTTTTTTTTT KKK
MMMM MMMM KKK TTTTTTTTTTT KKK
MMM MMMM MMM III KKK KKK RRRRRR OOOOOO TTT III KKK KKK
MMM MM MMM III KKKKK RRR RRR OOO OOO TTT III KKKKK
MMM MMM III KKK KKK RRRRRR OOO OOO TTT III KKK KKK
MMM MMM III KKK KKK RRR RRR OOOOOO TTT III KKK KKK
MikroTik RouterOS 6.25 (c) 1999-2014 http://www.mikrotik.com/
[?] Gives the list of available commands
command [?] Gives help on the command and list of arguments
[Tab] Completes the command/word. If the input is ambiguous,
a second [Tab] gives possible options
/ Move up to base level
.. Move up one level
/command Use command at the base level
[admin@Branch-Node-Test] > environment print
PingFailCount=[:nothing]
PingFailThreshold=5
doublePingFailThreshold=10
[admin@Branch-Node-Test] > system clock print
time: 14:19:09
date: mar/16/2015
time-zone-name: Europe/Kiev
gmt-offset: +02:00
dst-active: no
[admin@Branch-Node-Test] > environment print
PingFailCount=[:nothing]
PingFailThreshold=5
doublePingFailThreshold=10
[admin@Branch-Node-Test] > system clock print
time: 14:20:57
date: mar/16/2015
time-zone-name: Europe/Kiev
gmt-offset: +02:00
dst-active: no
[admin@Branch-Node-Test] > system scheduler print
Flags: X - disabled
# NAME START-DATE START-TIME INTERVAL ON-EVENT RUN-COUNT
0 X CHECK-GATEWAY jan/02/1970 00:00:00 5s CHECK-GATEWAY 0
1 X CONTROL-3G-CONNECTION jan/02/1970 00:00:00 50s CONTROL-3G-CONNECTION 0
2 CHECK-GATEWAY-TEST jan/02/1970 00:00:00 20s CHECK-GATEWAY-TEST 8
[admin@Branch-Node-Test] > environment print
PingFailCount=[:nothing]
PingFailThreshold=5
doublePingFailThreshold=10
[admin@Branch-Node-Test] > system clock print
time: 14:20:57
date: mar/16/2015
time-zone-name: Europe/Kiev
gmt-offset: +02:00
dst-active: no
[admin@Branch-Node-Test] > system scheduler print
Flags: X - disabled
# NAME START-DATE START-TIME INTERVAL ON-EVENT RUN-COUNT
0 X CHECK-GATEWAY jan/02/1970 00:00:00 5s CHECK-GATEWAY 0
1 X CONTROL-3G-CONNECTION jan/02/1970 00:00:00 50s CONTROL-3G-CONNECTION 0
2 CHECK-GATEWAY-TEST jan/02/1970 00:00:00 20s CHECK-GATEWAY-TEST 8
[admin@Branch-Node-Test] > environment print
PingFailCount=[:nothing]
PingFailThreshold=5
doublePingFailThreshold=10
So, you could see that initial values of variables are set only when script is executing. In the 2nd test, you could see that "CHECK-GATEWAY-TEST" script was executed by the scheduler 8 times but "PingFailCount" variable was not set but it should be set.
I also had done 3rd test. I deleted "check3Gstatus" from "CHECK-GATEWAY-TEST" script and it works fine. Here is an output:
[admin@Branch-Node-Test] /system script> print where name=CHECK-GATEWAY-TEST
Flags: I - invalid
0 name="CHECK-GATEWAY-TEST" owner="admin" policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive
last-started=mar/16/2015 14:58:25 run-count=86 source=
#describe variables
:global PingFailCount ;
:global PingFailThreshold 5 ;
:global doublePingFailThreshold ($PingFailThreshold * 2) ;
:local GatewayEth1 10.1.1.1 ;
:local PingResult ;
#setting PingFailCount default value
:if ([:typeof $PingFailCount] = "nothing") do={:set PingFailCount 0; /ip route enable [find gateway=$GatewayEth1] ;} ;
#doing ping-check
:set PingResult [ping $GatewayEth1 count=10 ttl=1 interval 0.1 size 100] ;
:put message="Packet Received = $PingResult" ;
:if ($PingResult <= 5) do={
:set PingFailCount ($PingFailCount + 1) ;
}
#END
#show diagnostic results
:put message="PingFailCount = $PingFailCount" ;
:put message="PingFailThreshold = $PingFailThreshold" ;
[admin@Branch-Node-Test] /system scheduler> /system reboot
Reboot, yes? [y/N]:
y
system will reboot shortly
Connection closed by foreign host.
serg@ncc:~$ telnet 192.168.88.6
Trying 192.168.88.6...
Connected to 192.168.88.6.
Escape character is '^]'.
MikroTik v6.25
Login: admin
Password:
MMM MMM KKK TTTTTTTTTTT KKK
MMMM MMMM KKK TTTTTTTTTTT KKK
MMM MMMM MMM III KKK KKK RRRRRR OOOOOO TTT III KKK KKK
MMM MM MMM III KKKKK RRR RRR OOO OOO TTT III KKKKK
MMM MMM III KKK KKK RRRRRR OOO OOO TTT III KKK KKK
MMM MMM III KKK KKK RRR RRR OOOOOO TTT III KKK KKK
MikroTik RouterOS 6.25 (c) 1999-2014 http://www.mikrotik.com/
[?] Gives the list of available commands
command [?] Gives help on the command and list of arguments
[Tab] Completes the command/word. If the input is ambiguous,
a second [Tab] gives possible options
/ Move up to base level
.. Move up one level
/command Use command at the base level
[admin@Branch-Node-Test] > environment print
[admin@Branch-Node-Test] > environment print
PingFailCount=0
PingFailThreshold=5
doublePingFailThreshold=10
[admin@Branch-Node-Test] > environment print
PingFailCount=1
PingFailThreshold=5
doublePingFailThreshold=10
So I caught the time when script initially sets PingFailCount value.
Maybe the problem is in "check3Gstatus" local function. But I need it to check 3G state directly when main channel fails, to make decision disable main default route or not.
Another problem exists when I use functions. A function will be done twice if I declare it at the beggining of the script and then call. I mean that the function is executed at the moment when scheduler is reading the script from top to down and than the function is executed once more if I call it like $<function-name> in the middle of the script.
Notice, that all tests were done with deactivated main channel and without 3G modem inserted into mikrotik.