Page 1 of 1

generalized failover/loadbalancer update

Posted: Tue Nov 09, 2010 5:01 am
by juaco
Hey this is starting to work fine :D If you can, please test it!

Later i'll detail logics and list stuff i don't understand and need help in solving.

Features
  • address-list based, one list per config entry
  • N incoming interfaces, N primary/backup outgoing routes per config entry
  • auto reconfig on up/down route events
  • adaptive scheduling for the monitor
  • nth or random based loadbalancing (still no PCC/ECMP, sorry!)
  • L7 persistence when loadbalancing: dst/src based, configurable duration, per entry or global
  • centralized config of all scripts
Prereqs & limitations
  • all manipulations are done in the mangle table. i haven't tested if it works with rules external to the scripts, (it should cause few problems though)
  • requires manual setup of routing, nat and filter
  • doesn't have PCC for now
  • made and tested in ROS 3.20, in these days i'll adapt it to v4
Preconfig
To keep the example simple i'll have two WANs and one LAN:

[WAN1]----\
-------[ROUTERBOARD]-------[LAN]
[WAN2]----/


I'll have three lists with lan addresses: one for people going exclusively over WAN1, other for WAN2, one that loadbalances randomly over WAN1 and WAN2 and for every connection to a site it remembers the route used and keeps connecting from the same route.

1) Routing
/ip route
add comment="WAN 1 Internet" disabled=no distance=1 dst-address=0.0.0.0/0 gateway=[WAN1 gateway address] pref-src=[local WAN1 interface address] routing-mark=\
    WAN1 scope=30 target-scope=10
add comment="WAN 2 Internet" disabled=no distance=1 dst-address=0.0.0.0/0 gateway=[WAN2 gateway address] pref-src=[local WAN2 interface address] routing-mark=\
    WAN1 scope=30 target-scope=10

/ip route rule
add action=lookup comment="" disabled=no routing-mark=WAN0 table=WAN0
add action=lookup comment="" disabled=no routing-mark=WAN1 table=WAN1
The rules are not exactly needed, but they don't hurt and keep the references to tables. You can use gateway-interface, or other options, just gave them a route mark.

2) NAT
/ip firewall nat
add action=src-nat chain=srcnat comment="SNAT WAN1" disabled=no out-interface=WAN1 to-addresses=[local WAN1 interface address]
add action=src-nat chain=srcnat comment="SNAT WAN2" disabled=no out-interface=WAN2 to-addresses=[local WAN2 interface address]
maybe add some DNATs too..

3) Filter
To filter Internet from LAN use these two rules:
/ip firewall filter
add action=accept chain=forward comment="Internet access allowed" disabled=no in-interface=internalnet packet-mark=!noroute
add action=reject chain=forward comment="Internet access rejected" disabled=no in-interface=internalnet packet-mark=noroute reject-with=\
    icmp-network-unreachable
4) Address lists
/ip firewall address-list
add address=[some address] comment="Internet through WAN1" disabled=no list=WAN1-list
add address=[some other address] comment="Internet through WAN2" disabled=no list=WAN2-list
add address=[some other address] comment="Internet through WAN1 and WAN2" disabled=no list=WAN-lb-list
And so on. That's all the config.

Scripts
Copy/paste the scripts below, the first 7 are required and the last 2 optional. Name them as noted so they invoke each other.

script name: config
policy: none
# shared config and global table initialization

# USER CONFIGURATION ===========================================================
#
# Main -------------------------------------------------------------------------
#
# internet access list config array, entries delimited with semicolon. each entry
# is a string of the following form:
#
#	"list-name,interfaces,pri-routes,[sec-routes,][,selector][,persistent][,ptime]"
#
# no spaces are allowed in the entries in any place
#
# obligatory params:
#
#	list-name=list
#		name of the list to configure
#	interfaces=interface1[;...;interfaceN]
#		inbound interfaces delimited with semicolon
#	pri-routes=route1[;...;routeN]
#		outbound primary routes delimited with semicolon
#		if more than 2 primary routes connections are loadbalanced
#
# optional params:
#
#	sec-routes=route1[;...;routeN]
#		backup routes to use when a primary is not available
#
#		when no primary or secondary routes are available
#		connections are rejected with icmp net-unreachable
#
#	selector
#		seleccion algorithm for loadbalancing
#		options
#			nth: round-robin selection
#			random: random selection
#			default: user-provided
#
#	persistent
#		L7 persistence algorithm for loadbalancing
#		options
#			no: no persistence
#			src: source address based
#			dst: destination address bases
#			default: user-provided
#
#	ptime
#		for "src" o "dst" persistence, duration of connection persistence
#		format "--h--m--s"
#		default: user-provided
#
#		for each ptime different to the default an extra chain is generated

# Config as for the example
:local MainConfig {\
"WAN0-list, LAN, WAN0, WAN1";\
"WAN1-list, LAN, WAN1, WAN0";\
"WAN-lb-list, LAN, WAN0;WAN1";\
};

# defaults mainconfig
:global DefSelector "nth";
:global DefPersistent "no";
:global DefPtime "30m";


# Initializer ------------------------------------------------------------------

# if we are adding scripts to the scheduler for automatic run at boot
:global InitBootUp true;

# if we are adding the mangles for our config
:global InitRulegen true;


# Route monitor ----------------------------------------------------------------

# how many hosts to check when testing a route
# BEWARE setting it too low can give false results
:global RtmonNumTests 7;

# adaptive execution based on lock detection
# frees cpu and optimizes response time
:global RtmonDynSched true;


# Debugging --------------------------------------------------------------------

# start/stop script messages
:global DbgEntryExit true;

# execution locks state info
:global DbgLocking true;

# route state and other info
:global DbgRouteInfo true;

# miscelaneous messages
:global DbgMisc false;

# monitor: dynamic scheduling info
:global DbgRtmonDynSched true;

# monitor: extra route testing info
:global DbgRtmonTests false;

# END USER CONFIGURATION =======================================================

# GLOBALS ----------------------------------------------------------------------

# flag to prevent regenerating the config
:global ConfReset;

# per config entry tables
:global ConfLists;
:global ConfIfaces;
:global ConfPriRoutes;
:global ConfSecRoutes;
:global ConfOptions;
:global ActiveRoutes;

# per route tables
:global RouteState;
:global RouteNames;

# SCRIPT START  ----------------------------------------------------------------
# inicialize global tables
:if ([:typeof $RouteState]="nothing" || $ConfReset=true) do={
	:set ConfLists {};
	:set ConfIfaces {};
	:set ConfPriRoutes {};
	:set ConfSecRoutes {};
	:set ConfOptions {};
	:set ActiveRoutes {};
	:set RouteState {};
	:set RouteNames {};

	:for EntryId from=0 to=([:len $MainConfig]-1) do={

		:local ConfEntry [:toarray [:tostr [:pick $MainConfig $EntryId]]];

# entry address lists
		{
			:local EntryList [:pick $ConfEntry 0];
			:set ConfLists ($ConfLists, $EntryList);
		}

# interfaces
# get 3rd level array and add as 2nd level in ConfIfaces
		{
			:local EntryIfaces [:tostr [:pick $ConfEntry 1]];
			:local temp "";
			:local char;
			:for i from=0 to=[:len $EntryIfaces] do={
				:set char [:pick $EntryIfaces $i ($i+1)];
				:if ($char=";") do={ :set char "," };
				:set temp "$temp$char";
			}
			:set ConfIfaces ($ConfIfaces, "$temp");
		};

# primary routes
# get 3rd level array and add as 2nd level in ConfPriRoutes
# add to RouteNames and update RouteState/ActiveRoutes if necessary
		:local EntryPriRoutes;
		{
			:set EntryPriRoutes [:tostr [:pick $ConfEntry 2]];
			:local temp "";
			:local char;
			:for i from=0 to=[:len $EntryPriRoutes] do={
				:set char [:pick $EntryPriRoutes $i ($i+1)];
				:if ($char=";") do={ :set char "," };
				:set temp "$temp$char";
			}
			:set temp [:toarray $temp];
			:set EntryPriRoutes {};

			:foreach RouteName in=$temp do={
				:local RouteId [:find $RouteNames $RouteName];
				:if ([:typeof $RouteId]!="num") do={
					:set RouteNames ($RouteNames, $RouteName);
					:set RouteId ([:len $RouteNames]-1);
					:set RouteState ($RouteState, true);
				};
				:set EntryPriRoutes ($EntryPriRoutes, $RouteId);
			};
			:set ConfPriRoutes ($ConfPriRoutes, [:tostr $EntryPriRoutes]);
			:set ActiveRoutes ($ActiveRoutes, [:tostr $EntryPriRoutes]);
		};

# get optional parameters
		{
			:local EntrySelector "";
			:local EntryPersistent "";
			:local EntryPtime "";
			:local EntrySecRoutes {};

# si multiroute set multiroute defaults
			:if ([:len [:toarray $EntryPriRoutes]]>1) do={
				:set EntrySelector $DefSelector;
				:set EntryPersistent $DefPersistent;
				:set EntryPtime $DefPtime;
			};

			:if ([:len $ConfEntry]>3) do={
				:for i from=3 to ([:len $ConfEntry]-1) do={
					:local EntryOption [:tostr [:pick $ConfEntry $i]];
					:if ([:typeof [:find "nth,random" $EntryOption]]!="nil") do={
						:set EntrySelector $EntryOption;
					} else={
						:if ([:typeof [:find "no,src,dst" $EntryOption]]!="nil") do={
							:set EntryPersistent $EntryOption;
						} else={
							:if ([:typeof [:totime $EntryOption]]!="nil") do={
								:set EntryPtime $EntryOption;
							} else={
# secondary routes
# get 3rd level array and add as 2nd level in ConfSecRoutes
# add to RouteNames and update RouteState if necessary
								:local temp "";
								:local char;
								:for i from=0 to=[:len $EntryOption] do={
									:set char [:pick $EntryOption $i ($i+1)];
									:if ($char=";") do={ :set char "," };
									:set temp "$temp$char";
								}
								:set temp [:toarray $temp];

								:foreach SecRoute in=$temp do={
									:local RouteId [:find $RouteNames $SecRoute];
									:if ([:typeof $RouteId]!="num") do={
										:set RouteNames ($RouteNames, $SecRoute);
										:set RouteState ($RouteState, true);
										:set RouteId ([:len $RouteNames]-1);
									};
									:set EntrySecRoutes ($EntrySecRoutes, $RouteId);
								};
							};
						};
					};
				};
			};
			:set ConfSecRoutes ($ConfSecRoutes, "$[:tostr $EntrySecRoutes]");
			:set ConfOptions ($ConfOptions, "$EntrySelector,$EntryPersistent,$EntryPtime");
		};
	};
} else={
	:if ($ConfReset != true) do={
		:put "INFO: config: no se reinician tablas globales. ConfReset=\"$ConfReset\"";
		:log info "config: no se reinician tablas globales. ConfReset=\"$ConfReset\"";
	};
};
script name: init
policy: read,write
# initializer

# GLOBALS ----------------------------------------------------------------------
# global tables
:global ActiveRoutes;

# init config
:global InitBootUp;
:global InitRulegen;

# debug config
:global DbgEntryExit;
:global DbgMisc;

# rulegen param
:global RuleGenParams;

# global table initialization flag
:global ConfReset;

# SCRIPT START -----------------------------------------------------------------
# get basic config
/system script run config

:if ($InitRulegen) do={
# generate mangles
	:local AllEntries "";
	:for EntryId from=0 to=([:len $ActiveRoutes]-1) do={
		:set AllEntries "$AllEntries,$EntryId";
	};
	:set RuleGenParams {"$AllEntries"; "all"; "print,build"}
	/system script run rulegen;
};

:if ($InitBootUp) do={
# add init to scheduler
	/system scheduler remove [find name=init];
	/system scheduler add \
	name=init \
	on-event="/system script run init;" \
	interval=00:00:00 \
	start-date=[/system clock get date] \
	start-time=[/system clock get time] \
	comment="automaticallly added by init script"

# add route monitor to scheduler
	/system scheduler remove [find name="route-monitor"];
	/system scheduler add \
	name=route-monitor \
	on-event="/system script run route-monitor;" \
	interval=00:00:10 \
	start-date=[/system clock get date] \
	start-time=[/system clock get time] \
	comment="automaticallly added by init script"
};
script name: route-monitor
policy: read,test
# route state monitor

# GLOBALS ----------------------------------------------------------------------

# global tables
:global RouteState;
:global RouteNames;

# monitor config
:global RtmonNumTests;
:global RtmonDynSched;

# debug config
:global DbgEntryExit;
:global DbgLocking;
:global DbgRouteInfo;
:global DbgMisc;
:global DbgRtmonDynSched;
:global DbgRtmonTests;

# pseudorandom number
:global RANDOM;

# execution lock
:global RtmonLock;

# route-event params
:global RtEventParams;

# SCRIPT START  ----------------------------------------------------------------
# get basic config
/system script run config

:if ($DbgEntryExit) do={
	:put "DEBUG: route-monitor: starting...";
	:log debug "route-monitor: starting...";
};

# undefined or disabled lock
:if ([:typeof $RtmonLock]!="bool" || !$RtmonLock) do={

# turn on lock
	:set RtmonLock true;
	:if ($DbgLocking) do={
		:put "route-monitor: the monitor lock state is now $RtmonLock";
		:log debug "route-monitor: the monitor lock state is now $RtmonLock";
	};

# if dynsched is on decrement the scheduling one second. next runs will be done
# more frequently
	:if ($RtmonDynSched && ([:len [/system scheduler find name="route-monitor"]]>0)) do={
		:local Interval ([/system scheduler get route-monitor interval]-[:totime 1]);
		:if ($Interval > 00:00:00) do={
			:if ($DbgRtmonDynSched) do={
				:put "DEBUG: route-monitor: setting execution interval to $Interval";
				:log debug "DEBUG: route-monitor: setting execution interval to $Interval";
			};
			/system scheduler set route-monitor interval=$Interval;
		};
	};

# main loop
	:local RouteName;
	:local RouteCurState;
	:local RemainingTests;
	:local TestResult;

	:for RouteId from=0 to ([:len $RouteState]-1) do={

		:set TestResult false;
		:set RouteCurState (!![:pick $RouteState $RouteId]);
		:set RouteName [:pick $RouteNames $RouteId];

		:if ($DbgRouteInfo) do={
			:put "DEBUG: route-monitor: checking route: $RouteName...";
			:log debug "route-monitor: checking route: $RouteName...";
		};

# hosts for pinging
		:local PublicHosts {};
		:foreach line in=[/ip firewall address-list find list=public-hosts && disabled=no] do={
			:set PublicHosts ($PublicHosts, [/ip firewall address-list get $line address]);
		};
		:local PHLStartLen [:len $PublicHosts];

# route test loop
		:do {

# grab a random host from PublicHosts
# this will get simpler when prng is parameterized
			:local TestTarget 0;
			:local PHLId 0;
			:do {
				/system script run prng;
				:local PHLId [:tonum [:pick [:tostr $RANDOM] 1 2]];
				:if ($PHLId < [:len $PublicHosts]) do={
					:set TestTarget [:pick $PublicHosts $PHLId ($PHLId+1)];
				} else={
					:if ([:tonum [:pick [:tostr $PHLId] 1 1]] < [:len $PublicHosts]) do={
						:set PHLId [:tonum [:pick [:tostr $PHLId] 1 1]];
						:set TestTarget [:pick $PublicHosts $PHLId ($PHLId+1)];
					};
				};
			} while ($TestTarget=0);

			:if ($DbgRtmonTests) do={
				:put "DEBUG: route-monitor: echo request to $TestTarget...";
				:log debug "route-monitor: echo request to $TestTarget...";
			};

# clean icmp traffic to the host that might still be in conntrack
# and ping the host through the tested route
			/ip firewall connection remove [find protocol=icmp && dst-address=$TestTarget];
			:set TestResult ([/ping $TestTarget count=1 interval=1 routing-table=$RouteName]=1);

			:if ($TestResult) do={
				:if ($DbgRtmonTests) do={
					:put "DEBUG: route-monitor: got answer from $TestTarget";
					:log debug "route-monitor: got answer from $TestTarget";
				};
			} else={
				:if ($DbgRtmonTests) do={
					:put "DEBUG: route-monitor: couldn't get answer from $TestTarget";
					:log debug "route-monitor: couldn't get answer from $TestTarget";
				};
				:set PublicHosts ([:pick $PublicHosts 0 $PHLId],[:pick $PublicHosts ($PHLId+1) [:len $PublicHosts]]);

				:set RemainingTests ($RtmonNumTests - ($PHLStartLen-[:len $PublicHosts]));
				:if ($DbgRtmonTests && ($RemainingTests > 0)) do={
					:put "DEBUG: route-monitor: $RemainingTests tests to determine $RouteName availability";
					:log debug "route-monitor: $RemainingTests tests to determine $RouteName availability";
				};
			}
		} while (!$TestResult && ($RemainingTests > 0) && ([:len $PublicHosts] > 0));

# if state change is detected dispatch an event
		:if ($TestResult != $RouteCurState) do={
			:if ($DbgRouteInfo) do={
				:put "DEBUG: route-monitor: state change in $RouteName - dispatching event...";
				:log debug "route-monitor: state change in $RouteName - dispatching event...";
			};
			:set RtEventParams {"$RouteId"; "$TestResult"};
			/system script run route-event;
		} else={
			:if ($DbgRouteInfo) do={
				:local TestResultMessage "works normally"
				:if (!$TestResult) do={
					:set TestResultMessage "doesn't seem to work"
				};
				:put "DEBUG: route-monitor: la ruta $RouteName $TestResultMessage";
				:log debug "route-monitor: la ruta $RouteName $TestResultMessage";
			};
		};
	};

# turn off the lock
	:set RtmonLock false;
	:if ($DbgLocking) do={
		:put "route-monitor: the monitor lock state is now $RtmonLock";
		:log debug "route-monitor: the monitor lock state is now $RtmonLock";
	};
} else={

# found active lock
	:if ($DbgLocking) do={
		:put "DEBUG: route-monitor: found active execution lock, aborting...";
		:log debug "route-monitor: found active execution lock, aborting...";
	};

# if dynsched is on increment the scheduling one second, next runs will be done
# less frequently
	:if ($RtmonDynSched && ([:len [/system scheduler find name="route-monitor"]]>0)) do={
		:local Interval ([/system scheduler get route-monitor interval]+[:totime 1]);

		:if ($DbgRtmonDynSched) do={
			:put "DEBUG: route-monitor: setting execution interval to $Interval";
			:log debug "DEBUG: route-monitor: setting execution interval to $Interval";
		};

		/system scheduler set route-monitor interval=$Interval;
	};
};

:if ($DbgEntryExit) do={
	:put "route-monitor: exiting";
	:log debug "route-monitor: exiting";
};
script name: route-event
policy: write
# route event handler

# GLOBALS ----------------------------------------------------------------------

# global tables
:global ConfPriRoutes;
:global ConfSecRoutes;
:global ActiveRoutes;
:global RouteState;
:global RouteNames;

# debug config
:global DbgEntryExit;
:global DbgRouteInfo;
:global DbgMisc;

# params
:global RtEventParams;

# state-update params
:global StUpdateParams;

# SCRIPT START  ----------------------------------------------------------------
# get basic config
/system script run config

:if ($DbgEntryExit) do={
	:put "DEBUG: route-event: starting...";
	:log debug "route-event: starting...";
};

# copy params to local scope
:local RouteId [:tonum [:pick $RtEventParams 0]];
:local RouteNewState (!![:pick $RtEventParams 1]);

# validate params...

# update global route state table
:set RouteState ([:pick $RouteState 0 $RouteId],$RouteNewState,[:pick $RouteState ($RouteId+1) [:len $RouteState]])

# get route name
:local RouteName [:pick $RouteNames $RouteId];

# initialize state-update param array
:set StUpdateParams {};

# propagate changes in config entries
:if (!$RouteNewState) do={
# A) route down:

	:put "*** WARNING: route-event: route $RouteName IS NOT RESPONDING *** propagating failover...";
	:log warning "*** route-event: route $RouteName IS NOT RESPONDING *** propagating failover...";

# kill connections over this route so apps timeout early
	/ip firewall connection remove [find connection-mark=$RouteName];

# delete route persistence lists
	/ip firewall address-list remove [find list=persistent-$RouteName];

# disable mangle references to the routes
	/ip firewall mangle set [find new-packet-mark="$RouteName"] disabled=yes;
	/ip firewall mangle set [find new-connection-mark="$RouteName"] disabled=yes;
	/ip firewall mangle set [find new-routing-mark="$RouteName"] disabled=yes;
	/ip firewall mangle set [find connection-mark="$RouteName"] disabled=yes;

# 1. propagate failover to all entries using the route now
	:for EntryId from=0 to=([:len $ActiveRoutes]-1) do={
		:local EntryActiveRoutes [:toarray [:tostr [:pick $ActiveRoutes $EntryId]]];
		:if ([:typeof [:find $EntryActiveRoutes $RouteId]]!="nil") do={
			:set StUpdateParams ($StUpdateParams, "$EntryId,$RouteId,failover");
		};
	};
	/system script run state-update;
	
} else={
# B) route up:

	:put "*** WARNING: route-event: route $RouteName is up again *** propagating fallback...";
	:log warning "*** route-event: route $RouteName is up again *** propagating fallback...";

# enable mangle references to the route
	/ip firewall mangle set [find new-packet-mark="$RouteName"] disabled=no;
	/ip firewall mangle set [find new-connection-mark="$RouteName"] disabled=no;
	/ip firewall mangle set [find new-routing-mark="$RouteName"] disabled=no;
	/ip firewall mangle set [find connection-mark="$RouteName"] disabled=no;

# 1. propagate fallback to all entries that use this as a primary route
	:for EntryId from=0 to=([:len $ConfPriRoutes]-1) do={
		:local EntryPriRoutes [:toarray [:tostr [:pick $ConfPriRoutes $EntryId]]];
		:if ([:typeof [:find $EntryPriRoutes $RouteId]]!="nil") do={
			:set StUpdateParams ($StUpdateParams, "$EntryId,$RouteId,primary-fallback");
		};
	};
# 2. propagate fallback to all entries that use this as a secondary route and
# have route deficit
	:for EntryId from=0 to=([:len $ConfSecRoutes]-1) do={
		:local EntryPriRoutes [:toarray [:tostr [:pick $ConfPriRoutes $EntryId]]];
		:local EntrySecRoutes [:toarray [:tostr [:pick $ConfSecRoutes $EntryId]]];
		:local EntryActiveRoutes [:toarray [:tostr [:pick $ActiveRoutes $EntryId]]];
		:if ([:typeof [:find $EntrySecRoutes $RouteId]]!="nil" && ([:len $EntryActiveRoutes] < [:len $EntryPriRoutes])) do={
			:set StUpdateParams ($StUpdateParams, "$EntryId,$RouteId,secondary-fallback");
		};
	};
	/system script run state-update;
};

# call extra actions defined by the user
# for later: provide better info about what happened in route-event/state-update
/system script run route-uevent

:if ($DbgEntryExit) do={
	:put "DEBUG: route-event: exiting";
	:log debug "route-event: exiting";
};
script name: state-update
policy: none
# entries state update on route events

# GLOBALS ----------------------------------------------------------------------

# global tables
:global ConfLists;
:global ConfPriRoutes;
:global ConfSecRoutes;
:global ActiveRoutes;
:global RouteState;
:global RouteNames;

# debug config
:global DbgEntryExit;
:global DbgLocking;
:global DbgRouteInfo;
:global DbgMisc;

# update lock
:global StUpdateLock;

# params
:global StUpdateParams;

# rulegen params
:global RuleGenParams;

# SCRIPT START -----------------------------------------------------------------
# get basic config
/system script run config

:if ($DbgEntryExit) do={
	:put "DEBUG: state-update: starting...";
	:log debug "state-update: starting...";
};

# copy params to local scope
:local StUpdateLParams $StUpdateParams;

# validate params...

# initialize lock at first run
:if ([:typeof $StUpdateLock]!="bool") do={
	:set StUpdateLock false;
};

# wait if any updates are in progress
:while ($StUpdateLock) do={
	:if ($DbgLocking) do={
		:put "state-update: other updates are in progress, waiting one second before continuing...";
		:log debug "state-update: other updates are in progress, waiting one second before continuing...";
	};
	:delay 1;
};

# turn on lock
:set StUpdateLock true;
:if ($DbgLocking) do={
	:put "state-update: the update lock state is now $StUpdateLock";
	:log debug "state-update: the update lock state is now $StUpdateLock";
};

:set RuleGenParams {};

# update route use state per config entry
:for StUpdateParamId from=0 to ([:len $StUpdateLParams]-1) do={

	:local StUpdateParam [:toarray [:tostr [:pick $StUpdateLParams $StUpdateParamId]]];
	:local EntryId [:tonum [:pick $StUpdateParam 0]];
	:local RouteId [:tonum [:pick $StUpdateParam 1]];
	:local UpdateType [:tostr [:pick $StUpdateParam 2]];

	:set RuleGenParams ($RuleGenParams,$EntryId);
	
	:if ($DbgMisc) do={
		:put "state-update: entry=$EntryId, route=$[:pick $RouteNames $RouteId], type=$UpdateType";
		:log debug "state-update: entry=$EntryId, route=$[:pick $RouteNames $RouteId], type=$UpdateType";
	};

# get primary, secondary and active routes of the entry
	:local EntryPriRoutes [:toarray [:tostr [:pick $ConfPriRoutes $EntryId]]];
	:local EntrySecRoutes [:toarray [:tostr [:pick $ConfSecRoutes $EntryId]]];
	:local EntryActiveRoutes [:toarray [:tostr [:pick $ActiveRoutes $EntryId]]];

# FAILOVER: disable route
	:if ($UpdateType="failover") do={

		:local EntryActiveRoute [:find $EntryActiveRoutes $RouteId];
		:set EntryActiveRoutes ([:pick $EntryActiveRoutes 0 $EntryActiveRoute],[:pick $EntryActiveRoutes ($EntryActiveRoute+1) [:len $EntryActiveRoutes]]);

# look for a secondary and if there is one, enable it
		:local FoundSecondary false;
		:local EntrySecRouteId 0;
		:while (!$FoundSecondary && ($EntrySecRouteId<[:len $EntrySecRoutes])) do={
			:local EntrySecRoute [:pick $EntrySecRoutes $EntrySecRouteId];
			:if ([:typeof [:find $EntryActiveRoutes $EntrySecRoute]]!="num" && (!![:pick $RouteState $EntrySecRoute])) do={
				:set EntryActiveRoutes ($EntryActiveRoutes, $EntrySecRoute);
				:set FoundSecondary true;
			};
			:set EntrySecRouteId ($EntrySecRouteId+1);
		};
	};

# PRIMARY-FALLBACK: enable primary, disable extra secondaries
	:if ($UpdateType="primary-fallback") do={
		:set EntryActiveRoutes ($EntryActiveRoutes, $RouteId);

		:local EntryActiveRouteId 0;
		:while ([:len $EntryActiveRoutes] > [:len $EntryPriRoutes]) do={
			:local EntryActiveRoute [:pick $EntryActiveRoutes $EntryActiveRouteId];
			:if ([:typeof [:find $EntrySecRoutes $EntryActiveRoute]]!="nil") do={
				:set EntryActiveRoutes ([:pick $EntryActiveRoutes 0 $EntryActiveRouteId],[:pick $EntryActiveRoutes ($EntryActiveRouteId+1) [:len $EntryActiveRoutes]]);
			};
			:set EntryActiveRouteId ($EntryActiveRouteId+1);
		};
	};

# SECONDARY-FALLBACK: enable secondary
	:if ($UpdateType="secondary-fallback") do={
		:set EntryActiveRoutes ($EntryActiveRoutes, $RouteId);
	};

# update global route usage table
	:set ActiveRoutes ([:pick $ActiveRoutes 0 $EntryId],[:tostr $EntryActiveRoutes],[:pick $ActiveRoutes ($EntryId+1) [:len $ActiveRoutes]]);

# delete mark-packet mangles for the entry
    :local EntryLists [:toarray [:tostr [:pick $ConfLists $EntryId]]];
    :local EntryChain [:tostr [:pick $EntryLists 0]];
	/ip firewall mangle remove [find chain=$EntryChain];
};

# regenerate mangles for processed entries
:set RuleGenParams {[:tostr $RuleGenParams]; "packetmarks"; "build,print"};
/system script run rulegen;

# turn off lock
:set StUpdateLock false;
:if ($DbgLocking) do={
	:put "state-update: the update lock state is now $StUpdateLock";
	:log debug "state-update: the update lock state is now $StUpdateLock";
};

:if ($DbgEntryExit) do={
	:put "DEBUG: state-update: exiting";
	:log debug "state-update: exiting";
};
script name: rulegen
policy: read,write
# rule generator

# GLOBALS ----------------------------------------------------------------------

# global tables
:global ConfLists;
:global ConfIfaces;
:global ConfOptions;
:global ActiveRoutes;
:global RouteNames;

# defaults
:global DefSelector;
:global DefPersistent;
:global DefPtime;

# debug config
:global DbgEntryExit;
:global DbgMisc;

# params
:global RuleGenParams;

# SCRIPT START -----------------------------------------------------------------
# get basic config
/system script run config

:if ($DbgEntryExit) do={
    :put "DEBUG: rulegen: starting...";
    :log debug "rulegen: starting...";
};

# copy params to local scope
:local GenEntries [:toarray [:tostr [:pick $RuleGenParams 0]]];
:local GenChains [:tostr [:pick $RuleGenParams 1]];
:local GenActions [:toarray [:tostr [:pick $RuleGenParams 2]]];

# validate params...

# rulesets
:local Ruleset0 {};
:local Ruleset1 {};
:local Ruleset2 {};
:local Ruleset3 {};
:local Ruleset4 {};
:local Ruleset5 {};
:local Output {};

:local Rule "";
:local RuleComment "";

:foreach EntryId in=$GenEntries do={

# get entry info
    :local EntryActiveRoutes [:toarray [:tostr [:pick $ActiveRoutes $EntryId]]];
    :local EntryLists [:toarray [:tostr [:pick $ConfLists $EntryId]]];
    :local EntryChain [:tostr [:pick $EntryLists 0]];
    :local EntryIfaces [:toarray [:tostr [:pick $ConfIfaces $EntryId]]];
    :local EntryOptions [:toarray [:tostr [:pick $ConfOptions $EntryId]]];
    :local EntrySelector [:tostr [:pick $EntryOptions 0]];
    :local EntryPersistent [:tostr [:pick $EntryOptions 1]];
    :local EntryPtime [:tostr [:pick $EntryOptions 2]];

# create array with all active route names
    :local EntryRouteNames "";
    :foreach RouteId in=$EntryActiveRoutes do={
        :local temp [:pick $RouteNames $RouteId];
        :set EntryRouteNames "$EntryRouteNames,$temp";
    };
    :set EntryRouteNames [:toarray $EntryRouteNames];

    :if ($DbgMisc) do={
        :put "DEBUG: rulegen: Entry: $EntryId, lists: $EntryChain, active routes: $[:tostr $EntryRouteNames]";
        :log debug "rulegen: Entry: $EntryId, lists: $EntryChain, active routes: $[:tostr $EntryRouteNames]";
    };

# prerouting section 1: route marks for already marked connections
# prerouting section 2: chain jumps for new connections
    :if ($GenChains="all") do={
        :foreach EntryIface in=$EntryIfaces do={
            :foreach EntryRouteName in=$EntryRouteNames do={
                :set Rule "chain=prerouting in-interface=\"$EntryIface\" connection-mark=\"$EntryRouteName\" action=\"mark-routing\" new-routing-mark=\"$EntryRouteName\" passthrough=no disabled=no";
                :if ([:typeof [:find $Ruleset0 $Rule]]="nil") do={
                    :set Ruleset0 ($Ruleset0, "$Rule");
                };
            };

            :foreach EntryList in=$EntryLists do={
                :set Rule "chain=prerouting connection-state=\"new\" in-interface=\"$EntryIface\" src-address-list=\"$EntryList\" action=\"jump\" jump-target=\"$EntryChain\" disabled=no";
                :if ([:typeof [:find $Ruleset1 $Rule]]="nil") do={
                    :set Ruleset1 ($Ruleset1, "$Rule");
                };
            };
        };

# mark-new seccion 1: connection mark for new marked packets
# mark-new seccion 2: route mark for new marked packets
        :foreach EntryRouteName in=$EntryRouteNames do={
            :set Rule "chain=\"mark-new\" packet-mark=\"$EntryRouteName\" action=\"mark-connection\" new-connection-mark=\"$EntryRouteName\" passthrough=yes disabled=no";
            :if ([:typeof [:find $Ruleset2 $Rule]]="nil") do={
                :set Ruleset2 ($Ruleset2, "$Rule");
            };

            :set Rule "chain=\"mark-new\" packet-mark=\"$EntryRouteName\" action=\"mark-routing\" new-routing-mark=\"$EntryRouteName\" passthrough=no disabled=no";
            :if ([:typeof [:find $Ruleset3 $Rule]]="nil") do={
                :set Ruleset3 ($Ruleset3, "$Rule");
            };
        };
    };

# packet marks chains
    :if ([:typeof [:find {"all"; "packetmarks"} $GenChains]]!="nil") do={

# opening
        :set RuleComment "[$EntryChain]";
        :set Rule "chain=\"$EntryChain\" action=\"mark-packet\" new-packet-mark=\"noroute\" passthrough=yes disabled=no comment=\"$RuleComment\"";
        :set RuleComment "";
        :set Ruleset4 ($Ruleset4, $Rule);

# persistent: opening continuation + add/find rules
# if Ptime differs from default generate an extra add chain to save with that Ptime
        :local PACSuffix "";
        :if ([:typeof [:find {"src"; "dst"} $EntryPersistent]]!="nil") do={
        
            :if ($EntryPtime!=$DefPtime) do={
                :set PACSuffix $EntryPtime;
            };

			:if ([:len $EntryRouteNames]>0) do={
				:for EntryRouteNameId from=0 to=([:len $EntryRouteNames]-1) do={
					:local EntryRouteName [:pick $EntryRouteNames $EntryRouteNameId];
					:set Rule "chain=\"$EntryChain\" $EntryPersistent-address-list=\"persistent-$EntryRouteName\" action=\"mark-packet\" new-packet-mark=\"$EntryRouteName\" passthrough=yes disabled=no comment=\"$RuleComment\"";
					:set Ruleset4 ($Ruleset4, $Rule);
					:if ($EntryRouteNameId=0) do={
						:set RuleComment "[Add $EntryPersistent to list]";                
					};
					:set Rule "chain=\"$EntryPersistent-add$PACSuffix\" packet-mark=\"$EntryRouteName\" action=\"add-$EntryPersistent-to-address-list\" address-list=\"persistent-$EntryRouteName\" address-list-timeout=\"$EntryPtime\" disabled=no comment=\"$RuleComment\"";
					:if ([:typeof [:find $Ruleset5 $Rule]]="nil") do={
						:set Ruleset5 ($Ruleset5, "$Rule");
					};
					:set RuleComment "";
				};
			};
			
            :set Rule "chain=\"$EntryChain\" packet-mark=\"!noroute\" action=\"jump\" jump-target=\"$EntryPersistent-add$PACSuffix\" disabled=no comment=\"$RuleComment\"";
            :set Ruleset4 ($Ruleset4, $Rule);
            :set Rule "chain=\"$EntryPersistent-add$PACSuffix\" action=\"jump\" jump-target=\"mark-new\" disabled=no comment=\"$RuleComment\";"
			:if ([:typeof [:find $Ruleset5 $Rule]]="nil") do={
				:set Ruleset5 ($Ruleset5, "$Rule");
			};
        };

# selection for multiroute chains
		:if ([:len $EntryRouteNames]>0) do={
			:for EntryRouteNameId from=0 to=([:len $EntryRouteNames]-1) do={
				:local EntryRouteName [:pick $EntryRouteNames $EntryRouteNameId];
				:local Selector "";
				:if ($EntrySelector="nth") do={
					:set Selector "$EntrySelector=$[:len $EntryRouteNames],$($EntryRouteNameId+1)";
				};
				:if ($EntrySelector="random" && $EntryRouteNameId<([:len $EntryRouteNames]-1)) do={
					:set Selector "$EntrySelector=$(100 / [:len $EntryRouteNames]) packet-mark=\"noroute\"";
				};
				:set Rule "chain=\"$EntryChain\" $Selector action=\"mark-packet\" new-packet-mark=\"$EntryRouteName\" passthrough=yes disabled=no comment=\"$RuleComment\"";
				:set Ruleset4 ($Ruleset4, $Rule);
			};
		};
		
# closure
        :if ([:typeof [:find {"src"; "dst"} $EntryPersistent]]!="nil") do={
            :set Rule "chain=\"$EntryChain\" action=\"jump\" jump-target=\"$EntryPersistent-add$PACSuffix\" disabled=no comment=\"$RuleComment\"";
			:if ([:typeof [:find $Ruleset4 $Rule]]="nil") do={
				:set Ruleset4 ($Ruleset4, "$Rule");
			};
        } else={
            :set Rule "chain=\"$EntryChain\" action=\"jump\" jump-target=\"mark-new\" disabled=no comment=\"$RuleComment\"";
            :set Ruleset4 ($Ruleset4, $Rule);
        };
    };
};

:set RuleComment "[prerouting1] marked connections: mark route and exit";
:foreach Rule in=$Ruleset0 do={
    :set Output ($Output, "$Rule comment=\"$RuleComment\"");
	:if ($RuleComment!="") do={
		:set RuleComment "";
	};
}

:set RuleComment "[prerouting 2] new connections: go to packet mark chain based on interface+list";
:foreach Rule in=$Ruleset1 do={
    :set Output ($Output, "$Rule comment=\"$RuleComment\"");
	:if ($RuleComment!="") do={
		:set RuleComment "";
	};
}

:set RuleComment "[mark-new 1] mark new connection based on packet marks";
:foreach Rule in=$Ruleset2 do={
    :set Output ($Output, "$Rule comment=\"$RuleComment\"");
	:if ($RuleComment!="") do={
		:set RuleComment "";
	};
}

:set RuleComment "[mark-new 2] mark new route based on packet marks";
:foreach Rule in=$Ruleset3 do={
    :set Output ($Output, "$Rule comment=\"$RuleComment\"");
	:if ($RuleComment!="") do={
		:set RuleComment "";
	};
}

:if ($GenChains="all") do={
	:foreach Rule in=$Ruleset5 do={
		:set Output ($Output, "$Rule")
	}
};

:foreach Rule in=$Ruleset4 do={
    :set Output ($Output, "$Rule")
}

:foreach Rule in=$Output do={
    :if ([:typeof [:find $GenActions "print"]]!="nil") do={
        :put $Rule;
    };
    :if ([:typeof [:find $GenActions "build"]]!="nil") do={
        :local Action [:parse "/ip firewall mangle add $Rule;"];
        $Action;
    };
};

:if ($DbgEntryExit) do={
    :put "DEBUG: rulegen: exiting";
    :log debug "rulegen: exiting";
};
script name: prng
policy: read
# generador de numero pseudo aleatorio

# GLOBALS ----------------------------------------------------------------------

:global RANDOM;

# SCRIPT START -----------------------------------------------------------------

# time based seed
:local semilla1 "";
{
	:local time [:tostr [/system clock get time]];
	:for i from=0 to=([:len $time] -1) do {
	:local char [:pick $time $i ($i+1)];
	:if ($char!=":") do={:set semilla1 ($semilla1 . $char)};
	}
	:set semilla1 [:tonum $semilla1]
}

# firewall traffic counters based seed
:local semilla2 0
:foreach item in=[/ip firewall filter find bytes>0] do={
	:set semilla2 ($semilla2+[/ip firewall filter get $item bytes]);
}
:foreach item in=[/ip firewall nat find bytes>0] do={
	:set semilla2 ($semilla2+[/ip firewall nat get $item bytes]);
}
:foreach item in=[/ip firewall mangle find bytes>0] do={
	:set semilla2 ($semilla2+[/ip firewall mangle get $item bytes]);
}

# uptime based seed
:local semilla3 "";
{
	:local uptime [:tostr [/system resource get uptime]];
	:for i from=0 to=([:len $uptime] -1) do {
	:local char [:pick $uptime $i ($i+1)];
	:if ($char!=":") do={:set semilla3 ($semilla3 . $char)};
	}
	:set semilla3 [:tonum $semilla3]
}

# cpu load seed
:local semilla4 [/system resource get cpu-load];

# mwc implementation (http://en.wikipedia.org/wiki/Multiply-with-carry)
:local mz ($semilla1+$semilla2);
:local mw ($semilla3+$semilla4);

:set mz (36969 * ($mz & 65535) + ($mz >> 16));
:set mw (18000 * ($mw & 65535) + ($mw >> 16));

:set RANDOM (($mz << 16) + ($mw & 65535) + 1 )
script name: route-uevent
policy: read,write (and extra as needed)
# user-defined actions for route events

# GLOBALS ----------------------------------------------------------------------

# params
:global RtEventParams;

# SCRIPT START -----------------------------------------------------------------
# get basic config
/system script run config
script name: env
policy: read
:while (true) do={/system script environment print brief without-paging ; :delay 1 ;} ;
Debugging:
You can run the scripts from the terminal and tweak the debug messages in "config", they output to log and terminal. The "env" script is useful to run in an ssh session or terminal.

Installation
  1. Setup lists, routes, nat and filter as needed
  2. Copy/paste the scripts and adjust "config" to your need
  3. To make the initial mangle generation set InitRulegen=true in config, run "init", and set InitRulegen=false again (this is purposely made cumbersome so it's not easy to screw up working mangle configs)
  4. nothing more

To do
  • ROS v4 compatibility
  • add PCC and ECMP
  • make config entries to have a chain name or autogenerate one
  • multilist entries
  • L7 "homemade" persistence: try to further mimic PCC-without-PCC
  • more detailed information on down interface/link/upstream hops
  • response to input interface state changes
  • optional per-entry dedicated monitoring
  • decouple scheduler/test strategies aiming to have dynamic behaviours on response to events/requests
  • better/parameterized prng
  • make the rulegen smarter than now and able to handle nat/filters/etc
  • better config/init mech
  • state snapshots
  • on-the-fly actions/modification of stuff at user request
  • improve route-uevent to make available a lib of mail/sms/other actions
Hope it works for you! If not let me know so i can work it out.

Updates
Nov/26/09:
  • put all config in one place
  • added variable primary and backup routes per entry
  • added basic L7 persistence for loadbalanced entries
  • better data structs to keep track of state
  • mangle generator is now integrated with the monitor
  • decoupled response to route events and fine-grained state updates
  • the failover/fallback procedures now focus on updating route mark chains instead of routing marks, as entries are all variable multiroute now
  • code cleanup and varnames normalization
Nov/14/09:
  • corrected failover/fallback/loadbalancer-update to focus on the routing-mark rules, which turned out to be a simpler and more effective solution. For the sake of saving cpu cycles, i still will add turning off and on any redundant rules, later.
  • if all routes are down the action is changed to mark packets so filter rejects them with icmp-network-unreachable. Whenever a route is back to life everything will go trough that route, and back to normal as the rest of the routes are back (so yes, it is now that failover works as it should).
  • discarded loadbalancer-on and loadbalancer-off as they aren't of much use. There are 6 scripts now.
  • minor change to the startup script

description

Posted: Fri Nov 12, 2010 9:17 am
by juaco
here i'll describe the logics used in the mangle chains and the scripts

but not now, as i'm going to sleep :lol:

bye

Re: first attempt at a generalized loadbalancer with failove

Posted: Sat Nov 13, 2010 12:24 pm
by rmichael
I have a few hopefully constructive comments (I'm a beginner so I may not be 100% correct here)

I would drop connection-state=invalid packets at the top of your filter. Otherwise you might be accepting connections that are not in conntrack through routing-mark accept rule.

Since connection/routing marks are first applied to packets originating from LAN it seems to me there's 50% (for two WANs) chance that traffic initiated from internet (port forwarding?) will be dropped since reply might not come out of the same interface it came from.

I don't know if you noticed but nth counters get reset when some changes are done to the configuration (by script of manually). On top of that nth seems to repeat first count. Both flaws combined favor first few nth rules (it probably matters more if there are more than two nth rules) and cause uneven distribution (at least in ROS4.13).

Keep up the good work.

Re: first attempt at a generalized loadbalancer with failove

Posted: Sun Nov 14, 2010 4:42 am
by juaco
Hi

I think the issue with nth it's because all this is made and tested in 3.x where nth has two params: "every" (how much to count), and "packet" (on which count match). In 4.x nth has 3 params, iirc the second is a "counter number" of which you have 8 (or something) available.

I'll expand on multi version/algorithm semantics later, not at this point. However it's not difficult to change if you examine the preparsing, when it generates the multi-route entries, "body" section. There's the code to generate the "nth=....".

Also have in account that the chain logics described in the first post are not exactly the same as those produced by the config generator, so errors may be expected if route-check/failover/fallback are run against an autogenerated mangle. Right now everything is in the process of being integrated, and then i'll repost the new working set. Meanwhile i'll mention it in the op to avoid confusion.
I would drop connection-state=invalid packets at the top of your filter. Otherwise you might be accepting connections that are not in conntrack through routing-mark accept rule.

Since connection/routing marks are first applied to packets originating from LAN it seems to me there's 50% (for two WANs) chance that traffic initiated from internet (port forwarding?) will be dropped since reply might not come out of the same interface it came from.
The rules at the filter just let pass what comes from lan, and route marked, i can't visualize how invalid packets would get to have routing marks, can you please describe more or less how their path would be?

Traffic that initiates from Internet won't be matched by the first prerouting section, as it's new and can't have a connection mark. It won't get matched by the other rules in prerouting, as they match new connections from local interfaces (&& src-address in a list). So the connection won't get marked, and just (d)nated / routed as if the mangles didn't exist. When the response comes back, it is connection tracking what "automatically" routes it correctly, so unless i specifically touch the responses, they will get to the initiator. I've been testing with dnat constantly as i have first to ssh from the outside to an internal machine through dnat to test the balancing from the internal machine. For connections from the outside, there's no mangling at all.

It's true also when you don't loadbalance, but just have multiple WANs: you don't have to do anything special for dnat to work and responses not get lost except to have conntrack on.

Hope my aswers make some sense.

Thank you for the comments, testing and kind words.