Basically it all boils down to packet processing. Taking all the HTB and other router processes out of the picture for the moment and just focusing on the firewall processing section of the MT OS it comes down to how many packets can the MT process per second. The more the better. The part some people cant quite put there brain around is the fact that a MT OS actually processes the same packet several times. For example you will see a common connection state set of rules in most efficient firewall rule sets for this exact reason. For example you will see the established, related and invalid connection state rules first. Then the allowed rule (for example port 80 for HTTP). When a packet destined for port 80 passes through the router it hits all three rules, the first (established) is invalid since at this point the connection is new, then related passed again since it is not related, then invalid and on to the port 80 rule where the packet now fits the parameters of the rule and is passed. The first packet in that connection has then passed through four rules before it leaves the router. So in fact the packet was processed four times. Now since the established and related rules are there as well the next packet in the connection (now established) passes the very first rule and eliminates the further processing. So then the packet is only processed once. This leads to greater efficiency in the router and therefore increases the overall performance by reducing the overhead of the rules that apply to the packet. In your case when you place the suggested mark connection rule before the mark packet rule the packets are processed twice. Once for the connection and again for the packet. Placing the packet rule first reduces the amount each packet is processed since the packet only passes through one rule. In a simple rule set this is more efficient, however if you have a complex set of rules you would see greater efficiency placing the connections first. For example lets say you wanted to identify all web traffic. This would include HTTP (port 80) and HTTPs (443) and lets say proxy traffic as well on port 8080. All of these are web traffic. You would place a connection rule first then three rules below it one for each port type. In this case once the packet is identified by one of the lower three rules the connection is then flagged as a web traffic connection and all subsequent packets are tagged in the first rule. This stops the packets right at the first processing point decreasing the amount of packets the router needs to process. Hope this makes sense, its much easier to grasp with a visual aid. If this does not make sense let me know and I will be happy to elaborate a bit more with some examples.
So to answer your question you are not wrong and neither is the examples. It depends on the overall rule set whether one way or the other is more efficient.
On the second question UDP being a connection less protocol that is just not the case. The MT OS uses (or can use) connection tracking for example. This is where the established and related connection states can come from. This significantly increases the routers ability to only process the establishing or "new" connection. This prevents each and every packet from being processed through every rule. Once the router establishes its a port 5405 connection for VOIP for example the connection between the source and destination are tracked and any related ports that may be opened by that connection are "related" and therefor allowed. This not only prevents the packets from being processed one hundred times through each of your rules. Connection tracking allows the kernel to keep track of all logical network connections or "sessions", and thereby relate all of the packets which may make up that connection. NAT relies on this information to translate all related packets in the same way, and the MT can use this information to act in many creative and effective ways. UDP is not a connectionless protocol per say as the MT sees it. Its simply a less overhead intensive protocol that does not carry the connection state and connection establishment headers within the packet itself making it a more efficient protocol that does as little as a transport protocol can (according to RFC standard) it still has a "connection" at the outer layer to the router or other transport device and is treated as such. The connectionless transport properties of UDP are referring to the actual packet itself not the transport the packet is sent with or on.
Hope all that makes sense.