Aren't routing decisions possible with L3 HW offloading?
Some routing decisions are, yes, but not all. Note the frequent lack of "HW" in the second column.
Even in the simple cases like IPv4 unicast, either someone has to configure the router with a static route, or it has to be provided by other means: implicit by VPN configuration, dynamic by OSPF, etc. Until someone tells the router what to route, it can't hardware-offload it. Even then, you have the "gateway=" case further down that list showing how it can require CPU intervention.
L3HW routing is better used for cases like inter-VLAN routing, where some outside agency applies VLAN tags, turning it into what is basically a switching problem for a switch chip that knows how to pair 802.1q tags with routing rules.
the 10 Gb/s advertised throughput
That's conditional. Are you doing full-size packets with no firewall rules and no queues?
The test results show 2.6 Gbit/sec for a typical IP firewall setup with an average of 512 byte packets, for instance.
I'm only getting around 1,5 Gb/s to around 5 Gb/s between two VLANs
Inter-VLAN routing is a pretty weak form of "routing." It's basically fixed, which is why it can be hardware-offloaded.
When I say "proper router," I think of IP firewall rules, VPNs, queueing, and such. These all require CPU intervention. IP firewall rules and queueing do because they're handled by the underlying Linux kernel. VPNs do because they're implemented either in the kernel or in userspace, so all they can hardware-offload is the crypto, and not always even then.
depending on how many parallel threads I start (all tested with iperf3 without any special parameters).
Between which two ports? There's only one 10G port on the box, and one 2.5G. Everything else is gigabit.
If you were relying on full-duplex on the lone 10G port, with one direction feeding the other, how is that realistic? You don't buy a router to route from one port back to the same port. It'll bottleneck to 2.5 or 1G regardless, in real-world cases, even with the ideal L3HW feature set.
I thought L3 HW means that the device can do all the routing (and routing decisions) in hardware
Nope, not even in the best CRS cases. See the first link.
switches like the CRS326 are "dumbed down" when doing L3 HW?
If by that you mean they don't have every RouterOS feature hardware-offloaded, then yes.