Any selector will still use connection tracking to match, so I can't see the reason for that. Queue tree are nice and simple, just do everything in mangle.
If you need some solutions that do not involve mangle use simple queues there are plenty selectors there.
P.S. 600mangles and 300queues for 20Mbps traffic is not normal, whats your latency?
----
Oh no, look at this (
http://lartc.org/howto/lartc.adv-filter.html#AEN1289):
"The U32 selector contains definition of the pattern, that will be matched to the currently processed packet. Precisely, it defines which bits are to be matched in the packet header and nothing more, but this simple method is very powerful. Let's take a look at the following examples, taken directly from a pretty complex, real-world filter:"
This sounds like this selector does not use connection tracking.
There is a difference between mangling packets with label + check + classify ...
and check + classify
Roules at firewal are processed for every packet from top to down until fit roule pattern (fit ends roule processing).
This process killing cpu.
If you can use selector who compares pattern directly with packet field (src, dst, TOS) this process can by mutch efficient.
About latency - can't be latency from roule processing if i have still 40-50 % of CPU power. Latency from QoS is from packet droping ( packets must be retransmited ).
Best Regards Tomasz