The core design decision: routing should be invisible
The alternative to automatic routing is asking users to pick a model. We rejected that early. Most users do not know which model tier is right for their query, do not want to think about it, and will default to whichever tier they know — usually the first one. Asking creates friction, creates wrong choices, and creates a product experience that feels technical rather than useful.
The classifier exists to make that choice on behalf of the user, every time, transparently. The goal is that users simply ask questions and get appropriately capable responses — whether that is a fast one-line answer or a detailed multi-paragraph breakdown — without any awareness that a routing decision was made at all.
How the scoring works
Each message is scored against two pattern sets. The complex set covers words and phrases that tend to predict queries needing more depth: 'explain,' 'analyze,' 'compare,' 'algorithm,' 'write a,' 'create a,' 'step-by-step,' 'essay,' 'report,' 'documentation,' 'pros and cons,' 'implications,' and academic subject names including philosophy, ethics, economics, physics, chemistry, biology, mathematics, psychology, and sociology. Each pattern that matches adds to the complexity score.
The simple set covers patterns that almost never need a deep response: greetings, 'how are you,' 'what is your name,' 'are you an AI,' and similar identity or social openers. A simple pattern match returns immediately with a 'simple' classification, skipping the rest of the scoring. This short-circuit is intentional — it keeps fast exchanges fast without running the full scoring logic on 'hey.'
Length as a signal
Keyword patterns miss a lot of complex queries that do not use the specific words we listed. A detailed question about an obscure topic might be 200 characters with no keywords from either set. Length is a strong secondary signal: any message over 180 characters is treated as complex automatically, regardless of keyword score. Messages between 80 and 180 characters with at least one complex keyword or two question marks also push the score toward the complex threshold.
The 180-character cutoff was chosen based on the observation that very short queries — even technical ones — are usually answerable with a brief response, while longer queries almost always benefit from a more capable model. The 80-character secondary threshold catches medium-length messages where a single keyword provides enough signal. These thresholds are not arbitrary but they are also not scientifically derived — they are calibrated starting points that we expect to adjust as we see real usage data from beta.
Fallback and the ack filter
If the target model tier is not live, the classifier falls back gracefully to the base model. Users always get a response. The fallback is transparent — no error, no explanation, just the best available response. This design means the classifier can be built and deployed before all model tiers exist, and the routing logic evolves in place as tiers come online.
Before the classifier runs at all, an acknowledgment filter checks for filler responses — messages like 'ok,' 'thanks,' 'lol,' 'got it,' or any message under 8 characters with no question mark. These get a lightweight casual reply from a small static pool without touching the classifier or the AI at all. This matters for latency and for keeping AI call counts down on messages that simply do not need an AI response.