Disclaimer: This website is currently in beta testing. Some features may not be complete, and content may be out of date.
Back to Development stories
Engineering·8 min read

From Discord message to AI reply: inside the Ophraxx pipeline

A detailed walkthrough of everything that happens between a user sending a message in Discord and receiving an Ophraxx AI reply — spanning spam detection, rate limiting, model routing, live tool calls, conversation memory, and chunked output delivery.

Code on a screen in a dark development environment

Step 1: Pre-flight checks

Every incoming Discord message hits a series of pre-flight checks before any AI work begins. The bot ignores messages from other bots, messages with no text content, messages that are only emojis or GIFs, and messages outside the configured AI channel or mention scope. This filtering keeps the pipeline from wasting resources on noise.

Next, a deduplication guard checks whether the message has already been picked up for processing. Discord can occasionally deliver the same message more than once — an internal short-lived cache prevents duplicate AI calls from racing each other.

Then the system checks the guild configuration stored on our backend. If the guild has not run the setup command, or if AI is disabled in their config, the message is dropped silently. This means the bot only activates for servers that have explicitly configured it.

Step 2: Spam and rate limit enforcement

Before any content moderation runs, the spam pre-fire check evaluates the user's recent message rate. If a user sends too many messages within 8 seconds, hard spam is declared. The bot immediately records a violation, attempts to apply a 60-second Discord timeout to the member, soft-blocks the user from AI for 5 minutes, and fires a detailed embed to the server's mod log channel including message count, channel, and a content preview.

On top of spam detection, the pipeline checks daily and monthly usage limits tracked on our backend. Each user is capped at 25 AI requests per day. Each guild is capped at 3,000 requests per month. When either limit is hit, the user receives a clear message explaining exactly why they are rate-limited and when the limit resets.

Step 3: Content moderation and blacklist

After rate checks pass, the message content goes through the content moderation layer and the blocklist check. The blocklist contains user and guild IDs that are permanently banned from AI access — checked at both the message handler and the slash command handler. Blocked users receive no response and no indication that the block exists.

Content moderation runs the pattern-based filter against the nine threat categories. If a match fires, the violation is logged and the user receives a specific refusal message tied to the exact category. The system also checks whether the user is currently soft-blocked from a prior spam or violation event.

Step 4: Ack filtering and prompt classification

Before spending a model call, the pipeline checks whether the message is a filler acknowledgment — words like 'ok', 'thanks', 'lol', 'got it', or messages under 8 characters with no question mark. These receive a lightweight casual reply chosen from a small static list without touching the AI at all.

For real queries, the prompt classifier runs next. It scores the message against complex keyword indicators like 'analyze', 'algorithm', 'write a', 'explain', 'step-by-step', and academic subject names, as well as simple indicators like greetings and identity questions. It also weighs message length and question density. A message over 180 characters is automatically complex. A high enough complexity score routes to the reasoning model; everything else goes to the fast base model.

Step 5: Conversation memory and live tool context

Ophraxx AI maintains per-user conversation sessions. Each session stores up to 20 messages and expires after 15 minutes of inactivity. When a new message arrives, the session is loaded (or created), the full history is passed to the model as context, and the new message is appended. This gives the bot genuine multi-turn memory within a session — users can refer back to earlier parts of the conversation naturally.

Before the main model call is made, the tool dispatcher checks whether the query needs live external data. Weather queries are the primary case — if the message appears to be asking about current conditions, the dispatcher fetches real-time weather data from a live data provider and injects the full result directly into the system context. This means the model always has real, timestamped weather information rather than disclaiming it cannot access live data.

Step 6: Generation, validation, and delivery

The main model call runs through our AI infrastructure with the assembled system prompt, personality addendum, conversation history, and tool context. The safeguard layer then screens the output against the same nine threat categories before anything is sent. The fact-checker reviews the response for factual accuracy. Output sanitization strips any PII. If any check fails, the pipeline returns a safe error message rather than the problematic output.

Finally, the response goes through chunked delivery. Discord's message limit is 2,000 characters. Responses that exceed this are split using a priority hierarchy: paragraph breaks first, then line breaks, then sentence endings, then word boundaries, then hard cuts as a last resort. Each chunk is sent as a reply to the previous one to maintain a readable thread. The final chunk always includes the accuracy disclaimer footer and a row of feedback buttons — thumbs up, thumbs down, and a link to AI Support.