Ophraxx AI

Abstract network and communication visualization

Why conversation memory matters

A stateless AI — one that treats every message as a fresh conversation — cannot hold a coherent exchange. Users would need to re-explain context on every message, which makes the bot feel broken rather than helpful. Multi-turn memory allows users to say 'what about in Python?' after asking a question about JavaScript, or 'can you make that shorter?' without repeating what 'that' is. These are basic conversational capabilities that users expect.

At the same time, unbounded memory creates problems. Longer context windows increase response latency and cost. Very old messages in a session often become irrelevant and can confuse the model with stale context. The design challenge is choosing a window that is long enough to support natural conversation and short enough to keep the system fast and predictable.

The 15-minute timeout

The 15-minute inactivity timeout is calibrated to the pace of Discord conversations. In active conversations, 15 minutes is more than enough — users rarely pause for longer than that in the middle of an exchange. In slow or asynchronous conversations, a pause of 15 minutes usually signals that the topic has shifted and fresh context is more appropriate than carrying over a stale session.

When a session expires, the next message from that user starts a new session. There is no error, no notice to the user — it is transparent. If a user comes back after 20 minutes wanting to continue a previous conversation, they simply include enough context in their message and the conversation continues naturally. The timeout is a background housekeeping mechanism, not a hard boundary that users are expected to be aware of.

The 20-message cap

The 20-message history cap bounds the maximum amount of prior conversation that gets passed to the model on each call. Without a cap, a very long session would eventually pass so much context that response latency increases noticeably and costs become hard to predict. The cap enforces a sliding window — the 20 most recent messages are always included, and older ones are dropped.

Twenty messages was chosen as a balance point. It is enough to carry a multi-topic conversation across several exchanges. It is few enough that context window sizes remain manageable across all three model tiers. And it roughly covers the amount of prior conversation that is actually useful for generating a relevant response — messages from early in a session are rarely needed to answer a question asked 18 messages later.

In-memory storage and the privacy tradeoff

Sessions are stored entirely in memory. They are never written to any persistent store — no database, no disk, no log file. This is a deliberate privacy decision. When a session expires or the bot restarts, the conversation history is gone. There is no way to retrieve it, audit it, or accidentally expose it.

The tradeoff is that sessions do not survive bot restarts. If the bot goes down mid-conversation, users lose their session context and start fresh. We considered this acceptable given the alternative: persisting conversation history introduces data storage obligations, access controls, retention policies, and potential exposure vectors. For a product in beta with a strong privacy posture, the in-memory approach is the right starting point. Long-term persistence is something we will revisit with appropriate controls if it becomes a clear user need.

Session memory design: why we chose 15 minutes and 20 messages

Why conversation memory matters

The 15-minute timeout

The 20-message cap

In-memory storage and the privacy tradeoff

Related

Designing the S1–S9 threat model: how we categorized safety risks

Routing complexity without asking: how the prompt classifier works