The most product-specific part of EvalBoard is BYOK: Bring Your Own Key. Evaluation tools can burn through paid provider credits quickly, and I did not want the backend to become a vault for user secrets. The first version lets users paste provider keys in the browser, sends the key with the run request, and never writes it to the database.
The backend reads X-API-Key from the request. If the selected provider requires a user key and none is present, the API rejects the run before creating external traffic. If a key is supplied, the execution thread passes it into call_llm. If no user key is supplied for a provider that has a server-side environment key, the client falls back to that environment key.
That makes the backend stateless with respect to user provider credentials. It stores runs and results, not secrets.
The provider abstraction
The provider client is intentionally small. Two dictionaries map provider names to environment variable names and OpenAI-compatible base URLs:
provider -> env var
provider -> base URL
The request body uses /chat/completions style payloads: model, messages, and temperature. Groq, Gemini, OpenAI, and DeepSeek fit that shape directly. Anthropic needs a provider-specific header, so the code handles that one conditional near the HTTP call.
This is a pragmatic kind of abstraction. It does not invent a full provider SDK. It notices that the providers expose similar enough HTTP surfaces and keeps the integration at that level. Adding another provider mostly means adding a base URL, an env var name, and frontend model options.
The tradeoff is that “OpenAI-compatible” does not mean identical forever. Providers differ on auth headers, model names, rate-limit behavior, response formats, streaming support, and error bodies. EvalBoard’s wrapper maps common errors like 401, 402, 429, timeout, and 5xx into LLMError, but a production-grade version would need provider-specific test coverage and probably a stricter response adapter.
Secrets and trust boundaries
BYOK has a real security tradeoff. Keeping keys in browser storage avoids server-side secret persistence, but browser storage is not a high-security key vault. If the frontend has an XSS vulnerability, stored keys are exposed. If the user shares a machine, keys may persist longer than expected.
For this project, the tradeoff was acceptable because EvalBoard is a developer tool and the goal was to avoid storing third-party credentials on the server. The UI makes the key path explicit, the backend does not log keys, and the database schema has no place for them.
The more robust future version would offer choices: ephemeral per-run keys, encrypted server-side key storage, provider OAuth where available, or workspace-level managed keys. But those are different products. For the portfolio version, BYOK demonstrated the privacy-aware pattern without building a secrets platform.
What Django contributed
Django’s role here is not flashy, but it is important. The run creation view is the policy gate. It can reject missing keys, validate the dataset, create the durable run record, and pass the secret only into the execution boundary. The model layer remains clean because secrets never become model fields.
The lesson I took from this part is that privacy decisions should show up in the data model. If the database has no api_key column, many classes of accidental persistence disappear. You still have to handle request logs, error logs, frontend storage, and transport security, but the core storage boundary is clear.
Good BYOK architecture is not just “send a header.” It is a product decision, a backend policy, a frontend storage decision, and a data model omission. EvalBoard kept those pieces small enough that I could reason about them directly.