Building Birr AI: a Django + React navigator for NBE directives

Birr AI started with a boring but painful problem: National Bank of Ethiopia documents matter, but they are not easy to search, compare, or cite. If you are building around Ethiopian fintech regulation, the hard part is not only finding an answer. The hard part is knowing which directive the answer came from.

That pushed the project away from “chat with PDFs” and toward a document navigator. The app has two first-class surfaces: a React chat interface for questions, and a document index for browsing the actual NBE files. Django owns the source metadata, user sessions, saved documents, and retrieval orchestration. PostgreSQL stores both normal relational data and pgvector embeddings.

The most important architectural choice was this: the vector store is not the source of truth. The Document model is. It stores the title, filename, category, source URL, source page, upload date hints, extracted Markdown, embedding status, and summary. The vector table exists to retrieve chunks for content questions, but document counts, filters, availability checks, and source links come from structured rows.

That distinction matters because regulatory questions are often mixed. “How many recent foreign exchange directives do you have, and what do they say?” is not one pure semantic-search problem. It has a count, a time window, a category hint, a list of matching documents, and a content question. Letting the LLM guess all of that would make the product feel clever in demos and unreliable in real use.

The Django implementation leans into that separation. chat/services/question_signals.py extracts deterministic signals from the user question. structured_retrieval.py answers metadata questions with ORM queries against Document. rag_chain.py gathers both structured facts and pgvector chunks when a question needs both. The final model call writes the prose, but it is told to trust exact database facts for counts and document availability.

That is the lesson I took from this project: RAG systems become easier to trust when boring code owns the facts. The LLM should explain, summarize, and rephrase. It should not count rows, decide whether a document exists, or invent a category filter.

Why Django fit this problem

Django was useful because the product was not only AI. It needed normal application features: authentication, saved documents, chat sessions, pagination, filters, admin inspectability, migrations, and custom API errors. DRF made those pieces straightforward, so most of the effort could go into the regulatory workflow.

The document index is a good example. A single endpoint supports search across title, filename, category, summary, and source fields; category filtering; embedded and summary status filters; saved-document filters; file type filters; recent windows; upload year ranges; sorting; pagination; and facet metadata. That sounds like a lot, but it is mostly composable queryset work.

The tradeoff is that the first version uses pragmatic search rather than a dedicated full-text search stack. For an MVP, icontains filters and structured metadata were enough. If the document library grows or query quality becomes the bottleneck, PostgreSQL full-text search or trigram indexes would be the natural next step. The important thing is that the API shape already treats document discovery as a structured backend responsibility.

Product lessons

The main product lesson was that citations are not decoration. In a regulatory tool, the source link is part of the answer. Every retrieved chunk carries source metadata, and saved AI messages persist the source_urls used for an answer. That lets the frontend show where the answer came from after the stream finishes.

The second lesson was that chat history needs boundaries. Follow-up questions are rewritten into standalone questions before retrieval, but the system still answers only from retrieved context and structured document facts. That keeps conversational convenience without turning previous model output into a trusted source.

The third lesson was operational: a free-tier AI stack can work if you design for its limits. Birr AI uses Groq for generation, Gemini for embeddings, GitHub Actions for scheduled ingestion, and Supabase for identity. That pushed the backend toward simple, observable mechanisms: management commands, durable database state, and no unnecessary worker infrastructure.

The result is a project that taught me to treat Django as more than an API wrapper around an LLM. It is the control plane: it owns identity, source lineage, durable state, permissions, and the deterministic parts of the reasoning pipeline. The AI layer is more useful precisely because it is not responsible for everything.