BME Internal AI

Enterprise AI Solutions for the Alabama Board of Medical Examiners | May 2026

What We Built

BME Internal AI is a fully on-premises artificial intelligence system deployed by the Board of Medical Examiners IT Department. It provides a secure, agency-controlled AI assistant for BME and MLC staff — operating entirely on hardware owned and managed by the agency, with no data transmitted to external cloud services.

Staff access the system through a web browser at https://ai.local.albme.org using their existing Windows network credentials. The system is restricted to staff members assigned to the Ollama security group in Active Directory.

Hardware Architecture

Machine

Dell Pro Max Tower T2 Workstation

CPU

Intel Core Ultra 9 285 — 24 cores, 5.6 GHz max, 65W TDP

GPU

NVIDIA RTX PRO 6000 Blackwell — 96GB GDDR7, 300W TDP

RAM

128GB DDR5 4400 MT/s (56GB allocated to AI stack, 64GB to Windows)

Storage

2x 2TB NVMe PCIe Gen4 in RAID 1

Power Supply

1500W 80 Plus Platinum (operating at ~30% load)

Host OS

Windows 11 Pro (work) + Ubuntu 24.04 in WSL2 (AI stack)

Network

10.0.10.89 on agency LAN via ai.local.albme.org DNS A record

AI Stack Components

Ollama — LLM Inference Engine

Open-source inference server that runs large language models locally. It manages GPU memory, handles model loading and unloading, and exposes an OpenAI-compatible API. Runs as a system service with automatic startup.

Why: Simplest path to running local models with full GPU support, OpenAI-compatible API, and the largest model library of any local inference tool.

Gemma 4 31B — Primary AI Model

Google DeepMind's flagship model released April 2026 under Apache 2.0 license. 31 billion parameters consuming ~20GB of 96GB available GPU memory. Features native vision capability, 256,000-token context window, strong reasoning and knowledge performance.

Why: Current-generation model with vision capabilities, permissive Apache 2.0 license, and Western development meeting agency requirements.

Open WebUI — User Interface

Open-source web application providing the chat interface at ai.local.albme.org. Handles user authentication, conversation history, model selection, document uploads, and knowledge base management via Docker container.

Why: Multi-user with role-based permissions, built-in LDAP/AD integration, custom workspaces, active development, MIT licensed.

Qdrant — Vector Database

High-performance vector database storing indexed document representations. Currently: 14,784 files indexed, 152,000+ chunks stored with metadata (filename, PII flags, entity types, processing date). Supports both semantic and keyword-based filtering.

Why: Designed for vector search at scale, efficient local hardware operation, Open WebUI's recommended external vector store.

Nginx — Reverse Proxy & HTTPS

Docker-based reverse proxy handling all incoming HTTPS traffic. Self-signed SSL certificate for ai.local.albme.org deployed via Group Policy to domain-joined machines. Automatic HTTP → HTTPS redirect.

Document Ingestion Pipeline

Custom Python pipeline processes documents from file server shares into Qdrant. Runs as systemd service with automatic scheduling: 4 workers 6pm–6am, 1 worker during business hours. Fully resumable via SQLite checkpoint database.

Docling (IBM)

Extracts text from PDFs (including scanned via GPU-accelerated OCR), Word, Excel.

Presidio (Microsoft)

Scans extracted text for PII. Detected entity types (SSN, names, addresses, phone, license numbers, etc.) stored as searchable metadata.

nomic-embed-text

Converts text chunks into 768-dimensional vectors via Ollama API for semantic search.

Qdrant Storage

Stores vectors and metadata. Queryable via semantic similarity or exact keyword matching.

File Server: \\10.0.0.29 (Windows Server 2016) mounted in WSL via CIFS. Service account: svc-fileingest (read-only). Current share: bme (160GB, 14,784 files). Planned: legal, investigations, credentialing, mlc.

Semantic & Keyword Search Tool

Custom Python tool integrated into Open WebUI gives the AI model the ability to search Qdrant during conversations. Two search modes:

Semantic Search (search_documents)

Converts user question to vector, finds conceptually similar chunks. Best for topics, regulations, general research.

Keyword Search (keyword_search)

Finds exact text matches via full-text index. Essential for names, case numbers, dates, specific terms.

The AI model selects the appropriate search mode based on the nature of the question, guided by system prompt instructions.

Current Capabilities

Regulatory Q&A

Ask questions about Alabama Admin Code Title 540/545 and Code of Alabama Title 34 Chapter 24 with cited answers.

Document Search

Search BME document archive (14,784 documents indexed) by topic, names, or specific terms.

Writing Assistance

Draft emails, letters, memos, reports, policy documents. Formal outputs are stamped to indicate AI assistance.

General Research

Research topics, summarize documents, answer general questions, assist with data analysis and spreadsheets.

Code Development

IT staff can use the system for software development assistance through VS Code integration.

Security & Compliance

Data Residency

All data remains on agency hardware. No external LLM API calls. Web search is the only outbound activity.

Authentication

Active Directory LDAP. Access controlled by Ollama security group. First-login approval required.

Encryption in Transit

TLS 1.2/1.3 via Nginx. Certificate deployed via GPO to domain machines.

Encryption at Rest

Windows BitLocker on host. WSL2 virtual disk on RAID 1 NVMe.

Endpoint Protection

Windows Defender on host. No additional AV exclusions required.

Model Provenance

Western-developed models only (Google, OpenAI, Meta). No Chinese-developed models.

PII Detection

Microsoft Presidio scans all document chunks. Entity types stored as searchable metadata.

Policy Enforcement

Hard limits on probable cause determinations, charging decisions, and adjudications. BME AI Use Policy enforced automatically in system prompt.

Deployment Status

Hardware and OS✅ Complete

Ollama + Gemma 4 inference✅ Complete

Open WebUI (HTTPS, LDAP, AD auth)✅ Complete

Qdrant vector database✅ Complete

BME document ingestion (14,784 files)✅ Complete

Semantic + keyword search tool✅ Complete

System prompt (regulatory + policy)✅ Complete

Reboot automation✅ Complete

Claude Code (IT dev tooling)✅ Complete

Legal share ingestion⬜ Planned

n8n scheduled agents⬜ Planned

Azure Claude API integration⬜ Planned

Future Possibilities

The current deployment is a proof-of-concept on a single workstation. The architecture is designed to scale. Future capabilities under consideration:

•Expanded document access. Index additional collections (legal, investigations, credentialing, MLC) with role-based access control.
•Automated overnight processing. Scheduled agents handling OCR, document tagging, and data extraction without staff intervention.
•LMS automation. AI agent capable of navigating the license management system for data entry and retrieval.
•Domain-trained model. Fine-tuning on BME-specific regulatory content and historical decisions.
•Multi-user production deployment. Multi-GPU server supporting 50+ concurrent users across both agencies.
•Entra SSO integration. Single sign-on via Microsoft Entra for seamless access.

Key Achievements

✓Zero external data exposure. All computation and storage remains on agency hardware.
✓Enterprise-grade hardware. 24-core CPU, 96GB GPU, 128GB RAM delivers high-performance AI at scale.
✓14,784 documents indexed. Complete BME document archive instantly searchable with semantic + keyword search.
✓Active Directory integration. Single sign-on with existing Windows credentials.
✓PII-aware indexing. Microsoft Presidio automatically detects and flags sensitive data in all documents.
✓Policy-compliant. Hard-coded limits align with BME AI Use Policy and OIT governance requirements.