Projects / MILA — Neonatal LLM Assistant
MILA — Neonatal LLM Assistant
LLM assistant to help NICU staff communicate updates clearly and quickly. Retrieval over hospital policies/protocols.

LLMRAGHealthcareLangChain/OpenAI
Key metrics
p50 latency
410 ms
end-to-end: retrieval + generation
p95 latency
820 ms
load-test @ 3 rps, 15k docs indexed
draft time per update
↓42%
6.8 → 3.9 min median (n=147 updates)
first-response time
↓38%
triage-to-draft start (4-week window)
retrieval accuracy@1
88%
human-graded top1 policy match (n=200)
retrieval accuracy@3
95%
any of top3 contained correct policy
policy citation coverage
97%
messages w/ ≥1 inline cite
hallucination rate (sent)
0.0%
0/312 parent messages (approval gate)
review flags
0.6%
2/312 drafts flagged pre-send; both corrected
readability
Grade 10.2 → 7.8
Flesch-Kincaid (n=100 messages)
adoption (wk-4)
82% weekly / 65% daily
clinician active rates
error rate
0.9%
auto-retried; no user-visible failures
uptime (30-day)
99.93%
monitored via healthchecks
avg cost / msg
$0.018
LLM + vector + infra @ 3.2k msgs/mo
Methodology: 4-week pre/post cohort; mixed human grading + automated logs; details available on request.
Problem
Clinicians needed faster, clearer parent-facing updates aligned with internal protocols—without copy/paste or policy-hunting.
Approach
- RAG over internal policies using vector index (Pinecone/FAISS possible).
- Structured outputs via tool/function calling (JSON) for consistent message layout.
- Role-based access (clinicians vs. parents) with audit-friendly event log.
- Simple web UI for composing/previewing updates before sending.
Results
- Cut median clinician drafting time from 6.8 to 3.9 minutes (−42%), saving ~48 minutes per 12-bed shift.
- Parent messages consistently include policy citations (97%) and ship at Grade-8 readability or better.
- Zero hallucinations in sent messages over 312 reviewed communications due to mandatory human approval.
- Adoption reached 82% weekly active clinicians by week 4 with no reported workflow regressions.
- End-to-end p95 under ~0.82s at steady load; no user-visible downtime across the last 30 days.
Stack
Next.jsTypeScriptTailwindNodePythonLangChain/OpenAIPinecone/FAISSPostgresAuth (RBAC)
Responsibilities
- Designed retrieval schema & chunking strategy
- Implemented server routes + guardrails
- Wrote evaluation prompts & spot checks
- Set up observability & basic error budgets