AI Research Tool: From Seed to Scale in 6 Weeks
We built Nextera's AI-powered research assistant from zero to 213 beta users in 6 weeks. The MVP was compelling enough to help them raise a $3.2M seed round 4 months after launch, with 93.7% weekly retention among active users.
6 wk
Time to Launch
From zero codebase to 50 beta users in production
213
Beta Users (Month 1)
Organic growth from initial 50 — no paid marketing
$3.2M
Seed Round Raised
Closed 4 months post-launch. MVP demo was the centerpiece of the pitch.
93.7%
Weekly Retention
Among active users (3+ sessions/week). Industry benchmark for research tools is ~60%.
The Challenge
What We Were Up Against
Nextera Labs had a compelling thesis: researchers spend 40% of their time on literature review, and LLMs could cut that in half. Their founding team had the domain expertise (two PhD researchers and a product manager from Elsevier) but no engineering capacity. They had $500K in pre-seed funding and needed a working MVP to demonstrate to Series A investors that the concept resonated with real users. The catch: three competing products were in development, and first-mover advantage in the academic tools space meant everything. Six weeks was the window.
Zero Engineering Team
Founding team was 100% domain experts with no software engineering experience. They'd tried no-code tools but couldn't get the LLM integration quality they needed.
LLM Quality for Academic Use
Generic ChatGPT-style interfaces hallucinated citations and couldn't maintain accuracy across specialized domains. The tool needed to ground every response in actual papers.
Budget Constraints
$500K pre-seed had to cover 18 months of runway. The MVP budget was $80K — no room for over-engineering or expensive infrastructure.
Competitive Pressure
Three known competitors (two YC-backed) were building similar tools. Nextera needed to be first to market with a differentiated product to secure their Series A narrative.
Constraints & Requirements
6-week hard deadline to demo at an investor showcase event
$80K total engineering budget (including infrastructure)
Must handle 500+ concurrent users for the demo day without crashing
Citation accuracy above 95% — investors would check
Our Approach
How We Built It
Speed was everything, so we optimized every decision for time-to-market. Python FastAPI for the backend because the ML engineer could contribute to both the API and the embedding pipeline without switching languages. React (not Next.js) for the frontend because we didn't need SSR and Create React App gave us the fastest scaffold-to-deploy time. PostgreSQL with pgvector for embeddings instead of Pinecone — cheaper, simpler, and one fewer service to manage. The key technical bet was a RAG (Retrieval-Augmented Generation) pipeline that grounded every LLM response in actual paper abstracts, eliminating the hallucination problem.
User Research & API Prototyping
Week 1Interviewed 12 researchers to validate the core workflow. Built a CLI prototype of the RAG pipeline to test retrieval quality against 50,000 arXiv abstracts.
Core Search & Analysis Engine
Weeks 2–3Built the semantic search engine and the LLM-powered analysis features. Every response includes source citations with links to the original papers. Implemented a dual-model strategy: OpenAI GPT-4 as primary with Anthropic Claude as fallback for availability.
Frontend & Authentication
Weeks 4–5Built the researcher-facing UI with a focus on the core loop: ask a question → get an answer with citations → drill into specific papers → save to a collection. Auth via Clerk for speed.
Beta Launch & Demo Prep
Week 6Deployed to production, onboarded 50 beta users from the interview cohort, and prepared the investor demo. Set up monitoring and cost alerts for LLM API spending.
Key Features
What We Built
RAG-Powered Research Assistant
Every LLM response is grounded in actual paper abstracts retrieved via semantic search, achieving 96.1% citation accuracy in production.
Technical Detail
Queries are embedded using OpenAI text-embedding-3-small, then matched against pgvector (HNSW index) for top-20 candidate papers. Retrieved abstracts are injected into the prompt context. Post-processing validates that every cited paper exists in the database and that the cited claim appears in the abstract. Hallucinated citations are filtered before reaching the user.
Dual-Model LLM Strategy
Primary model is GPT-4o with automatic failover to Claude 3.5 Sonnet, ensuring 99.8% availability even during OpenAI outages.
Technical Detail
FastAPI middleware tracks response latency and error rates per model. If GPT-4o latency exceeds 8 seconds or returns a 5xx, the request is automatically retried against Claude via the Anthropic API. During the beta period, OpenAI had two partial outages — users experienced zero downtime because failover kicked in within 400ms.
Cost-Optimized Embedding Pipeline
50,000 papers indexed for $12 using text-embedding-3-small with batched processing, staying well under the $80K budget constraint.
Technical Detail
Papers are batch-embedded in groups of 100 using the OpenAI batch API (50% cost reduction). Embeddings are stored in PostgreSQL with pgvector's HNSW index (ef_construction=128, m=16). Query-time search over 50K vectors takes 18ms average. We estimated Pinecone would cost $70/month — pgvector costs $0 beyond the existing PostgreSQL instance.
Research Collections
Researchers can save papers to themed collections, add annotations, and export citations in BibTeX format — the features that drove 93.7% weekly retention.
Technical Detail
Collections are stored in PostgreSQL with a simple papers <-> collections many-to-many. Annotations use a rich text editor (TipTap) stored as JSON. BibTeX export queries the Semantic Scholar API for full citation metadata. This feature wasn't in the original spec — we added it in week 5 after 8 out of 12 beta users asked for it.
Tech Stack
Why We Chose What We Chose
Backend
Python FastAPI
ML engineer could work on both the API and embedding pipeline in one language. Async support for concurrent LLM API calls.
PostgreSQL + pgvector
Vector search without a separate service. $0 additional cost vs $70/month for Pinecone. Performance was sufficient for 50K vectors.
Celery + Redis
Background task queue for paper ingestion and embedding generation. Redis also handles rate limiting for LLM API calls.
AI/ML
OpenAI GPT-4o
Best quality for research synthesis tasks at the time. Structured output mode ensured consistent citation formatting.
Claude 3.5 Sonnet
Failover model. Comparable quality to GPT-4o for our use case, different provider for availability redundancy.
text-embedding-3-small
Best cost/performance ratio for academic text. 1536 dimensions with 62% lower cost than text-embedding-3-large.
Frontend
React 18 + TypeScript
Fastest scaffold-to-deploy for an SPA. Didn't need Next.js SSR — the app is behind auth, so SEO was irrelevant.
TanStack Query
Server state management with built-in caching. Eliminated the need for a global state library for API data.
Clerk
Auth in 2 hours instead of 2 days. $0 for the first 10K MAU — perfect for a beta launch.
Infrastructure
Railway
$20/month for the entire stack (API + PostgreSQL + Redis). Auto-scaling available when needed. Cheapest path to production.
Cloudflare
CDN for the React SPA. Free tier was sufficient for beta traffic.
Sentry
Error tracking with user context. Free tier covered the beta with room to spare.
Impact
Before & After
Metric
Before
After
Literature Review Time
~8 hours/paper
~3.5 hours/paper (user-reported)
Citation Accuracy
N/A (no product)
96.1% (validated against source papers)
Monthly Infrastructure Cost
N/A
$47/month at 213 users
LLM Cost per Query
N/A
$0.031 average
Engineering Quality
How We Ship
Test Coverage
71% overall, 96% on the RAG pipeline and citation validation
CI/CD Pipeline
GitHub Actions — 6-minute pipeline with pytest and citation accuracy regression tests
Monitoring
Sentry for errors, custom dashboard for LLM cost tracking and citation accuracy metrics
Deploy Frequency
Multiple times daily during the 6-week sprint, 2x/week post-launch
“We talked to four agencies and two of them wanted to build us a 6-month, $400K platform. TechWithCare said 'let's prove the concept in 6 weeks for under $80K' and then actually delivered. The RAG pipeline quality was what sealed our seed round — every investor tested it with their own research questions and couldn't find a hallucinated citation. That's what made the difference.”
Dr. James Liu
Co-founder & CEO, Nextera Labs
Ongoing
What's Next
Scaling the paper index from 50K to 2M papers (full arXiv + PubMed)
Adding collaborative features (shared collections, team workspaces)
Building a VS Code extension for in-editor paper search