AI/ResearchNextera Labs

AI Research Tool: From Seed to Scale in 6 Weeks

We built Nextera's AI-powered research assistant from zero to 213 beta users in 6 weeks. The MVP was compelling enough to help them raise a $3.2M seed round 4 months after launch, with 93.7% weekly retention among active users.

6 weeks (Jul 2025 – Aug 2025)

1 fullstack engineer, 1 ML engineer, 1 designer

ReactFastAPIPostgreSQLpgvectorOpenAIAnthropic

6 wk

Time to Launch

From zero codebase to 50 beta users in production

213

Beta Users (Month 1)

Organic growth from initial 50 — no paid marketing

$3.2M

Seed Round Raised

Closed 4 months post-launch. MVP demo was the centerpiece of the pitch.

93.7%

Weekly Retention

Among active users (3+ sessions/week). Industry benchmark for research tools is ~60%.

The Challenge

What We Were Up Against

Nextera Labs had a compelling thesis: researchers spend 40% of their time on literature review, and LLMs could cut that in half. Their founding team had the domain expertise (two PhD researchers and a product manager from Elsevier) but no engineering capacity. They had $500K in pre-seed funding and needed a working MVP to demonstrate to Series A investors that the concept resonated with real users. The catch: three competing products were in development, and first-mover advantage in the academic tools space meant everything. Six weeks was the window.

Zero Engineering Team

Founding team was 100% domain experts with no software engineering experience. They'd tried no-code tools but couldn't get the LLM integration quality they needed.

LLM Quality for Academic Use

Generic ChatGPT-style interfaces hallucinated citations and couldn't maintain accuracy across specialized domains. The tool needed to ground every response in actual papers.

Budget Constraints

$500K pre-seed had to cover 18 months of runway. The MVP budget was $80K — no room for over-engineering or expensive infrastructure.

Competitive Pressure

Three known competitors (two YC-backed) were building similar tools. Nextera needed to be first to market with a differentiated product to secure their Series A narrative.

Constraints & Requirements

6-week hard deadline to demo at an investor showcase event

$80K total engineering budget (including infrastructure)

Must handle 500+ concurrent users for the demo day without crashing

Citation accuracy above 95% — investors would check

Our Approach

How We Built It

Speed was everything, so we optimized every decision for time-to-market. Python FastAPI for the backend because the ML engineer could contribute to both the API and the embedding pipeline without switching languages. React (not Next.js) for the frontend because we didn't need SSR and Create React App gave us the fastest scaffold-to-deploy time. PostgreSQL with pgvector for embeddings instead of Pinecone — cheaper, simpler, and one fewer service to manage. The key technical bet was a RAG (Retrieval-Augmented Generation) pipeline that grounded every LLM response in actual paper abstracts, eliminating the hallucination problem.

User Research & API Prototyping

Week 1

Interviewed 12 researchers to validate the core workflow. Built a CLI prototype of the RAG pipeline to test retrieval quality against 50,000 arXiv abstracts.

User interview synthesis with 5 validated use cases

RAG pipeline prototype with 94.2% citation accuracy on test set

Architecture decision document (FastAPI + pgvector + React)

arXiv abstract ingestion pipeline (50,000 papers indexed)

Core Search & Analysis Engine

Weeks 2–3

Built the semantic search engine and the LLM-powered analysis features. Every response includes source citations with links to the original papers. Implemented a dual-model strategy: OpenAI GPT-4 as primary with Anthropic Claude as fallback for availability.

Semantic search across 50K papers with < 200ms query time

Paper summarization with source-grounded citations

Research question answering with multi-paper synthesis

Dual-model LLM integration (OpenAI primary, Claude fallback)

Frontend & Authentication

Weeks 4–5

Built the researcher-facing UI with a focus on the core loop: ask a question → get an answer with citations → drill into specific papers → save to a collection. Auth via Clerk for speed.

Research dashboard with conversational interface

Paper detail view with abstract, citation graph, and related papers

Collection management (save, organize, and annotate papers)

Auth and user management via Clerk (setup in 2 hours)

Beta Launch & Demo Prep

Week 6

Deployed to production, onboarded 50 beta users from the interview cohort, and prepared the investor demo. Set up monitoring and cost alerts for LLM API spending.

Production deployment on Railway ($20/month to start)

50 beta users onboarded with feedback collection

Investor demo script with live product walkthrough

LLM cost monitoring with per-query cost tracking ($0.03 avg/query)

Key Features

What We Built

RAG-Powered Research Assistant

Every LLM response is grounded in actual paper abstracts retrieved via semantic search, achieving 96.1% citation accuracy in production.

Technical Detail

Queries are embedded using OpenAI text-embedding-3-small, then matched against pgvector (HNSW index) for top-20 candidate papers. Retrieved abstracts are injected into the prompt context. Post-processing validates that every cited paper exists in the database and that the cited claim appears in the abstract. Hallucinated citations are filtered before reaching the user.

Dual-Model LLM Strategy

Primary model is GPT-4o with automatic failover to Claude 3.5 Sonnet, ensuring 99.8% availability even during OpenAI outages.

Technical Detail

FastAPI middleware tracks response latency and error rates per model. If GPT-4o latency exceeds 8 seconds or returns a 5xx, the request is automatically retried against Claude via the Anthropic API. During the beta period, OpenAI had two partial outages — users experienced zero downtime because failover kicked in within 400ms.

Cost-Optimized Embedding Pipeline

50,000 papers indexed for $12 using text-embedding-3-small with batched processing, staying well under the $80K budget constraint.

Technical Detail

Papers are batch-embedded in groups of 100 using the OpenAI batch API (50% cost reduction). Embeddings are stored in PostgreSQL with pgvector's HNSW index (ef_construction=128, m=16). Query-time search over 50K vectors takes 18ms average. We estimated Pinecone would cost $70/month — pgvector costs $0 beyond the existing PostgreSQL instance.

Research Collections

Researchers can save papers to themed collections, add annotations, and export citations in BibTeX format — the features that drove 93.7% weekly retention.

Technical Detail

Collections are stored in PostgreSQL with a simple papers <-> collections many-to-many. Annotations use a rich text editor (TipTap) stored as JSON. BibTeX export queries the Semantic Scholar API for full citation metadata. This feature wasn't in the original spec — we added it in week 5 after 8 out of 12 beta users asked for it.

Tech Stack

Why We Chose What We Chose

Backend

Python FastAPI

ML engineer could work on both the API and embedding pipeline in one language. Async support for concurrent LLM API calls.

PostgreSQL + pgvector

Vector search without a separate service. $0 additional cost vs $70/month for Pinecone. Performance was sufficient for 50K vectors.

Celery + Redis

Background task queue for paper ingestion and embedding generation. Redis also handles rate limiting for LLM API calls.

AI/ML

OpenAI GPT-4o

Best quality for research synthesis tasks at the time. Structured output mode ensured consistent citation formatting.

Claude 3.5 Sonnet

Failover model. Comparable quality to GPT-4o for our use case, different provider for availability redundancy.

text-embedding-3-small

Best cost/performance ratio for academic text. 1536 dimensions with 62% lower cost than text-embedding-3-large.

Frontend

React 18 + TypeScript

Fastest scaffold-to-deploy for an SPA. Didn't need Next.js SSR — the app is behind auth, so SEO was irrelevant.

TanStack Query

Server state management with built-in caching. Eliminated the need for a global state library for API data.

Clerk

Auth in 2 hours instead of 2 days. $0 for the first 10K MAU — perfect for a beta launch.

Infrastructure

Railway

$20/month for the entire stack (API + PostgreSQL + Redis). Auto-scaling available when needed. Cheapest path to production.

Cloudflare

CDN for the React SPA. Free tier was sufficient for beta traffic.

Sentry

Error tracking with user context. Free tier covered the beta with room to spare.

Impact

Before & After

Metric

Before

After

Literature Review Time

~8 hours/paper

~3.5 hours/paper (user-reported)

Citation Accuracy

N/A (no product)

96.1% (validated against source papers)

Monthly Infrastructure Cost

N/A

$47/month at 213 users

LLM Cost per Query

N/A

$0.031 average

Engineering Quality

How We Ship

Test Coverage

71% overall, 96% on the RAG pipeline and citation validation

CI/CD Pipeline

GitHub Actions — 6-minute pipeline with pytest and citation accuracy regression tests

Monitoring

Sentry for errors, custom dashboard for LLM cost tracking and citation accuracy metrics

Deploy Frequency

Multiple times daily during the 6-week sprint, 2x/week post-launch

“We talked to four agencies and two of them wanted to build us a 6-month, $400K platform. TechWithCare said 'let's prove the concept in 6 weeks for under $80K' and then actually delivered. The RAG pipeline quality was what sealed our seed round — every investor tested it with their own research questions and couldn't find a hallucinated citation. That's what made the difference.”

Dr. James Liu

Co-founder & CEO, Nextera Labs

Ongoing

What's Next

Scaling the paper index from 50K to 2M papers (full arXiv + PubMed)

Adding collaborative features (shared collections, team workspaces)

Building a VS Code extension for in-editor paper search

MORE BUILDLESS BREAK

Start building with a team that cares. No credit card required.

Get Started