You are a senior full-stack developer and AI architect tasked with building a fully functional MVP of 'NanoAI Trainer'. This application is a platform designed to empower developers and researchers to train their own custom AI language models, inspired by Anthropic's Claude, using JAX and Google TPUs. The core value proposition is to democratize advanced AI model training by providing an end-to-end, cost-effective (around $200) solution following principles like Constitutional AI and Preference Optimization.
PROJECT OVERVIEW:
NanoAI Trainer aims to solve the problem of high barriers to entry in training advanced AI language models. It provides a complete library and infrastructure that guides users through training their own models from scratch, inspired by the success of projects like Karpathy's nanochat and Anthropic's Claude. The platform will allow users to define their model's 'SOUL' (principles and guidelines), generate synthetic data, define agentic interfaces, and use preference optimization to align the model's behavior. The primary goal is to make state-of-the-art AI training accessible to a wider audience, particularly those leveraging JAX and TPUs.
TECH STACK:
- Frontend Framework: React (Next.js App Router)
- Styling: Tailwind CSS
- ORM: Drizzle ORM (PostgreSQL compatible)
- Database: PostgreSQL
- UI Components: shadcn/ui
- Authentication: NextAuth.js (or Clerk for simplified integration)
- State Management: React Context API / Zustand (for global state)
- Data Fetching: React Server Components (RSC), Server Actions, Fetch API
- Deployment: Vercel / Google Cloud Run
- Libraries: zod (validation), react-hook-form, chart.js (or similar for dashboard)
DATABASE SCHEMA (PostgreSQL with Drizzle ORM):
1. `users` table:
- `id` (uuid, primary key)
- `name` (text)
- `email` (text, unique)
- `emailVerified` (timestamp)
- `image` (text)
- `createdAt` (timestamp, default now())
- `updatedAt` (timestamp, default now())
2. `projects` table:
- `id` (uuid, primary key)
- `userId` (uuid, foreign key to `users.id`)
- `name` (text)
- `description` (text)
- `createdAt` (timestamp, default now())
- `updatedAt` (timestamp, default now())
3. `models` table:
- `id` (uuid, primary key)
- `projectId` (uuid, foreign key to `projects.id`)
- `name` (text)
- `parameterSize` (integer)
- `architecture` (text, e.g., 'Transformer')
- `status` (enum: 'draft', 'training', 'completed', 'failed')
- `trainingConfig` (jsonb)
- `createdAt` (timestamp, default now())
- `updatedAt` (timestamp, default now())
4. `training_runs` table:
- `id` (uuid, primary key)
- `modelId` (uuid, foreign key to `models.id`)
- `projectId` (uuid, foreign key to `projects.id`)
- `runName` (text)
- `tpuDetails` (jsonb, e.g., {'nodes': 8, 'accelerator_type': 'v3-8'})
- `startTime` (timestamp)
- `endTime` (timestamp)
- `status` (enum: 'queued', 'running', 'completed', 'cancelled', 'error')
- `logUrl` (text)
- `metrics` (jsonb, e.g., {'loss': ..., 'accuracy': ...})
- `createdAt` (timestamp, default now())
5. `soul_definitions` table:
- `id` (uuid, primary key)
- `projectId` (uuid, foreign key to `projects.id`)
- `content` (text, markdown/structured text for SOUL.md)
- `version` (integer)
- `createdAt` (timestamp, default now())
CORE FEATURES & USER FLOW:
1. **User Authentication & Project Management**
* Flow: User signs up/logs in via email/Google. User lands on a dashboard. User can create a new 'Project' to define a new model training initiative. Each project can host multiple model training configurations.
* Details: Use NextAuth.js with email and Google provider. Dashboard displays existing projects. 'Create Project' form requires name and description.
2. **Model Configuration**
* Flow: Within a project, user defines a new 'Model'. User specifies model name, estimated parameter size, architecture type (defaulting to Transformer), and initial training configuration (linking to `trainingConfig` JSONB).
* Details: A form for model creation. `trainingConfig` will initially contain basic parameters like learning rate, batch size, epochs, etc. (JSONB for flexibility).
3. **SOUL Definition & Agentic Interface**
* Flow: User navigates to the 'SOUL Definition' section for a project. User inputs the core principles and guidelines (SOUL.md content) using a rich text editor. User can also define the agentic interface structure (e.g., available tools, function signatures).
* Details: Use a markdown editor (e.g., TipTap, Editor.js) for `soul_definitions.content`. Define a schema or structured input for the agentic interface.
4. **Synthetic Data Generation (Conceptual MVP Feature - Backend Focus)**
* Flow: Based on the SOUL definition and agentic interface, the system conceptually generates synthetic data or provides tools/scripts for users to generate it using a base model or LLM.
* Details: This might be a placeholder in MVP, pointing users to external tools or providing command-line scripts. The core JAX/TPU training infra is the focus.
5. **Preference Optimization Setup**
* Flow: User defines preference data (e.g., pairs of responses where one is preferred over the other) and configures the preference optimization step within the model training process.
* Details: A UI for uploading preference data or defining generation prompts for collecting this data. Configuration options for the optimization algorithm.
6. **TPU Training Job Submission & Monitoring**
* Flow: User initiates a training run for a configured model. The system prepares the JAX training script, bundles the configuration (including SOUL, agent interface, optimization params), and submits a job to Google Cloud TPUs (simulated via job queue or actual API call if available).
* Details: A 'Start Training' button. Backend handles job queuing (e.g., using Redis + BullMQ). The `training_runs` table tracks job status. A dashboard page displays active and past training runs, with links to logs and metrics.
7. **Training Metrics Dashboard**
* Flow: As training progresses, metrics (loss, accuracy, etc.) are collected and displayed on a real-time or near real-time dashboard for the selected training run.
* Details: Use a charting library to visualize key metrics over time. Data fetched from `training_runs.metrics` (JSONB).
API & DATA FETCHING:
- API Routes (Next.js App Router - Route Handlers / Server Actions):
- `POST /api/projects`: Create a new project.
- `GET /api/projects`: Get user's projects.
- `POST /api/projects/[projectId]/models`: Create a new model configuration.
- `GET /api/projects/[projectId]/models`: Get models for a project.
- `POST /api/models/[modelId]/train`: Submit a training job.
- `GET /api/training-runs/[runId]`: Get status and metrics for a training run.
- `GET /api/training-runs/[runId]/logs`: Stream/fetch logs (consider WebSocket or SSE for real-time).
- Data Fetching:
- Primarily use Server Components for initial page loads.
- Server Actions for mutations (create, update, delete).
- Client Components will fetch dynamic data (e.g., training run metrics) using `fetch` or libraries like SWR/React Query if needed, triggered by user interaction or intervals.
- `GET /api/projects`, `GET /api/projects/[projectId]/models` will be called from Server Components or `layout.tsx` to populate navigation/sidebars.
- `GET /api/training-runs/[runId]` will be polled or use WebSockets from a Client Component on the training dashboard.
UI/UX DESIGN & VISUAL IDENTITY:
- Style: "Minimalist Clean with Futuristic Accents"
- Color Palette:
- Primary: `#FFFFFF` (White)
- Secondary: `#F0F2F5` (Light Gray)
- Accent: `#007AFF` (Vibrant Blue)
- Text: `#1F2937` (Dark Gray)
- Background: `#FFFFFF` (Main), `#F9FAFB` (Slightly off-white for sections)
- Code/Terminal: `#1E1E1E` (Dark Gray background), `#D4D4D4` (Light text)
- Typography:
- Headings: Inter (Bold, Semibold)
- Body Text: Inter (Regular)
- Code: Fira Code (Monospaced)
- Layout:
- Sidebar Navigation (Projects, Models, Settings)
- Main Content Area displaying forms, dashboards, or configurations.
- Clean cards and containers with subtle shadows.
- Responsive design: Mobile-first approach, adapting seamlessly to tablet and desktop.
- Animations:
- Subtle page transitions (fade/slide).
- Loading spinners/skeletons using Tailwind CSS animations.
- Button hover effects (slight scale or color change).
- Smooth data updates on charts.
COMPONENT BREAKDOWN (Next.js App Router Structure):
- `app/`
- `layout.tsx`: Root layout with global styles, AuthProvider, possibly header.
- `page.tsx`: Dashboard landing page after login (list projects, "Create Project" CTA).
- `auth/`
- `page.tsx`: Sign in / Sign up page.
- `projects/[projectId]/`
- `layout.tsx`: Project-specific layout (e.g., project header, sidebar for models, soul).
- `page.tsx`: Project overview/settings.
- `models/`
- `page.tsx`: List models within the project.
- `[modelId]/`
- `page.tsx`: Model details, configuration form, training setup.
- `training/`
- `page.tsx`: Training run submission form.
- `[runId]/`
- `page.tsx`: Training run monitoring dashboard (metrics, logs).
- `soul/`
- `page.tsx`: SOUL.md editor and agentic interface definition.
- `settings/`
- `page.tsx`: User profile and account settings.
- `components/`
- `ui/`: Re-usable shadcn/ui components (Button, Input, Card, Table, Tabs, etc.).
- `auth/`: Auth-related components (e.g., SignInButton, UserAvatar).
- `common/`: General components (e.g., Sidebar, Header, LoadingSpinner, ErrorMessage).
- `dashboard/`: Dashboard-specific components (e.g., ProjectCard).
- `models/`: Model configuration forms and display components.
- `training/`: Training run monitoring components (e.g., MetricsChart, LogViewer).
- `lib/`: Utility functions, database connection (Drizzle), auth config.
- `styles/`: Global CSS (if any, mostly Tailwind utility classes).
ANIMATIONS:
- Page Transitions: Use `motion` from `framer-motion` for subtle slide-in/fade-in effects on route changes within the App Router structure (apply to `main` content area).
- Loading States: Implement skeleton loaders for data-heavy sections (tables, charts) using Tailwind CSS animations (`animate-pulse`). Use `Suspense` boundaries effectively.
- Interactive Elements: Subtle hover effects on buttons and links (`hover:scale-105`, `hover:bg-opacity-80`). Input field focus states.
- Chart Animations: Default animations provided by the charting library, ensure they are smooth and not distracting.
EDGE CASES:
- **Authentication Flow**: Handle unauthenticated users redirecting to login. Protect all API routes and pages except auth routes.
- **Authorization**: Ensure users can only access and modify their own projects and models.
- **Empty States**: Design informative empty states for project lists, model lists, training runs, and data tables (e.g., "No projects yet. Create your first project!").
- **Error Handling**: Implement robust error handling for API requests (display user-friendly messages using `toast` notifications). Handle backend errors during training job submission/execution gracefully. Catch exceptions in JAX/TPU scripts and report status as 'failed' with error details.
- **Validation**: Use `zod` for form validation on client and server (via Server Actions/Route Handlers) for all user inputs (project names, model configs, SOUL content).
- **Long-running Training**: Ensure the UI correctly reflects 'queued', 'running', 'completed', 'failed' statuses. Handle potential disconnections and implement auto-refresh or WebSocket listeners for status updates.
- **TPU Job Failures**: Provide clear feedback on why a TPU job failed, linking to logs if possible.
SAMPLE DATA (Mock Data for Frontend Development / Initial State):
1. **Project List (User Dashboard):**
```json
[
{"id": "uuid1", "name": "Claude-Lite Training", "description": "Experimenting with a smaller Claude variant.", "createdAt": "2024-05-10T10:00:00Z"},
{"id": "uuid2", "name": "Code Assistant V1", "description": "Training a model specialized for code completion.", "createdAt": "2024-05-09T15:30:00Z"}
]
```
2. **Model List (Within Project 'uuid1'):**
```json
[
{"id": "uuid101", "name": "CL-Lite-Base", "parameterSize": 1.3, "architecture": "Transformer", "status": "completed", "createdAt": "2024-05-11T09:00:00Z"},
{"id": "uuid102", "name": "CL-Lite-Tuned", "parameterSize": 1.3, "architecture": "Transformer", "status": "training", "createdAt": "2024-05-12T11:00:00Z"}
]
```
3. **Training Run (Active):**
```json
{
"id": "uuid201",
"modelId": "uuid102",
"runName": "Initial Training Epochs",
"startTime": "2024-05-12T11:05:00Z",
"status": "running",
"metrics": {"loss": [0.85, 0.72, 0.65], "accuracy": [0.5, 0.6, 0.68]},
"tpuDetails": {"nodes": 4, "accelerator_type": "v3-4"}
}
```
4. **Training Run (Completed):**
```json
{
"id": "uuid202",
"modelId": "uuid101",
"runName": "Base Model Pre-training",
"startTime": "2024-05-11T09:15:00Z",
"endTime": "2024-05-11T18:45:00Z",
"status": "completed",
"metrics": {"loss": [1.5, 1.2, 0.9, 0.75], "accuracy": [0.2, 0.3, 0.45, 0.6]},
"tpuDetails": {"nodes": 8, "accelerator_type": "v3-8"}
}
```
5. **SOUL Definition Snippet:**
```text
# SOUL for Code Assistant
You are an AI assistant that helps developers write better code.
**Principles:**
1. Be helpful and informative.
2. Prioritize code correctness and security.
3. Explain complex concepts clearly.
4. Never generate malicious or harmful code.
**Agent Interface:**
- `execute_code(language: str, code: str) -> str`: Executes code and returns output or errors.
- `search_documentation(query: str) -> str`: Searches for relevant documentation.
```
6. **Agentic Interface Definition (Structured):**
```json
[
{"name": "execute_code", "description": "Executes provided code snippet in a specified language.", "parameters": {"language": "string", "code": "string"}, "returns": "string"},
{"name": "search_documentation", "description": "Searches the web or internal docs for information.", "parameters": {"query": "string"}, "returns": "string"}
]
```
7. **Model Configuration (Basic):**
```json
{
"name": "MyCustomModel",
"parameterSize": 7, // e.g., 7 Billion parameters
"architecture": "Transformer",
"trainingConfig": {
"learningRate": 0.0001,
"optimizer": "adamW",
"batchSize": 32,
"epochs": 3,
"warmupSteps": 100
}
}
```
8. **Preference Data Example:**
```json
[
{"prompt": "Explain the difference between JAX and TensorFlow.", "chosen_response": "JAX is... (detailed explanation)", "rejected_response": "JAX is faster."}
]
```
9. **TPU Job Submission Payload Example:**
```json
{
"modelId": "uuid102",
"projectId": "uuid1",
"runName": "Preference Tuning Run 1",
"tpuConfig": {"nodes": 4, "accelerator_type": "v3-4"}
}
```
10. **Error State Example (Training Run):**
```json
{
"id": "uuid203",
"modelId": "uuid103",
"runName": "Failed Attempt",
"startTime": "2024-05-12T14:00:00Z",
"endTime": "2024-05-12T14:15:00Z",
"status": "error",
"errorMessage": "CUDA out of memory. Try reducing batch size or model size.",
"logUrl": "https://storage.googleapis.com/.../log.txt",
"metrics": {}
}
```
This comprehensive prompt provides a detailed blueprint for building the NanoAI Trainer MVP, covering all essential aspects from project conception to specific implementation details for a Next.js application.