Create a fully functional, multi-page Next.js MVP application for an AI Agent Performance Platform called 'AI Agent Arena'. This platform allows users to compare, test, and analyze the performance and cost-effectiveness of various AI models (agents) on specific tasks.
PROJECT OVERVIEW:
The AI Agent Arena provides a transparent and data-driven environment for evaluating AI models. Users can select predefined or create custom tasks, run tests using different AI agents, and visualize the results through comprehensive analytics. The core value proposition is to help users identify the most cost-effective and performant AI model for their specific needs, moving beyond marketing claims to real-world performance data. It addresses the problem of understanding and comparing the practical utility and economic viability of diverse AI models in a standardized manner.
TECH STACK:
- Frontend Framework: Next.js (App Router)
- Styling: Tailwind CSS
- ORM: Drizzle ORM (PostgreSQL compatible)
- Database: PostgreSQL (or a compatible SQL database)
- UI Components: shadcn/ui (for accessible, reusable components)
- Authentication: NextAuth.js (or Clerk for a simpler setup)
- State Management: React Context API / Zustand for global state, component-local state where appropriate
- Data Fetching: React Server Components (RSC), Server Actions, client-side fetching with `fetch` or libraries like SWR/React Query if needed.
- Other Libraries: Zod (for schema validation), React Hook Form (for forms), charting library (e.g., Recharts or Chart.js)
DATABASE SCHEMA (PostgreSQL with Drizzle ORM):
```sql
-- users table (managed by auth provider)
-- Example: email, id, name, image
-- agents table: Stores information about the AI models available for testing
CREATE TABLE agents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL UNIQUE, -- e.g., 'GPT-4 Turbo', 'Claude 3 Opus', 'Gemini Pro'
description TEXT,
provider VARCHAR(100), -- e.g., 'OpenAI', 'Anthropic', 'Google'
model_identifier VARCHAR(255) UNIQUE, -- e.g., 'gpt-4-turbo-preview', 'claude-3-opus-20240229'
cost_per_token_input DECIMAL(10, 8), -- Cost in USD
cost_per_token_output DECIMAL(10, 8), -- Cost in USD
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
-- tasks table: Stores predefined and user-created tasks
CREATE TABLE tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
description TEXT,
category VARCHAR(100), -- e.g., 'Code Generation', 'Text Summarization', 'Data Analysis'
prompt_template TEXT NOT NULL, -- The base prompt structure for the task
is_public BOOLEAN DEFAULT TRUE, -- Whether the task is visible to all users
created_by UUID REFERENCES users(id) ON DELETE SET NULL, -- User who created it, if not system-generated
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
-- task_runs table: Records each execution of a task with a specific agent
CREATE TABLE task_runs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
task_id UUID REFERENCES tasks(id) ON DELETE CASCADE,
agent_id UUID REFERENCES agents(id) ON DELETE CASCADE,
user_id UUID REFERENCES users(id) ON DELETE SET NULL, -- Who initiated the run
custom_input TEXT, -- User-provided input if different from template
status VARCHAR(50) DEFAULT 'pending', -- 'pending', 'running', 'completed', 'failed'
result TEXT, -- The output from the AI agent
input_tokens INTEGER,
output_tokens INTEGER,
execution_time_ms BIGINT, -- Time taken for the agent to respond
total_cost DECIMAL(12, 8), -- Calculated cost for this run
error_message TEXT, -- If status is 'failed'
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
completed_at TIMESTAMP WITH TIME ZONE
);
-- user_ratings table: Allows users to rate agents on specific tasks
CREATE TABLE user_ratings (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
task_run_id UUID REFERENCES task_runs(id) ON DELETE CASCADE,
user_id UUID REFERENCES users(id) ON DELETE CASCADE,
rating SMALLINT CHECK (rating >= 1 AND rating <= 5), -- 1 to 5 stars
comment TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
UNIQUE (task_run_id, user_id) -- Prevent duplicate ratings for the same run by the same user
);
-- Example Drizzle Schema Setup (TypeScript):
/*
import { pgTable, uuid, varchar, text, timestamp, boolean, integer, decimal, smallint, primaryKey } from 'drizzle-orm/pg-core';
import { relations } from 'drizzle-orm';
export const users = {
// Assuming auth provider handles user table structure, reference its ID
// id: uuid('id').primaryKey(),
};
export const agents = pgTable('agents', {
id: uuid('id').defaultRandom().primaryKey(),
name: varchar('name', { length: 255 }).notNull().unique(),
description: text('description'),
provider: varchar('provider', { length: 100 }),
modelIdentifier: varchar('model_identifier', { length: 255 }).unique(),
costPerTokenInput: decimal('cost_per_token_input', { precision: 10, scale: 8 }),
costPerTokenOutput: decimal('cost_per_token_output', { precision: 10, scale: 8 }),
createdAt: timestamp('created_at', { withTimezone: true }).default(sql`CURRENT_TIMESTAMP`)
});
export const tasks = pgTable('tasks', {
id: uuid('id').defaultRandom().primaryKey(),
name: varchar('name', { length: 255 }).notNull(),
description: text('description'),
category: varchar('category', { length: 100 }),
promptTemplate: text('prompt_template').notNull(),
isPublic: boolean('is_public').default(true),
createdBy: uuid('created_by').references(() => users.id), // Adjust if users table schema differs
createdAt: timestamp('created_at', { withTimezone: true }).default(sql`CURRENT_TIMESTAMP`)
});
export const taskRuns = pgTable('task_runs', {
id: uuid('id').defaultRandom().primaryKey(),
taskId: uuid('task_id').references(() => tasks.id, { onDelete: 'cascade' }),
agentId: uuid('agent_id').references(() => agents.id, { onDelete: 'cascade' }),
userId: uuid('user_id').references(() => users.id, { onDelete: 'set null' }),
customInput: text('custom_input'),
status: varchar('status', { length: 50 }).default('pending'),
result: text('result'),
inputTokens: integer('input_tokens'),
outputTokens: integer('output_tokens'),
executionTimeMs: bigint('execution_time_ms'),
totalCost: decimal('total_cost', { precision: 12, scale: 8 }),
errorMessage: text('error_message'),
createdAt: timestamp('created_at', { withTimezone: true }).default(sql`CURRENT_TIMESTAMP`),
completedAt: timestamp('completed_at', { withTimezone: true })
});
export const userRatings = pgTable('user_ratings', {
id: uuid('id').defaultRandom().primaryKey(),
taskRunId: uuid('task_run_id').references(() => taskRuns.id, { onDelete: 'cascade' }),
userId: uuid('user_id').references(() => users.id, { onDelete: 'cascade' }),
rating: smallint('rating').check(...),
comment: text('comment'),
createdAt: timestamp('created_at', { withTimezone: true }).default(sql`CURRENT_TIMESTAMP`)
}, (t) => ({
unq: primaryKey(t.taskRunId, t.userId)
}));
// Relations (Example)
export const agentRelations = relations(agents, ({ many }) => ({
taskRuns: many(taskRuns)
}));
export const taskRelations = relations(tasks, ({ many, one }) => ({
taskRuns: many(taskRuns),
createdBy: one(users) // Assuming users table is defined
}));
export const taskRunRelations = relations(taskRuns, ({ one, many }) => ({
task: one(tasks),
agent: one(agents),
user: one(users), // Assuming users table is defined
ratings: many(userRatings)
}));
export const userRatingRelations = relations(userRatings, ({ one }) => ({
taskRun: one(taskRuns),
user: one(users)
}));
*/
CORE FEATURES & USER FLOW:
1. **User Authentication:**
* **Flow:** User lands on the homepage. Clicks 'Sign Up' or 'Log In'. Redirected to auth provider's page (e.g., Google OAuth). After successful authentication, user is redirected back to the app dashboard. Unauthenticated users can view public tasks and results but cannot run tests or rate.
* **Implementation:** Use NextAuth.js or Clerk. Implement callbacks to manage user session and potentially link to existing `users` table if needed.
2. **Task Management:**
* **Viewing Tasks:** Authenticated/unauthenticated users can navigate to the 'Tasks' page. A list of public tasks is displayed with name, category, description, and creator (if applicable). Filters (category, creator) and search functionality should be available.
* **Creating Tasks (Authenticated Users):** Users can click 'Create New Task'. A form appears (using React Hook Form and Zod validation) with fields for Name, Description, Category (dropdown/autocomplete), Prompt Template. `is_public` toggle. Upon submission, a new record is created in the `tasks` table via a Server Action.
* **User Flow (Task Creation):** Dashboard -> Tasks -> 'Create Task' Button -> Task Creation Form -> Submit -> Redirect to Tasks List / Task Detail.
3. **Agent Testing (Core Feature):**
* **Initiating a Test:** From the 'Tasks' page or a specific Task Detail page, users can select an AI agent from a dropdown list of available agents. If the task requires custom input beyond the template, an input field will appear. User clicks 'Run Test'.
* **Backend Process:** A Server Action is triggered. It creates a `task_runs` record with `status='pending'`. A background job (e.g., using a queue system like BullMQ or a simple polling mechanism for MVP) picks up the task. The job constructs the full prompt (template + custom input), calls the respective AI agent's API using its `model_identifier` and `cost_per_token` data. It records `input_tokens`, `output_tokens`, `execution_time_ms`, calculates `total_cost`, and saves the `result` or `error_message`. The `status` is updated to `completed` or `failed`.
* **User Feedback:** The UI should show a loading state for the test. Upon completion, the user is notified (e.g., toast notification) and redirected to the Task Run Results page or the task details page shows the updated status.
* **User Flow:** Tasks List -> Select Task -> Select Agent -> (Optional) Add Custom Input -> 'Run Test' -> Loading Indicator -> Task Run Results Page.
4. **Results Analysis & Visualization:**
* **Results Page:** A dedicated 'Results' page (or section within Task Detail) displays all `task_runs` for a given task. For each run, it shows the Agent used, Status, Input/Output Tokens, Execution Time, Total Cost, and the Result snippet. A 'View Full Result' button can expand to show the complete output.
* **Dashboard Analytics:** The main Dashboard page aggregates data. It shows:
* Overall Agent Performance Ranking (e.g., average cost, average time, average rating).
* Task Success Rate by Agent.
* Cost Distribution across agents.
* A chart comparing top agents on a selected task (e.g., bar chart for cost, speed, token usage).
* **Implementation:** Use Server Components to fetch aggregated data. Use a charting library to render visualizations.
5. **User Ratings:**
* **Rating Interface:** On the Task Run Results page, authenticated users can rate a completed run (1-5 stars) and add an optional comment. This creates a record in the `user_ratings` table, linked to the `task_run_id` and `user_id`.
* **Impact:** Average ratings are displayed alongside agent performance metrics, influencing overall rankings.
* **User Flow:** Results Page -> Find Completed Run -> Click 'Rate' -> Rating Modal -> Submit -> Average rating updates.
API & DATA FETCHING:
- **API Routes (App Router):** Primarily use Server Actions for mutations (creating tasks, running tests, submitting ratings) and Server Components for data fetching and rendering.
- **Data Flow:**
* **Reading Data:** Server Components fetch data directly from the database using Drizzle ORM (e.g., `getTasks()`, `getAgentPerformance()`). Data is passed as props to client components or rendered directly.
* **Mutations:** Server Actions are invoked directly from client components (e.g., `form.submit(serverAction)`). They perform database writes and can revalidate data.
* **Real-time Updates (for Task Status):** For the 'running' state, consider using polling on the client or Server-Sent Events (SSE) for a smoother experience, though basic polling is sufficient for MVP.
- **Example API Logic (Conceptual Server Action):**
```typescript
// app/tasks/actions.ts
'use server';
import { db } from '@/lib/drizzle'; // Your Drizzle setup
import { taskRuns, agents, tasks } from '@/lib/schema';
import { eq, sql } from 'drizzle-orm';
import { getServerSession } from 'next-auth';
import { authOptions } from '@/lib/auth'; // Your NextAuth config
import { callAgentAPI } from '@/lib/agentService'; // Your service to interact with agent APIs
export async function runTask(taskId: string, agentId: string, customInput?: string) {
const session = await getServerSession(authOptions);
if (!session?.user?.id) {
throw new Error('Authentication required');
}
const task = await db.query.tasks.findFirst({ where: eq(tasks.id, taskId) });
const agent = await db.query.agents.findFirst({ where: eq(agents.id, agentId) });
if (!task || !agent) {
throw new Error('Task or Agent not found');
}
const prompt = customInput ? task.promptTemplate.replace('{{input}}', customInput) : task.promptTemplate;
// TODO: Implement more sophisticated prompt variable replacement if needed
const initialRun = await db.insert(taskRuns).values({
taskId: task.id,
agentId: agent.id,
userId: session.user.id,
customInput: customInput || null,
status: 'pending'
}).returning({ id: taskRuns.id });
try {
// Simulate or call actual agent API
const { result, inputTokens, outputTokens, executionTimeMs } = await callAgentAPI(agent.modelIdentifier, prompt);
const inputCost = (inputTokens || 0) * (agent.costPerTokenInput || 0);
const outputCost = (outputTokens || 0) * (agent.costPerTokenOutput || 0);
const totalCost = inputCost + outputCost;
await db.update(taskRuns).set({
result: result.substring(0, 1000), // Truncate for preview
inputTokens,
outputTokens,
executionTimeMs,
totalCost,
status: 'completed',
completedAt: new Date()
}).where(eq(taskRuns.id, initialRun[0].id));
return { success: true, runId: initialRun[0].id };
} catch (error) {
console.error('Task run failed:', error);
await db.update(taskRuns).set({
status: 'failed',
errorMessage: error.message,
completedAt: new Date()
}).where(eq(taskRuns.id, initialRun[0].id));
throw new Error('Task execution failed.');
}
}
```
COMPONENT BREAKDOWN (Next.js App Router Structure):
- **`app/`**
- **`(auth)/`** (Route Group for Auth Pages)
- `sign-in/page.tsx` (Handles sign-in logic, potentially redirects to provider)
- `sign-up/page.tsx` (Handles sign-up logic)
- **`(marketing)/`** (Route Group for Marketing Pages)
- `page.tsx` (Homepage/Landing Page)
- `about/page.tsx`
- `pricing/page.tsx`
- **`(app)/`** (Route Group for Authenticated App Features)
- `layout.tsx` (Main app layout, includes sidebar/navbar, checks auth)
- **`dashboard/page.tsx`**:
- **Components:** `AnalyticsOverview` (RSC fetching aggregated data), `AgentRankingChart`, `RecentTaskRunsList` (Client Component with polling or SSE hook).
- **State:** Primarily managed via RSC data fetching.
* **`tasks/page.tsx`**:
- **Components:** `TaskList` (RSC fetching tasks, with client-side filtering/search), `CreateTaskButton` (Client component opening modal), `TaskFilter`, `TaskSearchInput`.
- **State:** Filters managed client-side. Task list refreshed via Server Action revalidation.
* **`tasks/[taskId]/page.tsx`**: (Task Detail Page)
- **Components:** `TaskDetailView` (RSC showing task info), `RunTestForm` (Client Component with agent selection dropdown, custom input field, calls `runTask` Server Action), `TaskRunList` (Client Component fetching runs for this task, includes pagination, rating buttons), `LoadingIndicator`.
- **State:** Form state managed by `react-hook-form`. Task run list state managed client-side.
* **`tasks/[taskId]/runs/[runId]/page.tsx`**: (Specific Task Run Result)
- **Components:** `TaskRunDetails` (RSC fetching single run data), `FullResultViewer` (Modal or expandable section), `RatingForm` (Client Component, visible if not rated by user, calls `submitRating` Server Action).
- **State:** Rating form state.
* **`agents/page.tsx`**:
- **Components:** `AgentList` (RSC fetching agents, showing key stats like cost/token, avg rating), `AgentDetailView` (optional, on click).
- **State:** Data fetched via RSC.
* **`settings/page.tsx`**:
- **Components:** User profile settings, API key management (if applicable later).
- **State:** Form state.
- **`layout.tsx`**: Root layout (html, body, providers like NextAuth, ThemeProvider).
- **`page.tsx`**: Root landing page (if not using `(marketing)/page.tsx`).
- **`components/`** (Shared UI Components, mostly shadcn/ui based)
- `ui/button.tsx`, `ui/card.tsx`, `ui/input.tsx`, `ui/modal.tsx`, `ui/dropdown-menu.tsx`, `ui/table.tsx`, `ui/alert.tsx`, `ui/progress.tsx`, `ui/toast.tsx` etc.
- `common/Navbar.tsx`, `common/Sidebar.tsx`, `common/Footer.tsx`
- `tasks/TaskForm.tsx`, `tasks/TaskList.tsx`, `tasks/TaskRunCard.tsx`
- `agents/AgentCard.tsx`
- `analytics/ChartComponent.tsx`
- `auth/SignInButton.tsx`, `auth/SignOutButton.tsx`
- **`lib/`**
- `drizzle.ts` (Drizzle setup)
- `schema.ts` (Database schema definition)
- `auth.ts` (NextAuth.js configuration)
- `agentService.ts` (Abstraction for calling external AI APIs)
- `utils.ts` (Helper functions)
- `constants.ts` (App-wide constants)
- **`actions/`** (Server Actions)
- `taskActions.ts`, `agentActions.ts`, `ratingActions.ts`
- **`app/api/`** (Potentially for external API integrations, not core MVP functionality)
UI/UX DESIGN & VISUAL IDENTITY:
- **Style:** Modern, clean, professional, with subtle futuristic elements. Focus on clarity and data visualization.
- **Color Palette:**
- Primary: Dark Blue (`#1E3A8A` - slate-800) or Deep Purple (`#581C87` - purple-900)
- Secondary/Accent: Teal (`#0D9488` - teal-500) or Electric Blue (`#2563EB` - blue-600)
- Background: Very Dark Gray (`#111827` - gray-900)
- Card/Surface: Slightly Lighter Gray (`#1F2937` - gray-800)
- Text (Primary): Off-white (`#F3F4F6` - gray-100)
- Text (Secondary): Light Gray (`#9CA3AF` - gray-400)
- Success: Green (`#10B981` - green-500)
- Warning/Error: Red (`#EF4444` - red-500)
- **Typography:** Sans-serif font. Use a clear, readable font like Inter or Poppins. Scale appropriately for headings, body text, and labels.
- **Layout:** Utilize a sidebar navigation for authenticated sections (`/dashboard`, `/tasks`, `/agents`). Main content area uses a clean grid system. Cards for displaying data points and lists. Responsive design is crucial, adapting layout for desktop, tablet, and mobile.
- **Key Elements:** Clear headings, intuitive navigation, well-spaced elements, visually distinct interactive elements (buttons, inputs), effective use of whitespace.
ANIMATIONS:
- **Page Transitions:** Subtle fade-in/fade-out using Next.js's built-in router or a library like `Framer Motion` if more complex animations are desired.
- **Loading States:** Use spinners (`lucide-react` icons) within buttons during form submissions or data fetching. Skeleton loaders for content areas before data loads.
- **Hover Effects:** Slight scale-up or color change on interactive elements (buttons, cards).
- **Data Updates:** Animate chart updates or list refreshes subtly to indicate new data.
- **Transitions:** Smooth transitions for modal pop-ups and collapsible sections.
EDGE CASES:
- **Authentication:** Redirect unauthenticated users trying to access protected routes (`/dashboard`, etc.) to the sign-in page. Display appropriate messages.
- **Empty States:** When lists are empty (no tasks, no runs, no agents), display informative messages and clear calls to action (e.g., 'Create your first task!').
- **API Errors:** Gracefully handle errors from external AI APIs (rate limits, invalid requests, server errors). Display user-friendly error messages via toast notifications. Log detailed errors server-side.
- **Data Validation:** Use Zod for comprehensive validation on all user inputs (task creation, ratings, custom inputs) both client-side and server-side (via Server Actions).
- **Database Constraints:** Ensure unique constraints (e.g., agent model identifiers) and foreign key constraints are correctly defined and handled.
- **Cost Calculation:** Handle cases where token counts or cost per token might be null or zero. Default to 0 for calculations to prevent errors.
- **Long Results:** Implement truncation for displayed results with an option to view the full content to avoid overwhelming the UI.
SAMPLE DATA (For Initial State / Frontend Mocking):
1. **Agents:**
* `{ id: 'uuid-gpt4', name: 'GPT-4 Turbo', provider: 'OpenAI', costPerTokenInput: 0.00001, costPerTokenOutput: 0.00003 }`
* `{ id: 'uuid-claude3', name: 'Claude 3 Opus', provider: 'Anthropic', costPerTokenInput: 0.000015, costPerTokenOutput: 0.000075 }`
* `{ id: 'uuid-gemini', name: 'Gemini Pro', provider: 'Google', costPerTokenInput: 0.0000005, costPerTokenOutput: 0.0000005 }`
2. **Tasks:**
* `{ id: 'uuid-summarize', name: 'Summarize Article', description: 'Generate a concise summary of a given text.', category: 'Text Generation', promptTemplate: 'Summarize the following text in 3 sentences: \n\n{{input}}' }`
* `{ id: 'uuid-codegen', name: 'Generate Python Function', description: 'Create a Python function based on a description.', category: 'Code Generation', promptTemplate: 'Write a Python function that does the following: \n\n{{input}}' }`
* `{ id: 'uuid-translate', name: 'Translate to French', description: 'Translate English text to French.', category: 'Translation', promptTemplate: 'Translate the following English text to French: \n\nEnglish: {{input}}\nFrench:' }`
3. **Task Runs (Example Completed Run):**
* `{ id: 'uuid-run1', taskId: 'uuid-summarize', agentId: 'uuid-gpt4', userId: 'user-abc', status: 'completed', result: 'This is a generated summary of the provided article...', inputTokens: 150, outputTokens: 45, executionTimeMs: 2100, totalCost: 0.0000060, created_at: '2024-03-15T10:00:00Z', completed_at: '2024-03-15T10:00:02Z' }`
4. **Task Runs (Example Failed Run):**
* `{ id: 'uuid-run2', taskId: 'uuid-codegen', agentId: 'uuid-claude3', userId: 'user-abc', status: 'failed', errorMessage: 'API Error: Model not found', inputTokens: null, outputTokens: null, executionTimeMs: 500, totalCost: null, created_at: '2024-03-15T10:05:00Z', completed_at: '2024-03-15T10:05:01Z' }`
5. **User Ratings:**
* `{ id: 'uuid-rating1', taskRunId: 'uuid-run1', userId: 'user-xyz', rating: 5, comment: 'Excellent summary, very accurate!' }`
6. **User Ratings (Another User):**
* `{ id: 'uuid-rating2', taskRunId: 'uuid-run1', userId: 'user-abc', rating: 4, comment: 'Good, but a bit too concise.' }`
7. **User (For reference in schema):**
* `{ id: 'user-abc', name: 'Alice', email: 'alice@example.com', image: '...' }`
* `{ id: 'user-xyz', name: 'Bob', email: 'bob@example.com', image: '...' }`
8. **Task Run (Running State for UI Testing):**
* `{ id: 'uuid-run3', taskId: 'uuid-translate', agentId: 'uuid-gemini', userId: 'user-abc', status: 'running', created_at: '2024-03-15T10:10:00Z' }`
9. **Task (Public Task Example):**
* `{ id: 'uuid-public-joke', name: 'Tell a Joke', description: 'Generate a funny joke.', category: 'Entertainment', promptTemplate: 'Tell me a joke about AI.', is_public: true, created_by: null }`
10. **Agent (High Cost Example):**
* `{ id: 'uuid-hypothetical-agent', name: 'MegaModel X', provider: 'FutureAI', costPerTokenInput: 0.01, costPerTokenOutput: 0.02 }`
This prompt is designed to guide an AI coding assistant to generate a robust, multi-page MVP application with authentication, CRUD operations, data visualization, and a clear user flow, adhering to modern Next.js best practices.