## AI Master Prompt: Agent AI Forge MVP
**1. PROJECT OVERVIEW:**
**App Name:** Agent AI Forge
**Core Problem:** The rapid growth of 'agentic AI' necessitates specialized, efficient, and scalable infrastructure. Managing and optimizing AI workloads on new silicon like Arm AGI CPUs presents challenges in deployment, monitoring, and resource allocation. Existing solutions may not be tailored for the unique demands of continuous, globally scaled AI agent operations.
**Value Proposition:** Agent AI Forge is a comprehensive SaaS platform designed to empower organizations to build, deploy, monitor, and optimize their AI infrastructure specifically leveraging Arm AGI CPUs and the Neoverse platform. It provides deep visibility into AI agent performance, intelligent resource management, and automated optimization suggestions, enabling customers to maximize efficiency, reduce operational costs, and accelerate their AI initiatives.
**Target User:** Technical teams (DevOps, ML Ops, Infrastructure Engineers) at companies building or managing large-scale AI systems, particularly those adopting or considering Arm-based AI hardware.
**2. TECH STACK:**
* **Framework:** Next.js (App Router)
* **Language:** TypeScript
* **Styling:** Tailwind CSS
* **UI Library:** shadcn/ui (for accessible, reusable components)
* **Database:** PostgreSQL (via Drizzle ORM)
* **ORM:** Drizzle ORM (for type-safe database interactions)
* **Authentication:** NextAuth.js (or Clerk for a more managed solution)
* **State Management:** React Context API / Zustand (for global state)
* **Data Fetching:** React Server Components (RSC) and client-side fetching as needed (e.g., SWR/React Query for client state)
* **Deployment:** Vercel (recommended for Next.js)
* **Charting:** Recharts or Chart.js for data visualization
* **Form Handling:** React Hook Form + Zod for validation
**3. DATABASE SCHEMA (PostgreSQL with Drizzle ORM syntax):**
```typescript
// schema.ts
import { pgTable, uuid, text, timestamp, integer, boolean, jsonb, pgEnum } from 'drizzle-orm/pg-core';
import { relations } from 'drizzle-orm';
// User & Auth
export const users = pgTable('users', {
id: uuid('id').primaryKey().defaultRandom(),
name: text('name'),
email: text('email').unique().notNull(),
emailVerified: timestamp('emailVerified', { mode: 'date' }),
image: text('image'),
createdAt: timestamp('createdAt', { mode: 'date' }).defaultNow(),
updatedAt: timestamp('updatedAt', { mode: 'date' }).defaultNow(),
});
export const accounts = pgTable('accounts', {
id: uuid('id').primaryKey().defaultRandom(),
userId: uuid('userId').notNull().references(() => users.id, { onDelete: 'cascade' }),
type: text('type').$type<string>().notNull(), // 'oauth' or 'email'
provider: text('provider').notNull(),
providerAccountId: text('providerAccountId').notNull(),
refresh_token: text('refresh_token'),
access_token: text('access_token'),
expires_at: integer('expires_at'),
token_type: text('token_type'),
scope: text('scope'),
id_token: text('id_token'),
session_state: text('session_state'),
});
export const sessions = pgTable('sessions', {
sessionToken: text('sessionToken').primaryKey(),
userId: uuid('userId').notNull().references(() => users.id, { onDelete: 'cascade' }),
expires: timestamp('expires', { mode: 'date' }).notNull(),
});
export const verificationTokens = pgTable('verificationTokens', {
identifier: text('identifier').notNull(),
token: text('token').notNull(),
expires: timestamp('expires', { mode: 'date' }).notNull(),
}).primaryKey(identifier, token);
// Core Entities
export const infrastructureTypes = pgTable('infrastructure_types', {
id: uuid('id').primaryKey().defaultRandom(),
name: text('name').unique().notNull(), // e.g., 'Arm AGI CPU', 'Neoverse V2'
description: text('description'),
});
export const hardwareUnits = pgTable('hardware_units', {
id: uuid('id').primaryKey().defaultRandom(),
unitId: text('unitId').unique().notNull(), // Unique identifier for the physical/virtual unit
name: text('name'),
typeId: uuid('typeId').notNull().references(() => infrastructureTypes.id),
location: text('location'), // e.g., Datacenter, Region
status: pgEnum('unit_status', ['online', 'offline', 'maintenance', 'unknown']).default('unknown'),
createdAt: timestamp('createdAt', { mode: 'date' }).defaultNow(),
updatedAt: timestamp('updatedAt', { mode: 'date' }).defaultNow(),
userId: uuid('userId').notNull().references(() => users.id, { onDelete: 'cascade' }),
});
export const workloads = pgTable('workloads', {
id: uuid('id').primaryKey().defaultRandom(),
workloadId: text('workloadId').unique().notNull(), // Unique identifier for the AI workload/agent
name: text('name'),
description: text('description'),
assignedHardwareUnitId: uuid('assignedHardwareUnitId').references(() => hardwareUnits.id, { onDelete: 'set null' }),
userId: uuid('userId').notNull().references(() => users.id, { onDelete: 'cascade' }),
createdAt: timestamp('createdAt', { mode: 'date' }).defaultNow(),
updatedAt: timestamp('updatedAt', { mode: 'date' }).defaultNow(),
});
export const metrics = pgTable('metrics', {
id: uuid('id').primaryKey().defaultRandom(),
workloadId: uuid('workloadId').notNull().references(() => workloads.id, { onDelete: 'cascade' }),
timestamp: timestamp('timestamp', { mode: 'date' }).notNull(),
cpuUsage: integer('cpuUsage'), // Percentage
memoryUsage: integer('memoryUsage'), // MB or Percentage
gpuUsage: integer('gpuUsage'), // Percentage, if applicable
networkIn: integer('networkIn'), // KB/s
networkOut: integer('networkOut'), // KB/s
latency: integer('latency'), // ms
errorRate: integer('errorRate'), // Percentage
// Add other relevant metrics
createdAt: timestamp('createdAt', { mode: 'date' }).defaultNow(),
});
// Relations (Example)
export const usersRelations = relations(users, ({ many }) => ({
hardwareUnits: many(hardwareUnits),
workloads: many(workloads),
}));
export const hardwareUnitsRelations = relations(hardwareUnits, ({ one, many }) => ({
type: one(infrastructureTypes),
workloads: many(workloads),
}));
export const workloadsRelations = relations(workloads, ({ one, many }) => ({
assignedHardwareUnit: one(hardwareUnits),
metrics: many(metrics),
}));
// Add more relations as needed (e.g., for alerts, optimization suggestions)
```
**4. CORE FEATURES & USER FLOW:**
* **Feature 1: Infrastructure Monitoring Dashboard**
* **User Flow:**
1. User logs in.
2. User navigates to the 'Dashboard' page.
3. The dashboard displays a summary of all registered hardware units (e.g., Arm AGI CPUs, Neoverse servers).
4. Key metrics like overall CPU usage, memory availability, and unit status are visualized using charts and status indicators.
5. User can filter units by type, location, or status.
6. Clicking on a specific unit navigates to a detailed view for that unit.
* **Functionality:** Fetch data for all `hardwareUnits` associated with the logged-in user. Fetch recent `metrics` for relevant `workloads` if applicable. Aggregate and display key performance indicators (KPIs).
* **Feature 2: Workload Management & Performance Tracking**
* **User Flow:**
1. User navigates to the 'Workloads' page.
2. A table lists all AI workloads, showing their ID, name, assigned hardware unit, and status.
3. User can add a new workload, providing a name, description, and optionally assigning it to a hardware unit.
4. User can view the performance metrics (CPU, memory, latency, error rate) for a specific workload over time via a dedicated 'Workload Detail' page with charts.
5. User can update workload details or reassign it to a different hardware unit.
6. User can delete a workload.
* **Functionality:** CRUD operations for `workloads`. Fetch `metrics` data for a selected workload and time range. Use charting libraries to visualize metric trends.
* **Feature 3: Resource Optimization Suggestions**
* **User Flow:**
1. The system analyzes `metrics` data and `hardwareUnit` utilization.
2. On the 'Dashboard' or a dedicated 'Optimization' page, the system presents potential optimizations (e.g., 'Workload X is underutilized on Unit Y, consider moving it to Unit Z', 'High latency detected for Workload A, consider increasing CPU allocation').
3. User can review suggestions and manually apply them by reassigning workloads or adjusting configurations.
* **Functionality:** Backend logic (potentially a separate microservice or scheduled job) to analyze metric trends and hardware status. Generate actionable insights based on predefined rules (e.g., utilization < 20% for > 1 hour, latency > threshold).
* **Feature 4: Alerting & Notifications**
* **User Flow:**
1. User navigates to the 'Alerts' or 'Settings' page.
2. User can define rules for alerts (e.g., 'CPU usage > 90% for 15 minutes', 'Unit offline', 'Error rate > 5%').
3. User specifies notification channels (e.g., email, Slack integration - MVP might focus on in-app notifications and email).
4. When an alert condition is met, the system triggers a notification to the user.
* **Functionality:** Background process monitors metrics and unit status against defined alert rules. Integrate with email service (e.g., Nodemailer, Resend) or a notification service.
* **Authentication Flow:**
1. User visits the landing page.
2. Clicks 'Sign In' or 'Sign Up'.
3. Redirected to NextAuth.js/Clerk login page (e.g., Google, GitHub, Email/Password).
4. Upon successful authentication, user is redirected to the main dashboard.
5. User's session is managed via tokens/cookies.
6. Protected routes prevent access for unauthenticated users.
**5. API & DATA FETCHING:**
* **Data Fetching Strategy:** Leverage Next.js App Router's Server Components for initial data loading and performance. Use client-side fetching (SWR/React Query) for dynamic data, mutations, and real-time updates where necessary.
* **API Routes (Next.js App Router `app/api/...` or Server Actions):**
* `POST /api/hardware-units`: Register a new hardware unit.
* `GET /api/hardware-units`: Get a list of all hardware units for the user.
* `GET /api/hardware-units/[id]`: Get details for a specific hardware unit.
* `PUT /api/hardware-units/[id]`: Update a hardware unit (e.g., status, name).
* `DELETE /api/hardware-units/[id]`: Delete a hardware unit.
* `POST /api/workloads`: Create a new workload.
* `GET /api/workloads`: Get a list of all workloads.
* `GET /api/workloads/[id]`: Get details for a specific workload (including recent metrics).
* `PUT /api/workloads/[id]`: Update a workload (reassignment, name, etc.).
* `DELETE /api/workloads/[id]`: Delete a workload.
* `GET /api/metrics?workloadId=[id]&timeRange=[...]`: Fetch metrics for a workload.
* `POST /api/alerts/rules`: Create an alert rule.
* `GET /api/alerts/rules`: Get user's alert rules.
* `POST /api/auth/...`: Handled by NextAuth.js/Clerk.
* **Request/Response Examples (Illustrative):**
* `POST /api/workloads` Request Body: `{ name: 'Agent-Alpha', description: '...', assignedHardwareUnitId: '...' }`
* `GET /api/workloads/[id]` Response Body: `{ id: '...', name: 'Agent-Alpha', ..., metrics: [{ timestamp: '...', cpuUsage: 75, ... }], hardwareUnit: { id: '...', name: 'Arm-AGI-01', ... } }`
* **Server Actions:** Consider using Server Actions for mutations (Create, Update, Delete) to simplify data fetching and mutation logic within components.
**6. COMPONENT BREAKDOWN (Next.js App Router Structure):**
* `app/
├── layout.tsx (Root layout, includes Head, Auth Provider)
├── page.tsx (Landing Page)
├── dashboard
│ ├── page.tsx (Dashboard Overview - RSC)
│ ├── _components
│ │ ├── UnitStatusOverview.tsx
│ │ ├── KeyMetricCharts.tsx
│ │ └── OptimizationSuggestions.tsx
├── infrastructure
│ ├── page.tsx (List of Hardware Units - RSC/Client)
│ ├── [unitId]
│ │ ├── page.tsx (Detailed Unit View - RSC/Client)
│ │ └── _components
│ │ └── UnitPerformanceChart.tsx
│ └── _components
│ └── AddHardwareUnitForm.tsx
├── workloads
│ ├── page.tsx (List of Workloads - RSC/Client)
│ ├── [workloadId]
│ │ ├── page.tsx (Detailed Workload View - RSC/Client)
│ │ └── _components
│ │ └── WorkloadMetricCharts.tsx
│ └── _components
│ └── AddWorkloadForm.tsx
├── alerts
│ ├── page.tsx (Alert Rules List/Management - Client)
│ └── _components
│ └── AlertRuleForm.tsx
├── settings
│ ├── page.tsx (User Settings, Profile)
├── api
│ ├── auth
│ │ ├── [...nextauth]
│ │ │ └── route.ts (NextAuth.js handler)
│ ├── hardware-units
│ │ ├── route.ts (CRUD API handlers)
│ ├── workloads
│ │ └── route.ts (CRUD API handlers)
│ └── metrics
│ └── route.ts (Metrics fetching)
└── (auth)
├── sign-in
│ └── page.tsx
└── sign-up
└── page.tsx
* **State Management:**
* Global Auth state: Context API or Zustand.
* Component-specific state: `useState`, `useReducer`.
* Server/Client data fetching & caching: RSC, SWR/React Query for client components.
* Form state: React Hook Form.
**7. UI/UX DESIGN & VISUAL IDENTITY:**
* **Design Style:** Modern, Clean, Professional with a subtle tech/AI aesthetic.
* **Color Palette:**
* Primary: Deep Blue (#0A192F)
* Secondary: Dark Grey (#1E293B)
* Accent: Electric Cyan (#61DAFB) or a bright Teal (#14B8A6)
* Text (Light): White (#FFFFFF)
* Text (Dark/Muted): Light Grey (#A0AEC0)
* Success: Green (#34D399)
* Warning: Yellow/Orange (#FBBF24)
* Error: Red (#F87171)
* **Typography:** Inter or Source Sans Pro (for readability and modern feel). Use a clear hierarchy (H1, H2, body, captions).
* **Layout:**
* Sidebar Navigation (collapsible) for main sections (Dashboard, Infrastructure, Workloads, Alerts, Settings).
* Main content area uses a clean, card-based or table-based layout.
* Generous whitespace.
* Responsive design: Mobile-first approach, adapting to tablet and desktop layouts. Sidebar might collapse or move to a top menu on smaller screens.
* **Key Components:** Data tables with sorting/filtering, interactive charts, status badges, clear form layouts, modals for confirmation/details, notification toasts.
**8. SAMPLE/MOCK DATA:**
* **Hardware Unit:**
* `{ id: 'uuid-1', unitId: 'AGI-Server-001', name: 'Arm AGI Rack 1', type: { name: 'Arm AGI CPU' }, location: 'us-east-1', status: 'online', userId: 'user-abc' }`
* `{ id: 'uuid-2', unitId: 'Neoverse-Node-05B', name: 'Neoverse Compute Node 5B', type: { name: 'Neoverse V2' }, location: 'eu-west-2', status: 'maintenance', userId: 'user-abc' }`
* **Workload:**
* `{ id: 'uuid-w1', workloadId: 'Agent-TaskRunner-X', name: 'Image Generation Agent', description: 'Processes AI image generation requests', assignedHardwareUnitId: 'uuid-1', userId: 'user-abc' }`
* `{ id: 'uuid-w2', workloadId: 'Agent-Chatbot-Alpha', name: 'Customer Support Bot', description: 'Handles real-time customer queries', assignedHardwareUnitId: 'uuid-1', userId: 'user-abc' }`
* `{ id: 'uuid-w3', workloadId: 'DataAnalysis-Worker-Z', name: 'Real-time Analytics Processor', description: 'Processes incoming data streams', assignedHardwareUnitId: null, userId: 'user-abc' }`
* **Metric (for Workload 'uuid-w1'):**
* `{ id: 'uuid-m1', workloadId: 'uuid-w1', timestamp: '2023-10-27T10:00:00Z', cpuUsage: 75, memoryUsage: 4096, latency: 150, errorRate: 0.5 }`
* `{ id: 'uuid-m2', workloadId: 'uuid-w1', timestamp: '2023-10-27T10:05:00Z', cpuUsage: 82, memoryUsage: 4150, latency: 165, errorRate: 0.8 }`
* `{ id: 'uuid-m3', workloadId: 'uuid-w1', timestamp: '2023-10-27T10:10:00Z', cpuUsage: 80, memoryUsage: 4100, latency: 160, errorRate: 0.7 }`
* **Alert Rule:**
* `{ id: 'uuid-a1', userId: 'user-abc', name: 'High CPU Alert', type: 'CPU_USAGE', condition: '>', threshold: 90, duration: 15, enabled: true, notificationChannels: ['email'] }`
**9. TURKISH TRANSLATIONS:**
* **App Title:** Agent AI Forge
* **Navigation:**
* Dashboard: Gösterge Paneli
* Infrastructure: Altyapı
* Workloads: İş Yükleri
* Alerts: Uyarılar
* Settings: Ayarlar
* **Buttons:**
* Sign In: Giriş Yap
* Sign Up: Kayıt Ol
* Add New: Yeni Ekle
* Save Changes: Değişiklikleri Kaydet
* View Details: Detayları Gör
* **Labels/Placeholders:**
* Unit ID: Birim Kimliği
* Workload Name: İş Yükü Adı
* Status: Durum
* CPU Usage: CPU Kullanımı
* Memory Usage: Bellek Kullanımı
* Latency: Gecikme
* Error Rate: Hata Oranı
* **Page Titles:**
* Dashboard: Gösterge Paneli
* Hardware Unit Details: Donanım Birimi Detayları
* Workload Performance: İş Yükü Performansı
* **Notifications:**
* Alert Triggered: Uyarı Tetiklendi!
* System Notification: Sistem Bildirimi
**10. ANIMATIONS:**
* **Page Transitions:** Subtle fade-in/out or slide animations between pages (e.g., using `Framer Motion` if needed, or CSS transitions).
* **Component Mounts:** Animate elements (like charts or cards) as they load into view.
* **Hover Effects:** Subtle background color changes or scaling effects on interactive elements (buttons, table rows, cards).
* **Loading States:** Use spinners or skeleton loaders (`shadcn/ui` provides some) while data is being fetched.
* **Chart Transitions:** Smooth transitions when data updates on charts.
**11. EDGE CASES:**
* **Empty States:** Design visually appealing and informative empty states for dashboards, unit lists, workload lists, etc., guiding users on how to add their first item.
* **Authentication:** Handle invalid credentials, session expiry, unauthorized access attempts gracefully. Ensure proper redirection after login/logout.
* **Data Fetching Errors:** Implement error boundaries and display user-friendly error messages. Provide options to retry fetching.
* **Validation:** Implement robust client-side (using Zod with React Hook Form) and server-side validation for all form inputs and API requests.
* **API Rate Limiting:** If applicable, inform users about usage limits.
* **Resource Not Found:** Handle 404 errors for specific unit or workload IDs.
* **Offline/Maintenance Units:** Clearly indicate the status of hardware units and workloads affected by these states.