## AI Master Prompt: Intellectual Property Shield
**1. PROJECT OVERVIEW:**
Intellectual Property Shield is a cutting-edge SaaS platform designed to address the growing legal challenges faced by AI developers and organizations utilizing large datasets for model training. The core problem it solves is the risk of copyright infringement stemming from the use of copyrighted materials, such as books, articles, and other creative works, as training data without proper licensing or adherence to fair use doctrines. The platform aims to provide AI developers with a proactive solution to identify, analyze, and mitigate copyright risks within their training datasets. Its primary value proposition is to enable the responsible and legal development of AI models, safeguarding companies from costly lawsuits, reputational damage, and operational disruptions. We are building a single-page application (SPA) to streamline the user experience and provide a focused, efficient tool for AI data compliance.
**2. TECH STACK:**
* **Frontend Framework:** React.js (v18+ with Hooks and Concurrent Features)
* **Styling:** Tailwind CSS (v3+) for rapid UI development and utility-first styling.
* **State Management:** Zustand for efficient and scalable global state management, chosen for its simplicity and minimal boilerplate.
* **Routing:** React Router DOM (v6+) for handling client-side navigation within the SPA.
* **UI Components:** Radix UI for accessible and unstyled primitive components, which will be styled with Tailwind CSS.
* **Icons:**lucide-react for a comprehensive set of open-source icons.
* **Data Fetching:** TanStack Query (React Query) for managing server state, caching, and asynchronous operations.
* **Form Handling:** React Hook Form for efficient and performant form management.
* **Build Tool:** Vite for its speed and optimized development experience.
* **Deployment:** Vercel (recommended for seamless integration with Next.js if chosen, or for static site deployment).
**3. CORE FEATURES:**
* **Dataset Upload:**
* **User Flow:** Users navigate to the 'Upload Dataset' section. They can either drag-and-drop files/folders or use a file input to select local datasets (e.g., .txt, .csv, .json files, or zipped archives). The system provides clear feedback on upload progress and supported file types. A loading indicator is displayed during the upload process. Upon successful upload, the dataset is registered and ready for analysis.
* **Details:** Supports various file formats. Provides validation for file size and type. Handles potential upload errors gracefully.
* **Copyright Analysis Engine:**
* **User Flow:** After upload, users initiate the analysis by clicking an 'Analyze' button for their dataset. The system processes the dataset, scanning for patterns, text similarities, and metadata indicative of copyrighted material. The user sees a real-time progress update (e.g., 'Scanning...', 'Analyzing chunks...', 'Generating report...').
* **Details:** This is a simulated engine for the MVP. The core logic involves identifying text segments that might be derived from known copyrighted works (e.g., book excerpts). It would generate a 'Risk Score' (0-100) based on the identified potential infringements.
* **Risk Report Generation:**
* **User Flow:** Once analysis is complete, a detailed report is generated. Users can view this report within the platform or download it as a PDF. The report includes the overall risk score, a breakdown of potential infringement types, highlighted problematic text segments, and links to potential sources if available (mock data for MVP).
* **Details:** Visualizes the risk score with a prominent chart. Lists specific findings with context. Offers clarity on the methodology used.
* **Mitigation Recommendations:**
* **User Flow:** Based on the risk report, the platform suggests mitigation strategies. This includes recommending alternative, openly licensed datasets, suggesting specific works that are in the public domain, or providing links to legitimate sources for acquiring licenses.
* **Details:** Provides actionable advice. Offers direct links to resources where applicable (mock links for MVP).
* **Dashboard:**
* **User Flow:** The main dashboard provides an overview of all uploaded datasets, their analysis status, risk scores, and recent reports. Users can quickly filter and sort datasets.
* **Details:** Central hub for managing datasets and understanding the overall compliance status. Displays key metrics at a glance.
**4. UI/UX DESIGN:**
* **Layout:** Single-page application structure. A persistent sidebar navigation (collapsible on smaller screens) for accessing Dashboard, Upload, Analysis History, and Settings. The main content area displays the relevant section based on sidebar selection. Clean, uncluttered design.
* **Color Palette:** Primary: `#1A202C` (Dark background). Secondary: `#4A5568` (Medium Gray for subtle elements). Accent: `#3B82F6` (Primary Blue for interactive elements, buttons, links). Alert/Warning: `#F56565` (Red for errors and high-risk indicators). Success: `#48BB78` (Green for successful operations).
* **Typography:** Inter font family. Headings: Inter Bold (e.g., 36px, 24px, 18px). Body Text: Inter Regular (e.g., 16px, 14px). Use clear hierarchy and sufficient line spacing (1.5x).
* **Responsive Design:** Mobile-first approach. Sidebar collapses into a hamburger menu on screens < 768px. Main content adjusts fluidly. Use Tailwind CSS's responsive prefixes (sm:, md:, lg:, xl:) extensively. Ensure all components are usable and visually appealing on all standard device sizes (mobile, tablet, desktop).
* **Interactivity:** Clear hover states for buttons and links. Subtle animations for transitions between pages/sections. Visual feedback for loading states (spinners, skeleton screens).
**5. DATA MODEL (State Structure & Mock Data):**
* **State Management (Zustand Store):**
```javascript
// store/datasetStore.js
import { create } from 'zustand';
export const useDatasetStore = create((set) => ({
datasets: [], // Array of dataset objects
addDataset: (dataset) => set((state) => ({ datasets: [...state.datasets, dataset] })),
updateDataset: (id, updatedData) => set((state) => ({
datasets: state.datasets.map(ds => ds.id === id ? { ...ds, ...updatedData } : ds)
})),
setDatasets: (datasets) => set({ datasets }),
// Initial mock data loading can happen here or be fetched
}));
```
* **Mock Data Format (`Dataset` object):**
```json
{
"id": "uuid-string-1",
"name": "Book Excerpts Training Set v1",
"uploadDate": "2023-10-27T10:00:00Z",
"status": "Completed", // "Uploading", "Processing", "Completed", "Error"
"riskScore": 78, // 0-100
"analysisDate": "2023-10-27T11:30:00Z",
"fileSize": "50MB",
"findings": [
{
"id": "finding-1",
"description": "High similarity found with 'The Great Gatsby' by F. Scott Fitzgerald.",
"percentageMatch": 15,
"source": "Project Gutenberg (Mock)",
"url": "https://mock.projectgutenberg.org/gatsby",
"severity": "High"
},
{
"id": "finding-2",
"description": "Potential match with content from 'Dune' by Frank Herbert.",
"percentageMatch": 8,
"source": "Internal Archive Search (Mock)",
"url": null,
"severity": "Medium"
}
],
"recommendations": [
"Consider replacing the identified segments with public domain texts or newly generated content.",
"Review licensing agreements for training data sources."
]
}
```
**6. COMPONENT BREAKDOWN:**
* **`App.jsx`:** Main application component. Sets up routing using React Router. Loads initial state.
* Props: None
* Responsibility: Root component, routing setup.
* **`Layout.jsx`:** Main layout wrapper. Includes sidebar and main content area.
* Props: `children` (ReactNode)
* Responsibility: Page structure, sidebar, header.
* **`Sidebar.jsx`:** Navigation menu component.
* Props: `isOpen` (boolean - for mobile collapse)
* Responsibility: Primary navigation links.
* **`Dashboard.jsx`:** Displays the overview of datasets.
* Props: None
* Responsibility: Fetches and displays dataset list, summary statistics.
* **`DatasetTable.jsx`:** Renders the list of datasets in a table format.
* Props: `datasets` (Array of Dataset objects)
* Responsibility: Table display, sorting, filtering logic.
* **`DatasetRow.jsx`:** Represents a single row in the DatasetTable.
* Props: `dataset` (Dataset object)
* Responsibility: Displays individual dataset info, status, risk score.
* **`UploadForm.jsx`:** Handles file uploads.
* Props: `onUploadSuccess` (function)
* Responsibility: File input, drag-and-drop zone, upload logic.
* **`AnalysisReport.jsx`:** Displays the detailed analysis report for a dataset.
* Props: `reportData` (Dataset object, specifically its findings/recommendations)
* Responsibility: Visualizes findings, risk score, recommendations.
* **`LoadingSpinner.jsx`:** Reusable loading indicator.
* Props: None
* Responsibility: Visual feedback during async operations.
* **`RiskScoreIndicator.jsx`:** Displays the risk score, potentially with color coding.
* Props: `score` (number)
* Responsibility: Visual representation of risk level.
**7. ANIMATIONS & INTERACTIONS:**
* **Hover Effects:** Subtle background color change and slight lift effect (`shadow-md` to `shadow-lg`) on table rows and buttons when hovered.
* **Transitions:** Smooth transitions between routes/page changes using `react-transition-group` or similar, applied to the main content area. Fade-in/slide-in effects for new components.
* **Loading States:** Use `LoadingSpinner.jsx` with Tailwind CSS's animation utilities (`animate-spin`). For data tables or lists, implement skeleton screens that mimic the table structure while data is being fetched.
* **Micro-interactions:** Button click feedback (slight scale down/opacity change). Success/error toast notifications (e.g., using `react-hot-toast`) with subtle fade-in/out animations.
* **Sidebar Collapse:** Smooth slide animation for the sidebar collapsing and expanding.
**8. EDGE CASES:**
* **Empty State:** Dashboard and Analysis History should display informative messages and prompts (e.g., "No datasets uploaded yet. Click here to upload your first dataset.") when empty. Use illustration/icons.
* **Error Handling:**
* **Upload Errors:** Network errors, invalid file types/sizes should display user-friendly error messages via toasts or inline alerts.
* **Analysis Errors:** If the analysis process fails, the dataset status should update to 'Error', and an error message should be visible in the report view or dataset list.
* **API Errors:** Gracefully handle errors from any backend API calls (if implemented in future versions).
* **Validation:** Input fields (e.g., for dataset naming if applicable) should have client-side validation using React Hook Form. File uploads have type and size checks before initiating the upload.
* **Accessibility (a11y):** Use semantic HTML elements. Ensure proper ARIA attributes where necessary. All interactive elements must have keyboard navigation support and clear focus indicators. Use Radix UI primitives which are built with accessibility in mind. Ensure sufficient color contrast ratios.
**9. SAMPLE DATA (Mock Data):**
* **`datasets` array (for `useDatasetStore`):**
```json
[
{
"id": "ds-001",
"name": "LLama Model Training Data - Books",
"uploadDate": "2023-10-26T09:15:00Z",
"status": "Completed",
"riskScore": 85,
"analysisDate": "2023-10-26T10:45:00Z",
"fileSize": "1.2GB",
"findings": [
{
"id": "f-001",
"description": "Significant overlap detected with 'The Lord of the Rings' by J.R.R. Tolkien. High degree of text similarity.",
"percentageMatch": 22,
"source": "Pirate Archive DB (Mock)",
"url": "https://mock.piratearchive.com/lotr",
"severity": "High"
},
{
"id": "f-002",
"description": "Potential infringement found in scientific papers collection, specific articles by Dr. Evelyn Reed (Mock Author).",
"percentageMatch": 10,
"source": "ResearchGate Clone (Mock)",
"url": null,
"severity": "Medium"
}
],
"recommendations": [
"Immediately remove or replace identified segments from 'The Lord of the Rings'.",
"Verify the licensing status of Dr. Reed's papers.",
"Consult legal counsel regarding fair use arguments for the remaining data."
]
},
{
"id": "ds-002",
"name": "Public Domain Literature Corpus",
"uploadDate": "2023-10-25T14:00:00Z",
"status": "Completed",
"riskScore": 5,
"analysisDate": "2023-10-25T15:10:00Z",
"fileSize": "800MB",
"findings": [],
"recommendations": [
"Dataset appears to be clean. Continuous monitoring is advised."
]
},
{
"id": "ds-003",
"name": "User Generated Content - Beta",
"uploadDate": "2023-10-27T11:00:00Z",
"status": "Processing",
"riskScore": null,
"analysisDate": null,
"fileSize": "250MB",
"findings": [],
"recommendations": []
},
{
"id": "ds-004",
"name": "Marketing Copy Examples",
"uploadDate": "2023-10-27T11:05:00Z",
"status": "Error",
"riskScore": null,
"analysisDate": null,
"fileSize": "15MB",
"findings": [],
"recommendations": [],
"errorMessage": "Failed to process file: Invalid character encoding."
}
]
```
**10. DEPLOYMENT NOTES:**
* **Build Settings:** Use Vite's build command (`npm run build` or `yarn build`). Ensure the `base` path is correctly configured if deploying to a subdirectory.
* **Environment Variables:** Store API keys, base URLs, and other sensitive information in `.env` files (e.g., `.env.development`, `.env.production`). Vite automatically loads these variables. Prefixes like `VITE_` are recommended for exposing variables to the frontend.
* **Performance Optimizations:**
* Code Splitting: Vite handles this automatically for route-based splitting.
* Image Optimization: If images are used, optimize them for web use (e.g., using modern formats like WebP).
* Memoization: Use `React.memo` for potentially expensive component re-renders where props haven't changed.
* Bundle Analysis: Use tools like `rollup-plugin-visualizer` (compatible with Vite) to analyze bundle size and identify large dependencies.
* Lazy Loading: Implement lazy loading for components that are not immediately visible (e.g., off-screen images, complex charts).
* **CI/CD:** Configure a CI/CD pipeline (e.g., using GitHub Actions, Vercel) for automated testing, building, and deployment upon code commits.
* **HTTPS:** Ensure the application is always served over HTTPS in production.