PROJECT OVERVIEW:
The application, tentatively named 'Yerel Zeka: Gizli LLM Asistanı', aims to democratize access to powerful Large Language Models (LLMs) by enabling users to run them locally on their own devices. It addresses the critical issues of privacy, data control, and dependency on Big Tech platforms, which currently dominate LLM services. The core value proposition is to provide users with a secure, private, and fully controllable AI assistant that functions offline, bridging the capability gap between cutting-edge cloud-based LLMs and the limitations of current on-device models. Users will be able to download, manage, and interact with a variety of open-source LLMs without sending their data to external servers.
TECH STACK:
- Frontend Framework: React.js (using Vite for rapid development)
- Styling: Tailwind CSS for utility-first styling and rapid UI development
- State Management: Zustand for global state management due to its simplicity and performance
- Local LLM Interaction: WebAssembly (WASM) with bindings for models like Llama.cpp or similar C/C++ LLM inference engines compiled for different platforms (Windows, macOS, Linux). This will be the most challenging part, likely requiring platform-specific builds or a robust WASM runtime.
- Local Storage: Browser's Local Storage API for storing user preferences, settings, and potentially downloaded model metadata. For actual model files, the application will manage them within the user's file system (requiring appropriate permissions).
- UI Components: Radix UI for accessible and unstyled UI primitives, styled with Tailwind CSS.
- Icons: Heroicons
- Routing (if needed for future features): React Router DOM
CORE FEATURES:
1. **Local LLM Execution Engine Integration:**
* **User Flow:** Upon first launch, the app detects the user's OS and CPU/GPU capabilities. It prompts the user to grant necessary file system permissions. The engine then loads the selected LLM from the local file system and prepares it for inference.
* **Details:** This core feature involves integrating a compiled LLM inference engine (e.g., Llama.cpp compiled to WASM or a native binary accessed via Node.js APIs if building a desktop app with Electron/Tauri). The engine must handle model loading, tokenization, prompt processing, inference, and de-tokenization efficiently. It should support both CPU and GPU (via WebGPU API in browsers or native GPU APIs in desktop apps) acceleration where available.
2. **Model Management & Download Hub:**
* **User Flow:** Users navigate to a 'Model Hub' section. They see a list of compatible, open-source LLMs (e.g., Llama 2, Mistral, Mixtral variants) with details like size, performance benchmarks (estimated), and required VRAM/RAM. Users can click 'Download' on a model. The app shows download progress and status. Once downloaded, the model appears in the 'My Models' list and can be selected for use.
* **Details:** This requires a curated list of model repositories (e.g., Hugging Face GGUF format links). The app will fetch metadata for these models and allow downloading the model files (which can be several GBs) to a designated directory on the user's machine. Download management needs to be robust, handling interruptions and retries.
3. **Chat Interface:**
* **User Flow:** A clean chat interface displays the conversation history. Users type their prompts into an input field at the bottom. Upon sending, the prompt is sent to the local LLM engine. The app displays a '...' or 'Generating...' indicator while the model processes the request. The response is then displayed in the chat window. Users can clear the chat, or start a new conversation.
* **Details:** This is a standard chat UI. It needs to handle streaming responses from the LLM if the backend engine supports it, providing a more interactive feel. It should also display conversation history clearly, differentiating between user prompts and AI responses.
4. **Settings & Configuration:**
* **User Flow:** Users can access a settings menu to adjust parameters like 'temperature', 'max tokens', 'top-p', etc., for the active LLM. They can also configure the download directory for models, choose between CPU/GPU acceleration, and manage application preferences (e.g., theme, startup behavior).
* **Details:** This provides users with fine-grained control over the LLM's behavior. The settings UI should be clear and provide brief explanations for each parameter. Input validation is crucial here.
UI/UX DESIGN:
- **Layout:** A single-page application (SPA) layout. A persistent sidebar (collapsible on smaller screens) for navigation (Chat, Model Hub, Settings). The main content area displays the active feature (chat window, model list, settings form). For the chat, a typical two-column layout: conversation history on the left/center, input bar at the bottom.
- **Color Palette:** Dark theme primarily, focusing on deep grays, blues, and subtle accent colors for interactive elements. This is easy on the eyes for long coding/chat sessions and aligns with a 'tech' aesthetic. Example: Primary background: `#1a1a1d`, Secondary background: `#25252c`, Text: `#e0e0e0`, Accent: `#4a90e2` (for buttons, links).
- **Typography:** A clean, modern sans-serif font like Inter or Roboto for body text and UI elements. A slightly more distinct but still readable font for headings. Font sizes: Body 16px, Headings 24px-32px.
- **Responsive Design:** Mobile-first approach. Sidebar collapses into a hamburger menu on small screens. Chat input and history adapt to screen width. Model cards and settings forms should stack vertically.
- **Key Elements:** Clear visual hierarchy. Interactive elements should have distinct hover and active states. Loading states should be visually indicated (spinners, progress bars, skeleton screens).
COMPONENT BREAKDOWN:
1. **`App.jsx`:** Main application component. Manages overall layout, routing (if any), and global context/state initialization.
* Props: None
* Responsibility: Root component, layout structure.
2. **`Sidebar.jsx`:** Navigation menu.
* Props: `activeItem` (string), `onItemClick` (function)
* Responsibility: Display navigation links, handle item selection.
3. **`ChatWindow.jsx`:** Displays the conversation history and handles user input.
* Props: `messages` (array), `isLoading` (boolean), `onSendMessage` (function)
* Responsibility: Render messages, input field, send button, loading indicator.
4. **`Message.jsx`:** Renders a single chat message (user or AI).
* Props: `sender` (string - 'user' | 'ai'), `text` (string)
* Responsibility: Display individual message bubble.
5. **`ModelHub.jsx`:** Displays the list of available LLMs and download controls.
* Props: `models` (array), `onDownloadClick` (function), `downloadProgress` (object)
* Responsibility: List models, initiate downloads.
6. **`ModelCard.jsx`:** Represents a single LLM in the hub.
* Props: `model` (object - name, size, description, etc.), `downloadStatus` (string), `progress` (number), `onDownload` (function)
* Responsibility: Display model info, download button/progress.
7. **`SettingsPanel.jsx`:** Contains configuration options.
* Props: `config` (object), `onChange` (function)
* Responsibility: Render settings form, handle config changes.
8. **`LLM Engine Wrapper` (Internal/Abstracted):** Not a direct React component, but a module/class responsible for interacting with the WASM/native LLM inference backend.
* Methods: `loadModel(modelPath)`, `generateResponse(prompt, config)`, `getAvailableModels()`, `downloadModel(url, destPath)`
* Responsibility: Abstracting the complex LLM inference logic.
DATA MODEL:
- **`messages` state (in `ChatWindow` or global state):**
```json
[
{ "id": "msg_1", "sender": "user", "text": "Merhaba, bana güncel olaylar hakkında bilgi verebilir misin?" },
{ "id": "msg_2", "sender": "ai", "text": "Elbette, hangi konudaki güncel olaylar ilginizi çekiyor?" }
]
```
- **`models` state (in `ModelHub` or global state):**
```json
[
{
"id": "llama2-7b-chat",
"name": "Llama 2 7B Chat",
"description": "Meta AI tarafından geliştirilen, sohbet odaklı 7 milyar parametreli model.",
"size_gb": 4.7,
"downloadUrl": "http://example.com/models/llama-2-7b-chat.gguf",
"localPath": "/path/to/models/llama-2-7b-chat.gguf",
"status": "downloaded" // 'not_downloaded', 'downloading', 'downloaded', 'error'
},
// ... other models
]
```
- **`downloadProgress` state:**
```json
{
"llama2-7b-chat": 75 // Percentage
}
```
- **`settings` state (in `SettingsPanel` or global state):**
```json
{
"temperature": 0.7,
"maxTokens": 512,
"topP": 0.9,
"preferredDevice": "auto", // 'cpu', 'gpu', 'auto'
"modelDirectory": "~/yerel-zeka-models"
}
```
- **State Management:** Zustand store to hold `models`, `messages`, `settings`, `downloadProgress`, `isGenerating` flags.
ANIMATIONS & INTERACTIONS:
- **Hover Effects:** Subtle background color changes or slight scaling on buttons and clickable elements.
- **Transitions:** Smooth transitions for sidebar collapse/expand, modal openings/closings, and route changes (if applicable). Tailwind CSS's transition utilities will be used.
- **Loading States:**
* When downloading models: Progress bars on `ModelCard` components.
* When generating a response: A pulsing animation or a subtle typing indicator (`...`) in the chat UI, possibly disabling the input field temporarily.
* Initial app load: A simple splash screen or skeleton UI.
- **Micro-interactions:** Button click feedback, subtle animations when a new message appears in the chat.
EDGE CASES:
- **No Models Downloaded:** The Model Hub should display a message encouraging the user to download their first model, perhaps highlighting a recommended small model.
- **Download Failures:** Handle network errors or disk space issues gracefully. Display an error message to the user and allow retrying the download. Update model status to 'error'.
- **LLM Engine Errors:** If the LLM fails to load or run (e.g., due to incompatible hardware, corrupted model file), catch the error and display a user-friendly message in the chat interface or a notification. Provide guidance on potential solutions (check hardware, re-download model).
- **Unsupported Hardware:** If the user's system cannot support the chosen LLM acceleration (e.g., no compatible GPU for WebGPU), default to CPU inference and inform the user.
- **Input Validation:** Validate user inputs in the settings panel (e.g., temperature must be between 0 and 1). Ensure prompts sent to the LLM are within reasonable length limits if necessary.
- **Accessibility (a11y):** Use semantic HTML, ensure proper ARIA attributes, provide keyboard navigation support, and ensure sufficient color contrast, especially for the dark theme.
SAMPLE DATA:
1. **`models` array sample:**
```json
[
{
"id": "mistral-7b-instruct-v0.2",
"name": "Mistral 7B Instruct v0.2",
"description": "Mistral AI's instruction-tuned 7B parameter model, known for its efficiency and performance.",
"size_gb": 4.1,
"downloadUrl": "http://example.com/models/mistral-7b-instruct-v0.2.gguf",
"localPath": null,
"status": "not_downloaded"
},
{
"id": "phi-2",
"name": "Phi-2",
"description": "Microsoft's small but powerful 2.7B parameter model.",
"size_gb": 1.6,
"downloadUrl": "http://example.com/models/phi-2.gguf",
"localPath": "/path/to/models/phi-2.gguf",
"status": "downloaded"
}
]
```
2. **`messages` array sample (after a few turns):**
```json
[
{ "id": "m1", "sender": "user", "text": "Explain the concept of local LLMs in simple terms."},
{ "id": "m2", "sender": "ai", "text": "Local LLMs are Artificial Intelligence language models that run directly on your computer or device, instead of on a remote server. This means your data stays private, you don't need an internet connection to use them, and you have more control over how they work."},
{ "id": "m3", "sender": "user", "text": "What are the main benefits compared to cloud-based LLMs?"},
{ "id": "m4", "sender": "ai", "text": "The key benefits are enhanced privacy (no data leaves your device), greater control (no censorship or arbitrary bans), offline functionality, and potentially lower costs in the long run as you avoid subscription fees."}
]
```
3. **`settings` object sample:**
```json
{
"temperature": 0.8,
"maxTokens": 1024,
"topP": 0.95,
"preferredDevice": "gpu",
"modelDirectory": "C:\\Users\\User\\Documents\\LLM_Models"
}
```
4. **`downloadProgress` object sample:**
```json
{
"phi-2": 45
}
```
5. **Single User Message:**
```json
{ "id": "unique_msg_id", "sender": "user", "text": "Describe the potential of local LLMs for developers."}
```
6. **Single AI Response (Streaming simulation):**
```json
{ "id": "unique_msg_id", "sender": "ai", "text": "Local LLMs offer developers a powerful..."} // Followed by chunks if streaming is supported
```
7. **Model Metadata Structure:**
```json
{
"id": "codellama-7b",
"name": "Code Llama 7B",
"description": "LLM specialized for coding tasks.",
"size_gb": 4.1,
"tags": ["coding", "programming", "developer"]
}
```
DEPLOYMENT NOTES:
- **Build Process:** Use Vite (`npm run build`) for optimized production builds. For desktop deployment, consider packaging the application using Electron or Tauri, which will bundle the Node.js environment and necessary native dependencies.
- **Environment Variables:** Use `.env` files for managing API keys (if any are ever needed for metadata fetching) and build-time configurations. `VITE_` prefix for Vite variables.
- **Cross-Platform Compatibility:** The biggest challenge is the LLM engine. Ensure the WASM compilation or native binaries are available for Windows, macOS, and Linux. This might require separate build pipelines or careful dependency management.
- **Performance Optimizations:**
* Lazy loading components that are not immediately visible.
* Code splitting using Vite's capabilities.
* Efficient state management to prevent unnecessary re-renders.
* Optimize model loading times, potentially using background threads or workers.
* Leverage GPU acceleration via WebGPU or native bindings where possible.
- **File System Access:** For desktop apps (Electron/Tauri), request appropriate permissions for accessing the model download directory. For browser-based apps, this is highly restricted and might necessitate a companion desktop app or a different architecture.