PROJECT OVERVIEW:
Develop a single-page web application (SPA) that provides high-accuracy audio transcription services. The application's primary goal is to convert spoken language from uploaded audio files into text, addressing the growing need for efficient and accurate transcription in various professional and personal contexts. The core value proposition is to offer a user-friendly interface leveraging state-of-the-art Automatic Speech Recognition (ASR) models, similar to the capabilities announced by Cohere Transcribe, but with a focus on a streamlined user experience and integrated text analysis features. This platform will enable users to save time and resources typically spent on manual transcription, while also unlocking insights from audio data through basic text analytics.
TECH STACK:
- Frontend Framework: React.js (using functional components and hooks)
- Styling: Tailwind CSS for rapid UI development and utility-first styling.
- State Management: Zustand for efficient and simple global state management.
- Routing: React Router DOM for handling client-side navigation (even for a single page, useful for future expansion).
- Audio Processing: Standard HTML5 Audio API for playback control and potentially web workers for background processing if needed.
- API Interaction: Axios for making HTTP requests to a (hypothetical) backend API for transcription processing.
- Form Handling: React Hook Form for efficient and validated form submissions.
- Icons: React Icons library.
CORE FEATURES:
1. **Audio Upload:**
* **User Flow:** User clicks an 'Upload Audio' button or drags and drops an audio file into a designated area. The UI provides visual feedback during upload (progress bar, file name). Accepted formats: MP3, WAV, M4A.
* **Details:** The system should handle file validation (type and size limits). Upon successful upload, the file is staged for transcription.
2. **Transcription Processing:**
* **User Flow:** After upload, the user initiates the transcription process (e.g., by clicking a 'Transcribe' button). The UI displays a loading state with a clear message indicating that the audio is being processed. A real-time update or notification signals completion.
* **Details:** This feature simulates sending the audio file to a backend ASR service. For the MVP, this can be a placeholder that returns mock transcribed text after a delay. The prompt should guide the AI to assume an API endpoint exists.
3. **Transcription Display & Editing:**
* **User Flow:** Once transcription is complete, the user sees the transcribed text in a read-only format within a text area. An 'Edit' button allows the user to modify the text directly. Changes are saved either automatically or via a 'Save' button.
* **Details:** A robust text editing component is required. The display should be clean and readable, with clear demarcation between speakers if that feature were to be expanded later.
4. **Basic Text Analysis:**
* **User Flow:** A dedicated panel or section displays basic analytics derived from the transcribed text. This includes word count, character count, and a list of the top N most frequent words (e.g., top 5).
* **Details:** This involves simple text processing algorithms to count words, characters, and word frequencies. The results should update dynamically as the user edits the transcription.
UI/UX DESIGN:
- **Layout:** Single-page application layout. A main content area for audio upload and transcription editing, with a sidebar or top bar for navigation/status and analysis results. Clean, minimalist aesthetic.
- **Color Palette:** Primary: `#007AFF` (Blue - for actions, links), Secondary: `#F0F2F5` (Light Gray - for backgrounds, containers), Accent: `#2ECC71` (Green - for success messages, completion states), Text: `#1C1E21` (Dark Gray - for readability), Subtle Grays: `#8A8D91` (for secondary text), `#E0E0E0` (for borders, dividers).
- **Typography:** Use a clean, readable sans-serif font like Inter or Roboto. Headings: Bold, larger sizes. Body text: Regular weight, comfortable line height (e.g., 1.5).
- **Responsive Design:** Mobile-first approach. Utilize Tailwind CSS's responsive modifiers (e.g., `sm:`, `md:`, `lg:`). Ensure the layout adapts gracefully to various screen sizes, with touch targets appropriately sized on mobile.
- **Key Components:** Upload Area (Drag & Drop), File List, Transcription Editor (Textarea), Analysis Panel, Buttons (Upload, Transcribe, Edit, Save).
COMPONENT BREAKDOWN:
1. **`App.jsx`:** Main application component. Manages overall layout, routing (if implemented), and global state initialization.
* Props: None
* Responsibility: Renders main layout and orchestrates other components.
2. **`Header.jsx`:** Top navigation bar.
* Props: `title` (string)
* Responsibility: Displays the application title and potentially user status/logo.
3. **`FileUpload.jsx`:** Handles audio file uploading.
* Props: `onFileUpload` (function - callback when file is uploaded successfully)
* Responsibility: Renders the drag-and-drop area and file input, handles file selection and upload initiation.
4. **`TranscriptionEditor.jsx`:** Displays and allows editing of the transcribed text.
* Props: `initialText` (string), `onTextChange` (function - callback when text is edited)
* Responsibility: Renders a textarea, manages its state, handles editing mode toggle.
5. **`AnalysisPanel.jsx`:** Shows basic text analysis results.
* Props: `text` (string)
* Responsibility: Calculates and displays word count, character count, and frequent words.
6. **`LoadingIndicator.jsx`:** Visual feedback during processing.
* Props: `message` (string)
* Responsibility: Displays a loading spinner or progress message.
7. **`AudioPlayer.jsx`:** (Optional for MVP, but good for UX) Simple audio playback controls.
* Props: `audioUrl` (string)
* Responsibility: Renders play/pause, timeline for the uploaded audio file.
DATA MODEL:
- **Global State (Zustand Store):**
```javascript
{
file: null, // { name: string, size: number, type: string, url: string } or null
transcription: {
text: '', // The transcribed text
isEditing: false, // Boolean to control edit mode
isLoading: false, // Boolean to show loading state
error: null // Error message if transcription fails
},
analysis: {
wordCount: 0,
characterCount: 0,
frequentWords: [] // Array of { word: string, count: number }
}
}
```
- **Mock Data Format (for `frequentWords`):**
```json
[
{ "word": "the", "count": 55 },
{ "word": "and", "count": 42 },
{ "word": "to", "count": 38 },
{ "word": "a", "count": 35 },
{ "word": "in", "count": 30 }
]
```
ANIMATIONS & INTERACTIONS:
- **File Upload:** Subtle background color change on hover for the drop zone. Smooth progress bar animation.
- **Button Clicks:** Slight scale-down effect on click.
- **Loading States:** A pulsing animation for the `LoadingIndicator` or a subtle fade-in/out effect.
- **Transitions:** Smooth transitions for showing/hiding the `AnalysisPanel` or entering/exiting edit mode in the `TranscriptionEditor`.
- **Micro-interactions:** Hover effect on list items if a list of files is displayed.
EDGE CASES:
- **No File Uploaded:** Disable the 'Transcribe' button. Show a prompt message in the upload area.
- **Invalid File Type:** Display an error message to the user upon selection.
- **Transcription Failure:** Show a clear error message in the `TranscriptionEditor` or a dedicated notification area. Log the error details.
- **Empty Transcription:** If the audio is silent or unrecognizable, display a message indicating no text was generated. The analysis panel should show zeros.
- **Large Files:** Implement chunking or provide clear feedback about processing time. Potentially limit file size in MVP.
- **Accessibility (a11y):** Ensure all interactive elements have proper ARIA attributes, keyboard navigability, sufficient color contrast, and semantic HTML structure.
- **Form Validation:** For any future forms, implement clear validation messages.
SAMPLE DATA (Mock Transcription & Analysis):
1. **Audio File:** `meeting_notes_q3.mp3` (15 MB)
2. **Mock Transcription Text:** "Hello everyone, and thank you for joining today's Q3 review meeting. The primary agenda item is to discuss the performance metrics from the last quarter. We have seen significant growth in user engagement, particularly with the new feature launch. However, operational costs have also increased. We need to analyze the data carefully to understand the drivers behind these trends. Next steps involve finalizing the report and planning for Q4 initiatives. Any questions?"
3. **Analysis Results for Sample Text:**
* Word Count: 75
* Character Count: 430
* Frequent Words: `[{ "word": "the", "count": 8 }, { "word": "to", "count": 4 }, { "word": "and", "count": 3 }, { "word": "Q3", "count": 2 }, { "word": "quarter", "count": 2 }]`
4. **Another Mock Text (Different Topic):** "This is a test transcription to check the accuracy and speed of the new model. Speech recognition technology is evolving rapidly. We are exploring its potential applications in customer service and content creation. The goal is to provide seamless integration into existing workflows."
5. **Analysis Results for Second Text:**
* Word Count: 40
* Character Count: 220
* Frequent Words: `[{ "word": "the", "count": 3 }, { "word": "to", "count": 2 }, { "word": "is", "count": 2 }, { "word": "a", "count": 1 }, { "word": "new", "count": 1 }]`
6. **Empty/Silent Audio Scenario:**
* Transcription Text: ""
* Analysis Results: Word Count: 0, Character Count: 0, Frequent Words: []
7. **Error Scenario:**
* Display Message: "Transcription failed. Please try again later or upload a different file."
* Underlying state: `transcription.error = 'API Error: 500 Internal Server Error'`
DEPLOYMENT NOTES:
- **Build:** Use standard React build tools (e.g., Vite or Create React App's build script). Ensure the output is optimized for production (minification, code splitting if applicable).
- **Environment Variables:** Use `.env` files for API endpoints (e.g., `REACT_APP_API_ENDPOINT=https://api.example.com/transcribe`). Ensure sensitive keys are not exposed client-side.
- **Performance Optimizations:** Lazy load components where appropriate. Optimize images and assets. Use React.memo for expensive component re-renders. Ensure efficient state management updates to prevent unnecessary re-renders.
- **Hosting:** Suggest static hosting platforms like Vercel, Netlify, or GitHub Pages for the frontend. The backend API (for actual transcription) would need a separate scalable hosting solution (e.g., AWS, Google Cloud, Azure).
- **CORS:** Ensure backend API is configured to handle Cross-Origin Resource Sharing from the frontend domain.
- **HTTPS:** Always use HTTPS for deployed applications.