Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Shrutik Documentation

Welcome to the comprehensive documentation for Shrutik (শ্রুতিক), the open-source voice data collection platform designed to help communities build high-quality voice datasets in their native languages.

Shrutik means “listener” in Bengali, reflecting our mission to listen to and preserve diverse voices from around the world.

About This Documentation

This documentation is built with mdBook and provides comprehensive guides, API references, and tutorials for users, developers, and administrators.

Enhanced Features

  • Interactive Mermaid Diagrams: Zoom, pan, and view complex flowcharts in fullscreen
  • Professional Styling: Custom theme with Shrutik branding and improved readability
  • Responsive Design: Optimized experience on desktop and mobile devices
  • Status Badges: Color-coded indicators for different content types
  • Enhanced Navigation: Improved sidebar, search, and user experience

Interactive Diagram Controls

  • Zoom: Use mouse wheel or +/- buttons to zoom in/out
  • Pan: Drag to move around when zoomed in
  • Reset: Double-click or press ‘0’ to reset view
  • Fullscreen: Click the fullscreen button for better viewing
  • Mobile: Touch-friendly controls for mobile devices

Documentation Overview

Getting Started

Architecture & Design

Contributing

Additional Resources

Quick Navigation

For New Users

  1. Getting Started - Set up Shrutik in minutes
  2. Docker Local Setup - Run everything with Docker
  3. User Guide - Learn how to contribute voice data

For Developers

  1. Docker Local Setup - Quick Docker development setup
  2. Local Development - Native development environment
  3. Architecture Overview - Understand the system design
  4. API Reference - Integrate with Shrutik APIs
  5. Contributing Guide - Contribute code and features

For System Administrators

  1. Docker Local Setup - Deploy with Docker
  2. Deployment Guide - Production deployment strategies
  3. Monitoring & Health Checks - System monitoring

For Researchers & Data Scientists

  1. API Reference - Export datasets
  2. Architecture - Understand data structure
  3. Quality Control - Data quality processes

Visual Documentation

System Flows

Technical Diagrams

Development Resources

Setup & Configuration

Code Standards

Deployment Options

OptionComplexityUse CaseDocumentation
Docker ComposeLowDevelopment, Small TeamsDocker Deployment
KubernetesHighProduction, EnterpriseDeployment Guide
Cloud PlatformsMediumManaged ServicesDeployment Guide
Bare MetalMediumOn-PremisesDeployment Guide

Community & Support

Get Help

Contribute

Stay Updated

Additional Resources

Research Papers

What’s New

Recent Updates

  • Performance Optimization - Added comprehensive caching and rate limiting
  • CDN Integration - Optimized audio delivery with CDN support
  • Enhanced Monitoring - Real-time performance metrics and dashboards
  • Security Improvements - Advanced authentication and authorization

Coming Soon

  • Mobile App - Native mobile applications for iOS and Android
  • AI Assistance - ML-powered transcription assistance
  • Multi-language UI - Interface translations for global accessibility
  • Cloud Integration - Enhanced cloud platform support

Need help? Join our Discord community or check our GitHub discussions.

Found an issue? Please report it on GitHub.

Want to contribute? Read our Contributing Guide to get started.


Together, we’re building a more inclusive digital future, one voice at a time.

HomeGet StartedDevelopContribute

Getting Started with Shrutik

Welcome to Shrutik! This guide will help you set up and start using the platform in just a few minutes.

Overview

Shrutik is a voice data collection platform that allows communities to contribute voice recordings and transcriptions in their native languages. You can either contribute data or set up your own instance of the platform.

Quick Setup Options

The fastest way to get Shrutik running is with Docker:

# Clone the repository
git clone https://github.com/Onuronon-lab/Shrutik.git
cd shrutik

# Switch to the deployment-dev branch
git fetch origin
git switch deployment-dev

# Copy Docker environment configuration
cp .env.example .env

# Build images and start all services
docker compose up --build -d

Access the platform:

  • Frontend: http://localhost:3000
  • Backend API: http://localhost:8000
  • API Documentation: http://localhost:8000/docs

Note: For detailed Docker setup instructions, see our comprehensive Docker Local Setup Guide for configuration details, troubleshooting, and switching between local/Docker environments.

Option 2: Local Development

For development or customization:

# Clone and setup
git clone https://github.com/Onuronon-lab/Shrutik.git
cd shrutik

# Switch to the deployment-dev branch
git fetch origin
git switch deployment-dev

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

To start backend, frontend, and Celery worker, see the Local Setup Guide.

Verify Setup

Once you’ve successfully started the services using either Option 1 (Docker) or Option 2 (Local Development), confirm that everything is running correctly:

Check Backend Health

The backend provides a simple health endpoint to verify that the FastAPI server is up and running.

curl http://localhost:8000/health

Check Frontend

curl http://localhost:3000

First Steps

For Developers

  1. Register an Account: Visit http://localhost:3000 and create an account
  2. Start Recording: Begin with voice recordings or transcriptions
  3. Track Progress: Monitor your contributions in the dashboard

For Administrators

  1. Access Admin Panel: Login with your admin account
  2. Configure Languages: Add supported languages and scripts
  3. Manage Users: Review user registrations and assign roles
  4. Monitor Quality: Review transcription quality and consensus

Contributing Voice Data

Recording Guidelines

  • Environment: Record in a quiet environment
  • Equipment: Use a good quality microphone
  • Duration: Keep recordings between 2-10 seconds
  • Content: Read the provided text clearly and naturally

Transcription Guidelines

  • Accuracy: Transcribe exactly what you hear
  • Formatting: Follow language-specific formatting rules
  • Quality: Rate the audio quality honestly
  • Consensus: Multiple transcriptions improve dataset quality

Troubleshooting

Common Issues

Services won’t start:

# All services
docker compose logs -f

# Specific service (example: backend)
docker compose logs -f backend

# Restart all services
docker compose restart

# Restart a single Service
docker compose restart backend

# Or check status
docker compose ps

Database connection errors:

# Stop services and remove volumes
docker compose down -v --remove-orphans
# (Optional) Clean unused Docker resources
docker system prune -f
# Rebuild and start all services
docker compose up -d --build

# Run migrations inside the backend container
docker compose exec backend python scripts/init-db.py
# If that fails, try the fallback
docker compose exec backend python scripts/simple-init.py


Permission errors:

# Fix file permissions
sudo chown -R $USER:$USER uploads/
chmod -R 755 uploads/

Getting Help

  • Documentation: Check our comprehensive docs
  • GitHub Issues: Report bugs and request features
  • Discord: Join our community for real-time help
  • Email: Contact us at onuronon.dev@gmail.com

Next Steps

Welcome to the Community

You’re now ready to start using Shrutik! Whether you’re contributing voice data, developing features, or deploying your own instance, you’re part of a global movement to make voice technology more inclusive.

Join our community channels to connect with other contributors and stay updated on the latest developments.

Docker Local Setup Guide

This guide explains how to run Shrutik completely with Docker on your local machine, including all the configuration changes needed to switch from local development to Docker.

Quick Docker Setup

Prerequisites

  • Docker 20.10+

  • Docker Compose 2.0+

  • Git

1. Clone the Repo

git clone https://github.com/Onuronon-lab/Shrutik.git
cd Shrutik

# Switch to the deployment-dev branch
git fetch origin
git switch deployment-dev

2. Configure Environment for Docker

Use the Docker-specific environment file:

cp .env.example .env

Make sure the DATABASE_URL is correct in .env file

DATABASE_URL=postgresql://postgres:password@postgres:5432/voice_collection

When running inside Docker, services communicate using their Docker Compose service names.

Available Environment Files:

  • .env.example - Template with all available options

3. Configure Frontend

cd frontend
cp .env.example .env

4. Start All Containers

Use this when running the app for the first time or after changing Dockerfiles, requirements.txt, or package.json:

docker compose up -d --build

Regular use (no changes)

For normal daily use, when the images are already built:

docker compose up -d

Check service status:

docker compose ps

5. Initialize the Database

Run migrations:

docker compose exec backend alembic upgrade head

Create admin user:

docker compose exec backend python scripts/create_admin.py --name "Admin" --email admin@example.com

6. Access the Application

Configuration Changes Explained

Key Differences: Local vs Docker

ComponentLocal DevelopmentDocker
Database URLlocalhost:5432postgres:5432
Redis URLlocalhost:6379redis:6379
Frontend API URLhttp://localhost:8000http://localhost:8000
File Paths./uploads/app/uploads

Development Workflow

Start services

docker compose up -d

Stop everything

docker compose down

Stop AND remove volumes (fresh reset)

docker compose down -v

View logs

docker compose logs -f

Specific service logs:

docker compose logs -f backend

Rebuild after changing requirements

docker compose build --no-cache
docker compose up -d

Shell into a container

docker compose exec backend bash

Check backend health

curl http://localhost:8000/health

Database Management

Run migrations:

docker compose exec backend alembic upgrade head

Auto-generate migration:

docker compose exec backend alembic revision --autogenerate -m "message"

Connect to PostgreSQL:

docker compose exec postgres psql -U postgres -d voice_collection

Redis Debugging

Test Redis:

docker compose exec redis redis-cli ping

Restart Redis:

docker compose restart redis

Troubleshooting

Port in use

Check:

sudo lsof -i :6379
sudo lsof -i :5432

Kill process:

sudo kill <pid>

Backend not starting

docker compose logs backend

Frontend not loading

docker compose logs frontend
docker compose build frontend --no-cache
docker compose up -d frontend

Local Development Guide

This guide covers setting up Shrutik for local development, including all the tools and configurations needed for contributing to the project.

Prerequisites

System Requirements

  • Python: 3.11 or higher
  • Node.js: 20 or higher
  • PostgreSQL: 15 or higher
  • Redis: 7 or higher
  • Git: Latest version
  • IDE: VS Code with Python and TypeScript extensions
  • API Testing: Postman or Insomnia
  • Database GUI: pgAdmin or DBeaver
  • Redis GUI: RedisInsight

Setup Instructions

1. Clone and Navigate

git clone https://github.com/Onuronon-lab/Shrutik.git
cd shrutik

# Switch to the deployment-dev branch
git fetch origin
git switch deployment-dev

2. Backend Setup

Create Virtual Environment

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Linux/Mac:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

Install Dependencies

# Install Python dependencies
pip install -r requirements.txt

Database Setup

# Start PostgreSQL (if not running)
sudo systemctl start postgresql  # Linux
brew services start postgresql   # Mac

# Switch to PostgreSQL user (Linux)
sudo -i -u postgres

# Create database
createdb voice_collection

# Exit postgres user shell (Linux)
exit

# Set environment variables
cp .env.example .env

Edit .env:

# Development Database
DATABASE_URL=postgresql://postgres:password@localhost:5432/voice_collection

# Redis
REDIS_URL=redis://localhost:6379/0

# Development Settings
DEBUG=true
USE_CELERY=true

# File Storage
UPLOAD_DIR=uploads
MAX_FILE_SIZE=104857600

# Security (use a secure key in production)
SECRET_KEY=dev-secret-key-change-in-production

Run Database Migrations

# Run database migrations
alembic upgrade head

# Create admin user
python scripts/create_admin.py --name "AdminUser" --email admin@example.com

Follow the prompts to create your first admin user.

3. Frontend Setup

# Navigate to frontend directory
cd frontend

# Install dependencies
npm install

# Copy environment file
cp .env.example .env

4. Start Development Services

Start Services

Terminal 1 - Backend:

source venv/bin/activate
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Terminal 2 - Celery Worker:

source venv/bin/activate
celery -A  app.core.celery_app  worker  --loglevel=info

Terminal 3 - Frontend:

cd frontend
npm start

Development Configuration

Alembic Configuration

Alembic is configured to automatically use the correct database URL from your environment variables:

  • The alembic/env.py file reads from settings.DATABASE_URL
  • No manual configuration changes needed when switching environments
  • Migrations work seamlessly in both local and Docker environments

Switching Between Local and Docker

When switching between local development and Docker, you need to update these configurations:

1. Environment Variables (.env file)

Local Development:

DATABASE_URL=postgresql://postgres:password@localhost:5432/voice_collection
REDIS_URL=redis://localhost:6379/0

Docker:

DATABASE_URL=postgresql://postgres:password@postgres:5432/voice_collection
REDIS_URL=redis://redis:6379/0

Note: This stays the same for local Docker since we access from host

3. Quick Switch Commands

Switch to Docker:

# Stop local services
pkill -f uvicorn
pkill -f celery

# Update config for Docker
cp .env.example .env

# Start Services 
docker compose up -d

Switch to Local:

# Stop Docker
docker-compose down

Follow The Previous Instructions for locally starting service

Complete Docker Guide: For detailed Docker setup instructions, troubleshooting, and configuration explanations, see our Docker Local Setup Guide.

🧪 Testing

Backend Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=app

# Run specific test file
pytest tests/test_auth.py

# Run with verbose output
pytest -v

Frontend Tests

cd frontend

# Run tests
npm test

# Run with coverage
npm run test:coverage

# Run E2E tests
npm run test:e2e

Integration Tests

# Start test environment
docker-compose -f docker-compose.test.yml up -d

# Run integration tests
pytest tests/integration/

# Cleanup
docker-compose -f docker-compose.test.yml down -v

Debugging

Backend Debugging

VS Code Configuration

Create .vscode/launch.json:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "FastAPI Debug",
            "type": "python",
            "request": "launch",
            "program": "${workspaceFolder}/venv/bin/uvicorn",
            "args": ["app.main:app", "--reload", "--host", "0.0.0.0", "--port", "8000"],
            "console": "integratedTerminal",
            "envFile": "${workspaceFolder}/.env.development"
        }
    ]
}

Logging Configuration

Enable debug logging in .env.development:

LOG_LEVEL=DEBUG

Frontend Debugging

Browser DevTools

  • Use React Developer Tools extension
  • Enable source maps for debugging TypeScript
  • Use Network tab to debug API calls

VS Code Configuration

Install recommended extensions:

  • ES7+ React/Redux/React-Native snippets
  • TypeScript Importer
  • Prettier - Code formatter
  • ESLint

Database Management

Migrations

# Create new migration
alembic revision --autogenerate -m "Description of changes"

# Apply migrations
alembic upgrade head

# Rollback migration
alembic downgrade -1

# Check migration status
alembic current

Database Reset

# Drop and recreate database
dropdb voice_collection
createdb voice_collection
alembic upgrade head
python create_admin.py

Common Development Tasks

Adding New API Endpoints

  1. Create schema in app/schemas/
  2. Add model in app/models/ (if needed)
  3. Implement service in app/services/
  4. Create router in app/api/
  5. Register router in app/main.py
  6. Add tests in tests/

Adding New Frontend Components

  1. Create component in frontend/src/components/
  2. Add TypeScript types in frontend/src/types/
  3. Implement API calls in frontend/src/services/
  4. Add routing in frontend/src/pages/
  5. Add tests in frontend/src/__tests__/

Database Schema Changes

  1. Modify models in app/models/
  2. Generate migration: alembic revision --autogenerate -m "description"
  3. Review and edit migration file if needed
  4. Apply migration: alembic upgrade head
  5. Update tests and documentation

Performance Optimization

Development Performance

# Use faster database for development
export DATABASE_URL="postgresql://postgres:password@localhost:5432/voice_collection"

# Disable Celery for faster startup
export USE_CELERY=false

# Use development Redis
export REDIS_URL="redis://localhost:6379/1"

Hot Reload Configuration

Backend hot reload is enabled by default with --reload flag.

Frontend hot reload configuration in frontend/next.config.js:

module.exports = {
  reactStrictMode: true,
  swcMinify: true,
  experimental: {
    esmExternals: false
  }
}

Additional Resources

Happy coding! 🎉

Shrutik Architecture Overview

This document provides a comprehensive overview of Shrutik’s system architecture, design principles, and technical decisions.

System Architecture

Shrutik follows a modern, microservices-inspired architecture with clear separation of concerns and scalable design patterns.

High-Level Architecture

graph TB
    subgraph "Presentation Layer"
        WEB[React Frontend]
        MOBILE[Mobile App]
        API_DOCS[API Documentation]
    end

    subgraph "API Gateway Layer"
        NGINX[Nginx Reverse Proxy]
        RATE_LIMIT[Rate Limiting]
        AUTH_MW[Authentication Middleware]
    end

    subgraph "Application Layer"
        API[FastAPI Backend]
        WORKER[Celery Workers]
        SCHEDULER[Task Scheduler]
    end

    subgraph "Business Logic Layer"
        AUTH_SVC[Authentication Service]
        VOICE_SVC[Voice Recording Service]
        TRANS_SVC[Transcription Service]
        CONSENSUS_SVC[Consensus Service]
        EXPORT_SVC[Export Service]
        ADMIN_SVC[Admin Service]
    end

    subgraph "Data Layer"
        POSTGRES[(PostgreSQL)]
        REDIS[(Redis)]
        FILES[File Storage]
    end

    subgraph "External Services"
        CDN[Content Delivery Network]
        EMAIL[Email Service]
        MONITORING[Monitoring & Logging]
    end

    WEB --> NGINX
    MOBILE --> NGINX
    NGINX --> API
    API --> AUTH_SVC
    API --> VOICE_SVC
    API --> TRANS_SVC
    WORKER --> CONSENSUS_SVC
    WORKER --> EXPORT_SVC
    
    AUTH_SVC --> POSTGRES
    VOICE_SVC --> POSTGRES
    VOICE_SVC --> FILES
    TRANS_SVC --> POSTGRES
    TRANS_SVC --> REDIS
    
    API --> REDIS
    WORKER --> REDIS
    
    FILES --> CDN
    API --> EMAIL
    API --> MONITORING

Design Principles

1. Modularity

  • Service-Oriented: Clear separation between different business domains
  • Loose Coupling: Services communicate through well-defined interfaces
  • High Cohesion: Related functionality grouped together

2. Scalability

  • Horizontal Scaling: Stateless services that can be scaled independently
  • Async Processing: Heavy operations handled by background workers
  • Caching Strategy: Multi-layer caching for performance optimization

3. Reliability

  • Error Handling: Comprehensive error handling and recovery mechanisms
  • Health Checks: Automated monitoring and alerting
  • Data Integrity: ACID transactions and data validation

4. Security

  • Authentication: JWT-based authentication with refresh tokens
  • Authorization: Role-based access control (RBAC)
  • Data Protection: Encryption at rest and in transit

5. Maintainability

  • Clean Code: Following Python and TypeScript best practices
  • Documentation: Comprehensive API and code documentation
  • Testing: High test coverage with unit, integration, and E2E tests

Technology Stack

Backend Technologies

ComponentTechnologyPurpose
Web FrameworkFastAPIHigh-performance async API framework
DatabasePostgreSQLPrimary data storage with ACID compliance
Cache/QueueRedisCaching, session storage, and message queue
Background JobsCeleryAsync task processing
Audio ProcessingLibrosa, PyDubAudio analysis and manipulation
AuthenticationJWTStateless authentication
ValidationPydanticData validation and serialization
ORMSQLAlchemyDatabase abstraction layer
MigrationsAlembicDatabase schema migrations

Frontend Technologies

ComponentTechnologyPurpose
FrameworkReact 18Component-based UI framework
Meta FrameworkNext.jsFull-stack React framework
LanguageTypeScriptType-safe JavaScript
StylingTailwind CSSUtility-first CSS framework
State ManagementZustandLightweight state management
HTTP ClientAxiosPromise-based HTTP client
Audio RecordingMediaRecorder APIBrowser audio recording
TestingJest, React Testing LibraryUnit and integration testing

Infrastructure Technologies

ComponentTechnologyPurpose
ContainerizationDockerApplication containerization
OrchestrationDocker ComposeMulti-container application management
Reverse ProxyNginxLoad balancing and SSL termination
MonitoringPrometheus, GrafanaMetrics collection and visualization
LoggingStructured loggingCentralized log management
CI/CDGitHub ActionsAutomated testing and deployment

Data Architecture

Database Schema Design

erDiagram
    %% Core Relationships
    USERS ||--o{ VOICE_RECORDINGS : "creates"
    USERS ||--o{ TRANSCRIPTIONS : "creates"
    USERS ||--o{ QUALITY_REVIEWS : "performs"
    USERS ||--o{ EXPORT_AUDIT_LOGS : "performs"
    USERS ||--o{ EXPORT_DOWNLOADS : "downloads"
    USERS ||--o| EXPORT_BATCHES : "creates (optional)"

    LANGUAGES ||--o{ SCRIPTS : "has"
    LANGUAGES ||--o{ VOICE_RECORDINGS : "recorded in"
    LANGUAGES ||--o{ TRANSCRIPTIONS : "transcribed in"

    SCRIPTS ||--o{ VOICE_RECORDINGS : "recorded from"

    VOICE_RECORDINGS ||--|{ AUDIO_CHUNKS : "divided into (1 to many)"

    AUDIO_CHUNKS ||--o{ TRANSCRIPTIONS : "has many"
    AUDIO_CHUNKS ||--o| TRANSCRIPTIONS : "has one consensus"

    TRANSCRIPTIONS ||--o{ QUALITY_REVIEWS : "reviewed by"

    EXPORT_BATCHES ||--o{ EXPORT_AUDIT_LOGS : "generates"
    EXPORT_BATCHES ||--o{ EXPORT_DOWNLOADS : "downloaded by"

    %% Entities with attributes
    USERS {
        int id PK
        string name
        string email UK
        string password_hash
        string role "(enum: userrole)"
        json meta_data
        timestamptz created_at
        timestamptz updated_at
    }

    LANGUAGES {
        int id PK
        string name
        string code UK
        timestamptz created_at
    }

    SCRIPTS {
        int id PK
        int language_id FK
        text text
        string duration_category "(enum: durationcategory)"
        json meta_data
        timestamptz created_at
        timestamptz updated_at
    }

    VOICE_RECORDINGS {
        int id PK
        int user_id FK
        int script_id FK
        int language_id FK
        string file_path
        float duration
        string status "(enum: recordingstatus)"
        json meta_data
        timestamptz created_at
        timestamptz updated_at
    }

    AUDIO_CHUNKS {
        int id PK
        int recording_id FK
        int chunk_index
        string file_path
        float start_time
        float end_time
        float duration
        text sentence_hint
        json meta_data
        timestamptz created_at
        int transcript_count
        boolean ready_for_export
        float consensus_quality
        int consensus_transcript_id FK "optional"
        int consensus_failed_count
    }

    TRANSCRIPTIONS {
        int id PK
        int chunk_id FK
        int user_id FK
        int language_id FK
        text text
        float quality
        float confidence
        boolean is_consensus
        boolean is_validated
        json meta_data
        timestamptz created_at
        timestamptz updated_at
    }

    QUALITY_REVIEWS {
        int id PK
        int transcription_id FK
        int reviewer_id FK
        string decision "(enum: reviewdecision)"
        float rating
        text comment
        json meta_data
        timestamptz created_at
    }

    EXPORT_BATCHES {
        int id PK
        string batch_id UK
        string archive_path
        string storage_type "(enum: storagetype)"
        int chunk_count
        bigint file_size_bytes
        json chunk_ids
        string status "(enum: exportbatchstatus)"
        boolean exported
        text error_message
        int retry_count
        string checksum
        int compression_level
        string format_version
        json recording_id_range
        json language_stats
        float total_duration_seconds
        json filter_criteria
        timestamptz created_at
        timestamptz completed_at
        int created_by_id FK "optional"
    }

    EXPORT_AUDIT_LOGS {
        int id PK
        string export_id
        int user_id FK
        string export_type
        string format
        json filters_applied
        int records_exported
        bigint file_size_bytes
        string ip_address
        string user_agent
        timestamptz created_at
    }

    EXPORT_DOWNLOADS {
        int id PK
        string batch_id FK
        int user_id FK
        timestamptz downloaded_at
        string ip_address
        string user_agent
    }

Data Flow Patterns

1. Voice Recording Data Flow

User Input → Frontend → API → Database → File Storage → Background Processing → Chunking → Database Update

2. Transcription Data Flow

User Request → API → Database Query → Cache Check → Response → User Input → Validation → Database Save → Consensus Trigger

3. Consensus Calculation Flow

Transcription Submit → Background Job → Collect Related → Calculate Similarity → Weight Quality → Update Consensus → Notify Users

API Design

RESTful API Principles

Shrutik follows REST architectural principles with some pragmatic adaptations:

  • Resource-Based URLs: /api/recordings, /api/transcriptions
  • HTTP Methods: GET, POST, PUT, DELETE for CRUD operations
  • Status Codes: Proper HTTP status codes for different scenarios
  • JSON Format: Consistent JSON request/response format
  • Pagination: Cursor-based pagination for large datasets
  • Versioning: API versioning through URL path (/api/v1/)

API Structure

/api/
├── auth/
│   ├── POST /login
│   ├── POST /register
│   ├── POST /refresh
│   └── POST /logout
├── recordings/
│   ├── GET /
│   ├── POST /sessions
│   ├── POST /upload
│   └── GET /{id}/progress
├── transcriptions/
│   ├── GET /
│   ├── POST /tasks
│   ├── POST /submit
│   └── POST /skip
├── chunks/
│   ├── GET /{id}/audio
│   └── GET /{id}/info
├── admin/
│   ├── GET /stats/platform
│   ├── GET /users
│   └── GET /performance/dashboard
└── export/
    ├── POST /dataset
    └── GET /jobs/{id}/status

Authentication & Authorization

sequenceDiagram
    participant C as Client
    participant A as API
    participant Auth as Auth Service
    participant DB as Database

    C->>A: POST /auth/login
    A->>Auth: Validate Credentials
    Auth->>DB: Check User
    DB-->>Auth: User Data
    Auth->>Auth: Generate JWT
    Auth-->>A: JWT + Refresh Token
    A-->>C: Authentication Response

    Note over C: Store JWT in memory/secure storage

    C->>A: GET /recordings (with JWT)
    A->>Auth: Validate JWT
    Auth->>Auth: Check Expiry & Signature
    Auth-->>A: User Context
    A->>A: Check Permissions
    A-->>C: Protected Resource

Performance Architecture

Caching Strategy

graph LR
    subgraph "Client Side"
        BROWSER[Browser Cache]
        LOCAL[Local Storage]
    end
    
    subgraph "CDN Layer"
        CDN[Content Delivery Network]
    end
    
    subgraph "Application Layer"
        API_CACHE[API Response Cache]
        DB_CACHE[Database Query Cache]
        SESSION[Session Cache]
    end
    
    subgraph "Database Layer"
        DB[(PostgreSQL)]
        REDIS[(Redis)]
    end
    
    BROWSER --> CDN
    CDN --> API_CACHE
    API_CACHE --> DB_CACHE
    DB_CACHE --> REDIS
    SESSION --> REDIS
    DB_CACHE --> DB

Performance Optimizations

Backend Optimizations

  • Connection Pooling: Database connection pooling with configurable limits
  • Query Optimization: Indexed queries and efficient SQL patterns
  • Async Processing: Non-blocking I/O for concurrent request handling
  • Background Jobs: Heavy operations moved to background workers
  • Response Compression: Gzip compression for API responses

Frontend Optimizations

  • Code Splitting: Dynamic imports for reduced bundle size
  • Lazy Loading: Components and routes loaded on demand
  • Image Optimization: Optimized images with Next.js Image component
  • Caching: Aggressive caching of static assets and API responses
  • Service Workers: Offline functionality and background sync

Database Optimizations

  • Indexing Strategy: Proper indexes on frequently queried columns
  • Query Optimization: Efficient queries with proper joins and filters
  • Read Replicas: Separate read replicas for analytics queries
  • Partitioning: Table partitioning for large datasets

Security Architecture

Security Layers

graph TB
    subgraph "Network Security"
        FIREWALL[Firewall Rules]
        DDoS[DDoS Protection]
        SSL[SSL/TLS Encryption]
    end
    
    subgraph "Application Security"
        AUTH[Authentication]
        AUTHZ[Authorization]
        VALIDATION[Input Validation]
        SANITIZATION[Data Sanitization]
    end
    
    subgraph "Data Security"
        ENCRYPTION[Encryption at Rest]
        BACKUP[Secure Backups]
        AUDIT[Audit Logging]
    end
    
    FIREWALL --> AUTH
    DDoS --> AUTH
    SSL --> AUTH
    AUTH --> ENCRYPTION
    AUTHZ --> ENCRYPTION
    VALIDATION --> BACKUP
    SANITIZATION --> AUDIT

Security Measures

Authentication & Authorization

  • JWT Tokens: Stateless authentication with short-lived access tokens
  • Refresh Tokens: Secure token refresh mechanism
  • Role-Based Access: Granular permissions based on user roles
  • Session Management: Secure session handling with Redis

Data Protection

  • Input Validation: Comprehensive input validation using Pydantic
  • SQL Injection Prevention: Parameterized queries with SQLAlchemy
  • XSS Protection: Content Security Policy and input sanitization
  • CSRF Protection: CSRF tokens for state-changing operations

Infrastructure Security

  • HTTPS Enforcement: All communications encrypted with TLS
  • Security Headers: Comprehensive security headers implementation
  • Rate Limiting: Protection against abuse and DoS attacks
  • File Upload Security: Secure file upload with type validation

Monitoring & Observability

Monitoring Stack

graph LR
    subgraph "Application"
        APP[Shrutik Application]
        METRICS[Metrics Collection]
        LOGS[Structured Logging]
        TRACES[Distributed Tracing]
    end
    
    subgraph "Collection"
        PROMETHEUS[Prometheus]
        LOKI[Loki]
        JAEGER[Jaeger]
    end
    
    subgraph "Visualization"
        GRAFANA[Grafana Dashboards]
        ALERTS[Alert Manager]
    end
    
    APP --> METRICS
    APP --> LOGS
    APP --> TRACES
    
    METRICS --> PROMETHEUS
    LOGS --> LOKI
    TRACES --> JAEGER
    
    PROMETHEUS --> GRAFANA
    LOKI --> GRAFANA
    JAEGER --> GRAFANA
    
    PROMETHEUS --> ALERTS

Key Metrics

Application Metrics

  • Request Rate: Requests per second by endpoint
  • Response Time: P50, P95, P99 response times
  • Error Rate: Error percentage by endpoint and status code
  • Throughput: Data processing throughput

Business Metrics

  • User Engagement: Active users, session duration
  • Data Quality: Transcription accuracy, consensus rates
  • System Usage: Recording uploads, transcription submissions
  • Performance: Audio processing times, consensus calculation speed

Infrastructure Metrics

  • System Resources: CPU, memory, disk usage
  • Database Performance: Query times, connection pool status
  • Cache Performance: Hit rates, memory usage
  • Network: Bandwidth usage, connection counts

Deployment Architecture

Environment Strategy

graph LR
    subgraph "Development"
        DEV_LOCAL[Local Development]
        DEV_DOCKER[Docker Development]
    end
    
    subgraph "Testing"
        TEST_UNIT[Unit Tests]
        TEST_INTEGRATION[Integration Tests]
        TEST_E2E[E2E Tests]
    end
    
    subgraph "Staging"
        STAGING[Staging Environment]
        UAT[User Acceptance Testing]
    end
    
    subgraph "Production"
        PROD[Production Environment]
        MONITORING[Production Monitoring]
    end
    
    DEV_LOCAL --> TEST_UNIT
    DEV_DOCKER --> TEST_INTEGRATION
    TEST_UNIT --> TEST_E2E
    TEST_INTEGRATION --> STAGING
    TEST_E2E --> STAGING
    STAGING --> UAT
    UAT --> PROD
    PROD --> MONITORING

Deployment Pipeline

  1. Code Commit: Developer pushes code to repository
  2. Automated Testing: Unit, integration, and E2E tests run
  3. Build Process: Docker images built and tagged
  4. Staging Deployment: Automatic deployment to staging
  5. Manual Testing: QA and user acceptance testing
  6. Production Deployment: Manual approval and deployment
  7. Health Checks: Automated health verification
  8. Monitoring: Continuous monitoring and alerting

Future Architecture Considerations

Scalability Enhancements

  • Microservices: Further decomposition into microservices
  • Event-Driven Architecture: Event sourcing and CQRS patterns
  • Kubernetes: Container orchestration for better scaling
  • Service Mesh: Advanced service-to-service communication

Performance Improvements

  • Edge Computing: Edge nodes for global content delivery
  • Advanced Caching: Distributed caching with Redis Cluster
  • Database Sharding: Horizontal database partitioning
  • GraphQL: More efficient data fetching

AI/ML Integration

  • Automated Quality Assessment: ML-based quality scoring
  • Smart Chunk Assignment: AI-driven task assignment
  • Real-time Transcription: Automatic transcription assistance
  • Anomaly Detection: ML-based fraud and quality detection

This architecture provides a solid foundation for Shrutik’s current needs while maintaining flexibility for future growth and enhancements.

API Reference

This document provides comprehensive documentation for the Shrutik API, including authentication, endpoints, request/response formats, and examples.

🔗 Base URL

  • Development: http://localhost:8000

Authentication

Shrutik uses JWT (JSON Web Token) based authentication with refresh tokens for secure API access.

Authentication Flow

sequenceDiagram
    participant C as Client
    participant A as API
    participant DB as Database

    C->>A: POST /api/auth/login
    A->>DB: Validate credentials
    DB-->>A: User data
    A-->>C: JWT + Refresh Token

    Note over C: Store tokens securely

    C->>A: GET /api/recordings (with JWT)
    A->>A: Validate JWT
    A-->>C: Protected resource

    Note over C: JWT expires

    C->>A: POST /api/auth/refresh
    A->>A: Validate refresh token
    A-->>C: New JWT

Authentication Endpoints

Login

POST /api/auth/login
Content-Type: application/json

{
  "email": "user@example.com",
  "password": "secure_password"
}

Response:

{
  "access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "token_type": "bearer",
  "user": {
    "id": 1,
    "email": "user@example.com",
    "name": "John Doe",
  }
}

Register

POST /api/auth/register
Content-Type: application/json

{
  "email": "newuser@example.com",
  "password": "secure_password",
  "name": "Jane Smith",
  "preferred_language": "bn"
}

Refresh Token

POST /api/auth/refresh
Content-Type: application/json

{
  "refresh_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}

Logout

POST /api/auth/logout
Authorization: Bearer <access_token>

Using Authentication

Include the JWT token in the Authorization header for all protected endpoints:

Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...

Response Format

All API responses follow a consistent format:

Success Response

{
  "data": {
    // Response data
  },
  "message": "Operation successful",
  "timestamp": "2024-01-01T12:00:00Z"
}

Error Response

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Invalid input data",
    "details": {
      "field": "email",
      "issue": "Invalid email format"
    }
  },
  "timestamp": "2024-01-01T12:00:00Z"
}

Pagination Response

{
  "data": [
    // Array of items
  ],
  "pagination": {
    "total": 150,
    "page": 1,
    "per_page": 20,
    "total_pages": 8,
    "has_next": true,
    "has_prev": false
  }
}

Voice Recordings API

Create Recording Session

Start a new recording session for a specific script.

POST /api/recordings/sessions
Authorization: Bearer <token>
Content-Type: application/json

{
  "script_id": 123,
  "language_id": 1,
  "metadata": {
    "device_info": "iPhone 14",
    "environment": "quiet_room"
  }
}

Response:

{
  "session_id": "uuid-string",
  "script": {
    "id": 123,
    "content": "আমি বাংলায় কথা বলি।",
    "language": "Bengali",
    "difficulty": "easy"
  },
  "expires_at": "2024-01-01T14:00:00Z"
}

Upload Recording

Upload an audio file for a recording session.

POST /api/recordings/upload
Authorization: Bearer <token>
Content-Type: multipart/form-data

session_id: uuid-string
duration: 5.2
audio_format: wav
file_size: 1048576
sample_rate: 44100
channels: 1
audio_file: <binary_data>

Response:

{
  "recording_id": 456,
  "status": "uploaded",
  "processing_job_id": "job-uuid",
  "estimated_processing_time": 30
}

Get User Recordings

Retrieve paginated list of user’s recordings.

GET /api/recordings?skip=0&limit=20&status=processed
Authorization: Bearer <token>

Response:

{
  "recordings": [
    {
      "id": 456,
      "script_id": 123,
      "language": "Bengali",
      "duration": 5.2,
      "status": "processed",
      "chunks_count": 3,
      "created_at": "2024-01-01T12:00:00Z"
    }
  ],
  "total": 50,
  "page": 1,
  "per_page": 20,
  "total_pages": 3
}

Get Recording Progress

Check processing progress for a recording.

GET /api/recordings/456/progress
Authorization: Bearer <token>

Response:

{
  "recording_id": 456,
  "status": "processing",
  "progress_percentage": 75,
  "current_step": "chunking_audio",
  "estimated_completion": "2024-01-01T12:05:00Z",
  "chunks_created": 2
}

Transcriptions API

Get Transcription Task

Request audio chunks for transcription.

POST /api/transcriptions/tasks
Authorization: Bearer <token>
Content-Type: application/json

{
  "quantity": 5,
  "language_id": 1,
  "skip_chunk_ids": [10, 15, 20],
  "difficulty_preference": "mixed"
}

Response:

{
  "session_id": "transcription-session-uuid",
  "chunks": [
    {
      "id": 789,
      "recording_id": 456,
      "chunk_index": 1,
      "file_path": "/chunks/chunk_789.wav",
      "duration": 3.5,
      "sentence_hint": "Greeting phrase",
      "transcription_count": 2
    }
  ],
  "total_available": 1500
}

Submit Transcriptions

Submit transcriptions for audio chunks.

POST /api/transcriptions/submit
Authorization: Bearer <token>
Content-Type: application/json

{
  "session_id": "transcription-session-uuid",
  "transcriptions": [
    {
      "chunk_id": 789,
      "language_id": 1,
      "text": "আমি বাংলায় কথা বলি।",
      "quality": 4.5,
      "confidence": 0.95,
      "metadata": {
        "time_taken": 45,
        "difficulty_rating": 3
      }
    }
  ],
  "skipped_chunk_ids": [790]
}

Response:

{
  "submitted_count": 1,
  "skipped_count": 1,
  "transcriptions": [
    {
      "id": 1001,
      "chunk_id": 789,
      "text": "আমি বাংলায় কথা বলি।",
      "quality": 4.5,
      "is_consensus": false,
      "created_at": "2024-01-01T12:00:00Z"
    }
  ],
  "message": "Successfully submitted 1 transcriptions"
}

Skip Chunk

Skip a difficult or unclear audio chunk.

POST /api/transcriptions/skip
Authorization: Bearer <token>
Content-Type: application/json

{
  "chunk_id": 790,
  "reason": "poor_audio_quality",
  "comment": "Background noise makes it unclear"
}

Get User Transcriptions

Retrieve user’s transcription history.

GET /api/transcriptions?skip=0&limit=20&language_id=1
Authorization: Bearer <token>

Response:

{
  "transcriptions": [
    {
      "id": 1001,
      "chunk_id": 789,
      "text": "আমি বাংলায় কথা বলি।",
      "quality": 4.5,
      "confidence": 0.95,
      "is_consensus": true,
      "is_validated": true,
      "created_at": "2024-01-01T12:00:00Z"
    }
  ],
  "total": 100,
  "page": 1,
  "per_page": 20,
  "total_pages": 5
}

Audio Chunks API

Get Chunk Audio

Retrieve audio file for a specific chunk.

GET /api/chunks/789/audio
Authorization: Bearer <token>

Response: Binary audio data with optimized headers

Headers:

Content-Type: audio/wav
Cache-Control: public, max-age=3600
Accept-Ranges: bytes
Content-Length: 1048576

Get Chunk Info

Get metadata about an audio chunk.

GET /api/chunks/789/info
Authorization: Bearer <token>

Response:

{
  "chunk_id": 789,
  "recording_id": 456,
  "duration": 3.5,
  "start_time": 1.2,
  "end_time": 4.7,
  "transcription_count": 3,
  "file_size": 1048576,
  "optimized_url": "https://cdn.example.com/chunks/789.wav",
  "alternatives": [
    {
      "format": ".mp3",
      "url": "https://cdn.example.com/chunks/789.mp3",
      "mime_type": "audio/mpeg"
    }
  ]
}

Admin API

Platform Statistics

Get comprehensive platform statistics (admin only).

GET /api/admin/stats/platform
Authorization: Bearer <admin_token>

Response:

{
  "users": {
    "total": 1500,
    "active_last_30_days": 450,
    "new_this_month": 75
  },
  "recordings": {
    "total": 5000,
    "total_duration_hours": 250.5,
    "processed": 4800,
    "pending": 200
  },
  "transcriptions": {
    "total": 15000,
    "consensus_reached": 12000,
    "average_quality": 4.2
  },
  "languages": {
    "supported": 5,
    "most_active": "Bengali"
  }
}

User Management

Get users for management (admin only).

GET /api/admin/users?role=contributor&limit=50
Authorization: Bearer <admin_token>

Performance Dashboard

Get performance metrics (admin only).

GET /api/admin/performance/dashboard
Authorization: Bearer <admin_token>

Response:

{
  "system_metrics": {
    "cpu_usage": 45.2,
    "memory_usage": 67.8,
    "disk_usage": 23.1,
    "active_connections": 150
  },
  "cache_performance": {
    "hit_rate": 85.5,
    "memory_used": "512MB",
    "keys_count": 15000
  },
  "database_performance": {
    "connection_pool": {
      "total_connections": 20,
      "active_connections": 8,
      "idle_connections": 12
    },
    "slow_queries": 2
  }
}

Export API

Create Dataset Export

Request a dataset export job.

POST /api/export/dataset
Authorization: Bearer <token>
Content-Type: application/json

{
  "format": "csv",
  "language_ids": [1, 2],
  "include_audio": true,
  "quality_threshold": 4.0,
  "consensus_only": true,
  "date_range": {
    "start": "2024-01-01",
    "end": "2024-12-31"
  }
}

Response:

{
  "job_id": "export-job-uuid",
  "status": "queued",
  "estimated_completion": "2024-01-01T12:30:00Z",
  "estimated_size_mb": 150
}

Get Export Status

Check export job status.

GET /api/export/jobs/export-job-uuid/status
Authorization: Bearer <token>

Response:

{
  "job_id": "export-job-uuid",
  "status": "completed",
  "progress_percentage": 100,
  "download_url": "https://api.example.com/downloads/dataset-uuid.zip",
  "file_size_mb": 145.7,
  "expires_at": "2024-01-08T12:00:00Z"
}

Scripts API

Get Available Scripts

Retrieve scripts available for recording.

GET /api/scripts?language_id=1&difficulty=easy&limit=20
Authorization: Bearer <token>

Response:

{
  "scripts": [
    {
      "id": 123,
      "content": "আমি বাংলায় কথা বলি।",
      "language": {
        "id": 1,
        "name": "Bengali",
        "code": "bn"
      },
      "difficulty": "easy",
      "estimated_duration": 3.5,
      "recording_count": 25
    }
  ],
  "total": 500,
  "page": 1,
  "per_page": 20
}

Languages API

Get Supported Languages

Retrieve list of supported languages.

GET /api/languages

Response:

{
  "languages": [
    {
      "id": 1,
      "name": "Bengali",
      "code": "bn",
      "script": "Bengali",
      "active": true,
      "recording_count": 5000,
      "transcription_count": 15000
    },
    {
      "id": 2,
      "name": "Hindi",
      "code": "hi",
      "script": "Devanagari",
      "active": true,
      "recording_count": 3000,
      "transcription_count": 9000
    }
  ]
}

Search API

Search Transcriptions

Search through transcriptions (admin only).

GET /api/search/transcriptions?q=greeting&language_id=1&limit=20
Authorization: Bearer <admin_token>

Health Check

System Health

Check system health and status.

GET /health

Response:

{
  "status": "healthy",
  "checks": {
    "database": true,
    "redis": true,
    "disk_space": true,
    "memory": true
  },
  "performance": {
    "database_pool": {
      "total_connections": 20,
      "active_connections": 5
    },
    "cache_status": true
  },
  "timestamp": "2024-01-01T12:00:00Z"
}

Metrics

Performance Metrics

Get performance metrics (admin only).

GET /metrics
Authorization: Bearer <admin_token>

Error Codes

HTTP Status Codes

CodeDescription
200Success
201Created
400Bad Request
401Unauthorized
403Forbidden
404Not Found
422Validation Error
429Rate Limited
500Internal Server Error

Custom Error Codes

CodeDescription
VALIDATION_ERRORInput validation failed
AUTHENTICATION_FAILEDInvalid credentials
INSUFFICIENT_PERMISSIONSUser lacks required permissions
RESOURCE_NOT_FOUNDRequested resource not found
RATE_LIMIT_EXCEEDEDToo many requests
SESSION_EXPIREDRecording/transcription session expired
FILE_TOO_LARGEUploaded file exceeds size limit
UNSUPPORTED_FORMATAudio format not supported
PROCESSING_ERRORAudio processing failed
CONSENSUS_PENDINGTranscription consensus not yet reached

Rate Limits

Default Limits

User TypeRequests/Minute
Anonymous60
Authenticated300
Admin1000
Sworik Developer2000

Endpoint-Specific Limits

EndpointLimitWindow
/api/auth/login10/min1 minute
/api/recordings/upload20/min1 minute
/api/transcriptions/submit100/min1 minute
/api/chunks/*/audio10/sec1 second

Rate Limit Headers

X-RateLimit-Limit: 300
X-RateLimit-Remaining: 299
X-RateLimit-Reset: 1640995200
Retry-After: 60

Security

API Security Best Practices

  1. Always use HTTPS in production
  2. Store JWT tokens securely (not in localStorage for web apps)
  3. Implement proper CORS policies
  4. Validate all inputs on client and server
  5. Use refresh tokens for long-lived sessions
  6. Implement rate limiting to prevent abuse
  7. Log security events for monitoring

Content Security Policy

Content-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; img-src 'self' data: https:; media-src 'self' blob:; connect-src 'self' wss:

SDKs and Libraries

JavaScript/TypeScript SDK

npm install @shrutik/sdk
import { ShrutikClient } from '@shrutik/sdk';

const client = new ShrutikClient({
  baseURL: 'https://api.yourdomain.com',
  apiKey: 'your-api-key'
});

// Get transcription task
const task = await client.transcriptions.getTask({
  quantity: 5,
  languageId: 1
});

// Submit transcription
await client.transcriptions.submit({
  sessionId: task.sessionId,
  transcriptions: [{
    chunkId: 789,
    text: 'Transcribed text',
    quality: 4.5
  }]
});

Python SDK

pip install shrutik-sdk
from shrutik import ShrutikClient

client = ShrutikClient(
    base_url='https://api.yourdomain.com',
    api_key='your-api-key'
)

# Get transcription task
task = client.transcriptions.get_task(
    quantity=5,
    language_id=1
)

# Submit transcription
client.transcriptions.submit(
    session_id=task.session_id,
    transcriptions=[{
        'chunk_id': 789,
        'text': 'Transcribed text',
        'quality': 4.5
    }]
)

Testing

API Testing with curl

# Login
curl -X POST https://api.yourdomain.com/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email":"user@example.com","password":"password"}'

# Get recordings (with token)
curl -X GET https://api.yourdomain.com/api/recordings \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

# Upload recording
curl -X POST https://api.yourdomain.com/api/recordings/upload \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -F "session_id=uuid" \
  -F "duration=5.2" \
  -F "audio_format=wav" \
  -F "file_size=1048576" \
  -F "audio_file=@recording.wav"

For additional support, join our Discord community or check our GitHub repository.

Audio Processing Modes

The Voice Data Collection Platform supports two modes for audio processing to accommodate different development and deployment scenarios.

Processing Modes

When to use:

  • Production deployments
  • Development with full background processing
  • When you want non-blocking audio uploads
  • When processing large audio files

How it works:

  1. User uploads audio file
  2. File is saved and marked as UPLOADED
  3. Celery task is queued for background processing
  4. User gets immediate response
  5. Background worker processes audio into chunks
  6. Status updates to PROCESSED when complete
  7. User can check progress via API

Setup:

# In .env file
USE_CELERY=true

# Start services
redis-server
celery -A app.core.celery_app worker --loglevel=info
uvicorn app.main:app --reload

Benefits:

  • ✅ Non-blocking uploads
  • ✅ Scalable (multiple workers)
  • ✅ Retry mechanisms
  • ✅ Progress tracking
  • ✅ Monitoring via Flower

Drawbacks:

  • ❌ More complex setup
  • ❌ Requires Redis
  • ❌ Requires Celery workers

2. Synchronous Processing - Simple Development

When to use:

  • Quick local development
  • Testing without Celery setup
  • Simple deployments
  • When immediate results are needed

How it works:

  1. User uploads audio file
  2. File is saved and marked as PROCESSING
  3. Audio processing happens immediately in the request
  4. Chunks are created during the upload request
  5. Status updates to PROCESSED before response
  6. User gets complete results immediately

Setup:

# In .env file
USE_CELERY=false

# Start service
uvicorn app.main:app --reload

Benefits:

  • ✅ Simple setup (no Redis/Celery needed)
  • ✅ Immediate results
  • ✅ Easier debugging
  • ✅ No additional services required

Drawbacks:

  • ❌ Blocking uploads (slower response)
  • ❌ No retry mechanisms
  • ❌ Single-threaded processing
  • ❌ No progress tracking

Automatic Mode Detection

The system automatically detects which mode to use:

def _is_celery_available(self) -> bool:
    # 1. Check configuration
    if not settings.USE_CELERY:
        return False
    
    # 2. Check if workers are running
    try:
        inspect = celery_app.control.inspect()
        stats = inspect.stats()
        return stats is not None and len(stats) > 0
    except:
        return False

Fallback Logic:

  1. If USE_CELERY=false → Always use synchronous processing
  2. If USE_CELERY=true but no workers → Fall back to synchronous processing
  3. If USE_CELERY=true and workers available → Use Celery processing

API Behavior Differences

Upload Response

Celery Mode:

{
  "id": 123,
  "status": "uploaded",
  "message": "File uploaded successfully, processing queued"
}

Synchronous Mode:

{
  "id": 123,
  "status": "processed",
  "chunks_created": 5,
  "message": "File uploaded and processed successfully"
}

Progress Tracking

Celery Mode:

# Check progress
GET /api/recordings/123/progress
{
  "status": "processing",
  "progress": 45,
  "chunks_created": 0
}

# Later...
GET /api/recordings/123/progress
{
  "status": "processed", 
  "progress": 100,
  "chunks_created": 5
}

Synchronous Mode:

# Progress is always complete
GET /api/recordings/123/progress
{
  "status": "processed",
  "progress": 100,
  "chunks_created": 5
}

Configuration Options

Environment Variables

# Enable/disable Celery
USE_CELERY=true|false

# Celery configuration (when enabled)
REDIS_URL=redis://localhost:6379/0
JOB_MAX_RETRIES=3
JOB_RETRY_DELAY=60

Runtime Detection

The system logs which mode is being used:

# Celery mode
INFO: Queued audio processing task abc123 for recording 456

# Synchronous mode  
INFO: Celery not available, processing recording 456 synchronously...
INFO: Successfully processed recording 456 into 5 chunks

Development Workflow

For Frontend Development

Use synchronous mode for simplicity:

USE_CELERY=false
uvicorn app.main:app --reload

For Full-Stack Development

Use Celery mode to test complete workflow:

USE_CELERY=true
./scripts/start-local-dev.sh

For Production Testing

Always use Celery mode:

USE_CELERY=true
# + proper Redis/Celery setup

Monitoring and Debugging

Celery Mode Monitoring

# Check worker status
celery -A app.core.celery_app inspect active

# Monitor via Flower
celery -A app.core.celery_app flower --port=5555

# Check job status via API
GET /api/jobs/active

Synchronous Mode Debugging

# Check logs for processing errors
tail -f logs/app.log

# Processing happens in main thread
# Errors appear immediately in response

Performance Considerations

Celery Mode

  • Throughput: High (parallel processing)
  • Response Time: Fast (immediate return)
  • Resource Usage: Distributed across workers
  • Scalability: Horizontal (add more workers)

Synchronous Mode

  • Throughput: Limited (sequential processing)
  • Response Time: Slow (includes processing time)
  • Resource Usage: Single process
  • Scalability: Vertical only

Error Handling

Celery Mode

  • Automatic retries with exponential backoff
  • Failed tasks can be manually retried
  • Detailed error tracking in job monitoring
  • Notifications for failures

Synchronous Mode

  • Immediate error response
  • No automatic retries
  • Simpler error debugging
  • Direct error messages

Migration Between Modes

From Synchronous to Celery

  1. Set USE_CELERY=true
  2. Start Redis and Celery workers
  3. Existing processed recordings work normally
  4. New uploads use background processing

From Celery to Synchronous

  1. Set USE_CELERY=false
  2. Stop Celery workers (optional)
  3. Existing queued tasks will fail
  4. New uploads use synchronous processing

Note: In-progress Celery tasks will fail when switching to synchronous mode. Complete or cancel them first.

Best Practices

Development

  • Use synchronous mode for quick testing
  • Use Celery mode when testing full workflow
  • Monitor logs for processing errors

Production

  • Always use Celery mode
  • Set up proper monitoring
  • Configure retry mechanisms
  • Use multiple workers for scalability

Testing

  • Test both modes in CI/CD
  • Verify fallback behavior
  • Test error scenarios in both modes

Shrutik System Flowcharts

This directory contains visual documentation of Shrutik’s system flows and processes using Mermaid diagrams. These flowcharts help developers and contributors understand the system architecture and data flow.

Available Flowcharts

Core System Flows

Technical Flows

How to Read These Diagrams

Symbols and Conventions

  • Rectangles: Processes or services
  • Diamonds: Decision points
  • Circles: Start/end points
  • Cylinders: Databases or storage
  • Clouds: External services
  • Arrows: Data flow direction

Color Coding

  • Blue: User interactions
  • Green: Successful operations
  • Red: Error conditions
  • Yellow: Processing/waiting states
  • Purple: External services

🔧 Updating Flowcharts

When making changes to the system:

  1. Review Affected Diagrams: Check which flowcharts need updates
  2. Update Mermaid Code: Modify the diagram code
  3. Test Rendering: Ensure diagrams render correctly
  4. Update Documentation: Sync with code changes

Mermaid Syntax Reference

graph TD
    A[Start] --> B{Decision?}
    B -->|Yes| C[Process]
    B -->|No| D[Alternative]
    C --> E[End]
    D --> E

📚 Additional Resources

Contributing

To contribute new flowcharts or update existing ones:

  1. Follow the naming convention: kebab-case.md
  2. Include a description and context
  3. Use consistent styling and colors
  4. Test diagram rendering
  5. Update this README if adding new diagrams

These visual guides complement our technical documentation and help make Shrutik more accessible to contributors of all backgrounds.

Voice Recording Flow

This flowchart details the complete process of voice recording in Shrutik, from user interaction to final storage and processing.

Complete Voice Recording Process

flowchart TD
    START([User Starts Recording]) --> AUTH{User Authenticated?}
    
    AUTH -->|No| LOGIN[Redirect to Login]
    LOGIN --> AUTH
    AUTH -->|Yes| SCRIPT[Get Script for Recording]
    
    SCRIPT --> PERM{Microphone Permission?}
    PERM -->|No| REQ_PERM[Request Microphone Access]
    REQ_PERM --> PERM_GRANT{Permission Granted?}
    PERM_GRANT -->|No| ERROR_PERM[Show Permission Error]
    PERM_GRANT -->|Yes| PERM
    
    PERM -->|Yes| SETUP[Setup Audio Recording]
    SETUP --> DISPLAY[Display Script Text]
    DISPLAY --> READY[Show Record Button]
    
    READY --> RECORD_START[User Clicks Record]
    RECORD_START --> RECORDING[Recording Audio...]
    
    RECORDING --> MONITOR{Monitor Recording}
    MONITOR --> CHECK_DURATION{Duration < Max?}
    CHECK_DURATION -->|No| AUTO_STOP[Auto Stop Recording]
    CHECK_DURATION -->|Yes| USER_STOP{User Stops?}
    
    USER_STOP -->|No| MONITOR
    USER_STOP -->|Yes| STOP_REC[Stop Recording]
    AUTO_STOP --> STOP_REC
    
    STOP_REC --> VALIDATE[Validate Audio]
    VALIDATE --> VALID{Audio Valid?}
    
    VALID -->|No| ERROR_AUDIO[Show Audio Error]
    ERROR_AUDIO --> READY
    
    VALID -->|Yes| PREVIEW[Show Audio Preview]
    PREVIEW --> USER_ACTION{User Action}
    
    USER_ACTION -->|Re-record| READY
    USER_ACTION -->|Cancel| CANCEL[Cancel Recording]
    USER_ACTION -->|Submit| PREPARE[Prepare Upload]
    
    PREPARE --> CREATE_SESSION[Create Recording Session]
    CREATE_SESSION --> SESSION_VALID{Session Created?}
    
    SESSION_VALID -->|No| ERROR_SESSION[Session Creation Error]
    SESSION_VALID -->|Yes| UPLOAD[Upload Audio File]
    
    UPLOAD --> UPLOAD_PROGRESS[Show Upload Progress]
    UPLOAD_PROGRESS --> UPLOAD_COMPLETE{Upload Complete?}
    
    UPLOAD_COMPLETE -->|No| UPLOAD_ERROR[Upload Error]
    UPLOAD_ERROR --> RETRY{Retry Upload?}
    RETRY -->|Yes| UPLOAD
    RETRY -->|No| CANCEL
    
    UPLOAD_COMPLETE -->|Yes| SAVE_DB[Save to Database]
    SAVE_DB --> QUEUE_PROCESSING[Queue for Processing]
    QUEUE_PROCESSING --> SUCCESS[Show Success Message]
    
    SUCCESS --> NEXT_ACTION{User Next Action}
    NEXT_ACTION -->|Record Another| SCRIPT
    NEXT_ACTION -->|View Progress| DASHBOARD[Go to Dashboard]
    NEXT_ACTION -->|Logout| LOGOUT[Logout User]
    
    CANCEL --> CLEANUP[Cleanup Resources]
    ERROR_PERM --> CLEANUP
    ERROR_SESSION --> CLEANUP
    CLEANUP --> END([End])
    
    DASHBOARD --> END
    LOGOUT --> END

    %% Background Processing (Async)
    QUEUE_PROCESSING -.-> BG_START[Background Processing Starts]
    BG_START -.-> VALIDATE_FILE[Validate Audio File]
    VALIDATE_FILE -.-> CHUNK_AUDIO[Intelligent Audio Chunking]
    CHUNK_AUDIO -.-> SAVE_CHUNKS[Save Audio Chunks]
    SAVE_CHUNKS -.-> UPDATE_STATUS[Update Recording Status]
    UPDATE_STATUS -.-> NOTIFY_USER[Notify User of Completion]

    %% Styling
    classDef userAction fill:#e3f2fd
    classDef process fill:#e8f5e8
    classDef decision fill:#fff3e0
    classDef error fill:#ffebee
    classDef success fill:#e0f2f1
    classDef background fill:#f3e5f5

    class START,RECORD_START,USER_STOP,USER_ACTION,NEXT_ACTION userAction
    class SETUP,DISPLAY,RECORDING,VALIDATE,PREPARE,UPLOAD,SAVE_DB process
    class AUTH,PERM,PERM_GRANT,CHECK_DURATION,VALID,SESSION_VALID,UPLOAD_COMPLETE,RETRY decision
    class ERROR_PERM,ERROR_AUDIO,ERROR_SESSION,UPLOAD_ERROR error
    class SUCCESS,NOTIFY_USER success
    class BG_START,VALIDATE_FILE,CHUNK_AUDIO,SAVE_CHUNKS,UPDATE_STATUS background

Process Breakdown

1. User Authentication & Setup

sequenceDiagram
    participant U as User
    participant F as Frontend
    participant B as Backend
    participant DB as Database

    U->>F: Access Recording Page
    F->>B: Check Authentication
    B->>DB: Validate Session
    DB-->>B: User Data
    B-->>F: Authentication Status
    
    alt Not Authenticated
        F->>U: Redirect to Login
        U->>F: Login Credentials
        F->>B: Authenticate User
        B-->>F: JWT Token
    end
    
    F->>B: Request Script
    B->>DB: Get Available Script
    DB-->>B: Script Data
    B-->>F: Script Content
    F->>U: Display Script

2. Audio Recording Process

sequenceDiagram
    participant U as User
    participant F as Frontend
    participant M as MediaRecorder
    participant V as Validation

    U->>F: Click Record Button
    F->>M: Start Recording
    M-->>F: Recording Started
    F->>U: Show Recording UI
    
    loop During Recording
        M->>F: Audio Data Chunks
        F->>V: Validate Duration
        V-->>F: Status Update
    end
    
    U->>F: Stop Recording
    F->>M: Stop Recording
    M-->>F: Final Audio Blob
    F->>V: Validate Audio Quality
    V-->>F: Validation Result
    F->>U: Show Preview/Options

3. File Upload & Processing

sequenceDiagram
    participant U as User
    participant F as Frontend
    participant B as Backend
    participant S as Storage
    participant Q as Queue
    participant W as Worker

    U->>F: Submit Recording
    F->>B: Create Recording Session
    B-->>F: Session ID
    
    F->>B: Upload Audio File
    B->>S: Store Audio File
    S-->>B: File Path
    B->>Q: Queue Processing Job
    B-->>F: Upload Success
    F->>U: Show Success Message
    
    Q->>W: Process Audio File
    W->>S: Read Audio File
    W->>W: Validate & Chunk Audio
    W->>S: Save Audio Chunks
    W->>B: Update Status
    B->>F: Notify Completion (WebSocket)
    F->>U: Show Processing Complete

🔍 Validation Steps

Audio Quality Validation

  • Duration Check: 1-60 seconds
  • Format Validation: Supported audio formats
  • File Size: Maximum 100MB
  • Sample Rate: Minimum quality requirements
  • Noise Level: Basic noise detection

Security Validation

  • File Type: MIME type verification
  • Malware Scan: Basic security checks
  • User Permissions: Recording quota limits
  • Session Validation: Valid recording session

Performance Optimizations

Frontend Optimizations

  • Progressive Upload: Chunked file upload
  • Compression: Client-side audio compression
  • Caching: Cache user preferences and scripts
  • Offline Support: Queue recordings when offline

Backend Optimizations

  • Async Processing: Background job processing
  • Connection Pooling: Database connection optimization
  • Caching: Redis caching for frequent data
  • CDN Integration: Optimized file delivery

Error Handling

Common Error Scenarios

  1. Microphone Access Denied

    • Show clear instructions
    • Provide alternative options
    • Guide user through browser settings
  2. Network Connection Issues

    • Implement retry logic
    • Show connection status
    • Queue uploads for later
  3. File Upload Failures

    • Automatic retry with exponential backoff
    • Resume interrupted uploads
    • Clear error messages
  4. Audio Quality Issues

    • Real-time quality feedback
    • Recording tips and guidance
    • Option to re-record

Error Recovery

flowchart LR
    ERROR[Error Occurs] --> LOG[Log Error Details]
    LOG --> CLASSIFY{Error Type}
    
    CLASSIFY -->|Network| RETRY[Automatic Retry]
    CLASSIFY -->|Validation| USER_FIX[User Action Required]
    CLASSIFY -->|System| FALLBACK[Fallback Method]
    
    RETRY --> SUCCESS{Retry Success?}
    SUCCESS -->|Yes| CONTINUE[Continue Process]
    SUCCESS -->|No| USER_FIX
    
    USER_FIX --> GUIDE[Show User Guidance]
    FALLBACK --> ALTERNATIVE[Alternative Flow]
    
    GUIDE --> CONTINUE
    ALTERNATIVE --> CONTINUE

Monitoring & Analytics

Key Metrics

  • Recording Success Rate: Percentage of successful recordings
  • Average Recording Duration: User engagement metrics
  • Upload Success Rate: Technical performance metrics
  • Processing Time: Background job performance
  • Error Rates: System reliability metrics

User Experience Metrics

  • Time to First Recording: Onboarding effectiveness
  • Recording Abandonment Rate: UX friction points
  • Retry Attempts: Error recovery effectiveness
  • User Satisfaction: Quality ratings and feedback

This comprehensive flow ensures a smooth, reliable voice recording experience while maintaining high quality standards and robust error handling.

Transcription Workflow

Interactive Diagrams
Zoom & Pan Enabled

This flowchart details the complete transcription process in Shrutik, including task assignment, transcription submission, consensus building, and quality control.

💡 Pro Tip: All diagrams below are interactive! Use your mouse wheel to zoom, drag to pan, and double-click to reset. Click the fullscreen button for a better view of complex diagrams.

Complete Transcription Process

flowchart TD
    START([User Requests Transcription Task]) --> AUTH{User Authenticated?}
    
    AUTH -->|No| LOGIN[Redirect to Login]
    LOGIN --> AUTH
    AUTH -->|Yes| REQ_TASK[Request Transcription Task]
    
    REQ_TASK --> TASK_PARAMS[Specify Task Parameters]
    TASK_PARAMS --> FIND_CHUNKS[Find Available Chunks]
    
    FIND_CHUNKS --> FILTER[Apply Filters]
    FILTER --> EXCLUDE[Exclude User's Previous Work]
    EXCLUDE --> AVAILABLE{Chunks Available?}
    
    AVAILABLE -->|No| NO_CHUNKS[No Chunks Available]
    NO_CHUNKS --> SUGGEST[Suggest Alternatives]
    SUGGEST --> END_NO_WORK([End - No Work])
    
    AVAILABLE -->|Yes| SELECT[Select Random Chunks]
    SELECT --> CREATE_SESSION[Create Transcription Session]
    CREATE_SESSION --> LOAD_AUDIO[Load Audio Files]
    
    LOAD_AUDIO --> OPTIMIZE[Optimize Audio Delivery]
    OPTIMIZE --> PRESENT[Present Chunks to User]
    
    PRESENT --> USER_WORK[User Transcribes Audio]
    USER_WORK --> TRANSCRIBE{Transcription Action}
    
    TRANSCRIBE -->|Skip Chunk| SKIP_CHUNK[Record Skip Reason]
    TRANSCRIBE -->|Transcribe| ENTER_TEXT[Enter Transcription Text]
    TRANSCRIBE -->|Submit All| VALIDATE_SUBMISSION[Validate Submission]
    
    SKIP_CHUNK --> UPDATE_SKIP[Update Skip Metadata]
    UPDATE_SKIP --> NEXT_CHUNK{More Chunks?}
    
    ENTER_TEXT --> QUALITY_RATE[Rate Audio Quality]
    QUALITY_RATE --> CONFIDENCE[Set Confidence Level]
    CONFIDENCE --> SAVE_DRAFT[Save Draft Locally]
    SAVE_DRAFT --> NEXT_CHUNK
    
    NEXT_CHUNK -->|Yes| PRESENT
    NEXT_CHUNK -->|No| VALIDATE_SUBMISSION
    
    VALIDATE_SUBMISSION --> CHECK_REQUIRED{Required Fields?}
    CHECK_REQUIRED -->|Missing| SHOW_ERRORS[Show Validation Errors]
    SHOW_ERRORS --> USER_WORK
    
    CHECK_REQUIRED -->|Complete| SUBMIT[Submit Transcriptions]
    SUBMIT --> PROCESS_SUBMISSION[Process Submission]
    
    PROCESS_SUBMISSION --> VALIDATE_SESSION{Valid Session?}
    VALIDATE_SESSION -->|No| SESSION_ERROR[Session Error]
    SESSION_ERROR --> ERROR_RECOVERY[Error Recovery]
    
    VALIDATE_SESSION -->|Yes| CHECK_DUPLICATES{Check Duplicates}
    CHECK_DUPLICATES -->|Found| DUPLICATE_ERROR[Duplicate Error]
    DUPLICATE_ERROR --> ERROR_RECOVERY
    
    CHECK_DUPLICATES -->|None| SAVE_TRANSCRIPTIONS[Save Transcriptions]
    SAVE_TRANSCRIPTIONS --> UPDATE_STATS[Update User Stats]
    UPDATE_STATS --> TRIGGER_CONSENSUS[Trigger Consensus Calculation]
    
    TRIGGER_CONSENSUS --> SUCCESS[Show Success Message]
    SUCCESS --> CLEANUP_SESSION[Cleanup Session]
    CLEANUP_SESSION --> NEXT_ACTION{User Next Action}
    
    NEXT_ACTION -->|Continue| REQ_TASK
    NEXT_ACTION -->|View Progress| DASHBOARD[Go to Dashboard]
    NEXT_ACTION -->|Logout| LOGOUT[Logout User]
    
    ERROR_RECOVERY --> RETRY{Retry Submission?}
    RETRY -->|Yes| SUBMIT
    RETRY -->|No| SAVE_DRAFT
    
    DASHBOARD --> END_SUCCESS([End - Success])
    LOGOUT --> END_SUCCESS

    %% Background Consensus Process
    TRIGGER_CONSENSUS -.-> BG_CONSENSUS[Background Consensus Process]
    BG_CONSENSUS -.-> COLLECT_TRANSCRIPTIONS[Collect All Transcriptions for Chunk]
    COLLECT_TRANSCRIPTIONS -.-> CALCULATE_SIMILARITY[Calculate Text Similarity]
    CALCULATE_SIMILARITY -.-> WEIGHT_QUALITY[Weight by Quality Scores]
    WEIGHT_QUALITY -.-> DETERMINE_CONSENSUS[Determine Consensus Text]
    DETERMINE_CONSENSUS -.-> UPDATE_CONSENSUS[Update Consensus in Database]
    UPDATE_CONSENSUS -.-> NOTIFY_CONTRIBUTORS[Notify Contributors]

    %% Styling
    classDef userAction fill:#e3f2fd
    classDef process fill:#e8f5e8
    classDef decision fill:#fff3e0
    classDef error fill:#ffebee
    classDef success fill:#e0f2f1
    classDef background fill:#f3e5f5

    class START,USER_WORK,TRANSCRIBE,NEXT_ACTION userAction
    class REQ_TASK,FIND_CHUNKS,SELECT,LOAD_AUDIO,SAVE_TRANSCRIPTIONS process
    class AUTH,AVAILABLE,CHECK_REQUIRED,VALIDATE_SESSION,CHECK_DUPLICATES decision
    class SESSION_ERROR,DUPLICATE_ERROR,SHOW_ERRORS error
    class SUCCESS,NOTIFY_CONTRIBUTORS success
    class BG_CONSENSUS,COLLECT_TRANSCRIPTIONS,CALCULATE_SIMILARITY,WEIGHT_QUALITY background

Task Assignment Algorithm

flowchart LR
    subgraph "Task Request Parameters"
        LANG[Language Preference]
        QTY[Quantity Requested]
        SKIP[Skip List]
        DIFFICULTY[Difficulty Level]
    end
    
    subgraph "Filtering Process"
        ALL_CHUNKS[All Available Chunks]
        FILTER_LANG[Filter by Language]
        FILTER_USER[Exclude User's Work]
        FILTER_SKIP[Exclude Skip List]
        FILTER_STATUS[Filter by Status]
        PRIORITIZE[Prioritize by Need]
    end
    
    subgraph "Selection Strategy"
        RANDOM[Random Selection]
        BALANCED[Balance Difficulty]
        QUALITY[Quality Distribution]
        FINAL[Final Chunk List]
    end
    
    LANG --> FILTER_LANG
    QTY --> RANDOM
    SKIP --> FILTER_SKIP
    DIFFICULTY --> BALANCED
    
    ALL_CHUNKS --> FILTER_LANG
    FILTER_LANG --> FILTER_USER
    FILTER_USER --> FILTER_SKIP
    FILTER_SKIP --> FILTER_STATUS
    FILTER_STATUS --> PRIORITIZE
    
    PRIORITIZE --> RANDOM
    RANDOM --> BALANCED
    BALANCED --> QUALITY
    QUALITY --> FINAL

Consensus Algorithm

flowchart TD
    CHUNK[Audio Chunk] --> COLLECT[Collect All Transcriptions]
    COLLECT --> COUNT{Transcription Count}
    
    COUNT -->|< 3| NEED_MORE[Need More Transcriptions]
    COUNT -->|≥ 3| ANALYZE[Analyze Transcriptions]
    
    ANALYZE --> SIMILARITY[Calculate Text Similarity]
    SIMILARITY --> CLUSTER[Group Similar Transcriptions]
    CLUSTER --> WEIGHT[Apply Quality Weights]
    
    WEIGHT --> SCORE[Calculate Consensus Scores]
    SCORE --> THRESHOLD{Above Threshold?}
    
    THRESHOLD -->|No| NEED_MORE
    THRESHOLD -->|Yes| SELECT_CONSENSUS[Select Consensus Text]
    
    SELECT_CONSENSUS --> VALIDATE_CONSENSUS[Validate Consensus Quality]
    VALIDATE_CONSENSUS --> MARK_COMPLETE[Mark Chunk as Complete]
    
    NEED_MORE --> PRIORITY[Increase Priority for Assignment]
    MARK_COMPLETE --> UPDATE_CONTRIBUTORS[Update Contributor Stats]
    
    %% Consensus Calculation Details
    subgraph "Similarity Calculation"
        LEVENSHTEIN[Levenshtein Distance]
        SEMANTIC[Semantic Similarity]
        PHONETIC[Phonetic Matching]
        COMBINED[Combined Score]
    end
    
    SIMILARITY --> LEVENSHTEIN
    SIMILARITY --> SEMANTIC
    SIMILARITY --> PHONETIC
    LEVENSHTEIN --> COMBINED
    SEMANTIC --> COMBINED
    PHONETIC --> COMBINED
    COMBINED --> CLUSTER

Quality Control Process

sequenceDiagram
    participant U as User
    participant F as Frontend
    participant B as Backend
    participant Q as Quality Engine
    participant DB as Database
    participant N as Notification

    U->>F: Submit Transcription
    F->>B: Send Transcription Data
    B->>DB: Save Transcription
    
    B->>Q: Trigger Quality Check
    Q->>DB: Get Related Transcriptions
    Q->>Q: Calculate Quality Metrics
    
    alt Quality Issues Detected
        Q->>DB: Flag for Review
        Q->>N: Notify Moderators
    else Quality Acceptable
        Q->>Q: Update Quality Score
    end
    
    Q->>B: Quality Assessment Complete
    B->>F: Update UI Status
    F->>U: Show Completion Status
    
    Note over Q: Background Consensus Process
    Q->>Q: Check Consensus Threshold
    
    alt Consensus Reached
        Q->>DB: Update Consensus Text
        Q->>N: Notify Contributors
    else Need More Transcriptions
        Q->>DB: Increase Chunk Priority
    end

Progress Tracking

Individual User Progress

graph LR
    subgraph "User Metrics"
        TOTAL[Total Transcriptions]
        ACCURACY[Accuracy Rate]
        SPEED[Average Speed]
        QUALITY[Quality Score]
    end
    
    subgraph "Achievements"
        BADGES[Achievement Badges]
        LEVELS[Experience Levels]
        STREAKS[Contribution Streaks]
        RANKINGS[Leaderboards]
    end
    
    TOTAL --> BADGES
    ACCURACY --> LEVELS
    SPEED --> RANKINGS
    QUALITY --> STREAKS

System-wide Progress

graph TD
    subgraph "Dataset Metrics"
        CHUNKS_TOTAL[Total Audio Chunks]
        CHUNKS_TRANSCRIBED[Transcribed Chunks]
        CONSENSUS_REACHED[Consensus Achieved]
        QUALITY_VALIDATED[Quality Validated]
    end
    
    subgraph "Language Coverage"
        LANG_SUPPORTED[Supported Languages]
        DIALECT_COVERAGE[Dialect Coverage]
        SPEAKER_DIVERSITY[Speaker Diversity]
        DOMAIN_COVERAGE[Domain Coverage]
    end
    
    CHUNKS_TOTAL --> CHUNKS_TRANSCRIBED
    CHUNKS_TRANSCRIBED --> CONSENSUS_REACHED
    CONSENSUS_REACHED --> QUALITY_VALIDATED
    
    QUALITY_VALIDATED --> LANG_SUPPORTED
    LANG_SUPPORTED --> DIALECT_COVERAGE
    DIALECT_COVERAGE --> SPEAKER_DIVERSITY
    SPEAKER_DIVERSITY --> DOMAIN_COVERAGE

Optimization Strategies

Performance Optimizations

  • Caching: Cache frequently accessed chunks and user data
  • Preloading: Preload next chunks while user works on current ones
  • CDN: Optimize audio delivery through CDN
  • Compression: Compress audio for faster loading

User Experience Optimizations

  • Smart Assignment: Assign chunks based on user expertise and preferences
  • Progress Indicators: Clear progress tracking and feedback
  • Keyboard Shortcuts: Efficient transcription interface
  • Auto-save: Prevent data loss with automatic saving

Quality Optimizations

  • Difficulty Balancing: Mix easy and challenging chunks
  • Context Provision: Provide helpful context and hints
  • Real-time Feedback: Immediate quality feedback
  • Consensus Weighting: Weight transcriptions by contributor reliability

Error Handling & Recovery

Common Error Scenarios

  1. Session Timeout

    • Auto-save work in progress
    • Seamless session renewal
    • Recovery of unsaved work
  2. Network Interruption

    • Offline work capability
    • Automatic retry mechanisms
    • Queue submissions for later
  3. Audio Loading Issues

    • Fallback audio formats
    • Progressive loading
    • Error reporting and alternatives
  4. Consensus Conflicts

    • Human review escalation
    • Weighted voting systems
    • Quality threshold adjustments

Recovery Mechanisms

flowchart LR
    ERROR[Error Detected] --> CLASSIFY{Error Classification}
    
    CLASSIFY -->|Temporary| AUTO_RETRY[Automatic Retry]
    CLASSIFY -->|User Error| USER_GUIDANCE[User Guidance]
    CLASSIFY -->|System Error| ESCALATE[Escalate to Support]
    
    AUTO_RETRY --> SUCCESS{Retry Success?}
    SUCCESS -->|Yes| CONTINUE[Continue Process]
    SUCCESS -->|No| USER_GUIDANCE
    
    USER_GUIDANCE --> RESOLVED{Issue Resolved?}
    RESOLVED -->|Yes| CONTINUE
    RESOLVED -->|No| ESCALATE
    
    ESCALATE --> SUPPORT[Support Intervention]
    SUPPORT --> CONTINUE

This comprehensive transcription workflow ensures high-quality data collection while providing an engaging and efficient experience for contributors.

Shrutik System Architecture

This diagram shows the high-level architecture of the Shrutik voice data collection platform, including all major components and their interactions.

Overall System Architecture

graph TB
    subgraph "Client Layer"
        WEB[Web Browser]
        MOBILE[Mobile App]
        API_CLIENT[API Client]
    end

    subgraph "Load Balancer & Proxy"
        NGINX[Nginx<br/>Load Balancer]
    end

    subgraph "Application Layer"
        FRONTEND[React Frontend<br/>Next.js]
        BACKEND[FastAPI Backend<br/>Python]
        WORKER[Celery Workers<br/>Background Jobs]
    end

    subgraph "Caching Layer"
        REDIS[(Redis<br/>Cache & Queue)]
    end

    subgraph "Database Layer"
        POSTGRES[(PostgreSQL<br/>Primary Database)]
        REPLICA[(PostgreSQL<br/>Read Replica)]
    end

    subgraph "Storage Layer"
        LOCAL_STORAGE[Local File Storage<br/>Audio Files]
        CDN[CDN<br/>Static Assets]
        BACKUP[Backup Storage<br/>S3/MinIO]
    end

    subgraph "External Services"
        SMTP[Email Service<br/>SMTP]
        MONITORING[Monitoring<br/>Prometheus/Grafana]
        LOGGING[Logging<br/>ELK Stack]
    end

    subgraph "Processing Pipeline"
        AUDIO_PROC[Audio Processing<br/>Librosa/PyDub]
        CONSENSUS[Consensus Engine<br/>Quality Control]
        EXPORT[Data Export<br/>Multiple Formats]
    end

    %% Client connections
    WEB --> NGINX
    MOBILE --> NGINX
    API_CLIENT --> NGINX

    %% Load balancer routing
    NGINX --> FRONTEND
    NGINX --> BACKEND

    %% Application connections
    FRONTEND --> BACKEND
    BACKEND --> REDIS
    BACKEND --> POSTGRES
    BACKEND --> REPLICA
    WORKER --> REDIS
    WORKER --> POSTGRES
    WORKER --> LOCAL_STORAGE

    %% Processing connections
    WORKER --> AUDIO_PROC
    WORKER --> CONSENSUS
    BACKEND --> EXPORT

    %% Storage connections
    BACKEND --> LOCAL_STORAGE
    FRONTEND --> CDN
    LOCAL_STORAGE --> BACKUP

    %% External service connections
    BACKEND --> SMTP
    BACKEND --> MONITORING
    BACKEND --> LOGGING

    %% Styling
    classDef client fill:#e1f5fe
    classDef app fill:#e8f5e8
    classDef data fill:#fff3e0
    classDef external fill:#f3e5f5
    classDef processing fill:#e0f2f1

    class WEB,MOBILE,API_CLIENT client
    class FRONTEND,BACKEND,WORKER app
    class POSTGRES,REPLICA,REDIS,LOCAL_STORAGE data
    class SMTP,MONITORING,LOGGING external
    class AUDIO_PROC,CONSENSUS,EXPORT processing

Component Descriptions

Client Layer

  • Web Browser: Primary interface for contributors using React/Next.js frontend
  • Mobile App: Future mobile application for voice contributions
  • API Client: External integrations and automated systems

Load Balancer & Proxy

  • Nginx: Handles SSL termination, load balancing, and static file serving
  • Routes requests to appropriate backend services
  • Implements rate limiting and security headers

Application Layer

  • React Frontend: User interface built with Next.js and TypeScript
  • FastAPI Backend: RESTful API server with automatic documentation
  • Celery Workers: Background job processing for audio tasks

Caching Layer

  • Redis: Serves multiple purposes:
    • Session storage and caching
    • Message queue for Celery
    • Rate limiting counters
    • Real-time data caching

Database Layer

  • PostgreSQL Primary: Main database for all application data
  • PostgreSQL Replica: Read-only replica for analytics and reporting
  • Supports horizontal scaling and high availability

Storage Layer

  • Local File Storage: Audio files and uploads stored locally or on network storage
  • CDN: Content delivery network for static assets and optimized audio delivery
  • Backup Storage: Automated backups to S3-compatible storage

External Services

  • Email Service: SMTP for user notifications and system alerts
  • Monitoring: Prometheus and Grafana for system monitoring
  • Logging: Centralized logging with ELK stack or similar

Processing Pipeline

  • Audio Processing: Intelligent audio chunking and format conversion
  • Consensus Engine: Quality control and transcription consensus algorithms
  • Data Export: Multiple format support for dataset export

Data Flow Patterns

1. Voice Recording Flow

User → Frontend → Backend → Storage → Worker → Audio Processing → Database

2. Transcription Flow

User → Frontend → Backend → Database → Consensus Engine → Quality Metrics

3. API Request Flow

Client → Nginx → Backend → Cache/Database → Response → Client

🚀 Scalability Considerations

Horizontal Scaling

  • Frontend: Multiple instances behind load balancer
  • Backend: Stateless API servers can be scaled horizontally
  • Workers: Auto-scaling based on queue length
  • Database: Read replicas for query distribution

Performance Optimization

  • Caching: Multi-layer caching strategy with Redis
  • CDN: Global content delivery for static assets
  • Database: Connection pooling and query optimization
  • Background Jobs: Async processing for heavy operations

High Availability

  • Load Balancing: Multiple instances of each service
  • Database Replication: Master-slave setup with failover
  • Health Checks: Automated monitoring and alerting
  • Backup Strategy: Regular automated backups

Security Architecture

Authentication & Authorization

  • JWT-based authentication with refresh tokens
  • Role-based access control (RBAC)
  • API key authentication for external clients

Data Protection

  • HTTPS/TLS encryption for all communications
  • Database encryption at rest
  • Secure file upload validation
  • Input sanitization and validation

Network Security

  • Firewall rules and network segmentation
  • Rate limiting and DDoS protection
  • Security headers and CORS configuration
  • Regular security audits and updates

Monitoring & Observability

Metrics Collection

  • Application performance metrics
  • System resource monitoring
  • Business metrics and analytics
  • Error tracking and alerting

Logging Strategy

  • Structured logging with correlation IDs
  • Centralized log aggregation
  • Log retention and archival policies
  • Security event logging

Health Checks

  • Service health endpoints
  • Database connectivity checks
  • External service dependency monitoring
  • Automated failover mechanisms

Deployment Architecture

Development Environment

  • Local development with Docker Compose
  • Hot reload for rapid development
  • Isolated test databases
  • Mock external services

Staging Environment

  • Production-like environment for testing
  • Automated deployment pipeline
  • Integration testing
  • Performance testing

Production Environment

  • Multi-zone deployment for high availability
  • Blue-green deployment strategy
  • Automated rollback capabilities
  • Comprehensive monitoring and alerting

This architecture supports Shrutik’s mission of democratizing voice technology while maintaining high performance, security, and scalability standards.

Contributing to Shrutik

Thank you for your interest in contributing to Shrutik! This guide will help you get started with contributing to our open-source voice data collection platform.

Ways to Contribute

Voice Data Contribution

  • Record Voice Samples: Contribute voice recordings in your native language
  • Transcribe Audio: Help transcribe audio clips to improve dataset quality
  • Quality Review: Review and validate transcriptions from other contributors
  • Language Support: Help add support for new languages and dialects

Code Contribution

  • Bug Fixes: Fix reported issues and improve stability
  • Feature Development: Implement new features and enhancements
  • Performance Optimization: Improve system performance and scalability
  • Testing: Write and improve test coverage
  • Documentation: Improve code documentation and API references

Documentation

  • User Guides: Improve setup and usage documentation
  • Developer Docs: Enhance technical documentation
  • Translations: Translate documentation to other languages
  • Tutorials: Create tutorials and examples

Design & UX

  • UI/UX Improvements: Enhance user interface and experience
  • Accessibility: Improve accessibility features
  • Mobile Responsiveness: Optimize for mobile devices
  • Branding: Improve visual design and branding

Getting Started

1. Set Up Development Environment

Follow our Local Development Guide to set up your development environment. also you can setup with docker as well. See Docker Local Setup

2. Find an Issue

  • Browse open issues
  • Look for issues labeled good first issue for beginners
  • Check issues labeled help wanted for areas needing assistance
  • Join our Discord to discuss ideas

3. Fork and Clone

# Fork the repository on GitHub
# Then clone your fork
git clone https://github.com/YOUR_USERNAME/shrutik.git
cd shrutik

# Add upstream remote
git remote add upstream https://github.com/Onuronon-lab/Shrutik.git

Development Workflow

1. Create a Branch

Important: All PRs must be submitted to the deployment-dev branch, not master.

Before starting development, please review our Engineering Conventions for branch naming, commit messages, and coding standards.

# Update deployment-dev branch
git checkout deployment-dev
git pull origin deployment-dev

# Create a feature branch following our naming convention
git checkout -b feat/your-feature-name
# or for bug fixes
git checkout -b fix/issue-number-description

2. Make Changes

  • Write code
  • Add tests for new functionality
  • Update documentation as needed
  • Ensure all tests pass

3. Commit Changes

Follow our Engineering Conventions for commit message format.

# Stage your changes
git add .

# Commit with a descriptive message following conventional commits
git commit -m "feat: add voice recording validation

- Add audio quality validation
- Implement duration checks
- Add error handling for invalid formats
- Update tests and documentation

Fixes #123"

4. Push and Create PR

# Push to your fork
git push origin feature/your-feature-name

# Create a Pull Request to deployment-dev on GitHub

PR Guidelines:

  • Target the deployment-dev branch (not master!)
  • Fill out the PR template completely
  • Ensure all CI checks pass
  • Code must be formatted (see Code Formatting section)

Code Formatting

We use automated code formatters to maintain consistent code style and eliminate formatting-related merge conflicts.

Tools & Configuration

  • Backend (Python): Black (88 chars), isort, flake8
  • Frontend (TypeScript/React): Prettier (100 chars), ESLint

Quick Setup

1. Install formatting tools:

pip install black isort flake8
cd frontend && npm install && cd ..

2. Set up pre-commit hooks (recommended):

./scripts/setup_pre_commit.sh

This auto-formats your code on every commit!

Using Pre-commit Hooks

Once set up, just commit normally:

git add .
git commit -m "feat: your changes"
# ✨ Code is automatically formatted before commit!

Before Submitting a PR

If not using pre-commit hooks, format manually:

# Format entire codebase
./scripts/format_code.sh

# Review changes
git diff

# Commit and push
git add .
git commit -m "style: format code"
git push

Manual Formatting Commands

# Format everything
./scripts/format_code.sh

# Backend only
black app/ tests/ scripts/
isort app/ tests/ scripts/

# Frontend only
cd frontend
npm run format
npm run lint:fix

CI/CD Checks

Our GitHub Actions workflow automatically checks formatting on all PRs to deployment-dev. If formatting fails:

./scripts/format_code.sh
git add .
git commit -m "style: fix formatting"
git push

Skipping Hooks (Emergency Only)

git commit --no-verify -m "emergency fix"

Note: Use sparingly! The CI will still check formatting.

Troubleshooting

ProblemSolution
Tools not foundpip install black isort flake8
Prettier not foundcd frontend && npm install
Hooks not runningpre-commit install

Style Guidelines

Python

# ✅ Good (Black formatted)
def calculate_total(items: list[dict], tax_rate: float = 0.1) -> float:
    """Calculate total with tax."""
    subtotal = sum(item["price"] for item in items)
    return subtotal * (1 + tax_rate)

TypeScript/React

// ✅ Good (Prettier formatted)
const UserCard = ({ name, email }: UserCardProps) => {
  return (
    <div className="user-card">
      <h2>{name}</h2>
      <p>{email}</p>
    </div>
  );
};

Benefits:

  • ✅ Zero formatting conflicts in PRs
  • ✅ Faster code reviews (focus on logic)
  • ✅ Consistent codebase
  • ✅ Automatic on every commit

For more details, see docs/FORMATTING.md

Commit Message Guidelines

We follow the Conventional Commits specification as outlined in our Engineering Conventions:

<type>[optional scope]: <description>

[optional body]

[optional footer(s)]

Types

  • feat: New feature
  • fix: Bug fix
  • docs: Documentation changes
  • style: Code style changes (formatting, etc.)
  • refactor: Code refactoring
  • test: Adding or updating tests
  • chore: Maintenance tasks

Examples

feat(auth): add OAuth2 authentication
fix(api): resolve transcription submission error
docs(readme): update installation instructions
test(voice): add unit tests for audio processing

Testing Guidelines

Running Tests

# Backend tests
pytest

# Frontend tests
cd frontend && npm test

# Integration tests
pytest tests/integration/

# E2E tests
cd frontend && npm run test:e2e

Writing Tests

Backend Tests (Python)

# tests/test_transcription.py
import pytest
from app.services.transcription_service import TranscriptionService

def test_create_transcription(db_session):
    """Test transcription creation."""
    service = TranscriptionService(db_session)
    transcription = service.create_transcription(
        chunk_id=1,
        user_id=1,
        text="Test transcription"
    )
    assert transcription.text == "Test transcription"

Frontend Tests (TypeScript/Jest)

// frontend/src/__tests__/VoiceRecorder.test.tsx
import { render, screen } from '@testing-library/react';
import VoiceRecorder from '../components/VoiceRecorder';

test('renders voice recorder component', () => {
  render(<VoiceRecorder />);
  const recordButton = screen.getByRole('button', { name: /record/i });
  expect(recordButton).toBeInTheDocument();
});

Test Coverage

  • Maintain minimum 80% test coverage
  • Write tests for all new features
  • Include edge cases and error scenarios
  • Test both happy path and error conditions

Coding Standards

Please refer to our Engineering Conventions for detailed coding standards and philosophy. The following sections provide specific implementation guidelines.

Python (Backend)

Code Style

  • Follow PEP 8 style guide
  • Use Black for code formatting
  • Use isort for import sorting
  • Use flake8 for linting
# Format code
black app/
isort app/

# Check linting
flake8 app/

Code Structure

# Good: Clear function with type hints and docstring
from typing import Optional
from sqlalchemy.orm import Session

def get_user_by_email(db: Session, email: str) -> Optional[User]:
    """
    Retrieve user by email address.

    Args:
        db: Database session
        email: User email address

    Returns:
        User object if found, None otherwise
    """
    return db.query(User).filter(User.email == email).first()

Error Handling

# Good: Specific exception handling
try:
    user = create_user(db, user_data)
except ValidationError as e:
    logger.error(f"User validation failed: {e}")
    raise HTTPException(status_code=400, detail=str(e))
except DatabaseError as e:
    logger.error(f"Database error: {e}")
    raise HTTPException(status_code=500, detail="Internal server error")

TypeScript/React (Frontend)

Code Style

  • Use Prettier for code formatting
  • Use ESLint for linting
  • Follow React best practices
  • Use TypeScript for type safety
# Format code
npm run format

# Check linting
npm run lint

Component Structure

// Good: Typed React component with proper structure
interface VoiceRecorderProps {
  onRecordingComplete: (audioBlob: Blob) => void;
  maxDuration?: number;
}

export const VoiceRecorder: React.FC<VoiceRecorderProps> = ({
  onRecordingComplete,
  maxDuration = 60
}) => {
  const [isRecording, setIsRecording] = useState(false);

  // Component logic here

  return (
    <div className="voice-recorder">
      {/* JSX here */}
    </div>
  );
};

Database

Migrations

# Good: Clear migration with proper naming
"""Add voice quality metrics

Revision ID: 001_add_voice_quality
Revises: 000_initial
Create Date: 2024-01-01 12:00:00.000000

"""
from alembic import op
import sqlalchemy as sa

def upgrade():
    op.add_column('transcriptions',
        sa.Column('quality_score', sa.Float, nullable=True))

def downgrade():
    op.drop_column('transcriptions', 'quality_score')

Documentation Standards

Code Documentation

  • Use clear, descriptive docstrings
  • Document all public functions and classes
  • Include parameter types and return values
  • Provide usage examples for complex functions

API Documentation

  • Use OpenAPI/Swagger annotations
  • Document all endpoints, parameters, and responses
  • Include example requests and responses
  • Document error codes and messages

User Documentation

  • Write clear, step-by-step instructions
  • Include screenshots and examples
  • Test all instructions on a fresh environment
  • Keep documentation up-to-date with code changes

Code Review Process

Submitting a Pull Request

  1. Title: Clear, descriptive title
  2. Description: Explain what and why
  3. Testing: Describe how you tested the changes
  4. Screenshots: Include for UI changes
  5. Breaking Changes: Document any breaking changes

PR Template

## Description

Brief description of changes

## Type of Change

- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update

## Testing

- [ ] Unit tests pass
- [ ] Integration tests pass
- [ ] Manual testing completed

## Screenshots (if applicable)

## Checklist

- [ ] Code follows style guidelines
- [ ] Self-review completed
- [ ] Documentation updated
- [ ] Tests added/updated

Review Criteria

Reviewers will check for:

  • Functionality: Does the code work as intended?
  • Code Quality: Is the code clean and maintainable?
  • Testing: Are there adequate tests?
  • Documentation: Is documentation updated?
  • Performance: Are there any performance implications?
  • Security: Are there any security concerns?

Internationalization

Adding New Languages

  1. Language Configuration: Add language to app/models/language.py
  2. Frontend Translations: Add translations to frontend/src/locales/
  3. Backend Messages: Update error messages and notifications
  4. Documentation: Translate key documentation

Translation Guidelines

  • Use proper Unicode support for all scripts
  • Test with right-to-left languages
  • Consider cultural context in translations
  • Use native speakers for translation review

🎤 Voice Data Guidelines

Recording Quality

  • Environment: Quiet, echo-free environment
  • Equipment: Good quality microphone
  • Format: WAV or high-quality MP3
  • Duration: 2-10 seconds per clip
  • Content: Clear, natural speech

Transcription Guidelines

  • Accuracy: Transcribe exactly what is spoken
  • Formatting: Follow language-specific conventions
  • Punctuation: Include appropriate punctuation
  • Quality Rating: Rate audio quality honestly

Recognition

Contributor Recognition

  • Contributors are listed in our CONTRIBUTORS.md file
  • Significant contributors may be invited to join the core team
  • We highlight contributions in our release notes
  • Annual contributor appreciation events

Badges and Achievements

  • First-time contributor badge
  • Language champion badges
  • Code contributor levels
  • Community helper recognition

Getting Help

Community Support

  • Discord: Join our server for real-time help
  • GitHub Discussions: Ask questions and share ideas
  • Office Hours: Weekly community calls (schedule in Discord)

Mentorship Program

  • New contributors can request mentorship
  • Experienced contributors can volunteer as mentors
  • Structured onboarding for major contributions

Contact

  • General Questions: community@shrutik.org
  • Technical Issues: dev@shrutik.org
  • Security Issues: security@shrutik.org (private)

📜 Code of Conduct

We are committed to providing a welcoming and inclusive environment. Please read our Code of Conduct before contributing.

Our Standards

  • Be Respectful: Treat everyone with respect and kindness
  • Be Inclusive: Welcome contributors from all backgrounds
  • Be Collaborative: Work together towards common goals
  • Be Patient: Help others learn and grow

📄 License

By contributing to Shrutik, you agree that your contributions will be licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. This ensures that all contributions remain available for educational and non-commercial use while requiring attribution to the original creators.


Thank you for contributing to Shrutik! Together, we’re building a more inclusive digital future. 🎉

Engineering Conventions & Philosophy

Why this exists

This document is not about rules for the sake of rules. It exists to reduce confusion, remove unnecessary discussion, and protect engineering quality.

Conventions are not constraints, they are agreements. Agreements let teams move fast without stepping on each other.


1. Philosophy (Read this first)

  • Engineering is about clarity, not cleverness.
  • If something needs explanation, it’s already slightly wrong.
  • Conventions exist so that:
    • No one has to ask questions repeatedly
    • No one has to justify decisions emotionally
    • The system explains itself

We don’t optimize for personal preference. We optimize for collective understanding and future maintainability.

Everything here follows one principle:

Do not raise unnecessary questions for the next person reading your work.

That next person might be your teammate. Or future you at 3 AM.


2. Branch Naming Convention

Branch names must follow this format:

<prefix>/<short-description>

Allowed prefixes

  • feat/ → New features
  • fix/ → Bug fixes
  • hotfix/ → Critical production fixes
  • docs/ → Documentation-only changes

Examples

  • feat/auth-verification
  • fix/password-reset-token
  • docs/api-guidelines

Why this matters

  • Branch lists should be scannable at a glance
  • Prefixes instantly communicate intent
  • Consistency removes cognitive load

If every branch uses a different word (feature/, new/, stuff/), the system slowly becomes noisy. Noise kills velocity.


3. Commit Message Convention

We follow Conventional Commits.

Format:

<type>(<scope>): <clear, concrete description>

Allowed types

  • feat → New functionality
  • fix → Bug fix
  • docs → Documentation
  • refactor → Code restructure without behavior change
  • test → Tests
  • chore → Tooling / config

Examples

  • feat(auth): add email verification flow
  • fix(auth): prevent reset token reuse
  • docs(readme): add setup instructions

What commit messages are not

  • Not marketing
  • Not self-evaluation
  • Not emotion

Avoid words like:

  • strong
  • robust
  • powerful
  • improved (without context)

A commit message should describe what changed, not how good it feels.

If something is buggy → it’s wrong. If something works → that’s the baseline, not an achievement.


4. Pull Requests

  • A PR should do one logical thing
  • The title should summarize the change
  • The description should answer:
    • What changed?
    • Why was it needed?

No philosophy debates inside PRs. If a rule is violated, it will be requested to change, not discussed.


5. Source of Truth

  • WIP project docs are not the source of truth
  • External standards, official documentation, and established practices take priority

Always question:

  • outdated docs
  • informal assumptions
  • “this is how we’ve been doing it”

Engineering grows by questioning, not by accepting.


6. Ego & Engineering

  • Software is never perfect
  • Everything has limits
  • Everything breaks eventually

That’s exactly why we aim for:

  • clarity over cleverness
  • simplicity over ego
  • consistency over preference

Having strong opinions is good. Letting conventions decide instead of ego is better.


7. Final Note

These conventions are not optional. They exist so we can:

  • move faster
  • argue less
  • build things that last

If something here feels strict, that’s intentional. Discipline is what gives freedom later.

Clean systems scale. Messy ones don’t.

Follow the convention. Save your energy for real problems.

Code of Conduct

Our Pledge

We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.

We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.

Our Standards

Examples of behavior that contributes to a positive environment for our community include:

  • Demonstrating empathy and kindness toward other people
  • Being respectful of differing opinions, viewpoints, and experiences
  • Giving and gracefully accepting constructive feedback
  • Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience
  • Focusing on what is best not just for us as individuals, but for the overall community
  • Using welcoming and inclusive language
  • Being respectful of differing cultural backgrounds and languages
  • Encouraging and supporting new contributors

Examples of unacceptable behavior include:

  • The use of sexualized language or imagery, and sexual attention or advances of any kind
  • Trolling, insulting or derogatory comments, and personal or political attacks
  • Public or private harassment
  • Publishing others’ private information, such as a physical or email address, without their explicit permission
  • Discrimination or harassment based on any protected characteristic
  • Other conduct which could reasonably be considered inappropriate in a professional setting

Language and Cultural Sensitivity

Given Shrutik’s mission to support underrepresented languages and communities:

  • Be respectful of all languages, dialects, and accents
  • Avoid making assumptions about language proficiency or cultural backgrounds
  • Be patient with non-native speakers of any language
  • Celebrate linguistic diversity and cultural differences
  • Provide translations or explanations when using technical terms
  • Be mindful that humor and expressions may not translate across cultures

Voice Data Contribution Guidelines

When contributing voice data or transcriptions:

  • Respect the privacy and consent of all speakers
  • Do not submit recordings without proper consent
  • Be honest about audio quality and transcription accuracy
  • Respect cultural and religious sensitivities in content
  • Follow platform guidelines for appropriate content
  • Report any inappropriate or harmful content

Enforcement Responsibilities

Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.

Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.

Scope

This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.

Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at conduct@shrutik.org. All complaints will be reviewed and investigated promptly and fairly.

All community leaders are obligated to respect the privacy and security of the reporter of any incident.

Enforcement Guidelines

Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:

1. Correction

Community Impact: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.

Consequence: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.

2. Warning

Community Impact: A violation through a single incident or series of actions.

Consequence: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.

3. Temporary Ban

Community Impact: A serious violation of community standards, including sustained inappropriate behavior.

Consequence: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.

4. Permanent Ban

Community Impact: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.

Consequence: A permanent ban from any sort of public interaction within the community.

Reporting Guidelines

If you experience or witness unacceptable behavior, please report it by:

  1. Email: onuronon.dev@gmail.com
  2. Discord: Direct message to moderators
  3. GitHub: Use the report feature or contact maintainers

When reporting, please include:

  • Your contact information
  • Names (usernames, real names) of any individuals involved
  • Your account of what occurred, including any available records (screenshots, logs, etc.)
  • Any additional information that may be helpful

Response Process

  1. Acknowledgment: We will acknowledge receipt of your report within 24 hours
  2. Investigation: We will investigate the matter thoroughly and fairly
  3. Decision: We will make a decision based on our guidelines and communicate it to all parties
  4. Follow-up: We will follow up to ensure the resolution is effective

Appeals Process

If you disagree with a moderation decision:

  1. Send an appeal to appeals@shrutik.org within 30 days
  2. Include your reasoning and any additional information
  3. The appeal will be reviewed by different community leaders
  4. The appeal decision is final

Community Resources

Support Channels

  • Discord Community: https://discord.gg/9hZ9eW8ARk
  • GitHub Discussions: https://github.com/Onuronon-lab/Shrutik/discussions

Recognition

We believe in recognizing positive contributions to our community:

  • Community Champions: Monthly recognition for helpful community members
  • Mentorship Program: Opportunities to guide new contributors
  • Speaking Opportunities: Invitations to represent Shrutik at events
  • Contributor Spotlight: Featured stories of community members

Continuous Improvement

This Code of Conduct is a living document. We regularly review and update it based on:

  • Community feedback and suggestions
  • Evolving best practices in open source communities
  • Lessons learned from enforcement experiences
  • Changes in our community’s needs and composition

To suggest improvements, please:

  1. Open an issue on GitHub with the “code-of-conduct” label
  2. Join discussions in our Discord #community-guidelines channel
  3. Email suggestions to conduct@shrutik.org

Acknowledgments

This Code of Conduct is adapted from the Contributor Covenant, version 2.1, available at https://www.contributor-covenant.org/version/2/1/code_of_conduct.html.

Community Impact Guidelines were inspired by Mozilla’s code of conduct enforcement ladder.

For answers to common questions about this code of conduct, see the FAQ at https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations.

Contact Information

  • General Conduct Questions: onuronon.dev@gmail.com

Remember: We’re all here because we believe in making voice technology more inclusive. Let’s work together to create a welcoming space where everyone can contribute their unique perspectives and talents.

Thank you for helping make Shrutik a welcoming, inclusive community for everyone! 🎤✨

Contributors

Thank you to all the amazing people who have contributed to Shrutik (শ্রুতিক)! This project exists because of the collective effort of developers, linguists, designers, and community members from around the world.

Core Team

Project Founders

  • [Ifrun Kader Ruhin] - Project Creator & Lead Developer
    • GitHub: @ifrunruhin12
    • Role: Architecture, Backend Development, Project Vision

Core Maintainers

  • [Maintainer Name] - Lead Frontend Developer

    • GitHub: @maintainer
    • Role: Frontend Architecture, UI/UX Design
  • [Maintainer Name] - DevOps & Infrastructure

    • GitHub: @devops
    • Role: Deployment, CI/CD, Performance Optimization

💻 Code Contributors

Major Contributors (50+ commits)

  • [Contributor Name] - @username
    • Contributions: Audio processing pipeline, performance optimizations
    • Languages: Python, JavaScript

Regular Contributors (10+ commits)

  • [Contributor Name] - @username

    • Contributions: API development, database design
    • Languages: Python, SQL
  • [Contributor Name] - @username

    • Contributions: Frontend components, accessibility improvements
    • Languages: TypeScript, React

First-Time Contributors

  • [Furqan Ahmed] - @furqanRupom
    • First contribution: Bug fix in audio validation
    • Date: 2025-11-02

Voice Data Contributors

Language Champions

These contributors have made significant voice data contributions in their native languages:

Bengali (বাংলা)

  • [Contributor Name] - 500+ recordings, 1000+ transcriptions
  • [Contributor Name] - 300+ recordings, 800+ transcriptions
  • [Contributor Name] - 200+ recordings, 600+ transcriptions

Quality Reviewers

  • [Reviewer Name] - 2000+ transcription reviews
  • [Reviewer Name] - 1500+ transcription reviews
  • [Reviewer Name] - 1200+ transcription reviews

Documentation Contributors

Documentation Team

  • [Doc Contributor] - @username

    • Contributions: API documentation, user guides
    • Specialty: Technical writing
  • [Doc Contributor] - @username

    • Contributions: Deployment guides, troubleshooting
    • Specialty: DevOps documentation

Translators

  • [Translator Name] - Bengali translation lead
  • [Translator Name] - Hindi translation lead
  • [Translator Name] - Tamil translation lead

Design Contributors

UI/UX Designers

  • [Designer Name] - @username

    • Contributions: User interface design, user experience research
    • Tools: Figma, Adobe XD
  • [Designer Name] - @username

    • Contributions: Logo design, branding, visual identity
    • Tools: Illustrator, Photoshop

Accessibility Experts

  • [A11y Expert] - @username
    • Contributions: Accessibility audits, WCAG compliance
    • Specialty: Screen reader optimization

Research Contributors

Academic Researchers

  • Dr. [Researcher Name] - [University/Institution]

    • Contributions: Voice quality metrics, consensus algorithms
    • Publications: [Link to relevant papers]
  • Prof. [Researcher Name] - [University/Institution]

    • Contributions: Linguistic analysis, dialect classification
    • Expertise: Computational linguistics

Data Scientists

  • [Data Scientist] - @username
    • Contributions: Quality control algorithms, statistical analysis
    • Tools: Python, R, Machine Learning

Discord Moderators

  • [Moderator Name] - Senior Moderator
  • [Moderator Name] - Community Moderator
  • [Moderator Name] - Technical Support Moderator

🏅 Special Recognition

Milestone Achievements

🥇 Gold Contributors (Exceptional Impact)

  • [Contributor Name] - First to reach 1000 voice contributions
  • [Contributor Name] - Implemented critical security features
  • [Contributor Name] - Led successful community outreach campaign

🥈 Silver Contributors (Significant Impact)

  • [Contributor Name] - Major performance optimizations
  • [Contributor Name] - Comprehensive testing framework
  • [Contributor Name] - Multi-language support implementation

🥉 Bronze Contributors (Notable Impact)

  • [Contributor Name] - Bug fixes and stability improvements
  • [Contributor Name] - Documentation improvements
  • [Contributor Name] - Community support and mentoring

Annual Awards (2026)

🏆 Contributor of the Year

[Winner Name] - For outstanding contributions across code, community, and voice data

🌟 Rising Star

[Winner Name] - For exceptional growth and impact as a new contributor

🤝 Community Champion

[Winner Name] - For building bridges between technical and linguistic communities

Technical Excellence

[Winner Name] - For innovative solutions and architectural improvements

📊 Contribution Statistics

Overall Stats (as of 2026)

  • Total Contributors: x+
  • Code Contributors: x
  • Voice Contributors: x+
  • Documentation Contributors: x
  • Countries Represented: x+
  • Languages Supported: x

🎯 Contribution Types

Code Contributions

  • Backend Development: x contributors
  • Frontend Development: x contributors
  • DevOps & Infrastructure: x contributors
  • Testing & QA: x contributors
  • Security: x contributors

Non-Code Contributions

  • Voice Recordings: x contributors
  • Transcriptions: x contributors
  • Quality Reviews: x contributors
  • Documentation: x contributors
  • Translation: x contributors
  • Design: x contributors
  • Community Management: x contributors

🚀 How to Join

Want to see your name here? Here’s how you can contribute:

For Developers

  1. Check our Contributing Guide
  2. Look for issues labeled good first issue
  3. Join our Discord for technical discussions

For Voice Contributors

  1. Visit our platform at onuronon.org
  2. Register and start recording in your native language
  3. Help transcribe and review audio from others

For Designers

  1. Check our design needs in GitHub issues
  2. Share your portfolio and design ideas
  3. Help improve user experience and accessibility

For Linguists & Researchers

  1. Join our research discussions on Discord
  2. Contribute to quality metrics and algorithms
  3. Help with linguistic analysis and validation

🙏 Acknowledgments

Special Thanks

  • Open Source Community - For the amazing tools and libraries we build upon
  • Academic Partners - For research collaboration and validation
  • Early Adopters - For testing and feedback during development
  • Funding Partners - For supporting the project’s growth
  • Language Communities - For trusting us with their voices and stories

Inspiration

This project is inspired by the belief that technology should serve all communities, regardless of the language they speak. We’re grateful to everyone who shares this vision and contributes to making it a reality.

Recognition Requests

If you’ve contributed to Shrutik but don’t see your name here:

  1. Open an issue with the “recognition” label
  2. Email us at onuronon.dev@gmail.com
  3. Message us on Discord

We want to make sure everyone gets the recognition they deserve!

🔄 Updates

This file is updated monthly to recognize new contributors. The next update is scheduled for the first week of each month.

Last Updated: October 2025


Thank you to everyone who makes Shrutik possible! Your contributions, big and small, are building a more inclusive digital future. ✨

“Alone we can do so little; together we can do so much.” - Helen Keller

Troubleshooting

This guide covers common issues and their solutions when working with Shrutik.

Docker Issues

Services Won’t Start

Problem: Docker services fail to start or crash immediately.

Solutions:

# Check logs for all services
docker compose logs -f

# Check logs for a specific service
docker compose logs -f backend
docker compose logs -f postgres
docker compose logs -f redis

# Restart all services
docker compose restart

# Clean restart (removes containers, networks, and volumes)
docker compose down -v --remove-orphans
docker system prune -f  # optional: remove unused Docker resources
docker compose up -d    # start services again

Port Already in Use

Problem: Error messages about ports 3000, 5432, 6379, or 8000 being in use.

Solutions:

# Find processes using ports
sudo lsof -i :8000
sudo lsof -i :3000
sudo lsof -i :5432
sudo lsof -i :6379

# Kill processes using specific ports
sudo lsof -ti:8000 | xargs kill -9
sudo lsof -ti:3000 | xargs kill -9

# Or use netstat
netstat -tulpn | grep :8000

Environment Configuration

Problem: Migration fails due to .env misconfigurations.

Solution:

# Incorrect 
DATABASE_URL=postgresql://postgres:password@localhost:5432/voice_collection

# Correct 
DATABASE_URL=postgresql://postgres:password@postgres:5432/voice_collection

# Incorrect 
REDIS_URL=redis://localhost:6379/0

# Correct 
REDIS_URL=redis://redis:6379/0

Database Migrations Not Applied

Problems:

  • alembic upgrade head was not running or failed.

  • Tables such as users, recordings, etc., are missing.

  • Application may return errors like: relation “users” does not exist.

Solutions:

# Run database migrations
alembic upgrade head

# Verify tables exist
psql -U postgres -d voice_collection -c "\dt"

⚠️ Always run migrations after configuring environment variables and before starting the backend or running tests.

Database Connection Issues

Problem: Backend can’t connect to PostgreSQL database.

Solutions:

# Check database status
docker-compose exec postgres pg_isready -U postgres

# Check database logs
docker compose logs -f postgres

# Reset database and remove containers, volumes, and networks
docker compose down -v --remove-orphans

# Optional: prune unused Docker resources
docker system prune -f

# Start all services
docker compose up -d

# Run database migrations inside the backend container
docker compose exec backend alembic upgrade head

# Or use a custom initialization script if you have one
docker compose exec backend python scripts/init-db.py

Redis Connection Issues

Problem: Backend can’t connect to Redis.

Solutions:

# Test Redis connection
docker-compose exec redis redis-cli ping

# Check Redis logs
docker compose logs -f redis

# Restart Redis
docker compose restart redis

Database Issues

Problem: PostgreSQL connection errors in local development.

Solutions:

# Check PostgreSQL status
sudo systemctl status postgresql

# Start PostgreSQL
sudo systemctl start postgresql

# Create database if missing
createdb voice_collection

# Run migrations
alembic upgrade head

Permission Errors

Problem: File permission errors, especially with uploads directory.

Solutions:

# Fix upload directory permissions
mkdir -p uploads
sudo chown -R $USER:$USER uploads/
chmod -R 755 uploads/

# Fix general project permissions
sudo chown -R $USER:$USER .

Application Issues

Admin User Creation Fails

Problem: Cannot create admin user or login fails.

Solutions:

# Ensure the database is migrated
# Local environment
alembic upgrade head  # see Local Database docs for details.

# Docker environment
docker compose exec backend alembic upgrade head  # see Docker Database docs for details

# Create admin user 
# Local
python scripts/create_admin.py --name "AdminUser" --email admin@example.com

# Docker
docker compose exec backend python scripts/create_admin.py --name "AdminUser" --email admin@example.com

# Check users in database
# Local
psql -U postgres -d voice_collection -c "SELECT * FROM users;"

# Docker
docker compose exec postgres psql -U postgres -d voice_collection -c "SELECT * FROM users;"

File Upload Issues

Problem: Audio file uploads fail or return errors.

Solutions:

  1. Check file size: Ensure files are under 100MB (default limit)
  2. Check file format: Supported formats: .wav, .mp3, .m4a, .flac, .webm
  3. Check permissions: Ensure uploads directory is writable
  4. Check disk space: Ensure sufficient disk space available
# Check upload directory
ls -la uploads/
df -h  # Check disk space

API Errors

Problem: API endpoints return 500 errors or unexpected responses.

Solutions:

# Check backend logs
docker compose logs -f backend

# Check API health
curl http://localhost:8000/health

# Check specific endpoint
curl -X GET http://localhost:8000/api/auth/me \
  -H "Authorization: Bearer YOUR_TOKEN"

Frontend Issues

Frontend Won’t Load

Problem: Frontend shows blank page or connection errors.

Solutions:

# Check frontend logs
docker compose logs -f frontend

# Verify API connection
curl http://localhost:8000/health

# Check environment variables
cat frontend/.env

Build Errors

Problem: Frontend build fails with dependency or compilation errors.

Solutions:

# Clear node modules and reinstall
cd frontend
rm -rf node_modules package-lock.json
npm install

# Clear Next.js cache
rm -rf .next

# Rebuild
npm run build

Debugging Tips

Enable Debug Logging

Add to your .env file:

DEBUG=true
LOG_LEVEL=DEBUG

Check Service Health

# Backend health check (works for both local and Docker)
curl http://localhost:8000/health

# Database connection

# Local PostgreSQL
pg_isready -U postgres -d voice_collection

# Docker PostgreSQL
docker-compose exec postgres pg_isready -U postgres -d voice_collection

# Redis connection

# Local Redis
redis-cli ping

# Docker Redis
docker-compose exec redis redis-cli ping

Monitor Resource Usage

# Docker resource usage
docker stats

# System resource usage
htop
df -h
free -h

Common Local Issues

Port already in use:

# Find and kill process using port 8000
lsof -ti:8000 | xargs kill -9

Database connection issues:

# Check PostgreSQL status
sudo systemctl status postgresql

# Restart PostgreSQL
sudo systemctl restart postgresql

Create database if missing

createdb voice_collection

Run migrations

alembic upgrade head

Redis connection issues:

# Check Redis status
redis-cli ping

# Start Redis
redis-server

Getting Help

If you’re still experiencing issues:

  1. Search existing issues: Check GitHub Issues
  2. Create detailed issue: Include:
    • Operating system and version
    • Docker/Docker Compose versions
    • Complete error messages
    • Steps to reproduce
    • Relevant log outputs
  3. Join community: Discord Server
  4. Check documentation: Review relevant sections in this documentation

Frequently Asked Questions

General Questions

What is Shrutik?

Shrutik (শ্রুতিক) is an open-source voice data collection platform designed to help communities build high-quality voice datasets in their native languages. The name “Shrutik” means “listener” in Bengali, reflecting our mission to listen to and preserve diverse voices.

What languages does Shrutik support?

Shrutik is designed to support any language. Currently, it comes pre-configured with Bengali (Bangla), but administrators can easily add support for additional languages through the admin interface.

Is Shrutik free to use?

Yes! Shrutik is free and open-source under the Creative Commons BY-NC-SA 4.0 License. You can use it for learning, education, and non-commercial projects. Commercial use requires separate permission.

Technical Questions

What are the system requirements?

For Docker (Recommended):

  • Docker 20.10+
  • Docker Compose 2.0+
  • 4GB RAM minimum, 8GB recommended
  • 10GB free disk space

For Local Development:

  • Python 3.11+
  • Node.js 18+
  • PostgreSQL 13+
  • Redis 6+
  • 8GB RAM recommended

How do I backup my data?

Database Backup:

# Create database backup
docker-compose exec postgres pg_dump -U postgres voice_collection > backup.sql

# Restore from backup
docker-compose exec -T postgres psql -U postgres voice_collection < backup.sql

File Uploads Backup:

# Backup uploads directory
tar -czf uploads-backup.tar.gz uploads/

Usage Questions

How do I add a new language?

  1. Log in as an admin user
  2. Go to the admin dashboard
  3. Navigate to “Languages” section
  4. Click “Add Language”
  5. Enter language name and ISO code
  6. Add scripts/texts for that language

What audio formats are supported?

Shrutik supports these audio formats:

  • WAV (recommended for quality)
  • MP3
  • M4A
  • FLAC
  • WebM

What’s the maximum file size for uploads?

The default maximum file size is 100MB. This can be configured in the environment variables:

MAX_FILE_SIZE=104857600  # 100MB in bytes

How does the transcription consensus system work?

Shrutik uses a multi-contributor consensus system:

  1. Multiple users transcribe the same audio
  2. The system compares transcriptions
  3. When transcriptions match (or are very similar), they’re marked as “consensus”
  4. High-consensus transcriptions are considered high-quality data

Can I export my data?

Yes! Administrators can export data through the admin API:

  • Audio files and metadata
  • Transcriptions and consensus data
  • User statistics and contributions
  • Quality metrics

Development Questions

How do I contribute to Shrutik?

  1. Fork the repository on GitHub
  2. Set up your development environment
  3. Make your changes
  4. Write tests for new features
  5. Submit a pull request

See our Contributing Guide for detailed instructions.

How do I report bugs?

  1. Check if the issue already exists in GitHub Issues
  2. If not, create a new issue with:
    • Clear description of the problem
    • Steps to reproduce
    • Expected vs actual behavior
    • System information
    • Error logs

How do I request new features?

Create a feature request in GitHub Issues with:

  • Clear description of the feature
  • Use case and benefits
  • Proposed implementation (if you have ideas)

Can I customize the UI?

Yes! The frontend is built with Next.js and React. You can:

  • Modify the existing components
  • Add new pages and features
  • Customize styling and themes
  • Add support for new languages in the UI

Privacy and Security

How is user data protected?

Shrutik implements several security measures:

  • Password hashing with bcrypt
  • JWT token-based authentication
  • Role-based access control
  • Input validation and sanitization
  • CORS protection
  • Rate limiting

Can I run Shrutik offline?

Yes! Shrutik can run completely offline once deployed. All processing happens locally on your infrastructure.

How do I configure HTTPS?

For production deployments, configure HTTPS using:

  • Reverse proxy (nginx, Apache)
  • Load balancer with SSL termination
  • Cloud provider SSL certificates

Example nginx configuration is available in our deployment guides.

Community and Support

Where can I get help?

  1. Documentation: This documentation site
  2. GitHub Issues: For bugs and feature requests
  3. Discord: Join our community
  4. Email: Contact the maintainers

How can I stay updated?

  • Watch the GitHub repository for releases
  • Join our Discord community
  • Follow our social media channels
  • Subscribe to our newsletter (coming soon)

Can I hire someone to help with deployment?

While Shrutik is open-source and free, you can:

  • Hire freelance developers familiar with the stack
  • Contact the core team for consulting services
  • Engage with the community for paid support

Troubleshooting

The application won’t start

See our detailed Troubleshooting Guide for common issues and solutions.

I forgot my admin password

Reset your admin password:

# Using Docker
docker-compose exec backend python create_admin.py

# Local development
python scripts/create_admin.py --name "AdminUser" --email admin@example.com

This will create a new admin user or update the existing one.

The database is corrupted

If your database becomes corrupted:

  1. Stop all services
  2. Restore from backup (if available)
  3. Or reset the database:
# Stop and remove all containers, volumes, and networks
docker compose down -v --remove-orphans

# Optional: prune unused Docker resources
docker system prune -f

# Start services (build images if necessary)
docker compose up -d --build

# Wait a few seconds for Postgres and Redis to be ready

# Run database migrations
docker compose exec backend alembic upgrade head

# Or use a custom initialization script
docker compose exec backend python scripts/init-db.py

# Create Admin user
docker compose exec backend python create_admin.py

Still have questions?

If your question isn’t answered here:

  1. Check our Troubleshooting Guide
  2. Search GitHub Issues
  3. Join our Discord community
  4. Create a new issue on GitHub

We’re here to help!

Page not available yet

This page will be available in a future update.

You can continue navigating using the sidebar or search.