AI Medical Voicebot

March 2025

A multimodal AI-powered voicebot for healthcare that combines speech recognition, image analysis, and TTS for real-time medical insights.

Project Demo

Technologies Used

PythonGradioWhisper (GROQ)LLaMA-3.2 VisionElevenLabsGTTSdotenv

Project Links

Project Overview

This project features a voice-activated AI medical assistant that listens to voice input, transcribes it using Whisper-large-v3 via GROQ API, analyzes uploaded medical images with LLaMA-3.2 Vision, generates short professional diagnoses, and speaks the result using ElevenLabs or GTTS. Built in Python with a Gradio UI, it blends audio, vision, and NLP to create a seamless diagnostic experience.

Key Features

Voice Input & Transcription

Real-time transcription using Whisper-large-v3 via GROQ API.

Image-Based Diagnosis

Upload and analyze medical images using a powerful vision model.

Concise AI Insights

Generates short, professional responses tailored for healthcare.

Text-to-Speech Output

Converts AI responses into natural voice with ElevenLabs or GTTS.

Challenges

  • Combining multimodal AI (voice, vision, text) into a real-time flow
  • Ensuring quick and accurate transcriptions and image analysis
  • Integrating various APIs smoothly into a single UI

Key Learnings

  • Built a multimodal pipeline with real-time AI inference
  • Handled audio/image data and coordinated multiple services
  • Enhanced understanding of medical AI safety and clarity in generation

Project Gallery

AI Medical Voicebot screenshot 1
1 / 4
AI Medical Voicebot screenshot 2
2 / 4
AI Medical Voicebot screenshot 3
3 / 4
AI Medical Voicebot screenshot 4
4 / 4

Related Projects

You might also be interested in these projects.