Silent Voice - Neural Communication for the Paralyzed
Model Description
Silent Voice is a multimodal AI model that translates biosignals into natural language, designed to assist those who cannot speak.
Fine-tuned on Gemma 3N with 100,000+ medical-focused examples, it interprets:
- 👁️ Eye tracking patterns (sustained gaze, blinks, movement sequences)
- 💪 EMG signals (minimal muscle activity from jaw, face, limbs)
- 😊 Facial expressions (pain, happiness, distress, concentration with YOLO-ready annotations)
- 🤝 Multimodal combinations (eye + EMG + facial for robust communication)
- 🏥 Medical contexts (ALS, stroke, cerebral palsy patient conditions)
- ⚡ Urgency detection (automatic scaling from low to critical situations)
Github Repo: https://github.com/0xroyce/silent-voice
Video Demonstration
Vimeo: Fine-tuned Gemma 3n in Real-time Recognition
Intended Research Use
Primary Users
- ALS/Motor Neurone Disease patients
- Severe Cerebral Palsy patients
- Locked-in Syndrome patients
- Stroke survivors with aphasia
- Spinal cord injury patients
Healthcare Applications
- Hospital ICU communication systems
- Home care assistive devices
- Rehabilitation centers
- Hospice care facilities
How to Use
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("0xroyce/silent-voice-multimodal")
tokenizer = AutoTokenizer.from_pretrained("0xroyce/silent-voice-multimodal")
# Example: Emergency situation
prompt = """Biosignal: Sustained gaze at top_left for 2s + Facial expression: severe pain
Context: ICU, ALS patient (advanced stage)
Urgency: critical"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
# Output: "I'm in severe pain! Please check my medication immediately!"
Biosignal Input Examples
Eye Tracking
# Sustained gaze patterns
prompt = """Biosignal: Sustained gaze at top_left for 2s
Context: home, stroke patient
Urgency: high"""
# → "I need help urgently!"
prompt = """Biosignal: Double blink detected
Context: bedroom, cerebral_palsy patient
Urgency: low"""
# → "Yes, that's correct"
prompt = """Biosignal: Rapid blinking for 3s
Context: hospital_bed, ALS patient
Urgency: critical"""
# → "I'm in severe pain! Please get help immediately!"
EMG Signals
# Facial muscles
prompt = """Biosignal: EMG signal: jaw clench detected
Context: home, stroke patient
Urgency: low"""
# → "Yes, I understand"
prompt = """Biosignal: EMG signal: eyebrow raise detected
Context: therapy_session, cerebral_palsy patient
Urgency: low"""
# → "I have a question about this exercise"
# Minimal movements
prompt = """Biosignal: EMG signal: finger twitch detected
Context: home, ALS patient (early stage)
Urgency: low"""
# → "Please select this option"
Complex Multimodal
# Emergency
prompt = """Biosignal: Eye tracking: rapid blinking + EMG: jaw clench + Facial expression: severe pain
Context: hospital_bed, ALS patient (advanced stage)
Urgency: critical"""
# → "I'm in severe pain! Please get help immediately! Check my medication schedule!"
# Daily care
prompt = """Biosignal: Sustained gaze at window + Facial expression: moderate happiness
Context: living_room, cerebral_palsy patient
Urgency: low"""
# → "I'd like the curtains opened please. The sunshine would be nice."
# Complex communication
prompt = """Biosignal: Eye tracking: look at water bottle + EMG: weak jaw movement + Facial expression: concentration
Context: therapy_session, stroke patient
Urgency: medium"""
# → "I'm thirsty and would like some water during our session"
Advanced Usage with Context
# Contextual communication
context_prompt = """Biosignal: Sustained gaze at kitchen + Facial expression: moderate hunger
Context: home, cerebral_palsy patient
Urgency: low"""
# → "Good morning! I'm feeling hungry and would appreciate breakfast when convenient."
# Emergency situations
emergency_prompt = """Biosignal: Eye tracking: rapid blinking + EMG: all muscles tensed + Facial expression: severe distress
Context: home_alone, ALS patient (advanced stage)
Urgency: critical"""
# → "EMERGENCY! I need immediate medical assistance! Please call for help NOW!"
# Pain management
pain_prompt = """Biosignal: Facial expression: moderate pain + EMG: jaw clench
Context: hospital_bed, stroke patient
Urgency: high"""
# → "I'm experiencing significant pain. Could you please check my medication schedule?"
Training Details
Dataset
- +100,000+ examples with structured format:
- Biosignal patterns (eye tracking, EMG, facial expressions)
- Medical contexts (ALS, stroke, cerebral palsy patients)
- Urgency levels (low, medium, high, critical)
- Visual data with YOLO-ready annotations
- Multimodal combinations (35% eye, 25% EMG, 15% facial, 25% multimodal)
- Real emergency and daily living scenarios
Limitations and Biases
Current Limitations
- Response Time: ~2-3 seconds on GPU (may be too slow for some emergency situations)
- Context Window: Limited to recent interactions
- Signal Quality: Assumes clean biosignal input
Ethical Considerations
- Medical Device: NOT FDA approved - for research/experimental use only
- Privacy: Biosignals are sensitive medical data
- Consent: Ensure proper consent for data collection
- False Positives: May misinterpret signals - always verify critical communications
Real-World Impact
This model aims to enable:
- 🗣️ Independent communication for paralyzed individuals
- 🚨 Emergency alerts when patients cannot call for help
- ❤️ Emotional expression for those unable to speak
- 🏥 Better patient care through clear communication
- 👨👩👧👦 Family connections maintaining relationships
Technical Specifications
- Model Type: Gemma 3N Multimodal
- Input: Structured biosignal descriptions with context
- Output: Natural language responses
- Visual Data: YOLO-ready bounding boxes and facial landmarks
- Inference: FP16/INT4 quantization supported
- Deployment: Edge devices, hospitals, home care
Visual Data Integration
The model includes comprehensive visual annotations:
- Bounding boxes: Normalized coordinates for YOLO training
- Facial landmarks: 68-point facial feature detection
- Expression classification: Pain, happiness, distress, concentration
- Micro-expressions: 40-500ms duration subtle movements
- Medical indicators: Facial asymmetry, drooping detection
Example Conversations
Daily Routine
Input: Biosignal: EMG signal: slight muscle tension + Facial expression: mild discomfort
Context: bedroom, ALS patient (moderate stage)
Urgency: medium
Output: Good morning! I'm getting a bit stiff lying here. Could you help me change position when you have a moment?
Input: Biosignal: Sustained gaze at food + EMG: weak jaw movement
Context: dining_room, stroke patient
Urgency: low
Output: I'm ready to eat now. Could we start with something soft?
Medical Needs
Input: Biosignal: Sustained gaze at medication + EMG: jaw clench
Context: bedroom, cerebral_palsy patient
Urgency: medium
Output: It's time for my medication. Please check the schedule.
Input: Biosignal: Facial expression: moderate pain + EMG: abdominal tension
Context: hospital_bed, stroke patient
Urgency: high
Output: I'm experiencing stomach discomfort. It might be related to my medication.
Emotional Expression
Input: Biosignal: Sustained gaze at person + Facial expression: moderate happiness
Context: with_family, ALS patient (advanced stage)
Urgency: low
Output: I love you. Thank you for being here with me.
Input: Biosignal: Facial expression: tears + EMG: relaxed baseline
Context: with_family, stroke patient
Urgency: low
Output: I'm feeling emotional but okay. These are happy tears.
Citation
If you use this model in research, please cite:
@misc{silent-voice-2025,
title={Silent Voice: Neural Translation of Biosignals for Paralyzed Patients},
author=0xroyce,
year={2025},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/0xroyce/silent-voice-multimodal}}
}
Acknowledgments
- Gemma team at Google for the base model
- Unsloth for efficient fine-tuning
- ARASAAC and Mulberry for AAC symbols
- The ALS, CP, and locked-in syndrome communities for inspiration
Future Directions
- 🌍 Multilingual support
- 🧠 Direct brain-computer interface integration
- 🤖 Real-time biosignal hardware integration
Note: This model is a research prototype and has been fine tuned using Unsloth. For medical use, please consult with healthcare professionals and follow proper medical device protocols.
Remember: Every person deserves to be heard. This model is a step toward that goal. 💙
- Downloads last month
- 254
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for 0xroyce/silent-voice-multimodal
Base model
google/gemma-3n-E4B
Finetuned
google/gemma-3n-E4B-it
Finetuned
google/gemma-3n-E2B-it
Finetuned
unsloth/gemma-3n-E2B-it