Silent Voice - Neural Communication for the Paralyzed

Model Description

Silent Voice is a multimodal AI model that translates biosignals into natural language, designed to assist those who cannot speak.

Fine-tuned on Gemma 3N with 100,000+ medical-focused examples, it interprets:

👁️ Eye tracking patterns (sustained gaze, blinks, movement sequences)
💪 EMG signals (minimal muscle activity from jaw, face, limbs)
😊 Facial expressions (pain, happiness, distress, concentration with YOLO-ready annotations)
🤝 Multimodal combinations (eye + EMG + facial for robust communication)
🏥 Medical contexts (ALS, stroke, cerebral palsy patient conditions)
⚡ Urgency detection (automatic scaling from low to critical situations)

Github Repo: https://github.com/0xroyce/silent-voice

Video Demonstration

Vimeo: Fine-tuned Gemma 3n in Real-time Recognition

Intended Research Use

Primary Users

ALS/Motor Neurone Disease patients
Severe Cerebral Palsy patients
Locked-in Syndrome patients
Stroke survivors with aphasia
Spinal cord injury patients

Healthcare Applications

Hospital ICU communication systems
Home care assistive devices
Rehabilitation centers
Hospice care facilities

How to Use

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("0xroyce/silent-voice-multimodal")
tokenizer = AutoTokenizer.from_pretrained("0xroyce/silent-voice-multimodal")

# Example: Emergency situation
prompt = """Biosignal: Sustained gaze at top_left for 2s + Facial expression: severe pain
Context: ICU, ALS patient (advanced stage)
Urgency: critical"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
# Output: "I'm in severe pain! Please check my medication immediately!"

Biosignal Input Examples

Eye Tracking

# Sustained gaze patterns
prompt = """Biosignal: Sustained gaze at top_left for 2s
Context: home, stroke patient
Urgency: high"""
# → "I need help urgently!"

prompt = """Biosignal: Double blink detected
Context: bedroom, cerebral_palsy patient
Urgency: low"""
# → "Yes, that's correct"

prompt = """Biosignal: Rapid blinking for 3s
Context: hospital_bed, ALS patient
Urgency: critical"""
# → "I'm in severe pain! Please get help immediately!"

EMG Signals

# Facial muscles
prompt = """Biosignal: EMG signal: jaw clench detected
Context: home, stroke patient
Urgency: low"""
# → "Yes, I understand"

prompt = """Biosignal: EMG signal: eyebrow raise detected
Context: therapy_session, cerebral_palsy patient
Urgency: low"""
# → "I have a question about this exercise"

# Minimal movements
prompt = """Biosignal: EMG signal: finger twitch detected
Context: home, ALS patient (early stage)
Urgency: low"""
# → "Please select this option"

Complex Multimodal

# Emergency
prompt = """Biosignal: Eye tracking: rapid blinking + EMG: jaw clench + Facial expression: severe pain
Context: hospital_bed, ALS patient (advanced stage)
Urgency: critical"""
# → "I'm in severe pain! Please get help immediately! Check my medication schedule!"

# Daily care
prompt = """Biosignal: Sustained gaze at window + Facial expression: moderate happiness
Context: living_room, cerebral_palsy patient
Urgency: low"""  
# → "I'd like the curtains opened please. The sunshine would be nice."

# Complex communication
prompt = """Biosignal: Eye tracking: look at water bottle + EMG: weak jaw movement + Facial expression: concentration
Context: therapy_session, stroke patient
Urgency: medium"""
# → "I'm thirsty and would like some water during our session"

Advanced Usage with Context

# Contextual communication
context_prompt = """Biosignal: Sustained gaze at kitchen + Facial expression: moderate hunger
Context: home, cerebral_palsy patient
Urgency: low"""
# → "Good morning! I'm feeling hungry and would appreciate breakfast when convenient."

# Emergency situations
emergency_prompt = """Biosignal: Eye tracking: rapid blinking + EMG: all muscles tensed + Facial expression: severe distress
Context: home_alone, ALS patient (advanced stage)
Urgency: critical"""
# → "EMERGENCY! I need immediate medical assistance! Please call for help NOW!"

# Pain management
pain_prompt = """Biosignal: Facial expression: moderate pain + EMG: jaw clench
Context: hospital_bed, stroke patient
Urgency: high"""
# → "I'm experiencing significant pain. Could you please check my medication schedule?"

Training Details

Dataset

+100,000+ examples with structured format:
- Biosignal patterns (eye tracking, EMG, facial expressions)
- Medical contexts (ALS, stroke, cerebral palsy patients)
- Urgency levels (low, medium, high, critical)
- Visual data with YOLO-ready annotations
- Multimodal combinations (35% eye, 25% EMG, 15% facial, 25% multimodal)
- Real emergency and daily living scenarios

Limitations and Biases

Current Limitations

Response Time: ~2-3 seconds on GPU (may be too slow for some emergency situations)
Context Window: Limited to recent interactions
Signal Quality: Assumes clean biosignal input

Ethical Considerations

Medical Device: NOT FDA approved - for research/experimental use only
Privacy: Biosignals are sensitive medical data
Consent: Ensure proper consent for data collection
False Positives: May misinterpret signals - always verify critical communications

Real-World Impact

This model aims to enable:

🗣️ Independent communication for paralyzed individuals
🚨 Emergency alerts when patients cannot call for help
❤️ Emotional expression for those unable to speak
🏥 Better patient care through clear communication
👨‍👩‍👧‍👦 Family connections maintaining relationships

Technical Specifications

Model Type: Gemma 3N Multimodal
Input: Structured biosignal descriptions with context
Output: Natural language responses
Visual Data: YOLO-ready bounding boxes and facial landmarks
Inference: FP16/INT4 quantization supported
Deployment: Edge devices, hospitals, home care

Visual Data Integration

The model includes comprehensive visual annotations:

Bounding boxes: Normalized coordinates for YOLO training
Facial landmarks: 68-point facial feature detection
Expression classification: Pain, happiness, distress, concentration
Micro-expressions: 40-500ms duration subtle movements
Medical indicators: Facial asymmetry, drooping detection

Example Conversations

Daily Routine

Input: Biosignal: EMG signal: slight muscle tension + Facial expression: mild discomfort
Context: bedroom, ALS patient (moderate stage)
Urgency: medium
Output: Good morning! I'm getting a bit stiff lying here. Could you help me change position when you have a moment?

Input: Biosignal: Sustained gaze at food + EMG: weak jaw movement
Context: dining_room, stroke patient
Urgency: low
Output: I'm ready to eat now. Could we start with something soft?

Medical Needs

Input: Biosignal: Sustained gaze at medication + EMG: jaw clench
Context: bedroom, cerebral_palsy patient
Urgency: medium
Output: It's time for my medication. Please check the schedule.

Input: Biosignal: Facial expression: moderate pain + EMG: abdominal tension
Context: hospital_bed, stroke patient
Urgency: high
Output: I'm experiencing stomach discomfort. It might be related to my medication.

Emotional Expression

Input: Biosignal: Sustained gaze at person + Facial expression: moderate happiness
Context: with_family, ALS patient (advanced stage)
Urgency: low
Output: I love you. Thank you for being here with me.

Input: Biosignal: Facial expression: tears + EMG: relaxed baseline
Context: with_family, stroke patient
Urgency: low
Output: I'm feeling emotional but okay. These are happy tears.

Citation

If you use this model in research, please cite:

@misc{silent-voice-2025,
  title={Silent Voice: Neural Translation of Biosignals for Paralyzed Patients},
  author=0xroyce,
  year={2025},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/0xroyce/silent-voice-multimodal}}
}

Acknowledgments

Gemma team at Google for the base model
Unsloth for efficient fine-tuning
ARASAAC and Mulberry for AAC symbols
The ALS, CP, and locked-in syndrome communities for inspiration

Future Directions

🌍 Multilingual support
🧠 Direct brain-computer interface integration
🤖 Real-time biosignal hardware integration

Note: This model is a research prototype and has been fine tuned using Unsloth. For medical use, please consult with healthcare professionals and follow proper medical device protocols.

Remember: Every person deserves to be heard. This model is a step toward that goal. 💙

0xroyce
/

silent-voice-multimodal