Silent Voice - Neural Communication for the Paralyzed

Model Description

Silent Voice is a multimodal AI model that translates biosignals into natural language, designed to assist those who cannot speak.

Fine-tuned on Gemma 3N with 100,000+ medical-focused examples, it interprets:

  • 👁️ Eye tracking patterns (sustained gaze, blinks, movement sequences)
  • 💪 EMG signals (minimal muscle activity from jaw, face, limbs)
  • 😊 Facial expressions (pain, happiness, distress, concentration with YOLO-ready annotations)
  • 🤝 Multimodal combinations (eye + EMG + facial for robust communication)
  • 🏥 Medical contexts (ALS, stroke, cerebral palsy patient conditions)
  • Urgency detection (automatic scaling from low to critical situations)

Github Repo: https://github.com/0xroyce/silent-voice

Video Demonstration

Silent Voice Demo

Vimeo: Fine-tuned Gemma 3n in Real-time Recognition

Intended Research Use

Primary Users

  • ALS/Motor Neurone Disease patients
  • Severe Cerebral Palsy patients
  • Locked-in Syndrome patients
  • Stroke survivors with aphasia
  • Spinal cord injury patients

Healthcare Applications

  • Hospital ICU communication systems
  • Home care assistive devices
  • Rehabilitation centers
  • Hospice care facilities

How to Use

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("0xroyce/silent-voice-multimodal")
tokenizer = AutoTokenizer.from_pretrained("0xroyce/silent-voice-multimodal")

# Example: Emergency situation
prompt = """Biosignal: Sustained gaze at top_left for 2s + Facial expression: severe pain
Context: ICU, ALS patient (advanced stage)
Urgency: critical"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
# Output: "I'm in severe pain! Please check my medication immediately!"

Biosignal Input Examples

Eye Tracking

# Sustained gaze patterns
prompt = """Biosignal: Sustained gaze at top_left for 2s
Context: home, stroke patient
Urgency: high"""
# → "I need help urgently!"

prompt = """Biosignal: Double blink detected
Context: bedroom, cerebral_palsy patient
Urgency: low"""
# → "Yes, that's correct"

prompt = """Biosignal: Rapid blinking for 3s
Context: hospital_bed, ALS patient
Urgency: critical"""
# → "I'm in severe pain! Please get help immediately!"

EMG Signals

# Facial muscles
prompt = """Biosignal: EMG signal: jaw clench detected
Context: home, stroke patient
Urgency: low"""
# → "Yes, I understand"

prompt = """Biosignal: EMG signal: eyebrow raise detected
Context: therapy_session, cerebral_palsy patient
Urgency: low"""
# → "I have a question about this exercise"

# Minimal movements
prompt = """Biosignal: EMG signal: finger twitch detected
Context: home, ALS patient (early stage)
Urgency: low"""
# → "Please select this option"

Complex Multimodal

# Emergency
prompt = """Biosignal: Eye tracking: rapid blinking + EMG: jaw clench + Facial expression: severe pain
Context: hospital_bed, ALS patient (advanced stage)
Urgency: critical"""
# → "I'm in severe pain! Please get help immediately! Check my medication schedule!"

# Daily care
prompt = """Biosignal: Sustained gaze at window + Facial expression: moderate happiness
Context: living_room, cerebral_palsy patient
Urgency: low"""  
# → "I'd like the curtains opened please. The sunshine would be nice."

# Complex communication
prompt = """Biosignal: Eye tracking: look at water bottle + EMG: weak jaw movement + Facial expression: concentration
Context: therapy_session, stroke patient
Urgency: medium"""
# → "I'm thirsty and would like some water during our session"

Advanced Usage with Context

# Contextual communication
context_prompt = """Biosignal: Sustained gaze at kitchen + Facial expression: moderate hunger
Context: home, cerebral_palsy patient
Urgency: low"""
# → "Good morning! I'm feeling hungry and would appreciate breakfast when convenient."

# Emergency situations
emergency_prompt = """Biosignal: Eye tracking: rapid blinking + EMG: all muscles tensed + Facial expression: severe distress
Context: home_alone, ALS patient (advanced stage)
Urgency: critical"""
# → "EMERGENCY! I need immediate medical assistance! Please call for help NOW!"

# Pain management
pain_prompt = """Biosignal: Facial expression: moderate pain + EMG: jaw clench
Context: hospital_bed, stroke patient
Urgency: high"""
# → "I'm experiencing significant pain. Could you please check my medication schedule?"

Training Details

Dataset

  • +100,000+ examples with structured format:
    • Biosignal patterns (eye tracking, EMG, facial expressions)
    • Medical contexts (ALS, stroke, cerebral palsy patients)
    • Urgency levels (low, medium, high, critical)
    • Visual data with YOLO-ready annotations
    • Multimodal combinations (35% eye, 25% EMG, 15% facial, 25% multimodal)
    • Real emergency and daily living scenarios

image/png

Limitations and Biases

Current Limitations

  1. Response Time: ~2-3 seconds on GPU (may be too slow for some emergency situations)
  2. Context Window: Limited to recent interactions
  3. Signal Quality: Assumes clean biosignal input

Ethical Considerations

  • Medical Device: NOT FDA approved - for research/experimental use only
  • Privacy: Biosignals are sensitive medical data
  • Consent: Ensure proper consent for data collection
  • False Positives: May misinterpret signals - always verify critical communications

Real-World Impact

This model aims to enable:

  • 🗣️ Independent communication for paralyzed individuals
  • 🚨 Emergency alerts when patients cannot call for help
  • ❤️ Emotional expression for those unable to speak
  • 🏥 Better patient care through clear communication
  • 👨‍👩‍👧‍👦 Family connections maintaining relationships

Technical Specifications

  • Model Type: Gemma 3N Multimodal
  • Input: Structured biosignal descriptions with context
  • Output: Natural language responses
  • Visual Data: YOLO-ready bounding boxes and facial landmarks
  • Inference: FP16/INT4 quantization supported
  • Deployment: Edge devices, hospitals, home care

Visual Data Integration

The model includes comprehensive visual annotations:

  • Bounding boxes: Normalized coordinates for YOLO training
  • Facial landmarks: 68-point facial feature detection
  • Expression classification: Pain, happiness, distress, concentration
  • Micro-expressions: 40-500ms duration subtle movements
  • Medical indicators: Facial asymmetry, drooping detection

Example Conversations

Daily Routine

Input: Biosignal: EMG signal: slight muscle tension + Facial expression: mild discomfort
Context: bedroom, ALS patient (moderate stage)
Urgency: medium
Output: Good morning! I'm getting a bit stiff lying here. Could you help me change position when you have a moment?

Input: Biosignal: Sustained gaze at food + EMG: weak jaw movement
Context: dining_room, stroke patient
Urgency: low
Output: I'm ready to eat now. Could we start with something soft?

Medical Needs

Input: Biosignal: Sustained gaze at medication + EMG: jaw clench
Context: bedroom, cerebral_palsy patient
Urgency: medium
Output: It's time for my medication. Please check the schedule.

Input: Biosignal: Facial expression: moderate pain + EMG: abdominal tension
Context: hospital_bed, stroke patient
Urgency: high
Output: I'm experiencing stomach discomfort. It might be related to my medication.

Emotional Expression

Input: Biosignal: Sustained gaze at person + Facial expression: moderate happiness
Context: with_family, ALS patient (advanced stage)
Urgency: low
Output: I love you. Thank you for being here with me.

Input: Biosignal: Facial expression: tears + EMG: relaxed baseline
Context: with_family, stroke patient
Urgency: low
Output: I'm feeling emotional but okay. These are happy tears.

Citation

If you use this model in research, please cite:

@misc{silent-voice-2025,
  title={Silent Voice: Neural Translation of Biosignals for Paralyzed Patients},
  author=0xroyce,
  year={2025},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/0xroyce/silent-voice-multimodal}}
}

Acknowledgments

  • Gemma team at Google for the base model
  • Unsloth for efficient fine-tuning
  • ARASAAC and Mulberry for AAC symbols
  • The ALS, CP, and locked-in syndrome communities for inspiration

Future Directions

  • 🌍 Multilingual support
  • 🧠 Direct brain-computer interface integration
  • 🤖 Real-time biosignal hardware integration

Note: This model is a research prototype and has been fine tuned using Unsloth. For medical use, please consult with healthcare professionals and follow proper medical device protocols.

Remember: Every person deserves to be heard. This model is a step toward that goal. 💙

Downloads last month
254
Safetensors
Model size
7.85B params
Tensor type
F32
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 0xroyce/silent-voice-multimodal

Quantized
(2)
this model