Indore (Madhya Pradesh): Indian Institute of Management Indore researcher Prof Prabin Kumar Panigrahi has co-authored a new study proposing an optimised real-time sign language detection model designed to improve inclusive communication in enterprise and metaverse environments.
Published in the journal Enterprise Information Systems, the study titled Inclusive Enterprise Communication and Training: An Optimised YOLO11 Real-Time Sign Language Detection Model for Inclusive Communication explores how advanced computer vision and machine learning can enhance accessibility for hearing-impaired users in digital workplaces.
The research addresses a growing challenge in enterprise communication systems: enabling accurate, low-latency sign language recognition in immersive virtual environments such as the metaverse. While earlier sign language detection systems demonstrated strong gesture recognition capabilities, many struggled with real-time performance, computational efficiency, and accuracy under dynamic conditions. Existing YOLO-based models, though effective in object detection, were not specifically optimised for subtle hand gesture variations required in sign language interpretation.
To bridge this gap, the researchers developed a three-layer enterprise communication framework powered by an optimised YOLO11 model. The first layer integrates with enterprise systems such as ERP and CRM platforms, virtual office environments, and collaboration tools including Microsoft Teams and Zoom to capture live video streams.
The second layer processes these video frames using the YOLO11-based sign language recognition engine, enabling real-time detection and classification of American Sign Language gestures. The final layer converts recognised gestures into subtitles, workflow commands, meeting interactions, and training feedback that can be integrated into enterprise databases and collaborative applications.
According to the study, the proposed framework combines genetic algorithm-based hyper parameter optimisation with YOLO11 model training to improve both computational efficiency and detection accuracy. Parameters such as learning rate, momentum, weight decay, and data augmentation were optimised to support deployment in low-latency, 6G-enabled metaverse environments.
The model was trained using the American Sign Language Letters Dataset and achieved strong performance metrics, including a precision score of 0.925, recall of 0.865, mAP50 of 0.9368, and mAP50 95 of 0.8065. Researchers said the system outperformed baseline YOLO11 models and earlier YOLO versions while maintaining lower computational requirements with just 2.59 million parameters, making it suitable for lightweight enterprise and metaverse devices.
The paper emphasises that sign language recognition in enterprise environments is not merely a technological innovation but an accessibility necessity. In hybrid and remote workplaces, communication barriers can significantly impact participation, collaboration, and training outcomes for hearing-impaired employees. By embedding gesture recognition capabilities directly into enterprise information systems, organisations could create more inclusive digital workspaces and training ecosystems.
The researchers noted, however, that the current model primarily focuses on isolated sign language letters rather than continuous signing sequences. Future work will focus on integrating temporal modules, conducting real-time deployment testing, and evaluating latency in full-scale 6G-enabled environments.