This post is also available in: heעברית (Hebrew)

Technology for speech and language recognition continues to evolve rapidly, and is already being used in advanced devices, such as Amazon’s Alexa and Apple’s Siri. As dialog artificial intelligence (AI) systems develop, adding emotional intelligence is an important milestone. Using this method, the system could also recognize the emotional state of its users, in addition to understanding their language, and provide a more empathetic response that will enhance user experience and strengthen the bond between man and machine. 

The multimodal sentiment analysis is a group of methods formulated for an artificial intelligence dialogue system with the capability to recognize sentiment. Automated analysis of the speaker’s psychological state is possible using these methods, which analyze voice color, facial expressions, and posture – all of which are essential for human-focused artificial intelligence applications. As a result, we may be one step closer to developing an artificial intelligence system with advanced capabilities that able to analyze and understand the emotions of the person using it and produce an appropriate response. 

Today, most emotional assessment methods analyze only observable data and do not take into account additional information, such as physiological signals. Researchers at the Advanced Institute of Science and Technology in Japan and the Institute for Scientific and Industrial Research at the University of Osaka, USA, have combined physiological signals with multimodal sentiment analysis to better understand emotions in humans. Humans can hide emotions on an external level, but their biological process, such as heart rate, can reveal their true state. 

In order to determine the level of enjoyment of users during artificial intelligence conversations, the team analyzed approximately 2,500 sections of artificial intelligence dialogue with 26 participants. They used a data set called “Hazumi1911” that combines speech recognition, voice color sensors, facial expression and posture recognition, and physiological response recognition. Biological responses of the participants proved to be more valuable than those obtained through voice and face recognition. The AI managed to achieve a better assessment, according to, by combining language information with biological signals. The AI’s performance became comparable to that of a human by using language information with biological signals information. 

It can be concluded that the identification of physiological signals in humans may pave the way for an advanced dialogical artificial intelligence system, which can enhance the connection between man and machine.