Voice-controlled AI solution combines advanced vision and voice technologies

Renesas Electronics Corporation has announced joint development of a voice-controlled multimodal AI solution that facilitates low-power contactless operation for image processing in vision AI-based IoT and edge systems, including self-checkout machines, security cameras, and video conference systems, and smart appliances such as robotic cleaning devices.

The new solution combines the company’s RZ/V Series vision AI MPU and the low-power multimodal, multi-feature Syntiant NDP120 Neural Decision Processor to provide advanced voice and image processing capabilities. The joint solution offers always-on functionality with quick voice-triggered activation from standby mode to implement object recognition, facial recognition, and other vision-based tasks that are crucial functions in security cameras and other systems. For example, while user-defined voice cues drive activation and system operation, vision AI recognition tracks operator behaviour and controls operation or issues a warning when suspicious actions are identified.

The multimodal architecture makes it simpler to produce contactless user experiences for vision AI-based systems. Utilising a dedicated, power-efficient chip for voice recognition decreases standby power consumption while expediting system development because it is feasible to develop software independently of the vision AI functionality.

“We anticipate that demand for multimodal systems that use multiple streams of input information – both image and voice – will increase moving forward as a way to improve both ease of use and safety,” said Hiroto Nitta, senior vice president and head of SoC Business in the IoT and Infrastructure Business Unit at Renesas. “Through the collaboration between Renesas, a leader in low-power image AI technology, and Syntiant, a leader in voice AI technology, we will accelerate the adoption of low-power, ultra-small smart voice AI technology in embedded systems and deliver new combined solutions to customers globally.”

“Voice-based user interfaces will make it possible for customers to deliver new user experiences that bring the next generation of innovative ideas from concept to reality, said Syntiant CEO, Kurt Busch. “We’ve already shipped more than 15 million of our deep learning NDPs globally to enable always-on voice in a wide variety of consumer and industrial IoT applications. Our collaboration with Renesas delivers a powerful, low-power voice and image solution that is certain to accelerate traction among a global customer base in a variety of devices and use cases.”

The new voice-controlled multimodal AI solution employs multiple mutually compatible devices from the wider Renesas portfolio to provide customers with an elevated prototyping platform for faster time to market and reduced risk. The new solution is part of the company’s Winning Combinations, which offer compelling analog, power, and embedded processing product combinations that aid customers to accelerate their designs and get to market faster.