Multimodal AI solution eases smart voice design in embedded vision systems

Update: August 4, 2021

Renesas Electronics Corp. and Syntiant Corp. have jointly developed a voice-controlled multimodal artificial intelligence (AI) solution that enables low-power contactless operation for image processing in vision AI-based IoT and edge systems. Applications for the multimodal AI solution include self-checkout machines, security cameras, video conference systems, and smart appliances such as robotic cleaning devices.

Delivering voice and image processing capabilities, the solution combines the Renesas RZ/V Series vision AI microprocessor unit (MPU) and the low-power multimodal Syntiant NDP120 Neural Decision Processor. The joint solution features always-on functionality with quick voice-triggered activation from standby mode to perform object recognition, facial recognition, and other vision-based tasks.

(Source: Renesas Electronics)

One application example cited is user-defined voice cues that drive activation and system operation, while vision AI recognition tracks operator behavior and controls operation or issues a warning when suspicious actions are detected.

(Image: Renesas Electronics)

“The multimodal architecture makes it easier to create contactless user experiences for vision AI-based systems,” said Renesas. “Using a dedicated, power-efficient chip for voice recognition reduces standby power consumption while speeding up system development because it is possible to develop software independently of the vision AI functionality.”

The Renesas RZ/V Series MPU for vision AI incorporates the company’s dynamically reconfigurable processor-AI (DRP-AI) accelerator and combines high-precision AI inference with high power efficiency. This power performance eliminates the need for thermal management devices such as heat sinks or cooling fans, reducing the bill of materials (BOM) cost and making it possible to integrate vision AI into a wide range of embedded applications, said Renesas.

(Image: Renesas Electronics)

The Syntiant NDP120 chip incorporates advanced AI capabilities that can be used to implement high-precision, hands-free voice functions, including speaker recognition, keyword detection, multiple wake words, and local command recognition. The NDP120, packaged with the Syntiant Core 2 neural network inference engine, can also run multiple applications simultaneously while minimizing power consumption to 1 mW battery power.

The voice-controlled multimodal AI solution uses multiple mutually compatible devices from Renesas’ portfolio and is part of the company’s Winning Combinations reference designs that feature analog, power, and embedded processing product combinations. The reference design for the multimodal AI solution is available now, including circuit diagrams and BOM lists.

about Renesas Electronics America