up:: π€ Artificial Intelligence
type:: #π
status:: #π/π
tags:: #on/ai
topics:: Artificial Super Intelligence (ASI), Artificial General Intelligence (AGI)
links:: Chat GPT
Multimodal AI
Multimodal AI combines various
Deep learning algorithms have trained the underlying models on a single data source. For example, an NLP model is trained on the text data source, while a computer vision model is trained on an image dataset. Similarly, an acoustic model uses wake word detection and noise cancellation parameters to handle speech. The type of ML employed here is a single modal AI as the model outcome is mapped to one source of data typeβtext, images, and speech.
On the other hand, Multimodal AI combines visual and speech modalities to create scenarios that match human perception. DALL-E from OpenAI is a recent example of multimodal AI-generated images from texts. Googleβs multimodal AI β multitask unified model (MUM) β helps enhance the user search experience since the search results are shortlisted by considering contextual information mined from 75 different languages. Another example is NVIDIAβs GauGAN2 model, which generates photo-realistic images from text inputs. It uses text-to-image generation to develop the finest photorealistic art.
up:: π Home