Alibaba Unveils Two Open-Sourced AI Models that Understand Images

Coinspeaker
Alibaba Unveils Two Open-Sourced AI Models that Understand Images

Chinese technology behemoth Alibaba Group is propelling the boundaries of artificial intelligence (AI) forward by introducing two innovative open-source large vision language models (LVLM). The company said the AI tools Qwen-VL and Qwen-VL-Chat can understand images and respond to complex queries better than its other creations.

The company’s cloud unit, Alibaba Cloud, developed and trained both AI language models. According to reports, the firm said that Qwen-VL was designed to be the sophisticated offspring of its 7-billion-parameter model, Tongyi Qianwen. This dynamic model exhibits the ability to process images and text prompts seamlessly. The versatility spans from addressing open-ended queries linked to diverse images to crafting captivating image captions.

Qwen-VL-Chat, on the other hand, was designed to tackle more intricate interactions. The AI model, powered by advanced alignment techniques, boasts an impressive array of talents. From composing poetry and narratives grounded in input images to condensing the content of multiple pictures and even solving complex mathematical questions embedded within images.

Alibaba Exploring AI Capabilities

These two technologies are poised to redefine the landscape of AI capabilities, offering a remarkable fusion of image comprehension and text interaction in English and Chinese.

The company said the Qwen-VL model was trained using pictures and text information. During the training, Alibaba found that it can handle larger images (448×448 resolution) compared to similar models that can only work with small-sized images (224×224 resolution).

The AI technology also showed impressive abilities in tasks involving pictures and language during training. Alibaba disclosed that the AI tool could describe photos without prior information, answer questions about pictures, and even detect objects in images.

The second model, Qwen-VL-Chat, also showcased its skills in conversations about pictures. According to the company, the AI technology performed exceptionally well in Chinese and English, based on a benchmark test set by Alibaba Cloud.

Like the first model, Qwen-VL-Chat outperformed other AI tools in understanding and discussing the relationship between words and images. The test included a wide range of over 300 photographs, 800 questions, and 27 different categories.

Commitment to Open-Source Technologies

Alibaba revealed its intention to provide the two AI models as open-source solutions to the global community. Once the preparations are concluded, these tools will be freely available to anyone worldwide. The move allows the development of AI applications without the need for extensive system training, resulting in reduced expenses.

Earlier this month, the company made waves for open-sourcing its other AI applications, Qwen-7B and Gwen-7B-Chat within a month of unveiling. The move attracted many developers to the company, recording over 400,000 downloads combined.

Alibaba Unveils Two Open-Sourced AI Models that Understand Images

Source link