Mistral Announces Pixtral 12B Multimodal AI Model with ‘Computer Vision’ Feature


Mistral on Wednesday released its first multimodal artificial intelligence (AI) model, the Pixtral 12B. The AI ​​firm, known for its open-source large language models (LLM), has also made the latest AI models available on GitHub and Hugging Face for users to download and test. Specifically, despite being multimodal, Pixtral can only process images and answer questions about them using computer vision technology. Two special encoders have been added for this functionality. It cannot generate images like stable diffusion models or MidJourney’s generative adversarial networks (GANs).

Mistral releases Pixtral 12b

Gaining a reputation for minimal announcements, the official account of Mistral on X (formerly known as Twitter) released the AI ​​model in a post by sharing its magnet link. The total file size of Pixtral 12B is 24GB, and it will require an NPU-capable PC or a PC with a powerful GPU to run the model.

Pixtral 12B comes with 12 billion parameters and is built using the company’s existing Nemo 12B AI model. Mistral Highlights Users will also need a Gaussian Error Linear Unit (GeLU) as the vision adapter and 2D Rotary Position Embedding (RoPE) as the vision encoder.

Specifically, users can upload image files or URLs to Pixtral 12B and it should be able to answer questions about the image such as identifying objects, counting the number of objects, and sharing additional information. Since it is built on Nemo, this model will be able to perform all common text-based tasks as well.

A Reddit user posted an image of the Pixtral 12B’s benchmarking scores, and it appears that the LLM outperforms the Cloud-3 Haiku and Phi-3 Vision in multimodel capabilities on the ChartQA bench. It outperformed both rival AI models on the Massive Multitask Language Understanding (MMLU) bench for multimodal knowledge and reasoning.

The Mistral AI model can be fine-tuned and used under the Apache 2.0 license, TechCrunch reports, citing a company spokesperson. This means that the output of the model can be used for personal or commercial use without any restrictions. Additionally, Sophia Yang, head of developer relations at Mistral, clarified in a post that the Pixtral 12b will soon be available on Le Chat and Le platforms.

For now, users can directly download AI models using the magnet link provided by the company. Alternatively, the model weight is also hosted on the Hugging Face and GitHub listing.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *