1
New Theories / Re: How close are we from building a virtual universe?
« on: Today at 03:48:43 »
Grok Vision - First Multimodal Model from XAi
Grok 1.5 Vision Shows STUNNING Performance | Beats GPT-4, Claude and Gemini 1.5
Quote
X.ai just made an announcement about Grok-1.5 Vision. Its their new multimodal model that can understand images and can write code based on flow diagrams just like GPT-4
Grok 1.5 Vision Shows STUNNING Performance | Beats GPT-4, Claude and Gemini 1.5
Quote
GROK:The universe is a dynamic system, thus an accurate virtual universe must also be dynamic, i.e. change with time to reflect the real universe. The accurate and dynamic virtual universe must have the ability to understand information they get from their sensors and other inputs. RealWorldQA benchmark is a way forward.
https://x.ai/blog/grok-1.5v
Introducing Grok-1.5V, our first-generation multimodal model. In addition to its strong text capabilities, Grok can now process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs. Grok-1.5V will be available soon to our early testers and existing Grok users.
Capabilities
Grok-1.5V is competitive with existing frontier multimodal models in a number of domains, ranging from multi-disciplinary reasoning to understanding documents, science diagrams, charts, screenshots, and photographs. We are particularly excited about Grok?s capabilities in understanding our physical world. Grok outperforms its peers in our new RealWorldQA benchmark that measures real-world spatial understanding. For all datasets below, we evaluate Grok in a zero-shot setting without chain-of-thought prompting.