Last week, Google launched its new AI, or rather its new big language model, dubbed Gemini. The Gemini 1.0 model is available in three versions: Gemini Nano is supposed to be best suited for tasks on a specific device, Gemini Pro is supposed to be the best option for a wider range of tasks, and Gemini Ultra is Google’s largest language model that will handle the most complex tasks you can give it.
Something that Google was keen to highlight at the launch of Gemini Ultra was that the language model outperformed the latest version of OpenAI’s GPT-4 in 30 of the 32 most commonly used tests to measure the capabilities of language models. The tests cover everything from reading comprehension and various math questions to writing code for Python and image analysis. In some of the tests, the difference between the two AI models was only a few tenths of a percentage point, while in others it was up to ten percentage points.
Perhaps Gemini Ultra’s most impressive achievement, however, is that it is the first language model to beat human experts in massive multitask language understanding (MMLU) tests, where Gemini Ultra and experts were faced with problem-solving tasks in 57 different fields, ranging from math and physics to medicine, law, and ethics. Gemini Ultra managed to achieve a score of 90.0 percent, while the human expert it was compared to “only” scored 89.8 percent.
The launch of Gemini will be gradual. Last week, Gemini Pro became available to the public, as Google’s chatbot Bard started using a modified version of the language model, and Gemini Nano is built into a number of different functions on Google’s Pixel 8 Pro smartphone. Gemini Ultra isn’t ready for the public yet. Google says it’s still undergoing security testing and is only being shared with a handful of developers and partners, as well as experts in AI liability and security. However, the idea is to make Gemini Ultra available to the public via Bard Advanced when it launches early next year.
Microsoft has now countered Google’s claims that Gemini Ultra can beat GPT-4 by having GPT-4 run the same tests again, but this time with slightly modified prompts or inputs. Microsoft researchers published research in November on something they called Medprompt, a mix of different strategies for feeding prompts into the language model to get better results. You may have noticed how the answers you get out of ChatGPT or the images you get out of Bing’s image creator are slightly different when you change the wording a bit. That concept, but much more advanced, is the idea behind Medprompt.
Microsoft
Microsoft
Microsoft
By using Medprompt, Microsoft managed to make GPT-4 perform better than Gemini Ultra on a number of the 30 tests Google previously highlighted, including the MMLU test, where GPT-4 with Medprompt inputs managed to get a score of 90.10 per cent. Which language model will dominate in the future remains to be seen. The battle for the AI throne is far from over.
This article was translated from Swedish to English and originally appeared on pcforalla.se.
Author: Kristian Kask
Recent stories by Kristian Kask:
Chat GPT now has a voiceGoogle Drive videos look better nowGoogle Chrome test feature hides your IP address from websites