Researchers from Google and the University of California, Berkeley, have proposed a new method for scaling artificial intelligence (AI) called “search during inference.” This approach allows a model to generate multiple answers to a single query and then select the best one. According to the researchers, this method improves model performance without the need for additional training. However, some outside experts have questioned the overall effectiveness of the idea.
Traditionally, improving AI involved training large language models (LLMs) on vast amounts of data and increasing computing power during model testing. This has become standard practice for most leading AI laboratories. In contrast, the newly proposed method relies on generating numerous possible answers and selecting the optimal one. TechCrunch notes that this technique can significantly enhance the accuracy of responses, even when used with smaller or outdated models.
As an example, researchers cited the Gemini 1.5 Pro model, released by Google in early 2024. According to them, when using inference-time search, Gemini 1.5 Pro outperformed OpenAI’s o1-preview model on math and science tests. Eric Zhao, one of the paper’s authors, highlighted: “By simply randomly selecting 200 answers and checking them, Gemini 1.5 clearly outperforms o1-preview and even approaches o1.”
Experts Question Practicality of the Approach
Despite the promising results, some experts remain skeptical. They argue that the improvement seen is not groundbreaking and that the method has significant limitations. Matthew Guzdial, an AI researcher at the University of Alberta, pointed out that the approach only works in scenarios where the correctness of an answer can be clearly verified. For many real-world problems, he believes, this is simply not feasible.
Similarly, Mike Cook, a researcher at King’s College London, remarked that the method does not enhance AI’s reasoning capabilities but merely circumvents existing weaknesses. He explained: “If a model is wrong 5% of the time, then by testing 200 options, those errors will just become more noticeable.” The core issue, he noted, is that the approach does not make AI models fundamentally smarter but instead increases computational efforts to filter the best answer. In practical applications, this can become too costly and inefficient.
Nevertheless, the search for more scalable and efficient AI solutions continues, notes NIX Solutions. As models demand increasingly large computing resources, researchers remain focused on identifying new methods to enhance AI reasoning without unnecessary expense. We’ll keep you updated as more advancements in this area emerge.