资讯

Phi-4 and an rStar-Math paper suggest that compact, specialized models can provide powerful alternatives to the industry’s largest systems.
Driven by new technology called OpenAI o1, the chatbot can test various strategies and try to identify mistakes as it tackles complex tasks.
Telling AI model to “take a deep breath” causes math scores to soar in study DeepMind used AI models to optimize their own prompts, with surprising results.
On the MATH benchmark of competition level math word problems, for example, Meta's model posted a score of 73.8, compared to GPT-4o's 76.6 and Claude 3.5 Sonnet's 71.1.
AIME uses other AI models to evaluate a model’s performance, while MATH is a collection of word problems. QwQ-32B-Preview can solve logic puzzles and answer reasonably challenging math questions ...
A pilot professional-development program funded by the National Science Foundation introduces elementary school teachers to a method of advanced problem-solving.