Новая модель Qwen3-VL-32B показала результаты, сравнимые с Gemini 2.5 Flash
Daily Papers - Hugging Face
new
Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
Subscribe
90% for humans, yet surpass the widely used GPT-4o (59%). The best performing open-source model Qwen3-VL-32B achieves similar accuracies as Gemini 2.5 Flash (64%). We also show that MMRB2 performance strongly correlates with downstream task success using Best-of-N sampling and conduct an in-depth analysis that shows key areas to improve the reward models goi
Читать источник →