ZeroBench: An Impossible* Visual Benchmark for Contemporary Multimodal Models

casey2 7 months ago

I would consider any system that solves similar problems to be AGI. What I suspect will happen is that this benchmark will saturate long before any such system exists.

imtringued 7 months ago

Basically that's the reason why they built this benchmark. By posing challenging, seemingly unfair, benchmark questions, the system will be forced to at least have some generalisation ability that it previously did not posess.
When I look at the benchmark questions, they all look like they are exploiting the fact that LLMs suck at composition of subtasks. They might be able to solve each individual problem, but not the combination of them.
Solving that would be a far cry away from AGI.
drakenot 7 months ago

Because of benchmark leakage / contamination?