VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
We introduce VisIT-Bench (Visual InsTruction Benchmark), a benchmark for evaluation of instruction-following vision-language models for real-world use. Our starting point is curating 70 'instruction families' that we envision instruction tuned vision-language models should be able to address. Extending beyond evaluations like...
![](https://media.kbin.social/media/cache/resolve/entry_thumb/ab/77/ab77af3fa25e0e9002d8c752407598c670984ea6c990494842c2c5a9549d92cb.png)