Banner Banner

SemBench: A Benchmark for Semantic Query Processing Engines

Jiale Lao
Andreas Zimmerer
Olga Ovcharenko
Tianji Cong
Matthew Russo
Gerardo Vitagliano
Michael Cochez
Fatma Özcan
Gautam Gupta
Thibaud Hottelier
H. V. Jagadish
Kris Kissel
Sebastian Schelter
Andreas Kipf
Immanuel Trummer

November 03, 2025

We present a benchmark targeting a novel class of systems: semantic query processing engines. Those systems rely inherently on generative and reasoning capabilities of state-of-the-art large language models (LLMs). They extend SQL with semantic operators, configured by natural language instructions, that are evaluated via LLMs and enable users to perform various operations on multimodal data.
Our benchmark introduces diversity across three key dimensions: scenarios, modalities, and operators. Included are scenarios ranging from movie review analysis to medical question-answering. Within these scenarios, we cover different data modalities, including images, audio, and text. Finally, the queries involve a diverse set of operators, including semantic filters, joins, mappings, ranking, and classification operators.
We evaluated our benchmark on three academic systems (LOTUS, Palimpzest, and ThalamusDB) and one industrial system, Google BigQuery. Although these results reflect a snapshot of systems under continuous development, our study offers crucial insights into their current strengths and weaknesses, illuminating promising directions for future research.