Modifying an existing sort order with offset-value codes

Abstract: Sorting in databases can exploit an existing sort order if it is related to the desired sort order. In some cases, an external merge sort can become multiple internal sorts; in other cases, sorting can become merging, i.e., merge sort without run generation, e.g., without an initial quicksort phase. If the input includes offset-value codes, they readily map to offset-value codes for the output, enabling efficient comparisons in merge steps and in subsequent query operations.
There are many cases in which the existing sort order and existing offset-value codes can help creating a desired sort order. While today’s database systems implement only the simplest cases, i.e., changing the sort order from 𝐴, 𝐵 to 𝐴 or from 𝐴 to 𝐴, 𝐵, interesting orderings and offset-value codes permit substantial savings in more complex cases such as changing an existing sort order of 𝐴, 𝐵,𝐶, 𝐷 to 𝐴,𝐶, 𝐵, 𝐷. Our research introduces the required techniques and measures achievable savings.

Bio: Goetz Graefe published surveys on database concurrency control, logging and recovery, indexing, sorting, query execution, and query optimization. Since graduating from the University of Wisconsin–Madison as an MS in 1984 and a PhD in 1987 with a thesis on extensible database query optimization in EXODUS, he invented the Exchange operator for parallel query execution as well as the Volcano and Cascades frameworks for database query optimization, watched as academia and industry widely copied and adopted these designs, received an ACM SIGMOD Test of Time award and the inaugural IEEE ICDE Influential Paper award for these inventions, served as the query processing architect of Microsoft SQL Server in its formative years, researched robust query performance, concurrency control, logging, and instant recovery as an HP Fellow at HP Labs, shared in an ACM Software Systems award, and received the 2017 ACM SIGMOD Edgar F. Codd Innovations award. He is now the Technical Lead of the F1 Query system widely used within Google.