Articles, Papers, Podcasts, Webcasts

Query Optimization

Techniques
Classic
Cost-based
Magic sets
Move-around
Parallel
Plans
Primer
Similarity
Summary

DB2
Statistics

Oracle
Real world

SQL Server
T-SQL

Sponsor Links

Query optimization is an important skill for SQL developers and database administrators (DBAs). In order to improve the performance of SQL queries, developers and DBAs need to understand the query optimizer and the techniques it uses to select an access path and prepare a query execution plan. Query tuning involves a knowledge of techniques such as cost-based and heuristic-based optimizers, plus the tools an SQL platform provides for explaining a query execution plan.

Featured Links

Issues in Real-World Query Optimization and Processing
Håkan Jakobsson (Oracle).

Notes from a guest lecture at Stanford.

Dr. Daniela Florescu discusses query optimization in a video interview (running time 9:05). Video and MP3 / podcast formats.

Generic Database Cost Models for Hierarchical Memory Systems
Stefan Manegold, Peter Boncz, Martin L. Kersten

Estimating query cost for cascading cache memory and main-memory databases is different than estimating disk i/o to access indexes and data pages. This VLDB paper explores the issues.

From the abstract: In this article, we propose a generic technique to create accurate cost functions for database operations. We identify a few basic memory access patterns and provide cost functions that estimate their access costs for each level of the memory hierarchy. The cost functions are parameterized to accommodate various hardware characteristics appropriately. Combining the basic patterns, we can describe the memory access patterns of database operations. The cost functions of database operations can automatically be derived by combining the basic patterns’ cost functions accordingly.

Cost-Based Optimization for Magic: Algebra and Implementation
Praveen Seshadri et al

Abstract: Magic sets rewriting is a well-known optimization heuristic for complex decision-support queries. There can be many variants of this rewriting even for a single query, which differ greatly in execution performance. We propose cost-based techniques for selecting an efficient variant from the many choices.
Our first contribution is a practical scheme that models magic sets rewriting as a special join method that can be added to any cost-based optimizer....
Our second contribution is a formal algebraic model of magic sets rewriting, based on an extension of the multiset relational algebra, which cleanly defines the search space and can be used in a rule-based optimizer.

Getting Started with Oracle Query Optimization
This useful primer uses examples based on a university database.

Tuning Microsoft SQL Server queries is an excerpt from Transact-SQL Programming by Kevin Kline, Andrew Zanevsky, Lee Gould.

An Overview of Query Optimization in Relational Systems by Surajit Chaudhuri of Microsoft Research discusses classic optimization techniques.

Plan Selection Based on Query Clustering
Antara Ghosh, Jignashu Parikh, Vibhuti Sengar, Jayant Haritsa (Indian Institute of Science)

This is a VLDB 2002 presentation about query plan caching and recycling. One retrieval technique is to use exact matches. Another is to use similarity searching, as implemented by PLASTIC (PLAN Selection Through Incremental Clustering).

Query Optimization in Oracle9i
This white paper is a detailed examination of Oracle query optimization.

User-Optimizer Communication using Abstract Plans in Sybase ASE

SQL Query Optimization
This document summarizes techniques from Readings in Database Systems, 3rd Edition (edited by Stonebraker and Hellerstein)

A TPC-D Model for Database Query Optimization in Cascades

This is an interesting bit of research funded by the NSF and DARPA.

Abstract: This work describes a model (called Model D) for optimizing the TPC-D queries developed under the Cascades Optimizer Framework. Cascades is an extensible framework for building rule-based, top-down database query optimizers.

Model D is limited to a subset of SQL sufficient for representing the TPC-D queries, with a very simple physical model (no parallelism or buffer management), a limited number of physical operators and a limited search space. The 17 TPC-D queries were hand parsed into a logical algebra query tree and submitted to the optimizer.

Using DB2 Statistics for optimization (IBM article).

Optimization of SQL Queries for Parallel Machines
Waqar Hasan

This is a doctoral thesis often cited in research about parallel query processing.

Recent Advances in Query Optimization
S. Sudarshan, IIT

This is an informative tutorial that discusses query rewriting, parameterized query optimization, data cube queries, materialized views, query caching and other optimization techniques.

Expressing and Optimizing Similarity-based Queries in SQL discusses an approach (similarity search) that's also at the heart of the WHIRL search engine from AT&T Research.

A Formal Perspective on the View Selection Problem
Rada Chirkova , Alon Halevy , Dan Suciu

Abstract: The view selection problem is to choose a set of views to materialize over a database schema, such that the cost of evaluating a set of workload queries is minimized and such that the views fit into a pre-specified storage constraint. The two main applications of the view selection problem are materializing views in a database to speed up query processing, and selecting views to materialize in a data warehouse to answer decision support queries. In addition, view selection is a core problem for intelligent data placement over a wide-area network for data integration applications and data management for ubiquitous computing. We describe several fundamental results concerning the view selection problem. We consider the problem for views and workloads that consist of equality-selection, project and join queries, and show that the complexity of the problem depends crucially on the quality of the estimates that a query optimizer has on the size of the views it is considering to materialize. When a query optimizer has good estimates of the sizes of the views, we show that an optimal choice of views may involve a number of views that is exponential in the size of the database schema. On the other hand, when an optimizer uses standard estimation heuristics, we show that the number of necessary views and the expression size of each view are polynomially bounded.

Query Optimization by Predicate Move-Around
Alon Y. Levy, Inderpal Singh Mumick, Yehoshua Sagiv

Abstract: A new type of optimization for SQL queries, called predicate move-around, is introduced. It is shown how this optimization considerably improves the efficiency of evaluating SQL queries, especially queries that have query graphs with a large number of query blocks (which is a typical situation when queries are defined in terms of multiple views and sub-queries). Predicate move-around is a generalization of and has many advantages over the traditional predicate pushdown. One key advantage is that predicate move-around precedes pushdown by pulling predicates up the query graph. As a result, predicates that appear in the query in one part of the graph can be moved around the graph and applied also in other parts of graph. Moreover, predicate move-around optimization can move a wider class of predicates in a wider class of queries as compared to the standard predicate-pushdown techniques. In addition to the usual comparison and arithmetic predicates, other predicates that can be moved around are the EXISTS and NOT EXISTS clauses, the EXCEPT clause, and functional dependencies. The proposed optimization can also move predicates through aggregation and across query blocks (in the query graph) that cannot be merged into one block. Moreover, the method can also infer new predicates when existing predicates are moved through aggregation or when certain functional dependencies are known to hold. Finally, the predicate move-around algorithm is easy to implement on top of existing query optimizers.

Application of a Hardware Synthesis Technique for Database Query Optimization
Vivek Komaragiri, Mitchell A. Thornton (Mississippi State University), Rolf Drechsler (Siemens AG)

Abstract: The size of a typical modern database is on the order of hundreds of gigabytes (GB) and the need for developing systems for processing such huge amounts of data has emerged. A highly efficient information retrieval method is required to make the process quicker and easier. This paper focuses on this aspect and presents a method to optimize an SQL query for efficient information retrieval. It is shown that an SQL query can be optimized using AND/OR graphs.