Saturday, July 28, 2012

Physical data distribution

Tuning application queries is one of the biggest challenges for DBAs supporting production environments. The lack of knowledge about the application data model makes difficult the tuning. I remember when I worked as a development DBA – based on my knowledge about the application design I was able to identify whether the access path was optimal or not just looking at explain plans. It was easy to identify how a particular query should run.

 

Without application knowledge, improvements may be obtained by considering the table sizes and the number of rows that can be eliminated as early as possible in the query based on the join and filter conditions. When tables are accessed in the expected order, rows are eliminated correctly generating no more I/O operations (physical or logical) than the necessary. It is pretty common to see problematic situations created by just few SQL statements with non-optimal execution plans generating so many and unnecessary physical I/O in such way that the overall performance for the database is degraded.

 

Normally tuning OLTP queries is less complicated than tuning queries running on data warehouse systems, as the queries use to be smaller and simpler. Last week I was working on a query that is used to show offers to customers contacting the call center. Most part of the time (99%) the query runs very fast, in less than 100 milliseconds. But some executions take minutes, raising the average execution time for the query to 20 seconds – I am mentioning the average time just to give a better idea how extremely opposite the execution times are.

 

The execution time is related to the number of rows accessed by the query – fast executions access few rows, and the slow executions access some thousands rows. The query joins 7 tables, but only 3 are really relevant: one because it is the most selective table (it is the driven table) and the other two because they hold most records accessed by the query, being the top contributors for the high execution times. This is the simplified query that I am going to use to show how unordered physical rows (based on join predicates) can impact the response time due to the high number of I/O operations:

 

SELECT /*+ ALL_ROWS */

       ... -- 1 indexed column from T1, 5 columns from T2, and 14 columns from T7

  FROM T1,

       T2,

       T7

 WHERE T1.UID = :B1

   AND T2.PER_ID = T1.ROW_ID

   AND T7.ID = T2.ROW_ID

   AND T7.FLAG = 'Y'

 

Predicate

Rows

 

access(T1.UID= '4850')

1

access(T2.PER_ID=T1.ROW_ID)

4,973

access(T7. ID=T2.ROW_ID)

4,893

filter(T7.FLAG='Y')

3

access(T3.ROW_ID= T7.ROW_ID)

3

 

The query was taking 41 seconds.

 

I/O Statistics

 

Consistent gets

19,281

Physical reads

7,585

 

The following table shows the number of blocks that contains any row accessed by the query. It shows the number of blocks before and after reorganizing the physical order of the rows in tables T2 and T7.

 

Table

Rows

Before (blocks)

After (blocks)

 

T2

4,973

4,426

75

T7

4,893

120

23

 

The reorganization was pretty simple, as I was working only with records related to a sample case instead of reorganizing the entire table:

 

§   Created a temporary table with the relevant records;

§   Deleted the records from the current table;

§   Inserted the records back using a select ordered by the predicate column (PER_ID for T2 and ID for T7).

 

After reorganizing the tables T2 and T7 the query was taking 4.05 seconds.

 

I/O Statistics

 

Consistent gets

11,357

Physical reads

3,133

 

Analyzing the path access it is possible to reduce a little bit more the number of I/O operations. Oracle is using an index on column ID to access table T7 when joining it with T2. The column FLAG used as a filter condition is not part of the index, so oracle needs to access the table in order to check if the FLAG value satisfies the filter condition or not (T7.FLAG='Y').  Adding the column FLAG to the index will allow oracle to check the condition without accessing the table. After adding the FLAG column to the index this is the elapsed time and statistics for the query:

 

Elapsed time: 2.44 seconds

 

I/O Statistics

 

Consistent gets

10,081

Physical reads

2,819

 

Keeping the physical order of rows based on a specific logical order can be a nightmare. When the application inserts data through batch or bulk loads it may be simple to keep the physical order of rows based on a logical order (index). But normally data is inserted at different period of times, and without any order, result of online processing. It should be necessary to schedule table reorganizations on a regular basis to keep the physical order close to the logical order – I support some databases with tables measured in hundreds of gigabytes, with active data (not historical), and accessed 24x7x365 – reorganization is not feasible.

 

Concerning other access paths (for example, T2 was reorganized by PER_ID, but it has several other indexes used to satisfy different queries), the reorganization approach might decrease the table access efficiency thru other indexes. In this specific case of table T2, where the data was spread over thousands of blocks, the reorganization concentrated the rows in just few hundreds, improving all access paths.

 

Besides, this is a RAC database (for sake of sanity I ran all tests on the same node, and yes, I used the Test environment :o), therefore by using the reorganization approach there is a good chance to increase the interconnect traffic, as the rows will be grouped into fewer (hot) blocks.

 

Bottom line is there is no magic - bad design or a design that does not reflect easily the business requirements will cost money to fix, either for reengineering the application, or for acquiring powerful hardware.

No comments:

Post a Comment