Access Path Scientific Analysis Part III

Now that we have an understanding of how we can influence access paths using hints and session parameter changes, let’s continue our discussion by reviewing the various types of indexes as well as indexing strategies that affect Oracle access path selection. We’ll complete this series next week when we use all of the information we have learned to perform our own scientific analysis on Oracle optimization.

Introduction to Oracle Indexes
Generally, the fastest way to access Oracle data is with an index. The Oracle database contains several different indexing types that are designed to provide complementary performance functionality.

Cardinality vs Selectivity
You’ll hear a lot of technicians use the terms “cardinality” and “selectivity”.   You will also notice that a lot of people use the terms interchangeable (me included).   They aren’t interchangeable.  I’ll have to reread this bog to make sure I don’t!    Cardinality is the number of distinct values for a given column in a table.  A table with good cardinality would be a table that has many unique rows.  The Oracle 11G manual defines cardinality as “The ratio of distinct values to the number of table rows. A column with only two distinct values in a million-row table would have low cardinality.”   The 11G manual defines selectivity as “In a query, the measure of how many rows from a row set pass a predicate test”.   If the selectivity of your predicate is restrictive (a few number of rows are returned based on what you are searching for) it is said to have “good selectivity”.    So you could have skewed data that makes the same query have either high, or low, selectivity – based on the column value that you are searching for.  The query that searches for the value “FOOT” in the last name column (and “FOOT” occurs 10 times in the column being searched on), would have better selectivity than a query that searches for the last name of “SMITH” which occurs a thousand times in that column.

While standard B-tree indexes are most effective for columns containing a high number of different values (high cardinality), bitmapped indexes are most appropriate for columns with a relatively limited number (low cardinality) of different values. The low cardinality statement above comes with several caveats. Instead of spending this entire blog on the bitmap index/low selectivity issue, please turn to Julian Dyke’s presentation titled “Bitmap Index Internals”. One of my favorite blogs on bitmap indexes was written by Mark Rittman. Oracle also provides function-based indexes to allow index access using SQL that contains column manipulations in the WHERE clause. Administrators supporting large data stores use partitioned indexes to decompose large index structures into smaller, more manageable pieces called index partitions. Starting with the 8i release, Oracle places index data in the separate index partitions based on the index’s partitioning key.

Before we begin our reviewing some of the more popular Oracle index types, let me provide you with a few thoughts on indexes in general.

How Many Indexes Can I Build?
This subject has always been a matter for great debate. The DBA must balance the performance of SELECT statements with their DML (INSERT, UPDATE and DELETE) counterparts. SELECT statements that return a limited number of rows from a large table yet access non-indexed columns will suffer from poor performance. Conversely, if you have too many indexes on a particular table, DML statements may be adversely affected.

The DBA must take the business requirements, application processing workload and workload scheduling into consideration when determining how many indexes to build. If you compare the performance improvements an index makes on a SELECT statement to the negative affect it has on DML statements, you will find that the benefits of building the index usually far outweigh the performance drawbacks.

Indexes on columns in the WHERE clause of SELECT statements can reduce query times by minutes and even hours. The creation of additional indexes may add additional time to on-line transactions that execute DML statements. Additional indexes will have the greatest negative impact on DML statements that access a large number of rows. The more rows that are inserted, deleted or changed, the greater the negative impact will be. Traditionally, programs that process large volumes of rows are scheduled to execute during off-hours.

The DBA must also consider the needs of the business. What process is more important to the business unit – getting the data in or getting the data out? Who complains the most? Is it the business user that must wait minutes for their transaction or hours for their report to retrieve data or the business user that is waiting an extra few seconds for their update transaction to complete? Is the nightly batch window tight on time?

The DBA will need to find out how much time the additional indexes add to programs that process large volumes of rows. In addition, the DBA must determine when these programs run. If they run at night or do not require high-performance, consider building the index.

If the transaction update performance requirements are excessive (dot com applications are one example), keep the number of indexes to a minimum.

Index Monitoring
Determining if an index will increase performance is a pretty straightforward process. The administrator is focusing their tuning efforts on a particular query and is able to gather the specific information necessary to assist in the decision making process.

Dropping unused indexes is also an important part of application tuning. We learned previously that indexes force Oracle to occur additional I/O every time a row is inserted or deleted into the table they are built upon. Every update of the table’s columns incurs additional I/O to all indexes defined on those columns. Unused indexes also waste space and add unnecessary administrative complexity. Since unused indexes (excluding those used to enforce integrity constraints) do not add any benefits to the environment, why keep them?

Determining if indexes were being used in releases prior to Oracle9i was a time consuming and error-prone process. EXPLAIN plan and trace output could be used but there was no single mechanism that monitored index usage at the database level.

Starting with release 9i, the Oracle database simplifies the index usage monitoring process by providing the ALTER INDEX……… MONITOR USAGE command. The statement below turns monitoring on for the index SCOTT.EMPIDX while the second statement ends the monitoring session:

ALTER INDEX scott.empidx MONITORING USAGE;
ALTER INDEX scott.empidx NOMONITORING USAGE;

The V$OBJECT_USAGE table can then be accessed to determine if the index was used during the monitoring session. When the session is started, Oracle clears the information in V$OBJECT_USAGE for the index being monitored and enters a new start time identifying when the index monitoring session started. After the index monitoring session is concluded, the USED column in the V$OBJECT_USAGE table will contain the value ‘YES’ if the index was used during the monitoring session and the value ‘NO’ if it was not.

Parameters that Impact Index Usage
The parameters listed below influence the Oracle cost-based optimizer to favor or not favor index access. Please note that this is not an all inclusive list. It is a listing of parameters that I feel have the greatest chance to influence the optimizer to choose, or not choose, index access paths. We’ll use these parameters during our scientific analysis.

  • OPTIMIZER_MODE = first_rows or first_rows_nnnn – The optimizer chooses the best plan for fast delivery of the first few rows or the first nnnn rows. The first_rows_nnn replaces the first_rows parameter in later Oracle releases. The first_rows is available for backward compatibility. More often than not, that access path will include an index. This optimizer mode tends to favor nested loop joins over hash and merge scan.  It is important to note that using this mode is not a switch. It won’t change each and every table scan and hash join to index access and the nested loop join method. The optimizer, at times, will favor index access and nested loop joins. The reverse goes for the all_rows optimization mode below.
  • OPTIMIZER_MODE = all_rows – The optimizer chooses the best plan for fast delivery of all of the rows that queries return. The optimizer may decide to choose a full table scan over index access and hash joins instead of nested loop.
  • OPTIMIZER_INDEX_COST_ADJ = xxxx – This parameter lets you tune the optimizer to be more or less index “friendly.” It allows the administrators to influence the optimizer to make it more or less prone to selecting an index access path over a full table scan. Here’s an excellent series of blogs by Richard Foote on this parameter.    I have used this parameter a few times when I was freezing access paths using stored outlines.  More on that later.
  • OPTIMIZER_INDEX_CACHING = xxxx – You set this parameter to a value between 0 and 100 to indicate the percentage of the index blocks the optimizer should assume are in the cache. Setting this parameter to a higher value makes the indexes on the inner table of a nested loop joins look less expensive to the optimizer.  Richard Foote’s blog on this parameter is excellent.

Index Affects on Access Paths
If a table only contains a few hundred rows, queries may run faster if the optimizer chooses to read all of the blocks in the table as opposed to using an index. The I/O generated traversing the index blocks to get to the table row entries would be higher than if Oracle read just the blocks allocated to the table being accessed.

What if we access a larger table using a column with few unique values (poor cardinality)? As we learned previously, cardinality describes the number of different values stored in a column.  If our statement contains a WHERE clause that searches for a column value that is contained in 90% of the table’s rows, it is best that Oracle, once again, read each and every row in that table. Conversely if a WHERE clause searches for a column value that appears in 1% of the table rows, it would be beneficial for the optimizer to choose an index.

Here are some examples of index access paths that we will see during our testing:

  • Index only – Oracle is able to read all of the data required to satisfy the query’s data needs from the index structure alone.  An old DBA trick (from the mainframe days), is to add the columns that are used in the SELECT clause behind the columns in the index that are used in the WHERE clause.   The query you are attempting to tune will then stay within the index structure and not be required to access the table.
  • Index to table – Oracle searches through the index structure looking for the key value(s), then uses row identifiers to probe the table to satisfy the data request.
  • Index unique scans – The SQL statement accesses an index using a column (or columns) that are defined in a unique or primary key index with an equality condition.
  • Index range scans – Oracle scans a set of entries in the index to satisfy a query.
  • Index skip scans – Oracle is able to break down a multi-column index and view them as smaller subindexes. This is achieved by Oracle “skipping” the leading columns in the index and using columns that appear later in the index’s definition.
  • Full scans – Oracle scans all of the index entries. Kind of like a tablescan on an index. Oracle drops down to the leaf blocks and traverses the leaf blocks using the leaf pointers.
  • Fast full scan – Oracle uses multi-block reads to read both leaf and non-leaf blocks. Non-leaf (branch blocks) are discarded.

But indexes influence more than just index-related access paths. Indexes can also impact the type of join operations used. Here’s an “over the top” example to clarify. If you are joining two tables together in a SQL statement on join columns that have good selectivity (returns relatively few rows compared to the table size), Oracle will favor index access paths and a nested loop join. If you don’t have indexes on the join columns, Oracle may choose table scans using a hash join instead of the nested loop join method.

Index Types
Let’s continue our discussion by reviewing some of the more popular types of indexes: B-Tree, Bitmap and Function.

B-Tree Indexes
A traditional B-Tree index stores the key values and pointers in an inverted tree structure. The key to good B-Tree index performance is to build the index on columns having a lot of different values.  Oracle is able to quickly bypass rows that do not meet the search criteria when searching through indexes built on columns having a high degree of cardinality.

Bitmap Indexes
As I stated previously in this blog, there is a sense of confusion about bitmap indexes and the benefits they provide to columns with low cardinality. Instead of spending this entire blog on the bitmap index/low cardinality issue, please turn to Julian Dyke’s presentation titled “Bitmap Index Internals”.

The optimizer can be stubborn at times. It can be particularly stubborn when you want it to choose a single bitmapped index for an access path. A single bitmap index may not be chosen at all. The optimizer will be more inclined to choose bitmapped indexes as an access path if it can use multiple bitmapped indexes simultaneously. That’s where the benefits of bitmaps are realized.

Bitmap Indexes and Concurrency
Anyone accustomed to database programming understands the potential for concurrency problems. When one application program tries to update data that is in the process of being changed by another, the DBMS must sometimes forbid access until the modification is complete in order to ensure data integrity.

Each entry in a B-Tree index entry contains a single ROWID. When the index entry is locked during an update, a single row is affected. A bitmap lock affects a range of entries which could have a negative impact on other transactions attempting to update rows already locked.  Julian explains in his presentation that updates to Bitmap indexes in 10G may not be as expensive at they were in earlier database releases.

Locking issues affect data manipulation operations in Oracle. As a result, bitmapped indexes are not always appropriate for OLTP applications that have a high level of concurrent insert, update and delete operations. Concurrency is usually not an issue in a data warehousing environment where the data is maintained by bulk loads, inserts and updates.

Function Based Indexes
Oracle 8i solved an indexing problem that had been affecting database performance for close to a decade. Before Oracle8i, any SQL statement that contained a function or expression on the columns being searched on in the WHERE clause could not use an index.

For example, the statement:

SELECT * FROM employee_table
WHERE Upper(first_name) = ‘CHRIS’;

would not use an index. A full table scan would be required to retrieve the desired result set. We now know that we are able to use B-tree and bitmap indexes to speed query performance.

In Oracle8i and later releases, we are able to build both bitmap and B-tree indexes on columns containing the aforementioned functions or expressions. The following index could be used to increase performance of the query:

CREATE INDEX upper_first_name on employee_table (upper(first_name));

Function based indexes will affect the performance of DML statements that manipulate the columns contained in the function based index. The more complex of the expression used, the more time the database will require to update the index.

11G Invisible Indexes
11G introduced invisible indexes. They aren’t totally invisible as we can learn from Richard Foote’s blog on invisible indexes. Richard’s blog entry points you to another blog entry that contains a demo on invisible indexes. Please take a look at this information. You can see by the date that it is a bit old but the information is useful.  As I stated in an earlier blog, I will point you to websites that I feel can be trusted.  Richard Foote is the Oracle index expert and I visit his blog often.

I think that invisible indexes do provide benefits to DBAs who want to build the index but have the opportunity to “turn it off” and “turn it on”  when desired.

There are folks that spend dozens of hours learning about access paths and the affects that indexes, hints, statistics and parameters have on Oracle optimization.  The positions that I have held (except for my stint as an Oracle instructor), always pulled me in several non-technical directions that prevented me from spending as much time learning the internals as I would have liked.  I always seemed to be getting phone calls from managers asking me to make sure my staff had their new database built by the end of the day, getting everyone ready for the next big application turnover, running server consolidation/server deconsolidation, server migration, new application implementation projects.  That and all of the HR activities that surrounds DBA managers and DBA management.  There’s nothing like those phone calls from your local application department Vice President telling you “one of your DBAs is scaring my application developers”.

Wrapup
The intent of this blog was not to provide you with an all-inclusive education on Oracle indexes.  We are laying the groundwork to begin our own scientific analysis of Oracle access paths.   We want to begin testing our own hypothesis.    We can rely upon Richard and Jonathan and Mark to provide us with a base of information but the intent of this lesson is to learn for ourselves.  As I stated previously, we need to begin experimenting on our own to fully understand the Oracle optimization process. In my next blog, we’ll use the information we learned in this series to start our experimentation.
Thanks for reading,

Chris Foot
Oracle Aceace_2
Director Of Service Delivery

Share and Enjoy:
  • Digg
  • StumbleUpon
  • del.icio.us
  • LinkedIn
  • RSS
  • PDF
  • Google Bookmarks

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

3 Comments »

 
  • Peter Moore Peter Moore says:

    I’ve always found that index monitoring is an idea that sounds a lot more useful than it actually is.

    In order to find out if an index is ever really used, you have to monitor it for a long period of time – just in case it’s one of those indexes that’s only used in weekly/monthly/annual reports or batch runs.

    However, if you analyze a table (or run any of the gather_stats routines) then any indexes attached to the table will be marked as ‘used’, even though they haven’t actually been used in a real sense. Which immediately invalidates the monitoring which was so carefully setup.

  • Narendra Narendra says:

    Chris,

    Thanks for yet another nice post. I hope you don’t get distracted too much by your non-technical activities. :)

  • Chris Foot Chris Foot says:

    You are exactly correct Peter. I need to add that to the post! That is a definite problem with index monitoring, stats will flip the “used switch”

 

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Spam Protection by WP-SpamFree