Cardinality in Oracle
Cardinality means the number of something but it gets used in a variety of contexts.
It depends a bit on context.
- When you’re building a data model, cardinality often refers to the number of rows in table A that relate to table B.That is, are there 1 row in B for every row in A (1:1), are there N rows in B for every row in A (1:N), are there M rows in B for every N rows in A (N:M), etc.
- When you are looking at things like whether it would be more efficient to use a b*-tree index or a bitmap index or how selective a predicate is, cardinality refers to the number of distinct values in a particular column.If you have a
PERSON
table, for example,GENDER
is likely to be a very low cardinality column (there are probably only two values inGENDER
) whilePERSON_ID
is likely to be a very high cardinality column (every row will have a different value).
- When you are looking at query plans, cardinality refers to the number of rows that are expected to be returned from a particular operation.
There are probably other situations where people talk about cardinality using a different context and mean something else.
Types Of Index in Oracle
There are two types of indexe in oracle
- BitMap (for low cardinality data)
- B-Tree (for high cardinality data)
Cardinality On Index Choice
The advantages of using bitmap indexes are greatest for columns in which the ratio of the number of distinct values to the number of rows in the table is small. We refer to this ratio as the degree of cardinality. A gender column, which has only two distinct values (male and female), is optimal for a bitmap index.
For example, on a table with one million rows, a column with 10,000 distinct values is a candidate for a bitmap index. A bitmap index on this column can outperform a B-tree index, particularly when this column is often queried in conjunction with other indexed columns.
B-tree indexes are most effective for high-cardinality data: that is, for data with many possible values, such as customer_name
or phone_number
. In a data warehouse, B-tree indexes should be used only for unique columns or other columns with very high cardinalities (that is, columns that are almost unique). The majority of indexes in a data warehouse should be bitmap indexes.
In ad hoc queries and similar situations, bitmap indexes can dramatically improve query performance. AND
and OR
conditions in the WHERE
clause of a query can be resolved quickly by performing the corresponding Boolean operations directly on the bitmaps before converting the resulting bitmap to rowids. If the resulting number of rows is small, the query can be answered quickly without resorting to a full table scan.
Example : Bitmap Index
The following shows a portion of a company’s customers
table.
SELECT cust_id, cust_gender, cust_marital_status, cust_income_level FROM customers; CUST_ID C CUST_MARITAL_STATUS CUST_INCOME_LEVEL ---------- - -------------------- --------------------- ... 70 F D: 70,000 - 89,999 80 F married H: 150,000 - 169,999 90 M single H: 150,000 - 169,999 100 F I: 170,000 - 189,999 110 F married C: 50,000 - 69,999 120 M single F: 110,000 - 129,999 130 M J: 190,000 - 249,999 140 M married G: 130,000 - 149,999 ...
Because cust_gender
, cust_marital_status
, and cust_income_level
are all low-cardinality columns (there are only three possible values for marital status, two possible values for gender, and 12 for income level), bitmap indexes are ideal for these columns. Do not create a bitmap index on cust_id
because this is a unique column. Instead, a unique B-tree index on this column provides the most efficient representation and retrieval.
Bitmap Indexes and Nulls
Unlike most other types of indexes,bitmap indexes include rows that have NULL
values. Indexing of nulls can be useful for some types of SQL statements, such as queries with the aggregate function COUNT
.
Example : Bitmap Index
SELECT COUNT(*) FROM customers WHERE cust_marital_status IS NULL;
This query uses a bitmap index on cust_marital_status
. Note that this query would not be able to use a B-tree index, because B-tree indexes do not store the NULL
values.
SELECT COUNT(*) FROM customers;
Any bitmap index can be used for this query because all table rows are indexed, including those that have NULL
data. If nulls were not indexed, the optimizer would be able to use indexes only on columns with NOT NULL
constraints.
Bitmap Join Index Restrictions and Requirements
Join results must be stored, therefore, bitmap join indexes have the following restrictions:
- Parallel DML is only supported on the fact table. Parallel DML on one of the participating dimension tables will mark the index as unusable.
- Only one table can be updated concurrently by different transactions when using the bitmap join index.
- No table can appear twice in the join.
- You cannot create a bitmap join index on an index-organized table or a temporary table.
- The columns in the index must all be columns of the dimension tables.
- The dimension table join columns must be either primary key columns or have unique constraints.
- The dimension table column(s) participating the join with the fact table must be either the primary key column(s) or with the unique constraint.
- If a dimension table has composite primary key, each column in the primary key must be part of the join.
- The restrictions for creating a regular bitmap index also apply to a bitmap join index. For example, you cannot create a bitmap index with the
UNIQUE
attribute
Using B-Tree Indexes in Data Warehouses
A B-tree index is organized like an upside-down tree. The bottom level of the index holds the actual data values and pointers to the corresponding rows, much as the index in a book has a page number associated with each index entry.
In general, use B-tree indexes when you know that your typical query refers to the indexed column and retrieves a few rows. In these queries, it is faster to find the rows by looking at the index. However, using the book index analogy, if you plan to look at every single topic in a book, you might not want to look in the index for the topic and then look up the page. It might be faster to read through every chapter in the book. Similarly, if you are retrieving most of the rows in a table, it might not make sense to look up the index to find the table rows. Instead, you might want to read or scan the table.
B-tree indexes are most commonly used in a data warehouse to enforce unique keys. In many cases, it may not even be necessary to index these columns in a data warehouse, because the uniqueness was enforced as part of the preceding ETL processing, and because typical data warehouse queries may not work better with such indexes. B-tree indexes are more common in environments using third normal form schemas. In general, bitmap indexes should be more common than B-tree indexes in most data warehouse environments.
Choosing Between Local Indexes and Global Indexes
B-tree indexes on partitioned tables can be global or local. With Oracle8i and earlier releases, Oracle recommended that global indexes not be used in data warehouse environments because a partition DDL statement (for example, ALTER
TABLE
… DROP
PARTITION
) would invalidate the entire index, and rebuilding the index is expensive. In Oracle Database 10g, global indexes can be maintained without Oracle marking them as unusable after DDL. This enhancement makes global indexes more effective for data warehouse environments.
However, local indexes will be more common than global indexes. Global indexes should be used when there is a specific requirement which cannot be met by local indexes (for example, a unique index on a non-partitioning key, or a performance requirement).
Bitmap indexes on partitioned tables are always local.
Top Posts
- RANK, DENSE_RANK, FIRST and LAST Analytic Functions
- Partitioned Tables And Indexes in Oracle
- GROUP BY Functions in Oracle
- PIVOT Function in Oracle
- Oracle Architecture – SQL Query Processing