Search This Blog

Monday, May 20, 2013

Teradata Secondary Index

Secondary Indexes
Secondary indexes are similar to Primary Index because they define a path that will deliver the data quickly to meet the users’ expected goals. A secondary index is an alternate path to the data. They can be defined as a Unique Secondary Index (USI) or a Non-Unique Secondary Index (NUSI). Without any secondary indexes your data warehouse could be skating on thin ice!
When it comes to working with large amounts of data that is centrally located, demands for performance to access this data is key. So what can a user do to influence the way data is accessed? The first rule of thumb, which is essential when it comes to working with centralized databases today, is to know your data. Second, understand how Teradata manages data distribution and what a user can do to enhance performance. A query that utilizes a Primary Index in the WHERE column is the fastest path to the data. A query that utilizes a Secondary Index will provide an alternate path to the data and be the second fastest access method. This chapter is dedicated to secondary indexes.

Secondary Indexes

Secondary Indexes provide another path to access data. Let’s say that you were planning a road trip to your hometown. To determine the best way to get there, you need to utilize a map. This map will give you many alternatives to plan your trip. In this case, you need to get their ASAP. So you choose the best route to get there in the shortest period of time. Secondary indexes work very similar to this above example because they provide another path to the data. Teradata allows up to 32 secondary indexes per table. Keep in mind; row distribution of records does not occur when secondary indexes are defined. The value of secondary indexes is that they reside in a subtable and are stored on all AMPs, which is very different from how the primary indexes (part of base table) are stored. Keep in mind that Secondary Indexes (when defined) do take up additional space.
Secondary Indexes are frequently used in a WHERE clause. The Secondary Index can be changed or dropped at any time. However, because of the overhead for index maintenance, it is recommended that index values should not be frequently changed.
There are two different types of Secondary Indexes, Unique Secondary Index (USI), and Non-Unique Secondary Index (NUSI). Unique Secondary Indexes are extremely efficient. A USI is considered a two-AMP operation. One AMP is utilized to access the USI subtable row (in the Secondary Index subtable) that references the actual data row, which resides on the second AMP.
A Non-Unique Secondary Index is an All-AMP operation and will usually require a spool file. Although a NUSI is an All-AMP operation, it is faster than a full table scan.
Secondary indexes can be useful for:
·       Satisfying complex conditions
·       Processing aggregates
·       Value comparisons
·       Matching character combinations
·       Joining tables

Below is a general illustration of a secondary index subtable row:
First Row : Secondary Index Subtable Columns 
Second Row: Secondary Index Column Length

Secondary Index Value   Secondary Index Row Id Primary Index Row ID
Actual Length            8 Bytes                          8 Bytes

Unique Secondary Index (USI)

A Unique Secondary Index (USI) is a two-AMP operation that will always return a maximum of one row. An USI will make your queries burn with speed. When a user creates a Unique Secondary Index, Teradata automatically creates a Secondary Index Subtable. The subtable will contain the:
·       Secondary Index Value
·       Secondary Index Row ID
·       Primary Index Row ID
When a user writes an SQL query that has an USI in the WHERE clause, the Parsing Engine will Hash the Secondary Index Value. The output is the Row Hash, which points to a bucket in the Hash Map. That bucket contains an AMP number and the Parsing Engine then knows which AMP contains the Secondary Index Subtable pertaining to the requested USI information. The PE will direct the chosen AMP to look-up the Row Hash in the Subtable. The AMP will check to see if the Row Hash exists in the Subtable and double check the subtable row with the actual secondary index value. Then, the AMP will pass the Primary Index Row ID back up the BYNET network. This request is directed to the AMP with the base table row, which is then easily retrieved.

USI Subtable Example

When a USI is designated on a table, each AMP will build a subtable. So if you create 32 USI indexes on a table, then each AMP will build 32 separate subtables. Therefore, choose your Secondary Indexes wisely, because space is used when these indexes are created. When a user inputs SQL that utilizes a USI in the WHERE clause, then Teradata will know that either one row or no rows can be returned. Reason, the column in the WHERE clause is unique.
The following example illustrates the how a USI Subtable is created and how it works to speed up queries.

The above example is designed to illustrate in theory exactly how Teradata works with Base Rows and Secondary Index Subtables. Take your time and I will lead you through the key points. Try and remember what this book has already taught you.
First, take a look at the top table. The table name is NAME_TABLE. The table was created with a Non-Unique Primary Index on the column LNAME. Notice that both names of DAVIS have hashed to the same AMP. All NUPI columns with the same value will always hash to the same AMP.
I have also given each base row an abbreviated Row ID. The abbreviated Row ID is four bits for the Row Hash and one bit for the Uniqueness Value separated by a comma. Notice that both names of Davis have the same Row Hash in 0000, but the Uniqueness Values are 1 and 2.
When we decided to CREATE a Unique Secondary Index (USI) on the column EMP, a secondary index subtable is automatically created on every AMP. For each Base Row, a secondary index subtable row is created. It is hash distributed across the AMPs and placed in the secondary index subtable. Here is how that happened.
Each EMP column is run through the Hashing Algorithm and the output is a Row Hash. This Row Hash points to a bucket in the Hash Map, which directs the row to an AMP. That particular AMP is responsible for the secondary index row. Each AMP holds a portion of the secondary index rows, and in theory, each AMP holds an equal amount of secondary index rows.
Notice that each secondary index subtable row (at the bottom of the diagram) has the EMP value, the EMP row hash and uniqueness value (EMP Row ID) and the most important column is the base table Row ID. The base table Row ID is a pointer to the row in the base table.
Now that the secondary index subtable is set up it is ready for action.

Command: Select * from Table where EMP=99
The PE realizes that EMP is a Unique Secondary Index and so the plan is to find the secondary index subtable row, which will then point us directly to the base row.
The PE hashes the value in EMP, which is 99 and the output is the Row Hash. It happens to be 1111. The PE takes the numbers of the Row Hash and it points to a bucket in the Hash Map. The bucket contains the AMP number where the secondary index row resides. The AMP number is AMP 1. The PE tells AMP 1 to look in its secondary index subtable for EMP 99, which has a Row Hash of 1111.AMP 1 scans its subtable and finds Row Hash 1111, and then double checks to see if the EMP value is 99. When it finds out both are true, it returns the base table Row ID. That base table Row ID is 0000, 2 and the AMP passes this number up the BYNET. The BYNET knows which AMP holds the base row and sends a message to this AMP to get the row. The row for Davis Ellie 99 is returned to the user.


NUSI Subtable Example

When a NUSI is designated on a table, each AMP will build a subtable. So if you create 32 NUSI indexes on a table, then each AMP will build 32 separate subtables. Therefore, choose your Secondary Indexes wisely, because space is used when these indexes are created. The following example illustrates the how a NUSI Subtable is created and how it works to speed up queries.

The above example is designed to logically illustrate how Teradata works with base table rows and NUSI subtables. Take your time and I will lead you through the key points. Try and remember what this book has already taught you.
First, take a look at the top table. The table name is NAME_TABLE. The table was created with a Non-Unique Primary Index on the column LNAME. Notice that both names of DAVIS have hashed to the same AMP. All NUPI columns with the same value will always hash to the same AMP.
I have also given each base row an abbreviated Row ID. The abbreviated Row ID is four bits for the Row Hash and one bit for the Uniqueness Value separated by a comma. Notice both names of Davis have the same Row Hash in 0000, but the Uniqueness Values are 1 and 2.
If we decide to CREATE a Secondary Index (NUSI) on the column FNAME, a secondary index subtable was automatically created on every AMP. The FNAME secondary index is a Non-Unique Secondary Index (NUSI).The key points to a NUSI subtable are that each AMP will keep their own subtable. This subtable only holds information about the base rows on that AMP. This is called an AMP local secondary index subtable. It would be like every family keeping a record of their own children. So, each non-unique secondary index subtable on an AMP points to base table rows on that same AMP.
Another interesting thing about NUSI subtables is that for each value only one record is created. Notice in the picture, I have placed arrows pointing to the row on the second AMP. Both people on this AMP have a FNAME of Ray. In the subtable, we show the value of Ray, the Row Hash of Ray (so the AMP can find it quicker), and the base table Row ID for every Ray on that AMP.
Now that the secondary index subtable is set up it is ready for action.

Command: Select * from Table where F_NAME='RAY'
The PE realizes that FNAME is a Non-Unique Secondary Index and so the plan is to have each AMP check to see if they have a Ray in their subtable. For speed, the PE hashes Ray and comes up with a Row Hash of 1111. The PE tells each AMP to look in its secondary index subtable for FNAME Ray, which has a Row Hash of 1111.Each AMP scans its subtable looking for Row Hash 1111, and then double checks to see if the FNAME value is ‘Ray’. If an AMP can’t find the subtable row it knows it has nobody named ‘Ray’ and won’t participate. If an AMP finds a subtable row for ‘Ray’ it will then retrieve all base Row IDs in the table for ‘Ray’.
The rows in the above case are returned for:
Smith Ray 77
Jones Ray 34

Value-Ordered NUSI

A Value-Ordered NUSI is a Non-Unique Secondary Index that can be of any Integer Type (Numeric and Four bytes or less). Value-Ordered NUSIs are created because you want to use the command BETWEEN and look for a range of values. Teradata treats Value-Ordered NUSIs similar to ordinary NUSI indexes, but with one minor difference.
Similarly, when a Value-Ordered NUSI is created Teradata automatically creates and maintains a secondary index subtable. Teradata also keeps the subtables AMP local so each AMP is responsible for keeping track of the base rows it owns.
The only difference is the way each AMP orders the subtable rows in its subtable. The rows will be ordered not in Row Hash sequence, but in order of the values themselves. That way January 1, 2000 comes before January 2, 2000. Dates are an excellent example where Value-Ordered NUSIs make sense. When a user asks for information WHERE the Order_Date is BETWEEN ‘2001/01/01’ and ‘2001/01/30’, each AMP can gather up their rows quickly because they are ordered by the data value. That’s what Order By Values means.
The syntax for creating a Value-Ordered NUSI is:

CREATE INDEX(Dept) ORDER BY VALUES on EMPLOYEE
The requirements for a Value-Ordered Index is the following:
·       Cannot be unique or a Primary Index
·       Must only be used for NUSI‘s
·       It can be updated with SQL or Multiload
·       It can be any integer data type




Secondary Index Summary

·       You can have up to 32 secondary indexes for a table.
·       Secondary Indexes provide an alternate path to the data.
·       The two types of secondary indexes are USI and NUSI.
·       Every secondary index defined causes each AMP to create a subtable.
·       USI subtables are hash distributed.
·       NUSI subtables are AMP local.
·       USI queries are Two-AMP operations.
·       NUSI queries are All-AMP operations, but not Full Table Scans.
·       Value-Ordered NUSIs can be any non-unique index of integer type.
·       Always Collect Statistics on all NUSI indexes.
·       The PE will decide if a NUSI is strongly selective and worth using over a Full Table Scan.
·       Use the Explain function to see if a NUSI is being utilized or if bitmapping is taking place










3 comments:

  1. Very informative ! Excellent examples !!

    ReplyDelete
  2. Thanks for Information Teradata Online Training is one of the most emerging technologies in market. As more and more organization are moving their data warehouse in Teradata database, so the demand of Teradata Professionals are high. We at TeradataTech started giving online training sessions for all folks who are interested in learning this technology. People who are interested in learning the basics and advance features of Teradata Development can benefit from this training.

    ReplyDelete
  3. Excellent content ! Thanks for sharing

    ReplyDelete