Secondary Indexes
Secondary
indexes are similar to Primary Index because they define a path that will deliver the data
quickly to meet the users’ expected goals. A secondary index is an alternate
path to the data. They can be defined as a Unique Secondary Index (USI) or a
Non-Unique Secondary Index (NUSI). Without any secondary indexes your data
warehouse could be skating on thin ice!
When
it comes to working with large amounts of data that is centrally located,
demands for performance to access this data is key. So what can a user do to
influence the way data is accessed? The first rule of thumb, which is essential
when it comes to working with centralized databases today, is to know your
data. Second, understand how Teradata manages data distribution and what a user
can do to enhance performance. A query that utilizes a Primary Index in the
WHERE column is the fastest path to the data. A query that utilizes a Secondary
Index will provide an alternate path to the data and be the second fastest
access method. This chapter is dedicated to secondary indexes.
Secondary Indexes
Secondary
Indexes provide another path to access data. Let’s say that you were planning a
road trip to your hometown. To determine the best way to get there, you need to
utilize a map. This map will give you many alternatives to plan your trip. In
this case, you need to get their ASAP. So you choose the best route to get
there in the shortest period of time. Secondary indexes work very similar to
this above example because they provide another path to the data. Teradata
allows up to 32 secondary indexes per table. Keep in mind; row distribution of
records does not occur when secondary indexes are defined. The value of
secondary indexes is that they reside in a subtable and are stored on all AMPs,
which is very different from how the primary indexes (part of base table) are
stored. Keep in mind that Secondary Indexes (when defined) do take up
additional space.
Secondary
Indexes are frequently used in a WHERE clause. The Secondary Index can be
changed or dropped at any time. However, because of the overhead for index
maintenance, it is recommended that index values should not be frequently
changed.
There
are two different types of Secondary Indexes, Unique Secondary Index (USI), and
Non-Unique Secondary Index (NUSI). Unique Secondary Indexes are extremely
efficient. A USI is considered a two-AMP operation. One AMP is utilized to
access the USI subtable row (in the Secondary Index subtable) that references
the actual data row, which resides on the second AMP.
A
Non-Unique Secondary Index is an All-AMP operation and will usually require a
spool file. Although a NUSI is an All-AMP operation, it is faster than a full
table scan.
Secondary
indexes can be useful for:
· Satisfying complex conditions
· Processing aggregates
· Value comparisons
· Matching character combinations
· Joining tables
Below
is a general illustration of a secondary index subtable row:
First Row : Secondary Index Subtable Columns
Second Row: Secondary Index Column Length
Secondary Index Value Secondary Index Row Id Primary Index Row ID
Actual Length 8 Bytes 8 Bytes
Unique Secondary Index (USI)
A
Unique Secondary Index (USI) is a two-AMP operation that will always return a
maximum of one row. An USI will make your queries burn with speed. When a user
creates a Unique Secondary Index, Teradata automatically creates a Secondary
Index Subtable. The subtable will contain the:
· Secondary Index Value
· Secondary Index Row ID
· Primary Index Row ID
When
a user writes an SQL query that has an USI in the WHERE clause, the Parsing
Engine will Hash the Secondary Index Value. The output is the Row Hash, which
points to a bucket in the Hash Map. That bucket contains an AMP number and the
Parsing Engine then knows which AMP contains the Secondary Index Subtable
pertaining to the requested USI information. The PE will direct the chosen AMP
to look-up the Row Hash in the Subtable. The AMP will check to see if the Row
Hash exists in the Subtable and double check the subtable row with the actual
secondary index value. Then, the AMP will pass the Primary Index Row ID back up
the BYNET network. This request is directed to the AMP with the base table row,
which is then easily retrieved.
USI Subtable Example
When
a USI is designated on a table, each AMP will build a subtable. So if you
create 32 USI indexes on a table, then each AMP will build 32 separate
subtables. Therefore, choose your Secondary Indexes wisely, because space is
used when these indexes are created. When a user inputs SQL that utilizes a USI
in the WHERE clause, then Teradata will know that either one row or no rows can
be returned. Reason, the column in the WHERE clause is unique.
The
following example illustrates the how a USI Subtable is created and how it
works to speed up queries.
The
above example is designed to illustrate in theory exactly how Teradata works
with Base Rows and Secondary Index Subtables. Take your time and I will lead
you through the key points. Try and remember what this book has already taught
you.
First,
take a look at the top table. The table name is NAME_TABLE. The table was
created with a Non-Unique Primary Index on the column LNAME. Notice that both
names of DAVIS
have hashed to the same AMP. All NUPI columns with the same value will always
hash to the same AMP.
I
have also given each base row an abbreviated Row ID. The abbreviated Row ID is
four bits for the Row Hash and one bit for the Uniqueness Value separated by a
comma. Notice that both names of Davis
have the same Row Hash in 0000, but the Uniqueness Values are 1 and 2.
When
we decided to CREATE a Unique Secondary Index (USI) on the column EMP, a
secondary index subtable is automatically created on every AMP. For each Base
Row, a secondary index subtable row is created. It is hash distributed across
the AMPs and placed in the secondary index subtable. Here is how that happened.
Each
EMP column is run through the Hashing Algorithm and the output is a Row Hash.
This Row Hash points to a bucket in the Hash Map, which directs the row to an
AMP. That particular AMP is responsible for the secondary index row. Each AMP
holds a portion of the secondary index rows, and in theory, each AMP holds an
equal amount of secondary index rows.
Notice
that each secondary index subtable row (at the bottom of the diagram) has the
EMP value, the EMP row hash and uniqueness value (EMP Row ID) and the most
important column is the base table Row ID. The base table Row ID is a pointer
to the row in the base table.
Now
that the secondary index subtable is set up it is ready for action.
The
PE realizes that EMP is a Unique Secondary Index and so the plan is to find the
secondary index subtable row, which will then point us directly to the base
row.
The
PE hashes the value in EMP, which is 99 and the output is the Row Hash. It
happens to be 1111. The PE takes the numbers of the Row Hash and it points to a
bucket in the Hash Map. The bucket contains the AMP number where the secondary
index row resides. The AMP number is AMP 1. The PE tells AMP 1 to look in its
secondary index subtable for EMP 99, which has a Row Hash of 1111.AMP 1 scans
its subtable and finds Row Hash 1111, and then double checks to see if the EMP
value is 99. When it finds out both are true, it returns the base table Row ID.
That base table Row ID is 0000, 2 and the AMP passes this number up the BYNET.
The BYNET knows which AMP holds the base row and sends a message to this AMP to
get the row. The row for Davis Ellie 99 is returned to the user.
NUSI Subtable Example
When
a NUSI is designated on a table, each AMP will build a subtable. So if you
create 32 NUSI indexes on a table, then each AMP will build 32 separate
subtables. Therefore, choose your Secondary Indexes wisely, because space is
used when these indexes are created. The following example illustrates the how
a NUSI Subtable is created and how it works to speed up queries.
The
above example is designed to logically illustrate how Teradata works with base
table rows and NUSI subtables. Take your time and I will lead you through the
key points. Try and remember what this book has already taught you.
First,
take a look at the top table. The table name is NAME_TABLE. The table was
created with a Non-Unique Primary Index on the column LNAME. Notice that both
names of DAVIS
have hashed to the same AMP. All NUPI columns with the same value will always
hash to the same AMP.
I
have also given each base row an abbreviated Row ID. The abbreviated Row ID is
four bits for the Row Hash and one bit for the Uniqueness Value separated by a
comma. Notice both names of Davis
have the same Row Hash in 0000, but the Uniqueness Values are 1 and 2.
If
we decide to CREATE a Secondary Index (NUSI) on the column FNAME, a secondary
index subtable was automatically created on every AMP. The FNAME secondary
index is a Non-Unique Secondary Index (NUSI).The key points to a NUSI subtable
are that each AMP will keep their own subtable. This subtable only holds
information about the base rows on that AMP. This is called an AMP local secondary
index subtable. It would be like every family keeping a record of their own
children. So, each non-unique secondary index subtable on an AMP points to base
table rows on that same AMP.
Another
interesting thing about NUSI subtables is that for each value only one record
is created. Notice in the picture, I have placed arrows pointing to the row on
the second AMP. Both people on this AMP have a FNAME of Ray. In the subtable,
we show the value of Ray, the Row Hash of Ray (so the AMP can find it quicker),
and the base table Row ID for every Ray on that AMP.
Now
that the secondary index subtable is set up it is ready for action.
Command: Select * from Table where F_NAME='RAY'
The
PE realizes that FNAME is a Non-Unique Secondary Index and so the plan is to
have each AMP check to see if they have a Ray in their subtable. For speed, the
PE hashes Ray and comes up with a Row Hash of 1111. The PE tells each AMP to
look in its secondary index subtable for FNAME Ray, which has a Row Hash of
1111.Each AMP scans its subtable looking for Row Hash 1111, and then double
checks to see if the FNAME value is ‘Ray’. If an AMP can’t find the subtable
row it knows it has nobody named ‘Ray’ and won’t participate. If an AMP finds a
subtable row for ‘Ray’ it will then retrieve all base Row IDs in the table for
‘Ray’.
The
rows in the above case are returned for:
Smith
Ray 77
Jones
Ray 34
Value-Ordered NUSI
A
Value-Ordered NUSI is a Non-Unique Secondary Index that can be of any Integer
Type (Numeric and Four bytes or less). Value-Ordered NUSIs are created because
you want to use the command BETWEEN and look for a range of values. Teradata
treats Value-Ordered NUSIs similar to ordinary NUSI indexes, but with one minor
difference.
Similarly,
when a Value-Ordered NUSI is created Teradata automatically creates and
maintains a secondary index subtable. Teradata also keeps the subtables AMP
local so each AMP is responsible for keeping track of the base rows it owns.
The
only difference is the way each AMP orders the subtable rows in its subtable.
The rows will be ordered not in Row Hash sequence, but in order of the values
themselves. That way January 1, 2000 comes before January 2, 2000. Dates are an
excellent example where Value-Ordered NUSIs make sense. When a user asks for
information WHERE the Order_Date is BETWEEN ‘2001/01/01’ and ‘2001/01/30’, each
AMP can gather up their rows quickly because they are ordered by the data value.
That’s what Order By Values means.
The
syntax for creating a Value-Ordered NUSI is:
CREATE INDEX(Dept) ORDER BY VALUES on EMPLOYEE
The
requirements for a Value-Ordered Index is the following:
· Cannot be unique or a Primary Index
· Must only be used for NUSI‘s
· It can be updated with SQL or
Multiload
· It can be any integer data type
Secondary Index Summary
· You can have up to 32 secondary
indexes for a table.
· Secondary Indexes provide an
alternate path to the data.
· The two types of secondary indexes
are USI and NUSI.
· Every secondary index defined causes
each AMP to create a subtable.
· USI subtables are hash distributed.
· NUSI subtables are AMP local.
· USI queries are Two-AMP operations.
· NUSI queries are All-AMP operations,
but not Full Table Scans.
· Value-Ordered NUSIs can be any
non-unique index of integer type.
· Always Collect Statistics on all
NUSI indexes.
· The PE will decide if a NUSI is
strongly selective and worth using over a Full Table Scan.
· Use the Explain
function to see if a NUSI is being utilized or if bitmapping is taking place
Very informative ! Excellent examples !!
ReplyDeleteThanks for Information Teradata Online Training is one of the most emerging technologies in market. As more and more organization are moving their data warehouse in Teradata database, so the demand of Teradata Professionals are high. We at TeradataTech started giving online training sessions for all folks who are interested in learning this technology. People who are interested in learning the basics and advance features of Teradata Development can benefit from this training.
ReplyDeleteExcellent content ! Thanks for sharing
ReplyDelete