Thursday, May 30, 2013

Teradata Space Consumption Queries

Space Consumption

In order to find the space consumed by tables and database and also to find out the free space available, following queries will be useful. There are different queries to find each of the following.

1. Total Space Available in a Database
2. Used Space in Database
3. Free Space in Database
4. Space Consumed by each Tables
5. Space Consumed by each table by the creator name.

We will make use of some of the system tables which stores the space information.

Total Space, Used Space and Free Space available in Database :

Below query gives the total space consumed by the database and free space available. This helps to identify whether the database is nearing its full capacity so that we can add additional space or clean up some space.

SELECT
DATABASENAME DATABASE_NAME,
SUM(MAXPERM)/(1024*1024*1024) TOTAL_SPACE_IN_GB,
SUM(CURRENTPERM)/(1024*1024*1024) USED_SPACE_IN_GB,
TOTAL_SPACE_IN_GB-USED_SPACE_IN_GB AS FREE_SPACE_IN_GB
FROM
DBC.DISKSPACE
WHERE
DATABASENAME ='<DATABASENAME>'
GROUP BY 1;

Size of Each Tables:

Below query is used to find the size of each table or the space consumed by each table. This will be useful to find the bigger tables in Database and can be identified if the table is needed or not so that space can be freed up.

SELECT
DATABASENAME,
TABLENAME,
SUM(CURRENTPERM/(1024*1024*1024)) AS "TABLE SIZE IN GB"
FROM
DBC.TABLESIZE
WHERE
DATABASENAME = <'DATABASE_NAME'> AND TABLENAME = <'TABLE_NAME'>
GROUP BY 1,2;

Note: Remove the Tablename filter, if you want to find the table sizes of all the tables in the database.

TableSize by Creator/Owner Name:

This will help us to identify the tables created by other users in the database and check with them if the table is needed or not so that it can be dropped for getting free space.

SELECT
S.DATABASENAME,
S.TABLENAME,
T.CREATORNAME,
SUM(CURRENTPERM/(1024*1024*1024)) AS "TABLE SIZE IN GB"
FROM
DBC.TABLESIZE S
JOIN
DBC.TABLES T
ON
S.TABLENAME=T.TABLENAME AND S.DATABASENAME=T.DATABASENAME
WHERE
S.DATABASENAME = <'DATABASE_NAME'>
GROUP BY 1,2,3;

Monday, May 20, 2013

Teradata Locks

TERADATA LOCKS

Locking in Teradata is automatic and cannot be turned off for normal tables. There are

four types of locks that are used and they are:

Types of Lock

EXCLUSIVE

WRITE

READ

ACCESS

EXCLUSIVE	The resource is temporarily owned. Not available to users until the lock is released. This is normally obtained at the database or table level when the structure/definition is being modified.
WRITE	A data block has been retrieved from disk with the intention of modifying one or more rows. The block containing the modified data will be written back to disk.
READ	A data block has been retrieved from disk. No changes will be made to the data.
ACCESS	Affectionately called a Dirty Read lock. Allows a SELECT to read data that is locked for WRITE. It is a very minimal form of locking.

This chart shows the automatic locking in Teradata respective to SQL commands:

Type of Lock	Caused by	Locks Blocked for Other Users
EXCLUSIVE	DDL	EXCLUSIVE, WRITE, READ, ACCESS
WRITE	INSERT , UPDATE, DELETE	EXCLUSIVE, WRITE, READ
READ	SELECT	EXCLUSIVE, WRITE
ACCESS		EXCLUSIVE

The resource that is locked depends on the SQL command requested by the user.The lock may be set at the database, view, table, or row level.

This chart shows the impact of these levels of locking:

If Locked at	Resource(s) unavailable to other users
DATABASE	All tables, views, macros and triggers owned by the database/user.
VIEW	All tables referenced in the View.
TABLE	All rows in the table.
ROW	All rows with the same row hash.

All SQL commands automatically request a lock. The Teradata RDBMS attempts to lock

the resource at the lowest level possible. The lowest level is a row lock. However,

Teradata places more importance on performance than resource availability. This implies that the optimizer has the last say in the locking level that is used.

For instance, an UPDATE has the option of locking at the table or row level. The

optimizer knows that when an entire table is locked, all other users must wait to read

even a single row from the table. However, when only a row is WRITE locked, other

users still have access to the table, and only have to wait if they need to read the row

currently locked. Therefore, normally row level locks are preferable so that rows have a

maximum availability for users. This is especially important if another user is requesting

a UPI value not used in the UPDATE. This type of locking provides more opportunity

for concurrency of user requests and better overall performance.

However, the optimizer also knows when all rows in a table are going to be changed. It

could follow the row locking to allow as much access as possible. However, eventually

all rows are locked. Also, it knows that to lock a row and then read a row over and over

again takes longer than locking the table once, reading all rows as fast as possible, and

then releasing all locks at once. A full table scan needs all rows. Therefore, the normal

row level lock will be escalated to a table level lock for speed on a full table scan.

Additionally, by locking the table, it eliminates the potential for a deadlock between

multiple user requests.

Regardless of the approach to locking that the optimizer chooses, eventually all access to locked rows is denied for a period of time. The chart in Figure 16-2 indicates that a

WRITE lock blocks other WRITE locks requested by other users. Additionally all

READ lock requests are also blocked because the current data is being changed and

therefore, not available until it is finished. This is where the ACCESS lock can be useful.

It is also seen in Figure 16-2 that the WRITE lock does not block an ACCESS lock.

Therefore, a user can request an ACCESS lock for a SELECT instead of the default

READ lock. This does however mean that the data read may or may not be the latest

version. Hence, the nickname “Dirty Read.” This is commonly done in views.

To request a locking change, the LOCKING FOR modifier can be used. It is written

ahead of the SQL statement to modify the way it executes.

These are the various syntax formats of the LOCKING Modifier:

LOCKING [<table-name>] FOR <desired-locking> [NOWAIT]

LOCKING ROW FOR <desired-locking>

LOCKING DATABASE <database-name> FOR <desired-locking>

LOCKING VIEW <view-name> FOR <desired-locking>

LOCKING TABLE <table-name> FOR <desired-locking>

The first syntax listed above defaults to a ROW level lock using the desired lock. So, the

first two LOCKING requests do the same thing. To make the command shorter, the

LOCKING can be abbreviated to LOCK.

The first syntax format also shows the NOWAIT option. It indicates that if a resource is

not available the statement should not wait. Instead, it will instantly ABORT. There is

another option called MODE that can be used. However, it does not do anything and is

there strictly for compatibility with DB/2 SQL. Lastly, also for compatibility, the FOR

can be changed to an IN. The NOWAIT is available for all locking requests.

The other specification in the above formats is used for specifying the database object to lock. In reality, multiple LOCKING modifiers might exist on a single SELECT. For

instance, in a join operation an ACCESS lock might be requested for one table and not

the other.

This chart shows which SQL commands can use the modifier:

Modifier	SQL allowed with Modifier	Level
EXCLUSIVE	INSERT,UPDATE,DELETE,SELECT	Table, View or Row
WRITE	SELECT	Table, View or Row
READ	SELECT	Table, View or Row
ACCESS	SELECT	Table, View or Row

The above chart shows that the SELECT is the only command that can use the

LOCKING FOR modifier for anything other than EXCLUSIVE. This is because most

locks cannot be downgraded to a lesser lock, without causing potential data integrity

issues. Since the SELECT is not changing data, it can be downgraded safely.

It is very common to use the ACCESS locking when creating a view. Since most views

only SELECT rows, a WRITE lock is not needed. Plus, if maintenance is being

performed on a table, selecting rows using a view with an ACCESS lock is not delayed

due to a WRITE lock. So, users are happy and don’t call to complain that the “system is

slow.”

Another time to use the LOCKING modifier is for multi-step transactions. Consider this

situation: The first step is a SELECT and obtains a READ lock. This lock allows other

users to also SELECT from the table with a READ lock. Then, the next step of the

transaction is an UPDATE. It must now upgrade the READ lock to a WRITE lock.

This upgrade of the lock cannot occur while other users have a READ lock on the

resource. Therefore, the transaction must wait for the READ locks to disappear. This

might dramatically increase the time to complete the maintenance transaction. Therefore, by upgrading the initial default of a READ lock to a WRITE lock for the SELECT it eliminates the potential for a delay in the middle of the transaction.

The next SELECT uses the ACCESS lock, common in a View:

EXPLAIN

LOCKING ROW FOR ACCESS

WHERE customer_name LIKE ‘Billy%’ ;

Explanation

1) First, we lock MIKEL.Customer_table for access.

2) Next, we do an all-AMPs RETRIEVE step from MIKEL.Customer_table by

way of an all-rows scan with a condition of (

"MIKEL.Customer_table.Customer_name LIKE 'Billy%'") into Spool 1,

which is built locally on the AMPs. The size of Spool 1 is

estimated with no confidence to be 4 rows. The estimated time for

this step is 0.15 seconds.

3) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

-> The contents of Spool 1 are sent back to the user as the result of

statement 1. The total estimated time is 0.15 seconds.

Since the locking modifier can name the table, each table may use different locking when multiple tables are referenced in the same SQL statement.

Teradata OLAP Functions

OLAP Functions

The OLAP functions are built into the Teradata database to provide data mining capabilities and trend analysis. These functions provide processing not available using the standard aggregation. As mentioned earlier, an aggregate eliminates the detail data from a row. These OLAP functions provide the result of their operation and display the detail data values used in the function.

This technique produces output that is somewhat like the Teradata extensions of WITH and WITH BY. The similarity is in the fact that the detail row data is also displayed as part of the answer set. However, since the output is in row format and not in a report format like WITH, the OLAP functions may be performed on all tables or views and may be used to populate tables in conjunction with INSERT/SELECT. The biggest difference is that these can be used in Queryman, unlike WITH.

Below list has some of the important OLAP functions used.

Cumulative Sum Using the CSUM Function

The process of creating a cumulative sum means that data values in sequential rows are added together. The same addition was seen earlier in this book when using the SUM aggregate, with some major differences that are contrasted here.

The Cumulative Sum (CSUM) function provides a running or cumulative total for a column’s numeric value. This allows users to see what is happening with column totals over an ongoing progression. The results will be sorted in ascending or descending order and the sort list can consist of a single or multiple columns, listed as sort keys.

The syntax for CSUM is:

SELECT CSUM( <column-name>, <sort-key> [ASC | DESC]

[, <sort-key> [ASC | DESC] … )

FROM <table-name>

[GROUP BY <column-name> [,<column-number> … ] ]

;

The CSUM command uses the first parameter as the column containing a numeric value to sum. This value will be added to the previous data values and provide a running or cumulative answer.

The second parameter is entered as a sort key that sequences the output rows. This column determines the major sort sequence of the detail data from the row along with the CSUM. By default, the sort sequence is ascending (ASC). The DESC can be specified to request a descending (highest to lowest) sequence. Optionally, additional sort keys can be entered to specify one or more minor sort sequences.

To understand the use of CSUM, we will use a table that stores sales data for all products for each store on a daily basis. The CSUM function can be used to show the daily sales data for any or all of the products and accumulate the sales data for both running and final totals.

The column specified in this CSUM should contain the sales dollar figure for each day and the sort key as the date. Then, the WHERE clause provides a beginning and ending date for the query.

Cumulative Sum with Reset Capabilities

The CSUM has the ability to reset values throughout the accumulation process to obtain a subtotal. It uses the GROUP BY designation to specify a data column that, when the value changes, causes the accumulation value to be reset back to zero.

Using CSUM and GROUP BY

A cumulative sum may be reset to zero at a specified breakpoint by merely adding a GROUP BY clause to the query when using the CSUM function. The GROUP BY provides for subtotals within the output.

Cumulative Sum with Reset Capabilities

As mentioned with the CSUM, this version of the cumulative sum can be reset a to provide the equivalent of a subtotal. This method uses the ANSI Standard SUM with OVER and PARTITION BY designators to specify a data value that, when it changes,

causes the accumulation value to be reset back to zero.

SUM Using SUM and OVER / PARTITION BY

Normally aggregate functions and OLAP functions are incompatible. This is because

aggregates provide only the final single row answer and eliminate row detail data.

Conversely, the OLAP functions provide the row detail data and the answer. Using this

ANSI syntax, the SUM aggregate can be made to act more as an OLAP function to

provide both the answer and the row detail.

As seen above, CSUM is a Teradata Extension. It may use the GROUP BY designation

to reset the accumulation process for the equivalent of a subtotal. The ANSI method

does not use GROUP BY. Instead, it uses the OVER to design that a partition or group

can be established using the PARTITION BY designator.

One of the major advantages to the PARTITION is that each column that is a SUM can

be based on a different value. Whereas, there can only be a single GROUP BY in a

SELECT.

The following ANSI syntax used with SUM to provide OLAP functionality:

SELECT

SUM(<column-name>) OVER ( PARTITION BY <column-name>

ORDER BY <column-name> [ASC | DESC]

[,<column-name> [ASC | DESC] ] )

FROM <table-name>

;

Below, the previously used SELECT is again modified with SUM and OVER /

PARTITION to show the equivalent cumulative sales with a reset on a change in the date

as seen with the CSUM:

SELECT

Product_ID

,Sale_Date

,Daily_Sales

,SUM(Daily_Sales) OVER ( PARTITION BY Sale_Date

ORDER BY Sale_Date

ROWS UNBOUNDED PRECEDING)

(format ‘$$$$,$$$.99’) AS Like_CSum

FROM Sales_table

WHERE Sale_Date between 1001001 and 1001003 ;

The PARTITION has the same effect here as the GROUP BY does in the proprietary

Teradata extension OLAP functions.

There are two advantages to using this syntax. First, it is the ANSI standard. Second,

and the biggest advantage, is that it is compatible with other OLAP functions because the detail data is retained as well as the use of aggregate functions within the formulas for derived data.

Partitioning Data Using the QUANTILE Function

A Quantile is used to divide rows into a number of partitions of roughly the same number

of rows in each partition. The percentile is the QUANTILE most commonly used in

business. This means that the request is based on a value of 100 for the number of

partitions. It is also possible to have quartiles (based on 4), tertiles (based on 3) and

deciles (based on 10).

By default, both the QUANTILE column and the QUANTILE value itself will be output

in ascending sequence. As in some cases, the ORDER BY clause may be used to reorder the output for display. Here the order of the output does not change the meaning of the output, unlike a summation where the values are being added together and all need to appear in the proper sequence.

The syntax of the QUANTILE function is:

SELECT QUANTILE (<partitions>, <column-name> ,<sort-key> [DESC | ASC])

FROM <table-name>

[QUALIFY QUANTILE (<column-name>) {< | > | = | <= | >=} <number-of-rows>]

;

The next SELECT determines the percentile for every row in the Sales table based on the daily sales amount and sorts it into sequence by the value being partitioned, in this case the daily sales amount:

SELECT Product_ID

,Sale_Date

,Daily_Sales

,QUANTILE (100, Daily_Sales )

FROM Sales_table

WHERE Product_ID < 3000 and Sale_Date > 1000930 ;

QUALIFY to Find Products in the top Partitions

Like the aggregate functions, OLAP functions must read all required rows before

performing their operation. Therefore, the WHERE clause cannot be used. Where the

aggregates use HAVING, the OLAP functions uses QUALIFY. The QUALIFY

evaluates the result to determine which ones to return.

The following SELECT uses a QUALIFY to show only the products that sell in the top

60 Percentile:

SELECT Product_ID

,Sale_Date ,Daily_Sales

,QUANTILE (100, Daily_Sales, Sale_Date ) as “Percentile”

FROM Sales_table

QUALIFY “Percentile” >= 60 ;

Ranking Data using RANK

The Ranking function (RANK) permits a column to be evaluated and compared, either

based on high or low order, against all other rows to create the output set. The order will

be sorted by default in descending sequence of the ranking column, which correlates to

descending rank.

This style of selecting the best and the worst has been available using SQL in the past.

However, it was very involved and required extensive coding and logic in order to make

it work. The new RANK function replaces all the elaborate SQL previously required to

create this type of output.

The output of the RANK function is the highest or the lowest data values in the column,

depending on the sort requested. A query can return a specified number of the “best”

rows (highest values) or the “worst” rows (lowest values). These capabilities and output

options will be demonstrated below.

Here is the syntax for RANK:

SELECT RANK( <column-name> [DESC | ASC] )

FROM <table-name>

[GROUP BY <column-name> [,<column-number> ] ] [ { ASC | DESC } ]

[QUALIFY RANK(<column-name>) {< | <=} <number-of-rows> ]

;

QUALIFY to Find Top Best or Bottom Worse

The above report could have been created without the columns in the RANK function

and RANK value. It is a list in descending sequence by the sales amount. With a small

number of rows, the best and the worst is readily available. However, when there are

hundreds or millions of rows, returning all the rows takes far too much time.

Instead, it is preferable to only return the rows desired. For instance, the best 20 or the

worst 20 might be needed. Like the QUANTILE function, the RANK function uses a

QUALIFY clause to allow for control of how many rows to output in the final result.

The following SELECT is the same as the above, but uses the QUALIFY to limit the

output to the best 3 (highest values) rows:

SELECT

Product_ID

,Sale_Date

,Daily_Sales

,RANK(Daily_Sales)

FROM Sales_table

QUALIFY RANK(Daily_Sales) < 4 ;

Using RANK with GROUP BY

As previously seen, the RANK function permits a column to be ranked, either based on

high or low order, against other rows. The GROUP BY can be used in conjunction with

a RANK function to change the ranking function’s scope. This allows a check on which

items were in the top sales bracket for each store.

The following SELECT ranks the daily sales for each product using the GROUP BY and

creates an alias for the RANK column to use in the QUALIFY to find the best 2 days:

SELECT

Product_ID

,Sale_Date

,Daily_Sales

,RANK(Daily_Sales) AS Ranked

FROM Sales_table

GROUP BY Product_ID

QUALIFY Ranked <= 2

;

Teradata Secondary Index

Secondary Indexes

Secondary indexes are similar to Primary Index because they define a path that will deliver the data quickly to meet the users’ expected goals. A secondary index is an alternate path to the data. They can be defined as a Unique Secondary Index (USI) or a Non-Unique Secondary Index (NUSI). Without any secondary indexes your data warehouse could be skating on thin ice!

When it comes to working with large amounts of data that is centrally located, demands for performance to access this data is key. So what can a user do to influence the way data is accessed? The first rule of thumb, which is essential when it comes to working with centralized databases today, is to know your data. Second, understand how Teradata manages data distribution and what a user can do to enhance performance. A query that utilizes a Primary Index in the WHERE column is the fastest path to the data. A query that utilizes a Secondary Index will provide an alternate path to the data and be the second fastest access method. This chapter is dedicated to secondary indexes.

Secondary Indexes

Secondary Indexes provide another path to access data. Let’s say that you were planning a road trip to your hometown. To determine the best way to get there, you need to utilize a map. This map will give you many alternatives to plan your trip. In this case, you need to get their ASAP. So you choose the best route to get there in the shortest period of time. Secondary indexes work very similar to this above example because they provide another path to the data. Teradata allows up to 32 secondary indexes per table. Keep in mind; row distribution of records does not occur when secondary indexes are defined. The value of secondary indexes is that they reside in a subtable and are stored on all AMPs, which is very different from how the primary indexes (part of base table) are stored. Keep in mind that Secondary Indexes (when defined) do take up additional space.

Secondary Indexes are frequently used in a WHERE clause. The Secondary Index can be changed or dropped at any time. However, because of the overhead for index maintenance, it is recommended that index values should not be frequently changed.

There are two different types of Secondary Indexes, Unique Secondary Index (USI), and Non-Unique Secondary Index (NUSI). Unique Secondary Indexes are extremely efficient. A USI is considered a two-AMP operation. One AMP is utilized to access the USI subtable row (in the Secondary Index subtable) that references the actual data row, which resides on the second AMP.

A Non-Unique Secondary Index is an All-AMP operation and will usually require a spool file. Although a NUSI is an All-AMP operation, it is faster than a full table scan.

Secondary indexes can be useful for:

· Satisfying complex conditions

· Processing aggregates

· Value comparisons

· Matching character combinations

· Joining tables

Below is a general illustration of a secondary index subtable row:

First Row : Secondary Index Subtable Columns

Second Row: Secondary Index Column Length

Secondary Index Value Secondary Index Row Id Primary Index Row ID

Actual Length 8 Bytes 8 Bytes

Unique Secondary Index (USI)

A Unique Secondary Index (USI) is a two-AMP operation that will always return a maximum of one row. An USI will make your queries burn with speed. When a user creates a Unique Secondary Index, Teradata automatically creates a Secondary Index Subtable. The subtable will contain the:

· Secondary Index Value

· Secondary Index Row ID

· Primary Index Row ID

When a user writes an SQL query that has an USI in the WHERE clause, the Parsing Engine will Hash the Secondary Index Value. The output is the Row Hash, which points to a bucket in the Hash Map. That bucket contains an AMP number and the Parsing Engine then knows which AMP contains the Secondary Index Subtable pertaining to the requested USI information. The PE will direct the chosen AMP to look-up the Row Hash in the Subtable. The AMP will check to see if the Row Hash exists in the Subtable and double check the subtable row with the actual secondary index value. Then, the AMP will pass the Primary Index Row ID back up the BYNET network. This request is directed to the AMP with the base table row, which is then easily retrieved.

USI Subtable Example

When a USI is designated on a table, each AMP will build a subtable. So if you create 32 USI indexes on a table, then each AMP will build 32 separate subtables. Therefore, choose your Secondary Indexes wisely, because space is used when these indexes are created. When a user inputs SQL that utilizes a USI in the WHERE clause, then Teradata will know that either one row or no rows can be returned. Reason, the column in the WHERE clause is unique.

The following example illustrates the how a USI Subtable is created and how it works to speed up queries.

The above example is designed to illustrate in theory exactly how Teradata works with Base Rows and Secondary Index Subtables. Take your time and I will lead you through the key points. Try and remember what this book has already taught you.

First, take a look at the top table. The table name is NAME_TABLE. The table was created with a Non-Unique Primary Index on the column LNAME. Notice that both names of DAVIS have hashed to the same AMP. All NUPI columns with the same value will always hash to the same AMP.

I have also given each base row an abbreviated Row ID. The abbreviated Row ID is four bits for the Row Hash and one bit for the Uniqueness Value separated by a comma. Notice that both names of Davis have the same Row Hash in 0000, but the Uniqueness Values are 1 and 2.

When we decided to CREATE a Unique Secondary Index (USI) on the column EMP, a secondary index subtable is automatically created on every AMP. For each Base Row, a secondary index subtable row is created. It is hash distributed across the AMPs and placed in the secondary index subtable. Here is how that happened.

Each EMP column is run through the Hashing Algorithm and the output is a Row Hash. This Row Hash points to a bucket in the Hash Map, which directs the row to an AMP. That particular AMP is responsible for the secondary index row. Each AMP holds a portion of the secondary index rows, and in theory, each AMP holds an equal amount of secondary index rows.

Notice that each secondary index subtable row (at the bottom of the diagram) has the EMP value, the EMP row hash and uniqueness value (EMP Row ID) and the most important column is the base table Row ID. The base table Row ID is a pointer to the row in the base table.

Now that the secondary index subtable is set up it is ready for action.

Command: Select * from Table where EMP=99

The PE realizes that EMP is a Unique Secondary Index and so the plan is to find the secondary index subtable row, which will then point us directly to the base row.

The PE hashes the value in EMP, which is 99 and the output is the Row Hash. It happens to be 1111. The PE takes the numbers of the Row Hash and it points to a bucket in the Hash Map. The bucket contains the AMP number where the secondary index row resides. The AMP number is AMP 1. The PE tells AMP 1 to look in its secondary index subtable for EMP 99, which has a Row Hash of 1111.AMP 1 scans its subtable and finds Row Hash 1111, and then double checks to see if the EMP value is 99. When it finds out both are true, it returns the base table Row ID. That base table Row ID is 0000, 2 and the AMP passes this number up the BYNET. The BYNET knows which AMP holds the base row and sends a message to this AMP to get the row. The row for Davis Ellie 99 is returned to the user.

NUSI Subtable Example

When a NUSI is designated on a table, each AMP will build a subtable. So if you create 32 NUSI indexes on a table, then each AMP will build 32 separate subtables. Therefore, choose your Secondary Indexes wisely, because space is used when these indexes are created. The following example illustrates the how a NUSI Subtable is created and how it works to speed up queries.

The above example is designed to logically illustrate how Teradata works with base table rows and NUSI subtables. Take your time and I will lead you through the key points. Try and remember what this book has already taught you.

I have also given each base row an abbreviated Row ID. The abbreviated Row ID is four bits for the Row Hash and one bit for the Uniqueness Value separated by a comma. Notice both names of Davis have the same Row Hash in 0000, but the Uniqueness Values are 1 and 2.

If we decide to CREATE a Secondary Index (NUSI) on the column FNAME, a secondary index subtable was automatically created on every AMP. The FNAME secondary index is a Non-Unique Secondary Index (NUSI).The key points to a NUSI subtable are that each AMP will keep their own subtable. This subtable only holds information about the base rows on that AMP. This is called an AMP local secondary index subtable. It would be like every family keeping a record of their own children. So, each non-unique secondary index subtable on an AMP points to base table rows on that same AMP.

Another interesting thing about NUSI subtables is that for each value only one record is created. Notice in the picture, I have placed arrows pointing to the row on the second AMP. Both people on this AMP have a FNAME of Ray. In the subtable, we show the value of Ray, the Row Hash of Ray (so the AMP can find it quicker), and the base table Row ID for every Ray on that AMP.

Now that the secondary index subtable is set up it is ready for action.

Command: Select * from Table where F_NAME='RAY'

The PE realizes that FNAME is a Non-Unique Secondary Index and so the plan is to have each AMP check to see if they have a Ray in their subtable. For speed, the PE hashes Ray and comes up with a Row Hash of 1111. The PE tells each AMP to look in its secondary index subtable for FNAME Ray, which has a Row Hash of 1111.Each AMP scans its subtable looking for Row Hash 1111, and then double checks to see if the FNAME value is ‘Ray’. If an AMP can’t find the subtable row it knows it has nobody named ‘Ray’ and won’t participate. If an AMP finds a subtable row for ‘Ray’ it will then retrieve all base Row IDs in the table for ‘Ray’.

The rows in the above case are returned for:

Smith Ray 77

Jones Ray 34

Value-Ordered NUSI

A Value-Ordered NUSI is a Non-Unique Secondary Index that can be of any Integer Type (Numeric and Four bytes or less). Value-Ordered NUSIs are created because you want to use the command BETWEEN and look for a range of values. Teradata treats Value-Ordered NUSIs similar to ordinary NUSI indexes, but with one minor difference.

Similarly, when a Value-Ordered NUSI is created Teradata automatically creates and maintains a secondary index subtable. Teradata also keeps the subtables AMP local so each AMP is responsible for keeping track of the base rows it owns.

The only difference is the way each AMP orders the subtable rows in its subtable. The rows will be ordered not in Row Hash sequence, but in order of the values themselves. That way January 1, 2000 comes before January 2, 2000. Dates are an excellent example where Value-Ordered NUSIs make sense. When a user asks for information WHERE the Order_Date is BETWEEN ‘2001/01/01’ and ‘2001/01/30’, each AMP can gather up their rows quickly because they are ordered by the data value. That’s what Order By Values means.

The syntax for creating a Value-Ordered NUSI is:

CREATE INDEX(Dept) ORDER BY VALUES on EMPLOYEE

The requirements for a Value-Ordered Index is the following:

· Cannot be unique or a Primary Index

· Must only be used for NUSI‘s

· It can be updated with SQL or Multiload

· It can be any integer data type

Secondary Index Summary

· You can have up to 32 secondary indexes for a table.

· Secondary Indexes provide an alternate path to the data.

· The two types of secondary indexes are USI and NUSI.

· Every secondary index defined causes each AMP to create a subtable.

· USI subtables are hash distributed.

· NUSI subtables are AMP local.

· USI queries are Two-AMP operations.

· NUSI queries are All-AMP operations, but not Full Table Scans.

· Value-Ordered NUSIs can be any non-unique index of integer type.

· Always Collect Statistics on all NUSI indexes.

· The PE will decide if a NUSI is strongly selective and worth using over a Full Table Scan.

· Use the Explain function to see if a NUSI is being utilized or if bitmapping is taking place

Teradata Interview Questions

Search This Blog

Thursday, May 30, 2013

Teradata Space Consumption Queries

Space Consumption

Monday, May 20, 2013

Teradata Locks

Teradata OLAP Functions

Teradata Secondary Index

Secondary Indexes

Unique Secondary Index (USI)

USI Subtable Example

NUSI Subtable Example

Value-Ordered NUSI

Secondary Index Summary

About Me