Search This Blog

Wednesday, May 15, 2013

Teradata Utilities


Teradata Utilities

Below article shows the details of various Teradata Utilites and their usage.

BTEQ:

BTEQ is batch-mode utility for submitting SQL requests to the Teradata database. It runs on every supported platform – laptop to mainframe. It can both export/import data to/from client system.

Instead of checking error code or error level after each statement, use the following setting at the beginning of the script. This sets the error severity level at which BTEQ terminates to 1.
.MAXERROR 1

Thus, any error condition which results in an error level of 1 or greater will cause the program to terminate, returning the error code that caused the MAXERROR condition to trigger. (MAXERROR is NOT the maximum number of errors allowed).

Under certain conditions, it may be desirable to have certain error conditions not trigger the session to exit. Such an example is dropping a table or view, but not wanting to abort the session if that table/view does not exist. This can be accomplished by having the DROP statement as the first statement and .MAXERROR 1 as the second statement.

If your need to have the DROP object statement after some SQL statement (cannot be the first statement), the method to accomplish this is to set the error level associated with the error code.
.SET ERRORLEVEL 3807 SEVERITY 0
Then to reset the error level to be checked again you do the following:
.SET ERRORLEVEL 3807 SEVERITY 8
(8 is the default error level of the 3807 error, table/view does not exist).
You can find ERRORLEVEL for specific ERRORCODE in the BTEQ manual.
If you are going to use this functionality, it is very important to reset the ERRORLEVEL back to its default before processing any statement where you want that error condition to abort the script.
If need to implement loops in BTEQ script ,  can use following syntax/logic

Sql stmnt;

If activitycount ……

  =n ; ( how many times want to repeat the loop ).

Bteq Export

All bteq export processes should use the ‘close’ option of the export command and the 'set retry off' to ensure that the process aborts immediately upon a DBMS restart.  If not, the export will reconnect sessions when Teradata is available again, retransmitting rows already sent.

Bteq Import

All bteq import processes populating empty tables should be preceded by a delete of that table for restartability. Import will not automatically reconnect sessions after a Teradata restart.  The job must be manually restarted.


FASTLOAD:


Purpose: Loads large amount of data from external file into an empty table at high speed

Restriction:
Only load one empty table with 1 fast load job.
Tables defined with Referential integrity, secondary indexes, join indexes, hash indexes or triggers cannot be loaded with FastLoad.
Duplicate rows cannot be loaded into a multiset table with FastLoad.

FastLoad has two phases – Data Acquisition and Application. It requires separate error table for each phase. Use the BEGIN LOADING statement to specify (and create) the two error tables for the operation. We can specify the error table in the same database as the data table or different database.

BEGIN LOADING [dbname.]table_name
      ERRORFILES [dbname.]Err_Table1,
      ERRORFILES [dbname.]Err_Table2
      [CHECKPOINT integer]
      [INDICATORS];
.
.
.
.
END LOADING;
LOGOFF;

We must remove the error tables before we re-run the same load job or it will terminate in an error condition.

The CHECKPOINT option defines points in a job where FastLoad pauses to record that Teradata has processes a specified number of input records. When we use checkpoints, we do not have to re-run the entire Fastload job if it stops before completion. Fastload will use the checkpoint information in the restart log table to determine the restart location.Checkpoints slow FastLoad processing – set the CHECKPOINT large enough that checkpoints are taken every 10 to 15 minutes.

An errlimit count should be specified on all fastload utility control cards to pause the load when an unusually high number of constraint violations are directed to error table 1.  For large tables (such as fact tables), this number should be approximately 1000.  The errlimit for smaller tables, such as dimension tables, would be table specific.

END LOADING indicates that all data rows have been transmitted and the second phase can begin. Its omission means the load is incomplete and will be restarted later. This causes the table that is being loaded to become “Fastload Paused” and we can’t access the table via SQL. If the END LOADING was omitted by mistake, submit BEGIN and END LOADING statements, it will restart phase 2 only.

We can specify the max and min number of sessions that can be used by the Fastload script. Specify the max value equal to the number of AMPs. If we specify a max value larger than the number of available AMPs, Fastload limits the sessions to one per working AMP. Min is optional and defaulted to 1.

FAST EXPORT:

Purpose: Exports large volumes of formatted data from Teradata to a host file or user-written application.

Restrictions:
Equality conditions for a Primary Index or USI
WITH option to generate total or subtotal response rows.


The FastExport job may consist of multiple SELECT statements, which will be executed sequentially by FastExport.
In the .EXPORT statement, we need to specify definition of the output data file. The different data file format supported are: 
TEXT – Fixed width format
VARTEXT – Character delimiter format
BINARY – Binary format, a 2 bytes length indicator followed by n bytes of data

The data can be exported in the sorted order. But note that requesting sorted data adds additional work (overhead and time) to Teradata.

We can specify the max and min number of sessions that can be used by the FastExport script. Specify the max value equal to the number of AMPs. If we specify a max value larger than the number of available AMPs, FastExport limits the sessions to one per working AMP. Min is optional and defaulted to 1.


MULTI LOAD:


Purpose: Used for loading, updating or deleting data to and from populated tables, typically with batch inputs from a host file.

Restrictions:
Cannot process tables defined with USI’s, Referential Integrity, Join Indexes, Hash Indexes, or Triggers.
No data retrieval capability.
Import tasks require use of Primary Index.


The Multiload supports five target tables per script. Tables may contain pre-existing data. Ability to do INSERTs UPDATEs, DELETEs and UPSERTs.

In the BEGIN MLOAD statement define one work table for each target data table. If work tables are not defined, these are created with the table name prefixed by ‘WT_’. Make sure that the database where work tables are defined has enough space. Multiload loads the data into work table before applying into the target table data blocks.

Two error tables need to be defined for each target table. If they are not defined in the BEGIN MLOAD, they are defaulted to an ‘ET_’ and ‘UV_’ prefix and the table name. ‘ET_tablename’ refers to acquisition phase error table and ‘UV_tablename’ refers to an application phase error table.

BEGIN MLOAD
    TABLES tname1, tname2,…
    WORKTABLES wt_table1, wt_table2,…
    ERRORTABLES et_table1 uv_table1, et_table2 uv_table2, …
    [CHECKPOINT integer]
    [INDICATORS];
.
.
.
.
END MLOAD;

Use RELEASE MLOAD tablename; statement if MLOAD doesn’t successfully completed and there is no desire to restart the MLOAD. This will return the target table to general availability. Error tables should be dropped manually, as RELASE MLOAD will not drop them automatically.

In the .IMPORT statement, we need to specify definition of the input data file. The different data file format supported are: 
TEXT – Fixed width format
VARTEXT – Character delimiter format

We can specify the max and min number of sessions that can be used by the MLOAD script. Specify the max value equal to the number of AMPs. If we specify a max value larger than the number of available AMPs, MLOAD limits the sessions to one per working AMP. Min is optional and defaulted to 1.
Multiload jobs may or may not benefit from dropping/recreating indexes.  This is dependent on the size of the table, the number of indexes, and the size of the file being multiloading.  Tests will  need to be conducted to determine the method that performs best.  If drop/create index is not done, these members should exis, but contain comments only.

An errlimit count should be specified on all multiload utility control cards to pause the load when an unusually high number of constraint violations are directed to the ‘et’ error table.  For large tables (such as fact tables), this number should be approximately 1000. The errlimit for smaller tables, such as dimension tables, would be table specific.

Multiload Tips and Techniques

Multiload is a Teradata load utility that is used mostly to append rows to a table.  It is the fastest way to do this, but is substantially slower than fastload.  Although runtime is affected by the number of duplicate primary index values, multiload runs approximately 2-3 times slower than fastload.  A sample runtime on a 2 node, 12 amp NCR box is: 1000000, 300 byte rows in 1 hour (unique primary index).
Multiload usually runs faster when secondary indexes are dropped prior to the load and then recreated afterwards. Multiload does not support unique secondary indexes and they must be dropped prior to a multiload.
Multiload restart options depend on whether or not the job abends in the acquisition or application phase.  The syslist will indicate which phase it is in, but if you are unsure run the release mload command.  If the job is in the application phase, this command will not be accepted. Please see the multiload restart procedures for more details on restart options.
Multiload can also be used to perform a large number of deletes in an efficient manner, using the multiload delete statement. 
Any operation performed using MLOAD cannot be rolled back.



         

1 comment: