Teradata Interview Questions: Teradata Data Placement

Teradata Data Placement

Because Teradata was built for large data warehouses, its architects knew that data placement and management of tables could be a full time job. That is why they designed Teradata to automatically manage the data. Nobody had ever attempted this incredible feat. The Teradata designers dreamed of things that never were and made them so.

Managing table space, disks, and other system administration functions in a data warehouse is a nightmare. Teradata has made the DBA’s role a dream because Teradata lets the system handle the difficult functions. Teradata not only spreads the data evenly, but it can retrieve it quickly because it knows which AMP holds a particular row.

Teradata always attempts to spread data evenly so each AMP will manage approximately the same amount of data. As a result, the rows of every table are distributed across all of the AMPs. In other words, every AMP stores a portion of every table in the database on its virtual disk (VDISK). If a data warehouse has 200 tables, then each AMP will hold a portion the 200 tables. This method of data distribution is unique only to Teradata.

There are some significant benefits to handling data this way: First, the biggest bottleneck in any system is the disk. Because each AMP has their own virtual disk and each table is spread among the AMPs, there is no disk bottleneck.

Second, when each AMP has nearly the same quantity of table rows, then no one AMP becomes a data bottleneck. AMPs can retrieve all or a portion of the data in parallel so you do not have AMPs sitting idle while others are chugging away. Baseball superstar Casey Stengel once said, “It’s easy to get good players. Getting’ em to play together, that’s the hard part.” AMPs love to work together in parallel.

Third, each AMP is unaware of any data except its own portion. Each AMP can ONLY read or write to a particular row of data that the AMP actually owns. This makes retrieving data from a particular row very efficient as all AMPs focus on their own work. Fourth, each AMP automatically groups all of its rows by the tables from which they come. Have you ever been to a large aquarium and seen one of the displays that look like a very tall, clear cylinder? As you walk around the glass, the fish tend to swim in schools. Similarly, Teradata does this with the rows on the AMPs to boost performance. When you ask for data from any given table, an AMP will immediately go to that particular group of rows, and then select what you need. It doesn’t need to look through the rows of many tables before it finds what you need. This is how parallel processing works. The AMPs retrieve data in parallel and then pass it over the BYNET to the PE. The PE ensures the data is delivered to the user. Keep in mind the BYNET is an internal Teradata network, across which the PEs and the AMPs communicate.

The example below shows the information we have just discussed. Notice that the system has four AMPs, and three tables: “Employee,” “WebLog,” and “Order.” Notice each AMP holds a portion of the rows for every table. AMP1, for example, holds 1/4^th of the Employee table rows, 1/4^th of the WebLog table rows, and 1/4^th of the Order table rows.

Plus, the data is spread evenly across for all tables. If a query asks for all rows in the employee table, then each AMP will retrieve their employee table rows in parallel. Each AMP will then pass its data to the PE via the BYNET. Because the data in the employee table is spread evenly among all AMPs, each should finish reading at exactly the same time.

Also, notice how each AMP separates each table. Just like schools of fish, the rows of the Employee table are grouped together. In addition, the WebLog and Order tables are grouped together. This is important key in a data warehouse environment because most queries read millions of rows to satisfy a single query. Performance is enhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory.

Teradata Interview Questions

Search This Blog

Monday, May 20, 2013

Teradata Data Placement

No comments:

Post a Comment

About Me