IBM Information Server 8.X (DataStage): Parallel Transformer Stage Properties

DataStage: What is Transformer Stage?

DataStage provides several stages to load the data into the data warehouse or data marts. The stages classified into General, Database, Developement and Debug, File, Processing, Real time etc. and the transformer stage is a processing stage. We will explore different options available like execution mode, Preserve partitioning.

Stage properties Advanced tab:

In the Advanced tab, the following options available to set.

  • Execution Mode. The stage can execute in parallel mode or sequential mode. In parallel mode, the data is processed by the available nodes as specified in the Configuration file, and by any node constraints specified on the Advanced tab. In sequential mode the data is processed by the conductor node.
  • Combinability mode. This is Auto by default, which allows WebSphere DataStage to combine the operators that underlie parallel stages so that they run in the same process if it is sensible for this type of stage.
  • Preserve partitioning. This is set to Propagate by default, this sets or clears the partitioning in accordance with what the previous stage has set. You can also select Set or Clear. If you select Set, the stage will request that the next stage preserves the partitioning as is.
  • Node pool and resource constraints. Select this option to constrain parallel execution to the node pool or pools or resource pool or pools specified in the grid. The grid allows you to make choices from drop down lists populated from the Configuration file.
  • Node map constraint. Select this option to constrain parallel execution to the nodes in a defined node map. You can define a node map by typing node numbers into the text box or by clicking the browse button to open the Available Nodes dialog box and selecting nodes from there. You are effectively defining a new node pool for this stage (in addition to any node pools defined in the Configuration file).

Properties Surrogate Key Tab:

Select Source type field as Flat File or DBSequence

Transformer stage: Input page

Partitioning tab:

The Partitioning tab allows you to specify details about how the incoming data partitioned or collected when input to the Transformer stage. It also allows you to specify that the data should be sorted on input.

By default the Transformer stage will attempt to preserve partitioning of incoming data, or use its own partitioning method according to what the previous stage in the job dictates.

The Partitioning tab also allows you to specify that data arriving on the input link should be sorted. The sort is always carried out within data partitions. If the stage is partitioning incoming data the sort occurs after the partitioning. If the stage is collecting data, the sort occurs before the collection. The availability of sorting depends on the partitioning method chosen.

  • Perform Sort. Select this to specify that data coming in on the link should be sorted. Select the column or columns to sort on from the Available list.
  • Stable. Select this if you want to preserve previously sorted data sets. This is the default.
  • Unique. Select this to specify that, if multiple records have identical sorting key values, only one record is retained. If stable sort is also set, the first record is retained.

Preserves Sort Order:

Select this if you know that the rows being input to the Transformer stage have been sorted and you want to preserve the sort order.

The transformer stage is very important stage and we use very often in designing in DataStage jobs. The various options available in the stage along with options available to define local variables, calling routines and transforms and derivations make this stage unique in the available stages in the DataStage.

Source by Yogi Talakanti

Leave a Comment