Oracle Database for DataStage connector
Use the Oracle Database for DataStage connector in DataStage® to access Oracle database system and perform read, write, and load functions.
Prerequisite
Create the connection. For instructions, see Connecting to a data source in DataStage and the Oracle Database for DataStage connection.
Reading data from the Oracle Database for DataStage connector as a source
To configure the connector to read rows in an Oracle table or view, you must specify the source table or view or define a complete SELECT statement or PL/SQL block.
The available properties for the Read mode depend on whether you choose Select or PL/SQL mode.
- Reading data from the Oracle Database for DataStage database
-
- From the job design canvas, double-click the Oracle Database for DataStage connector.
- On the Output tab, click Usage.
- Set Read mode to Select or PL/SQL
- If you set Read mode to Select, use one of these
methods to specify the source of the data:
- Set Generate SQL at a runtime to Yes and then enter the name of the table or view in the Table name property. Use the syntax schema_name.table_name, where schema_name is the owner of the table. If you do not specify schema_name, the connector uses the schema that belongs to the user who is currently connected. From the Table scope property you can choose what part of the table to access.
- Set Generate SQL at a runtime to No, and then specify the SELECT statement in the Select statement property.
- Set Generate SQL at a runtime to No, and then enter the fully qualified file name of the file that contains the SQL statement in the Select statement property.
- If you set Read mode to PL/SQL, use one of these
methods to specify the source of data:
- Enter the PL/SQL block manually in the PL/SQL block property.
- Enter the fully qualified file name of the file that contains the PL/SQL block in the PL/SQL block property.
The PL/SQL block that you specify must begin with the keyword DECLARE or BEGIN and must end with the keyword END, and you must enter a semicolon after the END keyword.
- You can configure optional properties:
- In the Session tab:
- Prefetch row count: Set a value other than
1
for this property to prefetch that many rows beforehand from the Oracle database. - Prefetch buffer size: Enter the value to fetch as many records that can fit in the specified buffer size.
- Transfer BFILE contents: Transfer external file contents instead of file location when transferring BFILE column values.
You can specify Before/After SQL statements that will run once before or after any data is processed, or run once on each node before or after any data is processed.
- Prefetch row count: Set a value other than
- In the Transaction tab:
- Isolation level: Specify the degree to which the data that is being accessed by the Oracle Database for DataStage connector is locked or isolated from other concurrently executing transactions, units of work, or processes.
- Record count: The number of records per transaction. The value
0
means all available records. - Mark end of wave: Choose how the connector handles batch processing.
- In the Session tab:
- Click Save.
Reading partitioned data
In a job that uses multiple nodes, each node that is specified for the stage reads a distinct subset of data from the source table.
- Reading partitioned data from the Oracle Database for DataStage database
-
- On the Output tab, set Enable partitioned reads to Yes.
- Set Read mode to Select, and then define the
SELECT statement that the connector uses at run time:
- Set Generate SQL at a runtime to Yes, and then
enter the name of the table or view in the Table name property. Use the
syntax schema_name.table_name, where schema_name is
the owner of the table. If you do not specify schema_name, the connector uses
the schema that belongs to the user who is currently connected. The connector automatically
generates and runs the SELECT statement.
To read data from a particular partition of a partitioned table, set the Table scope property to Single partition and specify the name of the partition in the Partition name property. The connector then automatically adds a PARTITION(partition_name) clause to the SELECT statement that is generated. To read data from a particular subpartition of the composite partitioned table, set the Table scope property to Single subpartition.
- Set Generate SQL at a runtime to No, and then specify the SELECT statement in the Select statement property. You can enter the SQL statement or enter the fully qualified file name of the file that contains the SQL statement.
- Set Generate SQL at a runtime to Yes, and then
enter the name of the table or view in the Table name property. Use the
syntax schema_name.table_name, where schema_name is
the owner of the table. If you do not specify schema_name, the connector uses
the schema that belongs to the user who is currently connected. The connector automatically
generates and runs the SELECT statement.
- Choose the Partitioned read method that you want to use. The default
partitioning method is Rowid range.
- If you choose the Rowid range or Minimum and maximum
range partitioned method, in the Partition or subpartition name for
partitioned reads property, specify the name of the partition or subpartition that the
partitioned read methods uses.
If you do not specify a value for the Partition or subpartition name for partitioned reads property, the connector uses the entire table as input for the partitioned read method.
- If you choose the Modulus or the Minimum and maximum range partitioned read method, in the Column name for partitioned reads, enter the name of the column from the source table to use for the method. The column must be an existing column in the table, must be of NUMBER(p) data type, where p is the number precision, and must have a scale of zero.
- In the Table name for partitioned reads property, specify the name of the table that the partitioned read method uses to define the subsets of data that each node reads from the source table. If you do not specify a table name, the connector uses the value of the Generate SQL at a runtime property to determine the table name. If Generate SQL at a runtime is set to Yes, the connector uses the table name that is specified in the Table name property. If Generate SQL at a runtime property is set to No, the connector looks at the SELECT statement that is specified in the Select statement property and uses the first table name that is specified in the FROM clause.
- If you choose the Rowid range or Minimum and maximum
range partitioned method, in the Partition or subpartition name for
partitioned reads property, specify the name of the partition or subpartition that the
partitioned read methods uses.
- Click Save.
Writing data to the Oracle Database for DataStage connector as a target
To configure the connector to write rows to the Oracle Database for DataStage table or writable view, you must specify the target table or view or define the SQL statements or PL/SQL block.
- Writing data to the Oracle Database for DataStage
-
- From the job design canvas, double-click the Oracle Database for DataStage connector.
- On the Input tab, click Usage.
- Specify how the Oracle Database for DataStage connector writes
data to an Oracle table or writable view:
Table 1. Methods for writing data to an Oracle Database for DataStage table or writable view Method Procedure Automatically generate the SQL at run time - Set Generate SQL at runtime to Yes.
- Set Write mode to Insert, Insert new rows only, Update, Delete, Insert then update, Update then insert, or Delete then insert.
- Enter the name of the target table in the Table name property.
Enter the SQL manually - Set Generate SQL at runtime to No.
- Set Write mode to Insert, Insert new rows only, Update, Delete, Insert then update, Update then insert, or Delete then insert.
- Enter SQL statements in the fields that correspond to the write mode that you selected.
Specify a PL/SQL block - Set the Write mode to PL/SQL.
- Enter the PL/SQL block in the PL/SQL block property.
The PL/SQL block must begin with the keyword DECLARE or BEGIN and end with the keyword END. You must include a semicolon character after the END keyword.
Bulk load the data - Set Write mode to Bulk load.
- Enter the name of the table in the Table name property. Use the syntax schema_name.table_name, where schema_name is the owner of the table. If you do not specify schema_name, the connector uses the schema that belongs to the user who is currently connected.
- Click Save.
Configuring bulk loading of data
When you use the Oracle Database for DataStage connector to bulk load data to an Oracle database, you can enable or disable constraints and triggers. You can also configure the date cache, manage indexes, set options for bulk record loading, and enable manual mode.
- Setting up the bulk load
-
- In the Write mode property, choose the Bulk load method and specify the table to write to in the Table name property.
- Configure the connector to disable constraints before it bulk loads data and enable constraints
after it bulk loads data:
- Set Perform operations before bulk load to Yes.
- Set Disable constrains to Yes.
- Set Perform operations after bulk load to Yes.
- Set Enable constrains to Yes.
- In the Exceptions table name field, enter the name of the exceptions
table.
If the exceptions table does not exist, the connector creates it.
If the exceptions table already exists, the connector deletes any data that is in the table and then uses it.
- Configure the connector to disable triggers before it bulk loads data and enable triggers after
it bulk loads data:
- Set Perform operations before bulk load to Yes.
- Set Disable triggers to Yes.
- Set Perform operations after bulk load to Yes.
- Set Enable triggers to Yes.
- To control how to handle table indexes during a bulk load, set the Index maintenance option property.
- To rebuild indexes after a bulk load:
- Set Perform operations after bulk load to Yes.
- Set Rebuild indexes to Yes.
- Optional: To enable or disable parallelism and logging to the redo log when the index is
rebuilt, specify nondefault values for the Parallel clause and
Logging clause properties.
By default, parallel and logging clauses are not included in the ALTER INDEX statement.
- Optional: To stop the job if an index rebuild statement fails, set Fail on error for
index rebuilding to Yes.
If an index rebuild fails, the connector logs a fatal error.
- If you plan to bulk load data into tables that contain DATE or TIMESTAMP columns, enable and
configure the date cache:
- Set Use Oracle data cache to Yes.
- Optional: In the Cache size property, enter the maximum number of entries that the cache stores.
- Optional: Set Disable cache when full to
Yes.
When the number of entries in the cache reaches the number that is specified in the Cache size property and the next lookup in the cache results in a miss, the cache is disabled.
- Set options to control bulk record loading:
- Set Array size to a value 1 - 999,999,999.
The default is 2,000.
- Set the Allow concurrent load sessions property to Yes.
- Set Buffer size to a value 4 - 100,240, which represents the buffer size
in KB.
The default is 1,024.
- Set Array size to a value 1 - 999,999,999.
- To enable manual mode:
- Set Manaul mode to Yes.
- Optional: In the Directory for data and control files property, specify a directory to save the control and data files to.
- Optional: In the Control file name property, specify a name for the control file.
- In the Data file name property, specify the name of the data file.
If you do not specify a value for the data file name, the connector generates the name in the servername_tablename.dat format.
- In the Load options property, specify the bulk load options to include in
the control file that the connector generates.
The value contains parameters that are passed to the Oracle SQL*Loader utility when the utility is invoked to process the control and data files. The default value is OPTIONS(DIRECT=FALSE,PARALLEL=TRUE).
The DIRECT=FALSE parameter tells the Oracle SQL*Loader to use the conventional path load instead of the direct path load. The PARALLEL=TRUE parameter tells the utility that the data can be loaded in parallel from multiple concurrent sessions.
The word OPTIONS and the parentheses must be included in the value that is specified for the property. The connector saves this property value to the control file that is generated and does not check the syntax of the value.
Writing partitioned data
In a job that uses multiple nodes, records that arrive on the input link of the connector are distributed across multiple nodes. Then, the records are written in parallel from all of the nodes to the target database.
The default partition type is Auto, which selects the partition type based on the various settings for the stages in the job. Instead of using Auto, it is better to select a partition type based on your knowledge about the actual data and the target table that the connector writes to at run time.
- Setting a partitioned method
-
- On the job design canvas, double-click the Oracle Database for DataStage connector.
- On the Input tab, click Partitioning tab.
- Choose the Partition type:
Table 2. Partitioning types for the Oracle Database for DataStage connector Partition type Description Auto DataStage determines the best partitioning method to use depending on the type of stage, and what the previous stage in the job has done. Entire Every instance of a stage on every processing node receives the complete data set as input. You are most likely to use this partitioning method with stages that create lookup tables from their input. Random Records are randomly distributed across all processing nodes in this partition. Like round robin, random partitioning can rebalance the partitions of an input data set to guarantee that each processing node receives an approximately equal-sized partition. Round robin The first record goes to the first processing node, the second to the second processing node, and so on. When DataStage reaches the last processing node in the system, it starts over. This method is useful for resizing partitions of an input data set that are not equal in size. Same The stage using the data set as input performs no repartitioning and takes as input the partitions output by the preceding stage. With this partitioning method, records stay on the same processing node; that is, they are not redistributed. DataStage uses this method when passing data between stages in your job. Modulus Partitioning is based on a key column modulo the number of partitions. The modulus partitioner assigns each record of an input data set to a partition of its output data set as determined by a specified key field in the input data set. Hash Partitioning is based on a function of one or more columns (the hash partitioning keys) in each record. The hash partitioner examines one or more fields of each input record (the hash key fields). Records with the same values for all hash key fields are assigned to the same processing node. Range Divides a data set into approximately equal-sized partitions, each of which contains records with key columns within a specified range. This method is also useful for ensuring that related records are in the same partition.