Batch deployment input details for SPSS models
Follow these rules when you are specifying input details for batch deployments of SPSS models.
Data type summary table:
Data | Description |
---|---|
Type | data references, inline |
File formats | CSV |
Data sources
Input or output data references:
- Local or managed assets from the space
- Connected (remote) assets from these sources:
Notes:
- For connections of type Cloud Object Storage or Cloud Object Storage (infrastructure), you must configure Access key and Secret key, also known as HMAC credentials.
- For SPSS deployments, these data sources are not compliant with Federal Information Processing Standard (FIPS):
- Cloud Object Storage
- Cloud Object Storage (infrastructure)
- Storage volumes
- Table names that are provided in input and output data references are ignored. Table names that are referred to in the SPSS model are used during the batch deployment.
- Use SQL PushBack to generate SQL statements for IBM SPSS Modeler operations that can be “pushed back” to or run in the database to improve performance. SQL Pushback is only supported by:
- Db2
- SQL Server
- Netezza Performance Server
Using connected data for a batch deployment
An SPSS Modeler flow can have a number of import and export nodes for data. If the nodes use database connections, they must be configured with the table names in the data sources and targets. These table names are used later for batch jobs. Use Data Asset nodes for importing data and Data Asset Export nodes for exporting data. When you are configuring the nodes, choose the table name from Connections; don't choose a data asset in your project. Set the nodes and table names before you save and deploy the model to Watson Machine Learning.
When you deploy the model to a deployment space, check the nodes connect to a supported database in the deployment space. In a batch deployment of the model, the connection details are selected from the input and output data references, but the input and output table names are selected from the SPSS Modeler model. The input and output table names that are provided in the connected data references are ignored.
For batch deployment of an SPSS model that uses a Cloud Object Storage connection, make sure that the SPSS model has a single input and output data asset node.
Supported combinations of input and output sources
You must specify compatible data sources and targets for the batch job input and the output. If you specify incompatible data sources and targets, you get an error when you try to run the batch job.
These combinations are supported for batch jobs:
SPSS model input/output | Batch deployment job input | Batch deployment job output |
---|---|---|
File | Local, managed, or referenced data asset or connection asset (file) | Remote data asset or connection asset (file) or name |
Database | Remote data asset or connection asset (database) | Remote data asset or connection asset (database) |
Specifying multiple inputs
If you are specifying multiple inputs for an SPSS model deployment with no schema, specify an ID for each element in input_data_references
.
For more information, see Using multiple data sources for an SPSS job.
In this example, when you create the job, provide three input entries with IDs: sample_db2_conn
, sample_teradata_conn
, and sample_googlequery_conn
and select the required connected data for each input.
{
"deployment": {
"href": "/v4/deployments/<deploymentID>"
},
"scoring": {
"input_data_references": [{
"id": "sample_db2_conn",
"name": "DB2 connection",
"type": "data_asset",
"connection": {},
"location": {
"href": "/v2/assets/<asset_id>?space_id=<space_id>"
},
},
{
"id": "sample_teradata_conn",
"name": "Teradata connection",
"type": "data_asset",
"connection": {},
"location": {
"href": "/v2/assets/<asset_id>?space_id=<space_id>"
},
},
{
"id": "sample_googlequery_conn",
"name": "Google bigquery connection",
"type": "data_asset",
"connection": {},
"location": {
"href": "/v2/assets/<asset_id>?space_id=<space_id>"
},
}],
"output_data_references": {
"id": "sample_db2_conn",
"type": "data_asset",
"connection": {},
"location": {
"href": "/v2/assets/<asset_id>?space_id=<space_id>"
},
}
}
Specifying data references programmatically
If you are specifying input and output data references programmatically:
- Data source reference
type
depends on the asset type. Refer to the Data source reference types section in Adding data assets to a deployment space. - SPSS jobs support multiple data source inputs and a single output. If the schema is not in the metadata for the model when you saved it, you must enter
id
manually and select a data asset for each connection. If the schema is provided in the metadata for the model,id
names are populated automatically by using metadata. You select the data asset for the correspondingid
s in Watson Studio. For more information, see Using multiple data sources for an SPSS job. - To create a local or managed asset as an output data reference, the
name
field must be specified foroutput_data_reference
so that a data asset is created with the specified name. You cannot specify anhref
that refers to an existing local data asset.
Connected data assets that refer to supported databases can be created in the output_data_references
only when the input_data_references
also refers to one of these sources.
-
If you are creating a job by using the Python client, you must provide the connection name that is referred in the data nodes of the SPSS model model in the
id
field, and the data asset href inlocation.href
for input/output data references of the deployment jobs payload. For example, you can construct the job payload like this:job_payload_ref = { client.deployments.ScoringMetaNames.INPUT_DATA_REFERENCES: [{ "id": "DB2Connection", "name": "drug_ref_input1", "type": "data_asset", "connection": {}, "location": { "href": <input_asset_href1> } },{ "id": "Db2 WarehouseConn", "name": "drug_ref_input2", "type": "data_asset", "connection": {}, "location": { "href": <input_asset_href2> } }], client.deployments.ScoringMetaNames.OUTPUT_DATA_REFERENCE: { "type": "data_asset", "connection": {}, "location": { "href": <output_asset_href> } } }
Parent topic: Batch deployment input details by framework