Data integration tutorial: Replicate data
Take this tutorial to set up data replication between a source and target data store with the Data integration use case of the data fabric trial. Your goal is to use Data Replication to integrate the credit score information from the provider's Db2 on Cloud data source by setting up a near real time and continuous replication feed with efficient data capture from the source database into your Golden Bank's Event Streams instance. Event Streams is a high-throughput message bus built with Apache Kafka. It is optimized for event ingestion into IBM Cloud and event stream distribution between your services and applications. For more information about Event Streams, see the Learn more section.
The story for the tutorial is that Golden Bank needs to adhere to a new regulation where it cannot lend to underqualified loan applicants. As a data engineer at Golden Bank, you need to provide access to the most up to date credit scores of loan applicants. These credit scores are sourced from a Db2 on Cloud database owned by an external provider and continuously delivered into Golden Bank's Event Streams hub. The data in Event Streams hub is used by the application to lookup credit scores for mortgage applicants to determine loan approval for qualified applicants.
The following animated image provides a quick preview of what you’ll accomplish by the end of the tutorial. Click the image to view a larger image.
Preview the tutorial
In this tutorial, you will complete these tasks:
- Set up the prerequisites.
- Task 1: Set up Event Streams.
- Task 2: View credit score data.
- Task 3: Create a connection to your Event Streams instance.
- Task 4: Associate the Data Replication service with your project.
- Task 5: Set up data replication.
- Task 6: Run data replication.
- Task 7: Verify data replication.
- Cleanup
Watch this video to preview the steps in this tutorial. There might be slight differences in the user interface shown in the video. The video is intended to be a companion to the written tutorial.
This video provides a visual method to learn the concepts and tasks in this documentation.
Tips for completing this tutorial
Here are some tips for successfully completing this tutorial.
Use the video picture-in-picture
The following animated image shows how to use the video picture-in-picture and table of contents features:
Get help in the community
If you need help with this tutorial, you can ask a question or find an answer in the Cloud Pak for Data Community discussion forum.
Set up your browser windows
For the optimal experience completing this tutorial, open Cloud Pak for Data in one browser window, and keep this tutorial page open in another browser window to switch easily between the two applications. Consider arranging the two browser windows side-by-side to make it easier to follow along.
Set up the prerequisites
Sign up for Cloud Pak for Data as a Service
You must sign up for Cloud Pak for Data as a Service and provision the necessary services for the Data integration use case.
- If you have an existing Cloud Pak for Data as a Service account, then you can get started with this tutorial. If you have a Lite plan account, only one user per account can run this tutorial.
- If you don't have a Cloud Pak for Data as a Service account yet, then sign up for a data fabric trial.
Watch the following video to learn about data fabric in Cloud Pak for Data.
This video provides a visual method to learn the concepts and tasks in this documentation.
Verify the necessary provisioned services
To preview this task, watch the video beginning at 01:29.
Follow these steps to verify or provision the necessary services:
-
In Cloud Pak for Data, verify that you are in the Dallas region. If not, click the region drop down, and then select Dallas.
-
From the Navigation menu , choose Services > Service instances.
-
Use the Product drop-down list to determine whether an existing Data Replication service instance exists.
-
If you need to create a Data Replication service instance, click Add service.
-
Select Data Replication.
-
Select the Lite plan.
-
Click Create.
-
-
Wait while the Data Replication service is provisioned, which might take a few minutes to complete.
-
Repeat these steps to verify or provision the following additional services:
- watsonx.ai Studio
- Cloud Object Storage
- Event Streams - You might be prompted to log in to your IBM Cloud account.
Check your progress
The following image shows the provisioned service instances. You are now ready to create the sample project.
Create the sample project
To preview this task, watch the video beginning at 02:19.
If you already have the sample project for this tutorial, then skip to Task 1. Otherwise, follow these steps:
-
Access the Data integration tutorial sample project in the Resource hub.
-
Click Create project.
-
If prompted to associate the project to a Cloud Object Storage instance, select a Cloud Object Storage instance from the list.
-
Click Create.
-
Wait for the project import to complete, and then click View new project to verify that the project and assets were created successfully.
-
Click the Assets tab to see the connections, connected data asset, and the notebook.
Check your progress
The following image shows the Assets tab in the sample project. You are now ready to start the tutorial.
Task 1: Set up Event Streams
To preview this task, watch the video beginning at 03:05.
As part of the Prerequisites, you provisioned a new Event Streams instance. Now, you need to set up that service instance. Follow these steps to:
-
Create a topic to store the data replicated from the source data in Db2 on Cloud. The topic is the core of Event Streams flows. Data passes through a topic from producing applications to consuming applications.
-
Copy sample code that contains the bootstrap server information necessary to set up data replication.
-
Create credentials that you will use to create a connection to the service in the project.
-
Return to the IBM Cloud console Resources list.
-
Expand the Integration section.
-
Click the service instance name for your Event Streams instance to view the instance details.
-
First, to create the topic, click the Topics page.
-
Click Create topic.
-
For the Topic name, type
golden-bank-mortgage
. -
Click Next.
-
In the Partitions section, accept the default value, and click Next.
-
In the Message retention section, accept the default value, and click Create topic.
-
Open a text editor, and paste the topic name
golden-bank-mortgage
into the text file to use later.
-
-
Next, back on the Topics page, click Connect to this service to retrieve the connection information.
-
Copy the value in the Bootstrap server field. The bootstrap server is required when creating a connection to the Event Streams instance in your project.
-
Paste the bootstrap server value into the same text file to use later.
-
Click the Sample code tab.
-
Copy the value in the Sample configuration properties field. You will use some properties from this snippet to connect securely to the service.
-
Paste the sample code into the same text file to use later.
-
Click the X to close the Connect to this service panel.
-
-
Lastly, to create the credentials, click the Service credentials page.
-
Click New credential.
-
Accept the default name, or change it if you would prefer.
-
For the Role, accept the default value of Manager.
-
Expand the Advanced options section.
-
In the Select Service ID field, select Auto Generate.
-
Click Add.
-
Next to the new credentials, click the Copy to clipboard icon.
-
Paste the credentials into the same text file to use later.
-
Your text file should contain all of the following information:
TOPIC NAME: golden-bank-mortgage
BOOTSTRAP SERVER FIELD
broker-5-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093,broker-1-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093,broker-2-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093,broker-0-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093,broker-3-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093,broker-4-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093
SAMPLE CODE
bootstrap.servers=broker-5-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-0-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-2-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-1-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-3-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-4-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="token" password="<APIKEY>";
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
ssl.protocol=TLSv1.2
ssl.enabled.protocols=TLSv1.2
ssl.endpoint.identification.algorithm=HTTPS
CREDENTIALS
{
"api_key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"apikey": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"bootstrap_endpoints": "broker-2-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-0-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-4-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-5-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-3-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-1-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
"iam_apikey_description": "Auto-generated for key crn:v1:bluemix:public:messagehub:us-south:a/a53b11fc95fcca4d96484d0de5f3bc3c:6b5a2cb2-74ef-432d-817f-f053873e7ed2:resource-key:96372942-5d26-4c59-8ca4-41ab6766ba91",
"iam_apikey_name": "Service credentials-1",
"iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Manager",
"iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/a53b11fc95fcca4d96484d0de5f3bc3c::serviceid:ServiceId-4773bed1-f423-43ea-adff-469389dca54c",
"instance_id": "6b5a2cb2-74ef-432d-817f-f053873e7ed2",
"kafka_admin_url": "https://pqny71x0b9vh7nwh.svc11.us-south.eventstreams.cloud.ibm.com",
"kafka_brokers_sasl": [
"broker-2-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
"broker-0-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
"broker-4-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
"broker-5-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
"broker-3-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
"broker-1-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093"
],
"kafka_http_url": "https://pqny71x0b9vh7nwh.svc11.us-south.eventstreams.cloud.ibm.com",
"password": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"user": "token"
Check your progress
The following image shows the Topics page for your Event Streams instance in IBM Cloud. You are now ready to create connection to the Event Streams instance in your project.
Task 2: View the credit score data
To preview this task, watch the video beginning at 05:06.
The sample project includes a connection to the Db2 on Cloud instance where the source data is stored. Follow these steps to view the connection asset and the credit score data:
-
Return to your Cloud Pak for Data as a Service browser tab. You will see the Data integration project. If you don't see the project, then follow these steps:
-
From the Navigation menu , choose Projects > View all projects.
-
Click the Data integration project to open it.
-
-
On the the Assets tab, click All assets.
-
Locate the the Data Fabric Trial - Db2 on Cloud - Source connection asset.
-
Locate the CREDIT_SCORE connected data asset.
-
Click the CREDIT_SCORE asset to see a preview. This data asset maps to the CREDIT_SCORE table in the BANKING schema in the provider's Db2 on Cloud instance. It includes information about the mortgage applicants such as ID, name, address, and credit score. You want to set up data replication for this data asset.
-
Click Data integration project name in the navigation trail to return to the project.
Check your progress
The following image shows the credit score data asset in the sample project. You are now ready to create a connection to the Event Streams service in this project.
Task 3: Create a connection to your Event Streams instance
To preview this task, watch the video beginning at 05:34.
To set up replication, you also need a connection to the new Event Streams instance that you provisioned as part of the Prerequisites using the information you gathered in Task 1. Follow these steps to create the connection asset:
-
On the Assets tab, click New asset > Connect to a data source.
-
Select the Apache Kafka connector, and then click Next.
-
For the Name, type
Event Streams
. -
In the Connection details section, complete the following fields:
- Kafka server host name: Paste the bootstrap server value from the text file you created in Task 1.
- Secure connection: Select SASL_SSL.
- User principal name: Paste the user value from the service credentials in your text file. This value is usually
token
. - Password: Paste the password value from the service credentials in your text file.
-
Click Test connection.
-
When the test is successful, click Create. If the test is not successful, verify the information you copied and pasted from your text file, and try again. If prompted to confirm creating the connection without setting location and sovereignty, click Create again.
-
Click All assets to see the new connection.
Check your progress
The following image shows the Assets tab in the sample project showing the new Event Streams connection asset. You are now ready to associate the Data Replication service with this project.
Task 4: Associate the Data Replication service with your project
To preview this task, watch the video beginning at 06:32.
To use the Data Replication service in your project, you need to associate your service instance with the project. Follow these steps to associate the Event Streams service with the Data integration project:
-
In the Data integration project, click the Manage tab.
-
Click the Services and integrations page.
-
Click Associate service.
-
Check the box next to your Data Replication service instance.
-
Click Associate.
-
Click Cancel to return to the Services & Integrations page.
Check your progress
The following image shows the Services and Integrations page with the Data Replication service listed. You are now ready to set up data replication.
Task 5: Set up data replication
To preview this task, watch the video beginning at 06:53.
Now you can create a Data Replication asset to start continuous data replication between the Db2 on Cloud source and the Event Streams target. Follow these steps to set up data replication:
-
Click the Assets tab in the project.
-
Click New asset > Replicate data.
-
For the Name, type
CreditScoreReplication
. -
Click Source options.
-
On the Source options page, select Data Fabric Trial - Db2 on Cloud - Source from the list of connections.
-
Click Select data.
-
On the Select data page, select the BANKING schema > CREDIT_SCORE table.
-
Click Target options.
-
On the Target options page, select Event streams from the list of connections.
-
In the Default topic field, paste the topic name created in Task 1,
golden-bank-mortgage
. -
Accept the default value for the rest of the fields, and click Review.
-
Review the summary, and click Create.
Check your progress
The following image shows the ReplicateCreditScoreData screen with replication stopped. You are now ready to run data replication.
Task 6: Run data replication
To preview this task, watch the video beginning at 07:54.
After creating the Data Replication asset, you can run data replication and view information about the replication status. Follow these steps to run data replication:
-
On the CreditScoreReplication screen, click the Run icon to start the replication process.
If this is your first time running a Data Replication asset, you might be prompted to provide an API key. Data replication assets use your personal IBM Cloud API key to execute replication operations securely without disruption. If want to use a specific API key, then click the Settings icon .
- If you have an existing API key, click Use existing API key, paste the API key, and click Save.
- If you don't have an existing API key, click Generate new API key, and then click Generate. Save the API key for future use, and then click Close.
-
In the Event logs section, click the Refresh icon to see any new messages.
-
After a few minutes, the message
Completed initial synchronization for table "BANKING"."CREDIT_SCORE"
displays in the Event logs section.
From this point forward, any changes to the BANKING.CREDIT_SCORE table in the Db2 on Cloud instance will be detected automatically and replicated to the target.
Check your progress
The following image shows the CreditScoreReplication screen with replication running and messages in the Event log. You are now ready to monitor replication by watching the status of the replication asset, the events and metrics, and to
verify that the data is being replicated.
Task 7: Verify data replication
To preview this task, watch the video beginning at 09:03.
You can use Python code to verify that the credit score data was replicated into Golden Bank's Event Streams hub. The sample project includes a Jupyter notebook containing the sample Python code. Follow these steps to edit and run the code in the notebook:
-
Click Data integration project name in the navigation trail to return to the project.
-
Click the Assets tab.
-
Click All assets.
-
Click the Overflow menu at the end of the row for the Monitor data replication notebook, and choose Edit.
-
Run the first code cell to install the Kafka-python library.
-
Edit the second cell using the information you saved to a text file from Task 1.
-
topic: Paste the topic name. This value is
golden-bank-mortgage
. -
bootstrap_servers: Paste the bootstrap server value from your text file, which should look similar to this value:
broker-5-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093, broker-0-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093, broker-2-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093, broker-1-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093, broker-3-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093, broker-4-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093
-
sasl_plain_username: Paste the user value from the service credentials in the text file. This value is usually
token
. -
security_protocol: Paste the security.protocol value from the text file. This value is usually
SASL_SSL
. -
sasl_mechanism: Paste the sasl.mechanism value from the text file. This value is usually
PLAIN
. -
sasl_plain_password: Paste the password value from the service credentials in the text file.
-
-
After completing all of the values, run the code in the second cell to provide the connection information for your Event Streams instance.
-
Run the code in the the third cell to consume records from your Event Streams topic.
-
Run the code in the fourth cell to print the messages captured into your consumer object.
-
Review the output showing the content of the messages delivered by replication into your Event Streams topic. Compare that to the CREDIT_SCORE data asset you viewed in Task 2.
-
Click File > Save to save the Jupyter notebook with your stored credentials.
Check your progress
The following image shows the Monitor data replication notebook after running the code successfully.
As a data engineer at Golden Bank, you set up continuous access to the most up to date credit scores of loan applicants by configuring data replication between the CREDIT_SCORE table in the Db2 on Cloud source database and a topic in Event Streams. If there are changes to an applicant's credit score, then Golden Bank's mortgage approvers will have near real time access to those changes.
Cleanup (Optional)
If you would like to retake the tutorials in the Data integration use case, delete the following artifacts.
Artifact | How to delete |
---|---|
Data Replication and Event Streams service instances | 1. From the Navigation Menu , choose Services > Service instances. 2. Click the Action menu next to the service name, and choose Delete. |
Data integration sample project | Delete a project |
Next steps
-
Try other tutorials:
-
Sign up for another Data fabric use case.
Learn more
Parent topic: Use case tutorials