Detecting entities with a custom transformer model
If you don't have a fixed set of terms or you cannot express entities that you like to detect as regular expressions, you can build a custom transformer model. The model is based on the pretrained Slate IBM Foundation model.
When you use the pretrained model, you can build multi-lingual models. You don't have to have separate models for each language.
You need sufficient training data to achieve high quality (2000 – 5000 per entity type). If you have GPUs available, use them for training.
Training transformer models is CPU and memory intensive. The predefined environments are not large enough to complete the training. Create a custom notebook environment with a larger amount of CPU and memory, and use that to run your notebook. If you have GPUs available, it's highly recommended to use them. See Creating your own environment template.
Input data format
The training data is represented as an array with multiple JSON objects. Each JSON object represents one training instance, and must have a text
and a mentions
field. The text
field represents the training
sentence text, and mentions
is an array of JSON objects with the text, type, and location of each mention:
[
{
"text": str,
"mentions": [{
"location": {
"begin": int,
"end": int
},
"text": str,
"type": str
},...]
},...
]
Example:
[
{
"id": 38863234,
"text": "I'm moving to Colorado in a couple months.",
"mentions": [{
"text": "Colorado",
"type": "Location",
"location": {
"begin": 14,
"end": 22
}
},
{
"text": "couple months",
"type": "Duration",
"location": {
"begin": 28,
"end": 41
}
}]
}
]
Training your model
The transformer algorithm is using the pretrained Slate model.
For a list of available Slate models, see this table:
Model | Description |
---|---|
pretrained-model_slate.153m.distilled_many_transformer_multilingual_uncased |
Generic, multi-purpose model |
pretrained-model_slate.125m.finance_many_transformer_en_cased |
Model pretrained on finance content |
pretrained-model_slate.110m.cybersecurity_many_transformer_en_uncased |
Model pretrained on cybersecurity content |
pretrained-model_slate.125m.biomedical_many_transformer_en_cased |
Model pretrained on biomedical content |
To get the options available for configuring Transformer training, enter:
help(watson_nlp.workflows.entity_mentions.transformer.Transformer.train)
Sample code
import watson_nlp
from watson_nlp.toolkit.entity_mentions_utils.train_util import prepare_stream_of_train_records_from_JSON_collection
# load the syntax models for all languages to be supported
syntax_model = watson_nlp.load('syntax_izumo_en_stock')
syntax_models = [syntax_model]
# load the pretrained Slate model
pretrained_model_resource = watson_nlp.load('<pretrained Slate model>')
# prepare the train and dev data
# entity_train_data is a directory with one or more json files in the input format specified above
train_data_stream = prepare_stream_of_train_records_from_JSON_collection('entity_train_data')
dev_data_stream = prepare_stream_of_train_records_from_JSON_collection('entity_train_data')
# train a transformer workflow model
trained_workflow = watson_nlp.workflows.entity_mentions.transformer.Transformer.train(
train_data_stream=train_data_stream,
dev_data_stream=dev_data_stream,
syntax_models=syntax_models,
template_resource=pretrained_model_resource,
num_train_epochs=3,
)
Applying the model on new data
Apply the trained transformer workflow model on new data by using the run()
method, as you would use on any of the existing pre-trained blocks.
Code sample
trained_workflow.run('Bruce is at Times Square')
Storing and loading the model
The custom transformer model can be stored as any other model as described in Saving and loading custom models, using ibm_watson_studio_lib
.
To load the custom transformer model, extra steps are required:
-
Ensure that you have an access token on the Access control page on the Manage tab of your project. Only project admins can create access tokens. The access token can have Viewer or Editor access permissions. Only editors can inject the token into a notebook.
-
Add the project token to the notebook by clicking More > Insert project token from the notebook action bar and then run the cell.
By running the inserted hidden code cell, a
wslib
object is created that you can use for functions in theibm-watson-studio-lib
library. For information on the availableibm-watson-studio-lib
functions, see Using ibm-watson-studio-lib for Python. -
Download and extract the model to your local runtime environment:
import zipfile model_zip = 'trained_workflow_file' model_folder = 'trained_workflow_folder' wslib.download_file('trained_workflow', file_name=model_zip) with zipfile.ZipFile(model_zip, 'r') as zip_ref: zip_ref.extractall(model_folder)
-
Load the model from the extracted folder:
trained_workflow = watson_nlp.load(model_folder)
Parent topic: Creating your own models