Fair is fast, and fast is fair: IBM Slate Foundation models for NLP
More throughput and less bias in Natural Language Processing
Once again, IBM watsonx.ai teamed up with Watson NLP for a major upgrade of the Watson Data Science NLP capabilities. Read “teamed up” as: we grabbed the latest and greatest models from our NLP colleagues and sprinkled them generously across all our data science offerings:
Our goal was to tackle the three witches of NLP: bias, high compute expense and limited adaptability. Let’s find out how we fared…
“But wait, what’s Slate?”, you ask? Slate is a family of encoder-only large language models models built by IBM. You can’t use them for generative AI (i.e., they won’t write this blog entry for me) but they are fast and effective for enterprise NLP tasks like sentiment analysis, entity extraction and classification.
Fine-tune Slate models on your data — but try our already fine-tuned models FIRST
Our “base” Slate model is called pretrained-model_slate.153m.distilled_many_transformer_multilingual_uncased .
That’s quite a mouthful, so let’s tease the name apart: It’s a Slate model, i.e., an encoder-only model similar to RoBERTa , with 153 million parameters. We distilled knowledge from a larger Slate model into this one to keep it reasonably small, which improves inference times. Finally, it was trained on a multilingual, uncased corpus of data.
You can fine-tune this Slate model on your own data to create custom models for
- classification
- entity extraction
- sentence and document sentiment detection
- targeted sentiment detection
Here’s an example how to create a fine-tuned classification model:
import watson_nlp
from watson_nlp.blocks.classification.transformer import Transformer
from watson_core.data_model.streams.resolver import DataStreamResolver
training_data_file = "train_data.json"
# Step 1: create datastream from training data
data_stream_resolver = DataStreamResolver(target_stream_type=list, expected_keys={'text': str, 'labels': list})
train_stream = data_stream_resolver.as_data_stream(training_data_file)
# Step 2: load pre-trained Slate model
pretrained_model_resource = watson_nlp.load('pretrained-model_slate.153m.distilled_many_transformer_multilingual_uncased ')
# Step 3: Fine-train your model, based on the Slate model
classification_model = Transformer.train(train_stream, pretrained_model_resource)
Each fine-tuning follows the same three steps:
- Step 1: Prepare a JSON file with the training data. The format will differ, depending on your fine-tuning use case. For classification, each entry contains the text to classify, and its classification labels (plural, as we support multi-class classification).
- Step 2: Load the “base” Slate model.
- Step 3: Start training.
For all fine-tuning use cases:
- Expect to bring 1000 examples or more. This is significantly less data than what prior approaches required, especially for models that work across multiple languages. Still, it’s worth checking whether a simpler approach works just as well: Training an SVM ensemble can yield good classification results, and will be faster to train and apply to new data. Dictionaries and regular expressions don’t require any training data for entity detection, and are a good alternative to detect “simpler”, more regular, entities.
- Use a GPU environment for training. In watsonx.ai and Cloud Pak for Data as a Service, that’s the GPU notebook environments. They have the Watson NLP library pre-installed, just like the NLP Runtime 23.1 and NLP Runtime 22.2 environments. In Cloud Pak for Data, it’s the GPU environment that you can install as an add-on.
Fine-tuned NLP models available out-of-the-box
Before you embark on your fine-tuning exercise, check out the Slate models that we have already fine-tuned for several use-cases:
- entity extraction: entity-mentions_transformer-workflow_multilingual_slate.153m.distilled
- relationship detection: relations_transformer-workflow_multilingual_slate.153m.distilled
- sentence and document sentiment detection: sentiment-aggregated_transformer-workflow_multilingual_slate.153m.distilled
- targeted sentiment detection: targets-sentiment_transformer-workflow_multilingual_slate.153m.distilled
To make using them even easier, these models are available as workflows. They include all necessary preprocessing as part of the model. You just invoke the run method with your input text, like so:
import watson_nlp
# Load Target Sentiment model for English
targets_sentiment_model = watson_nlp.load('targets-sentiment_transformer-workflow_multilingual_slate.153m.distilled')
# Run the target sentiment model on the input text
targets_sentiments = targets_sentiment_model.run('The rooms are nice, but the bed was not very comfortable.')
# Print the targets with the associated sentiment
print(targets_sentiments)
In our previous NLP environment (NLP Runtime 22.2), you had to know that targeted sentiment required the output of syntax analysis, and add an additional intermediate step:
import watson_nlp
# Load Syntax and the Target Sentiment model for English
syntax_model = watson_nlp.load('syntax_izumo_en_stock')
targets_sentiment_model = watson_nlp.load('targets-sentiment_sequence-bert_multi_stock')
# Run the syntax model on the input text
syntax_result = syntax_model.run('The rooms are nice, but the bed was not very comfortable.')
# Run the targets sentiment model on the syntax results
targets_sentiments = targets_sentiment_model.run(syntax_result)
# Print the targets with the associated sentiment
print(targets_sentiments)
If you’re using NLP in Runtime 22.2 today, consider upgrading to Runtime 23.1 — you can remove the intermediate step from your code, and you’ll get an improved targeted sentiment model!
And if you’re missing the call to watson_nlp.download() in above code snippet — that step is no longer necessary in watsonx.ai and Cloud Pak for Data as a Service! We “pre-load” the NLP models now on our cloud environment for faster load times. You can still keep the download() method in your Runtime 22.2 notebooks, but remove that call when switching to Runtime 23.1 — it’s no longer supported there.
Reduced bias through careful training data selection and filtering
All large language models are trained on incredible amounts of data. Before trusting these models to work on your enterprise data, wouldn’t it be nice if:
- Your model provider carefully vetted and balanced all data sources that went into the model, and ensured that the sources’ intellectual property is respected according to its license?
- Your model provider removed hate, abuse and profanity (aka “HAP”) from the data before it went into training the model?
If you say “yes” to both, read on. Otherwise, I have a picture for you:
Above notebook snippet runs sentiment analysis on the same three sentences, on two models. One of them is our “Slate” sentiment model. I hope you can spot the difference — it’s our training data selection and HAP filtering at work!
Foundational models that do not require GPUs
Slate models are fairly powerful and small enough to run effectively in runtime environments as small as 1 CPU and 8G RAM — whether using them for batch inference through Jupyter Notebooks in watsonx.ai and Watson Studio or for online inference through Python functions in Watson Machine Learning on CPD.
To further improve their runtime characteristics, we offer our fine-tuned Slate foundation models in two variants:
- GPU-optimized: These models run in both GPU and CPU environments, with a significantly higher throughput in GPU environments (up to an order of magnitude).
- CPU-optimized: These models run only on CPUs. They need more memory to run, compared to the GPU-optimized variant. However, their throughput is higher, compared to running the GPU-optimized variant on a CPU. These models have a -cpu suffix, e.g. targets-sentiment_transformer-workflow_multilingual_slate.153m.distilled-cpu
So, if you have a GPU available, use our GPU-optimized models. In all other cases, use the CPU-optimized models (when available). Here’s an easy switch you can include in your notebooks to load the right model, based on the environment:
gpu_available = False
try:
hw_spec = os.environ['RUNTIME_HARDWARE_SPEC']
if 'num_gpu' in json.loads(hw_spec):
gpu_available = True
except:
pass
entity_model_name = 'entity-mentions_transformer-workflow_multilingual_slate.153m.distilled'
if gpu_available:
entity_model = watson_nlp.load(entity_model_name)
else:
entity_model = watson_nlp.load(entity_model_name + '-cpu')
This optimization is not limited to our fine-tuned models: if you fine-tune a Slate model, you can save the resulting model as either GPU-optimized or CPU-optimized — or you save the model twice, once per variant. Just specify the cpu_format parameter when saving the model:
from ibm_watson_studio_lib import access_project_or_space
wslib = access_project_or_space({...})
wslib.save_data('<model name>',
data=my_finetuned_model.as_bytes(cpu_format=True),
overwrite=True)
Let Watson NLP de-witch your text analytics challenges
So, did we achieve our goals? Let’s see…
- Bias: While there’s always more to be done, the new release of our NLP capabilities improves trust and transparency in our models. And further good news: we’ll bring the same level of trust to additional models, step by step.
- High compute expense: We tackled this from two angles — providing out-of-the-box fine-tuned models for specific use cases, so you don’t have to spend resources on fine-tuning yourself and making CPU-optimized models available that reduce the need for GPUs.
- Limited adaptability: In NLP Runtime 22.2, we only supported custom classification models. We now expanded this to custom entity, sentiment and targeted sentiment models, all based on the same Slate model.
I think I leave the verdict to you — try out our new NLP capabilities in watsonx.ai and let us know what we did well, and what we can improve!