Tibert
Tibert is a transformers-compatible reproduction from the paper End-to-end Neural Coreference Resolution (3) with several modifications. Among these:
- Usage of BERT (or any BERT variant) as an encoder as in BERT for Coreference Resolution: Baselines and Analysis (2)
- Support of singletons as in Adapted End-to-End Coreference Resolution System for Anaphoric Identities in Dialogues (4)
- Hierarchical merging as in Coreference in Long Documents using Hierarchical Entity Merging (1)
It can be installed with pip install tibert
.
Here is an example of using the simple prediction interface:
from tibert import BertForCoreferenceResolution, predict_coref_simple from tibert.utils import pprint_coreference_document from transformers import BertTokenizerFast model = BertForCoreferenceResolution.from_pretrained( "compnet-renard/bert-base-cased-literary-coref" ) tokenizer = BertTokenizerFast.from_pretrained("bert-base-cased") annotated_doc = predict_coref_simple( "Sli did not want the earpods. He didn't like them.", model, tokenizer ) pprint_coreference_document(annotated_doc)
results in:
>>> (0 Sli ) did not want the earpods. (0 He ) didn't like them.
A more advanced prediction interface is available:
from transformers import BertTokenizerFast from tibert import predict_coref, BertForCoreferenceResolution model = BertForCoreferenceResolution.from_pretrained( "compnet-renard/bert-base-cased-literary-coref" ) tokenizer = BertTokenizerFast.from_pretrained("bert-base-cased") documents = [ "Sli did not want the earpods. He didn't like them.", "Princess Liana felt sad, because Zarth Arn was gone. The princess went to sleep.", ] annotated_docs = predict_coref(documents, model, tokenizer, batch_size=2) for doc in annotated_docs: pprint_coreference_document(doc)
results in:
>>> (0 Sli ) did not want the earpods . (0 He ) didn't like them . >>> (0 Princess Liana ) felt sad , because (1 Zarth Arn ) was gone . (0 The princess) went to sleep .
The coreference chains predicted can be accessed using the
.coref_chains
attribute:
annotated_doc = predict_coref_simple( "Princess Liana felt sad, because Zarth Arn was gone. The princess went to sleep.", model, tokenizer ) print(annotated_doc.coref_chains) >>>[[Mention(tokens=['The', 'princess'], start_idx=11, end_idx=13), Mention(tokens=['Princess', 'Liana'], start_idx=0, end_idx=2)], [Mention(tokens=['Zarth', 'Arn'], start_idx=6, end_idx=8)]] Hierarchical Merging
Hierarchical merging allows to reduce RAM usage and computations when performing inference on long documents. To do so, the user provides the text cut in chunks. The model will perform prediction for chunks, which means the long document wont be taken at once into memory. Then, hierarchical merging will try to merge chunk predictions. This allow scaling to arbitrarily large documents. See (1) for more details. Hierarchical merging can be used as follows:
from tibert import BertForCoreferenceResolution, predict_coref from tibert.utils import pprint_coreference_document from transformers import BertTokenizerFast model = BertForCoreferenceResolution.from_pretrained( "compnet-renard/bert-base-cased-literary-coref" ) tokenizer = BertTokenizerFast.from_pretrained("bert-base-cased") chunk1 = "Princess Liana felt sad, because Zarth Arn was gone." chunk2 = "She went to sleep." annotated_doc = predict_coref( [chunk1, chunk2], model, tokenizer, hierarchical_merging=True ) pprint_coreference_document(annotated_doc)
This results in:
>>>(1 Princess Liana ) felt sad , because (0 Zarth Arn ) was gone . (1 She ) went to sleep .
Even if the mentions Princess Liana
and She
are not in the same
chunk, hierarchical merging still resolves this case correctly.
Training a model
Aside from the tibert.train.train_coref_model
function, it is possible
to train a model from the command line. Training a model requires
installing the sacred library. Here is the most basic example:
python -m tibert.run_train with\ dataset_path=/path/to/litbank/repository\ out_model_dir=/path/to/output/model/directory
The following parameters can be set (taken from
./tibert/run_train.py
config function):
Parameter | Default Value |
---|---|
batch_size |
1 |
epochs_nb |
30 |
dataset_name |
"litbank" |
dataset_path |
"~/litbank" |
mentions_per_tokens |
0.4 |
antecedents_nb |
350 |
max_span_size |
10 |
mention_scorer_hidden_size |
3000 |
sents_per_documents_train |
11 |
mention_loss_coeff |
0.1 |
bert_lr |
1e-5 |
task_lr |
2e-4 |
dropout |
0.3 |
segment_size |
128 |
encoder |
"bert-base-cased" |
out_model_dir |
"~/tibert/model" |
checkpoint |
None |
One can monitor training metrics by adding run observers using command line flags - see sacred documentation for more details.
References
Gupta, T. and Hatzel, H. O. and Biemann, C. (2024). Coreference in Long Documents using Hierarchical Entity Merging, Google Scholar.
Joshi, M. and Levy, O. and Zettlemoyer, L. and Weld, D. (2019). BERT for Coreference Resolution: Baselines and Analysis.
Lee, K. and He, L. and Lewis, M. and Zettlemoyer, L. (2017). End-to-end Neural Coreference Resolution.
Xu, L. and Choi, J. D. (2021). Adapted End-to-End Coreference Resolution System for Anaphoric Identities in Dialogues.