NLP research group - e-rte-3-ita

e-RTE-3-it

The e-RTE-3-it dataset is an emended, enriched, and manually curated version of the Italian RTE-3 dataset, which is the Italian translation of the Textual Entailment English dataset used in the RTE-3 Challenge.

What is the Italian RTE-3 dataset

The RTE3-ITA dataset is the Italian translation of the Textual Entailment English dataset used in the RTE-3 Challenge.
Like its English counterpart, the Italian RTE-3 dataset is composed of a development set and a test set, each containing 800 T/H pairs. RTE3-ITA has the following characteristics:

all T/H pairs were translated into Italian by a professional translator
all information related to the English T/H pairs (e.g. length of T, task) was imported into the Italian dataset
all the Italian T/H pairs were judged for entailment. In 15 cases a disagreement with respect to English was found.

More info

The e-RTE-3-it Dataset

In the e-RTE3-it dataset, each text-hypothesis pair, in addition to the ’entailment’, ’contradiction’, or ’neutrality’ label, has been enriched with:

an explanation for the label itself;
the level of confidence with which the annotators could write the explanation;
in cases where the annotators did not agree with the original label, an alternative label along with an explanation for the new label.

Furthermore, in e-RTE-3-it, label mismatches between RTE-3-it and the original English dataset have been emended, so that e-RTE-3-it and the original RTE-3 datasets are perfectly overlapping.

The e-RTE-3-it dataset is licensed under a CC BY 4.0 Deed | Attribution 4.0 International License.

Download: https://github.com/andreazaninello/e-rte3-it

Contact: Andrea Zaninello (azaninello@fbk.eu)

Page updated

Report abuse