Lexical Resources and corpora
Lexical Resources and corpora
Lexical Resources
MultiWordNet - A multilingual lexical database in which the Italian WordNet is strictly aligned with Princeton WordNet
WordNet Domains - A lexical resource created by augmenting WordNet with domain labels; it includes WordNet-Affect
SentiWords - A high coverage resource containing roughly 155.000 words associated with a sentiment score
MapNet - A FrameNet to WordNet Mapping
QALL-ME Ontology - A domain-specific ontology for question answering in the domain of tourism
Sensicon - A sensorial lexicon that associates English words with senses
LICO - A lexicon for Italian discourse connectives
CORPS - A corpus of political speeches tagged with specific audience reactions, such as applause or laughter
I-CAB - An annotated corpus consisting of 525 news stories taken from a local newspaper
Evalita NER2011 Dataset - The Dataset of the Evalita 2011 Named Entity Recognition Task
CRIPCO - A corpus of Italian news stories annotated with information about person cross-document coreference
SWiiT - Italian Wikipedia automatically annotated with entity mentions
MultiSemCor - An English/Italian parallel corpus
T-PAS - Typed Predicate Argument Structures for Italian
Causal-TimeBank - The TimeBank corpus taken from TempEval-3 task, annotated with causal information
QALL-ME Benchmark - Annotated spoken requests in the tourism domain (Italian, Spanish, English and German)
Textual Entailment Specialized Data Sets - RTE-5 pairs annotated with linguistic phenomena and monothematic pairs
Wikisents for FrameNet - Wikipedia sentences with frame labels in English and Italian
RTE-3-Ita - Italian version of the English RTE-3 dataset
Fact-Ita Bank - A subpart of Ita-TimeBank annotated with factuality information
ACEtoWiki - An extension of the English ACE 2005 Corpus with Ground-truth Links to Wikipedia
Textual Entailment Graph Dataset - A gold standard dataset of entailment graphs for English and Italian
Pilot Task of EVENTI @ Evalita 2014 - Test data set of the EVENTI Pilot Task on "Temporal Processing of Historical Texts"
SemEval2015 TimeLine Dataset - Dataset of the SemEval-2015 Task "TimeLine: Cross-Document Event Ordering"
NewsReader MEANTIME Corpus - A semantically annotated corpus of 480 news articles in 4 languages
NE-annotated-tweets-AL - Tweets annotated with Named Entities following the NEEL-IT guidelines
WItaC - NewsReader Wikinews Italian Corpus - The Italian section of the NewsReader MEANTIME corpus
Contrast-Ita Bank - A corpus annotated with discourse contrast relations in Italian
KRAUTS - A Temporally Annotated News Corpus in German
DPD - A manually annotated Italian corpus of diary entries written by diabetic patients
COSMIANU - Corpus Of Social Media Italian Annotated with Nominal Utterances - A manually annotated corpus of around 66,000 tokens
e-RTE-3-it - An emended, enriched, and manually curated version of the Italian RTE-3 dataset