COSMIANU
COSMIANU is an Italian corpus of social media texts annotated manually with different types of Nominal Utterances (NUs).
In particular, COSMIANU consists of semi-synchronous forms of computer mediated communication, i.e. blogs, forums, newsgroups, and social networks (for a total of 66,013 tokens), taken from the Web2Corpus IT, a balanced corpus of over one million words. These texts consist of discussions between users across a large number of themes (from politics to popular singers). Thus in most cases, users interact with each other creating a dialogic enviroment rich in verbal crossfires and quotes.
Nominal utterances (NUs) appearing in the corpus have been annotated and further marked with the following attributes:
Verbal coordinate
Non-verbal coordinate
Verbal subordinate
Ellipsis
Metadata
Distribution License
COSMIANU is licensed under a Creative Commons Attribution 4.0 International License.
If you use COSMIANU, please cite the following paper:
Gloria Comandini, Manuela Speranza, and Bernardo Magnini. Effective Communication without Verbs? Sure! Identification of Nominal Utterances in Italian Social Media Texts. In Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Turin, Italy, 10-12 December 2018.
To obtain the corpus please fill the request form below with your data (they will be maintained in a database at FBK): Request form for download