COSMIANU is an Italian corpus of social media texts annotated manually with different types of Nominal Utterances (NUs).

In particular, COSMIANU consists of semi-synchronous forms of computer mediated communication, i.e. blogs, forums, newsgroups, and social networks (for a total of 66,013 tokens), taken from the Web2Corpus IT, a balanced corpus of over one million words. These texts consist of discussions between users across a large number of themes (from politics to popular singers). Thus in most cases, users interact with each other creating a dialogic enviroment rich in verbal crossfires and quotes.

Nominal utterances (NUs) appearing in the corpus have been annotated and further marked with the following attributes:

  • Verbal coordinate

  • Non-verbal coordinate

  • Verbal subordinate

  • Ellipsis

  • Metadata

Distribution License

COSMIANU is licensed under a Creative Commons Attribution 4.0 International License.

If you use COSMIANU, please cite the following paper:

To obtain the corpus please fill the request form below with your data (they will be maintained in a database at FBK): Request form for download