Frequency norms from social media
The FRACSS model
The SICK dataset
SICK (Sentence Involving Compositional Knowledge) is large dataset of human intuitions on English sentences, collected through crowdsourcing. The dataset includes about 10.000 sentence pairs, each annotated for the degree of semantic relatedness and the type of entailment relation. The data were prepared with the purpose of specifically capturing compositional aspects, thus minimizing elements such as named entities, world-knowledge notions, idioms, and focusing on phenomena of linguistic interest (lexical variations, syntactic alternations, negation). Although the dataset is first and foremost aimed at the validation of computational models (and was indeed employed in a SemEval shared task), it can be also profitably considered for psycholinguistic purposes. The dataset is described in a series of paper (Marelli et al., LREC 2014; Marelli et al., SemEval 2014; Bentivogli et al., under review) that can be downloaded -along with the dataset itself- from the link above.