posted on 2019-03-06, 14:15authored bySilvana Castano, Alfio Ferrara, Enrico Gallinucci, Matteo Golfarelli, Stefano Montanelli, Lorenzo Mosca, Stefano Rizzi, Cristian Vaccari
Social Business Intelligence (SBI) is the discipline that combines corporate data with social content to let decision makers analyze the trends perceived from the environment. SBI poses research challenges in several areas, such as IR, data mining, and NLP; unfortunately, SBI research is often restrained by the lack of publicly-available, real-world data for experimenting approaches, and by the difficulties in determining a ground truth. To fill this gap we present SABINE, a modular dataset in the domain of European politics. SABINE includes 6 millions bilingual clips crawled from 50 000 web sources, each associated with metadata and sentiment scores; an ontology with 400 topics, their occurrences in the clips, and their mapping to DBpedia; two multidimensional cubes for analyzing and aggregating sentiment and semantic occurrences. We also propose a set of research challenges that can be addressed using SABINE; remarkably, the presence of an expert-validated ground truth ensures the possibility of testing approaches to the whole SBI process as well as to each single task.
Funding
Supported by the Italian Ministry of Education “Future in Research 2012” initiative (code RBFR12BKZH) for the project “Building Inclusive Societies and a Global Europe Online: Political Information and Participation on Social Media in Comparative Perspective” (www.webpoleu.net).
History
School
Social Sciences and Humanities
Department
Communication and Media
Published in
International Semantic Web Conference
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume
11137 LNCS
Pages
70 - 85
Citation
CASTANO, S. ... et al, 2018. SABINE: A multi-purpose dataset of semantically-annotated social content. IN: Vrandecic, D. ... et al (eds). The Semantic Web – ISWC 2018, International Semantic Web Conference, Monterey, CA, USA, 8-12 October 2018, pp.70-85.
This work is made available according to the conditions of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) licence. Full details of this licence are available at: https://creativecommons.org/licenses/by-nc-nd/4.0/