Viri, metode in orodja za razumevanje, prepoznavanje in razvrščanje različnih oblik družbeno nesprejemljivega diskurza v informacijski družbi (Slovene)

General information

Code: J7-8280
Period: 1.5.2017 - 30.4.2020
Range on year: 0.60 FTE | 2017
Project leader at FDV: prof.dr. Vasja Vehovar

Abstract

Socially unacceptable discourse, such as hate, discriminatory, offensive or threatening speech is by no means a new phenomenon. It has, however, recently gained significant momentum due to a number of substantial societal, cultural and economic changes. Furthermore, the boom of the information-communication technology and the speed at which information is spread on the Internet have given such discourse practices an unprecedented reach and impact that can only be studied and efficiently mitigated with interdisciplinary methods and automatic approaches. The project combines state-of-the-art quantitative and qualitative multidisciplinary approaches which will be employed to investigate the use of socially unacceptable discourse in its sociocultural context. The use of novel data-driven approaches on unstructured and semi-structured data will move the frontiers of the traditional humanities and social sciences. As a side-effect, the project will also support the development of the new field of Digital Humanities and Social Sciences, which combines tools and methods from computer science with those of humanities and social sciences. In the scope of the project we will construct large corpora of Slovene computer mediated communication in general and socially unacceptable discourse in particular, which will serve as the basis for our empirically based research. The collected corpora will be highly structured and their texts linguistically processed as well as enriched with various metadata. We will develop a typology of socially unacceptable discourse and its targets, and manually annotate a representative sample of texts with this typology. This will result in a gold-standard dataset for researching such communication. By using machine learning techniques on this dataset, an automatic method to flag and categorise SUD texts and their targets will be developed and applied to the compiled corpora. Interdisciplinary sociolinguistic analyses will be performed on the basis of the collected and processed resources, focusing on migrants and Islamophobia, homophobia and gay rights, and sexism and misogyny. We will use the methodologies and instruments of corpus linguistics, critical discourse analysis and inferential statistics. These approaches will be supplemented with a corpus analysis of legal aspects of socially unacceptable discourse and surveys on its the perception in the Slovene society. The project will organise an international interdisciplinary workshop and publish a monograph. It is important to note that the project will enable free and open access to the research results through the research infrastructure CLARIN.SI and the Social Science Data Archive. The research data will consist of the developed language resources and software. All legal and ethical issues with regard to personal data distribution will be taken into account. Through this, the project will also support the move to open science, enabling reproducibility of its research results.

Research Organisation

http://www.sicris.si/public/jqm/prj.aspx?lang=eng&opt=2&subopt=403&hits=1&id=12545&search_term=J7-8280

Researchers

http://www.sicris.si/public/jqm/prj.aspx?lang=eng&opt=2&subopt=402&hits=1&id=12545&search_term=J7-8280

Citations for bibliographic records

http://www.sicris.si/public/jqm/prj.aspx?lang=eng&opt=2&subopt=400&hits=1&id=12545&search_term=J7-8280

Back to list of projects