27. February 2017
Who can be trusted on the internet?
In a project funded by the Austrian Science Fund FWF, scientists examined how the credibility of information from the web can be assessed in order to obtain more effective data from platforms such as Flickr.
Recently, the credibility of information from the internet has become a hot topic, given the political dimension of fake news and the impact it has on democratic processes. Internet companies such as Facebook, which is particularly affected by this issue, are challenged by the fact that they depend on computerised methods for the selection of content. The success of the internet is based mainly on the automated processing of information: algorithms, not human beings, determine the results search engines will show. When it comes to fake news, there are no appropriate methods available. As people at Facebook emphasise, the truth is particularly difficult to identify in many cases. What does seem possible, however, is an assessment of credibility. This was the aim of an international research project funded by the Austrian Science Fund FWF under the lead of principal investigator Allan Hanbury from TU Wien. “We have seen that there was no clear definition of the credibility of online content”, says Hanbury. “It was our objective to find a better model for assessing credibility and to experiment with it.”
According to Hanbury, credibility can be assessed on the basis of a number of criteria. Some relate to the source of the information. Does the source have expertise in this area? Do people consider it trustworthy? Other criteria involve the information itself. How high is its level of quality? This might involve detecting typos in documents, for instance. How reliable is the information? This question revolves around how strongly fluctuating the quality of information coming from a given source appears to be.
Credibility of images and search engines
Hanburys team studied the search for images in the social media, particularly in Flickr. The idea was to examine ‘tags’, i.e. annotations or labels attached to photographs, and to assess their credibility. Such tags could be ‘water’, ‘mountain’ or ‘beach’. An algorithm assesses the credibility of these tags according to certain criteria, for instance the image captions, the regularity with which a user posts and how many images he or she puts online, without analysing the content of the images themselves. All this information is fed into a programme that tries to attach the correct tags to images.
The scholars were able to show that an algorithm using credibility criteria was very reliable in finding the correct tags for certain images. By taking credibility criteria into account, the effectiveness of automated information retrieval on the internet can thus be increased. Demonstrating that was one of the objectives of the project. The project team broke new ground with their attempt not only to analyse the credibility of content, but also the credibility of systems that gather information.
“Search engines and recommendation systems have a big impact on which posts and documents people see”, says Hanbury. They can arrange the results by relevance, but can also push paid content to a higher place in the ranking. “We have explored whether it is possible to assess the credibility of search engines.” That issue was examined for the first time in the context of this project and turned out to be very challenging. In the absence of knowledge about the programming of a search engine it was almost impossible, Hanbury conceded.
Health on the web
Misinformation is very sensitive not only in the political arena, but also when it comes to medical content. “People suffering from protracted illnesses often seek comprehensive information on the internet about their condition and slowly turn into experts themselves”, explains Allan Hanbury. The scholar warns that in some cases it was especially top-ranked content that provided erroneous information in an attempt to make money out of people’s hopes and despair. The organisation Health On the Net (HON) has been dealing with this phenomenon since 1995. Their fight for the quality of online medical information is, however, burdened by the enormous amount of data involved. “This makes manual assessment impossible.” In collaboration with HON, the scholars from TU Wien tested the method developed in their project for an automated assessment of credibility. “The framework that we have set up can be used in the future to measure credibility”, notes Hanbury.
Allan Hanbury (http://allan.hanbury.eu/doku.php) is a Senior Researcher in the Information and Software Engineering group (ifs) (http://www.ifs.tuwien.ac.at), Institute of Software Technology and Interactive Systems, Vienna University of Technology. His research interests include data science, information retrieval, and the evaluation of information retrieval systems and algorithms. He is the coordinator of various EU projects and Austrian research projects.
Ralf Bierig, Cristina Serban, Alexandra Siriteanu, Mihai Lupu, Allan Hanbury: A System Framework for Concept- and Credibility-Based Multimedia Retrieval. ICMR 2014: 543,
Palotti, João, et al: How users search and what they search for in the medical domain. Information Retrieval Journal 19.1-2 (2016): 189-224. (pdf),
Image and text will be available as of Monday, 27th February 2017, from 9.00 am CET at: http://scilog.fwf.ac.at/en
Dr. Allan Hanbury
1040 Vienna, Austria
T +43 / 1 / 58801-188310
Austrian Science Fund FWF
Haus der Forschung
1090 Vienna, Austria
T +43 / 1 / 505 67 40 – 8117