System approach to integration of labels from crowdsourcing campaigns

Oleg Nurmukhametov, of the Ural Federal University, Russia, explored how to get the most out of citizen science by using a machine learning algorithm to increase the reliability of the data produced.

Oleg Nurmukhametov

Oleg Nurmukhametov

Introduction

Crowdsourcing is a new approach to performing tasks, where a group of volunteers distributed worldwide are substituted for an expert. Recent results show that crowdsourcing is an efficient tool for annotating large datasets. One of the most successful citizen science projects is Geo-Wiki, the main goal of which is to improve a global land cover map using crowdsourcing techniques. Note that labels collected from the crowd can be of a low quality, as volunteers are often non-experts and may be unreliable. That is why advanced methods for integrating annotations are necessary for increasing the reliability of estimates.

Methods

Analysis of data received from non-experts is a challenging task that requires a systems approach. The research thus included analysis of all main steps of any crowdsourcing campaigns: preparation of images for validation, task assignment, and aggregation of collected votes. During the research we used methods of computer vision and machine learning. All numerical experiments and hypotheses testing were based on an actual dataset from the Geo-Wiki project.

Results

i) We proposed strategies for the preparation of a dataset of images for effective crowdsourcing campaigns in the future. The strategies were applied to existing datasets and were shown to be effective; ii) Methods were proposed for solving problems of inconsistency in volunteers’ votes. Analysis of methods was based on numerical experiments; iii) The original Geo-Wiki algorithm of image distribution among volunteers was studied. Drawbacks were analyzed in detail. We suggested and tested possible improvements to the algorithm; iv) New ways were found of constructing a dataset for experts to obtain information on the reality on the ground. Different approaches were proposed and compared; v) A machine learning algorithm for aggregation of volunteers’ votes was used. The algorithm showed a higher accuracy than a majority-voting heuristic algorithm.

Conclusions

Crowdsourcing is a new and powerful tool. Unfortunately, it is inadequately studied. We thus used and elaborated systems analysis methods in this developing field. The research allowed empirical knowledge to be obtained for development of new policies for organizing crowdsourcing campaigns and other citizen science projects. In contrast to other researches we performed numerical assessment using real-life data (not a synthetic dataset) containing results from the Geo-Wiki crowdsourcing project. We also provided new tools for an emerging field: global land cover maps.

Supervisors

Dmitry Shchepashchenko, Ecosystems Services and Management Program, IIASA

Artem Baklanov, Advanced Systems Analysis Program, IIASA

Note

Oleg Nurmukhametov, of the Ural Federal University, Russia, is a citizen of Russia. He was funded by the Russian Trust Fund and worked in the Advanced Systems Analysis Program during the YSSP.

Please note these Proceedings have received limited or no review from supervisors and IIASA program directors, and the views and results expressed therein do not necessarily represent IIASA, its National Member Organizations, or other organizations supporting the work.


Print this page

Last edited: 03 February 2016

CONTACT DETAILS

Tanja Huber

YSSP Coordinator & Team Leader

Young Scientists Summer Program

T +43(0) 2236 807 344

International Institute for Applied Systems Analysis (IIASA)
Schlossplatz 1, A-2361 Laxenburg, Austria
Phone: (+43 2236) 807 0 Fax:(+43 2236) 71 313