On the Wild Cricket Tales citizen science game, one of the tricky problems is grading player created data in terms of quality. The idea is to get people to help the research by tagging videos to measure behaviour of the insect beasts – but we need to accept that there will be a lot of ‘noise’ in the data, how can we detect this and filter it away? Also it would be great if we can detect and acknowledge players who are successful at hunting out and spotting interesting things, or people who are searching through lots of videos. As we found making the camouflage citizen science games, you don’t need much to grab people’s attention if the subject matter is interesting (which is very much the case with this project), but a high score table seems to help. We can also have one per cricket or burrow so that players can more easily see their progress – the single egglab high score table got very difficult to feature on after a few thousand players or so.
We have two separate but related problems – acknowledging players and filtering the data, so it probably makes sense if they can be linked. A commonly used method, which we did with egglab too (also for example in Google’s reCAPTCHA which is also crowdsourcing text digitisation as a side effect) is to get compare multiple people’s results on the same video, but then we still need to bootstrap the scoring from something, and make sure we acknowledge people who are watching videos no one has seen yet, as this is also important.
Below is a simple naive scoring system for calculating a score simply by quantity of events found on a video – we want to give points for finding some events, but over some limit we don’t want to reward endless clicking. It’s probably better if the score stops at zero rather than going negative as shown here, as games should never really ‘punish’ people like this!
Once we have a bit more data we can start to cluster events to detect if people are agreeing. This can give us some indication of the confidence of the data for a whole video, or a section of it – and it can also be used to figure out a likelihood of an individual event being valid using the sum of neighbouring events weighted by distance via a simple drop-off function.
If we do this for all the player’s events over a single video we can get an indication of how consistent they are with other players. We could also recursively weight this by a player’s historical scores – so ‘trusted’ players could validate new ones – this is probably a bit too far at this point, but it might be an option if we pre-stock some videos with data from the researchers who are trained with what is important to record.