Cricket Tales is an ambitious citizen science project. 438 days of CCTV footage from the Wild Crickets Research group – the only record of wild behaviour of insects of it’s kind. It turns out that insects have more complex lives and individuality than we thought, and the game is a way of helping uncover this more precisely. For Foam Kernow, this was also a significant project as the biggest production that all three of us have worked on together.
My favorite aspect of this project is that the movies are a strangely different way of viewing an ecosystem, tiny close up areas of a perfectly normal field in northern Spain. The footage is 24 hour, with infrared at night, recording a couple of frames a second only when movement is detected. Some of the videos get triggered when there is simply movement of shadows, but there are plenty of moments that we wouldn’t normally notice. Worms and bugs of all kinds going about their lives, sudden appearances of larger animals or swarms of ants, condensation of dew at dawn. The crickets themselves, mostly with tags stuck to them so we can tell which is which, but other than that – this is their normal habitat and way of life. Compared to the study of insects in lab conditions, it’s not surprising they act in a more complex way.
Screenshots from the Spanish version, as I’m particularly proud of that (my first experience using GNU gettext with Django).
We combined the task of watching the 1 minute long movies with the ability to build houses for the crickets – we needed to provide a way for people to leave something behind, something that marks progress on this gigantic collective task. You get to design a little house for each burrow, and your name gets recorded on the meadow until the next person takes over by watching more videos.
We’ve had plenty of conversations about what kind of people take part in this sort of citizen science activity, what the motivations may be. We ask a couple of questions when people sign up, and this is something we are interested in doing more research on in general for our projects. In this case, we are interested in depth of involvement more than attracting thousands of brief encounters – it only takes a few motivated people to make the researcher’s jobs much easier and provide some data they need.
For me a bigger objective of Cricket Tales is as a way to present more diverse and personal views of the world that surround us, and tends to go unnoticed. Being asked to contemplate a tiny organism’s view of the world for a minute can be quite an eye opener.
Work on cricket tales the last few weeks has been concerned with scaling everything for the sheer amount of data involved. The numbers are big – we’re starting with the footage from 2013 as a test (a ‘smaller’ year), where 145 cameras recorded in total 438 days worth of video of cricket burrows. Our video processing robot is currently chopping this up into 211,889 sped up one minute clips, and encoding them into webm, ogg and h.264 mp4 for maximum browser compatibility. It looks like this would take a few months to do in total, and over the last week or so we have 8,000 videos processed.
Now we have a framework in place that will support this quantity of data, for example we can be continuously processing video while people are tagging – and swap them in and out so we don’t need to fit them all on disk. The database contains an entry for every video with a status (currently ‘not encoded’, ‘ready’ and ‘complete’). Movies could be marked complete after some number of players have watched it or some correlation metric with their tags has been met, then the files can be deleted off the disk. In terms of feasibility, we had 68K people playing the camouflage citizen science games over the last year – so if we managed to get that many people we’d need them to view 3 minutes each to get the entire dataset watched once.
This is a fictional map of the cricket’s burrows, with the name of the player who contributed most to them so far displayed – one idea for the ‘play’ element, which is the next big thing to consider. This is a balance between making it quick and fun and making people feel like they are contributing to something bigger and being able to see a tangible result for their efforts. Do we aim for something that takes someone five minutes of their time and getting enough people to make it work, or do we aim for a smaller number of more dedicated players who keep coming back? We have loads of options at this point, and to a large extent this depends on the precise nature of what the researchers need – so we need to do some more thinking together at this stage.
On the Wild Cricket Tales citizen science game, one of the tricky problems is grading player created data in terms of quality. The idea is to get people to help the research by tagging videos to measure behaviour of the insect beasts – but we need to accept that there will be a lot of ‘noise’ in the data, how can we detect this and filter it away? Also it would be great if we can detect and acknowledge players who are successful at hunting out and spotting interesting things, or people who are searching through lots of videos. As we found making the camouflage citizen science games, you don’t need much to grab people’s attention if the subject matter is interesting (which is very much the case with this project), but a high score table seems to help. We can also have one per cricket or burrow so that players can more easily see their progress – the single egglab high score table got very difficult to feature on after a few thousand players or so.
We have two separate but related problems – acknowledging players and filtering the data, so it probably makes sense if they can be linked. A commonly used method, which we did with egglab too (also for example in Google’s reCAPTCHA which is also crowdsourcing text digitisation as a side effect) is to get compare multiple people’s results on the same video, but then we still need to bootstrap the scoring from something, and make sure we acknowledge people who are watching videos no one has seen yet, as this is also important.
Below is a simple naive scoring system for calculating a score simply by quantity of events found on a video – we want to give points for finding some events, but over some limit we don’t want to reward endless clicking. It’s probably better if the score stops at zero rather than going negative as shown here, as games should never really ‘punish’ people like this!
Once we have a bit more data we can start to cluster events to detect if people are agreeing. This can give us some indication of the confidence of the data for a whole video, or a section of it – and it can also be used to figure out a likelihood of an individual event being valid using the sum of neighbouring events weighted by distance via a simple drop-off function.
If we do this for all the player’s events over a single video we can get an indication of how consistent they are with other players. We could also recursively weight this by a player’s historical scores – so ‘trusted’ players could validate new ones – this is probably a bit too far at this point, but it might be an option if we pre-stock some videos with data from the researchers who are trained with what is important to record.
For the wild cricket tales project I’m using django again, the all new 1.7 version which has amongst lots of other shiny things has an all new migrations system. This is so you can change the underlying database ‘model’ for your site and it automatically tracks your changes and applies them to the data (via python scripts it generates and you can tweak if needed).
However, if you trash them (my 0001_initial.py was deleted – not sure what by, or why I didn’t have them in version control but that’s another question) it’s helpful to know how to start again from scratch – the official documentation doesn’t have much to say on this topic yet. This is what I cobbled together, and it requires that your current database matches the code state exactly:
Delete all existing migrations:
Remove references to them from the database – presumably you need to be a little more selective here if you are running more than one app:
sqlite3 db.sqlite3 "delete from django_migrations"
Create new starting state:
python manage.py makemigrations --empty my-app
Fake the initial migration – this will leave the migrations system assuming that the current database state match the code but won’t actually apply the changes (i.e. making the tables again):
python manage.py migrate --fake
I’m still a little unclear how migrations in source control work across development/production servers etc, but time will tell…
We have a brand new citizen science project starting with the wild crickets research group at Exeter University! These researchers are examining how evolution works with insects in their natural environment, rather than in lab conditions. In order to do this they have hundreds of CCTV cameras set up recording the burrows of field crickets, resulting in many hundreds of hours of footage. This footage needs to be watched in order to determine the various events that make up the life story of the insects. Each individual it turns out has quite distinct characteristics, and we thought it would be fun to open up this process and make it into a citizen science project – partly to get some help and speed up the job, but also the vast quantity of material (hundreds of thousands of hours in total) has it’s own appeal – and it would be great to be able to use it for a creative project like this.
Here is an example of the footage, rather sped up – check out the frog and the sudden switch to daylight mode:
This is the first interface sketch – my plan is to focus on the individual insects and visualise the information coming together by displaying their characteristics, along with which players are their ‘biggest fans’, i.e the people who’ve put the most time into tagging them:
The video tagging interface itself is the focus of the first prototype I’m working on at the moment. I’ve got a database set up for storing relationships between crickets, their movies and events that people create based on a combination of django and popcorn.js. Below is the first attempt, the buttons add new events as the movie plays that get recorded on the database, and displayed on the timeline bar at the bottom. Currently all players can see all the events globally, so that’s one of the first things to figure out how to handle.