Train a Robot? Why bother, when he can just look it up?
The Semantic Robot Vision Challenge was set up to find robots which could locate an object in real space, after only seeing it in cyberspace. We spoke to Professor Jim Little and Dr Per-Erik Forssen about their winning robot, Curious George...
Chris - What was the big challenge you were trying to overcome here?
Jim - Well the semantic robot vision challenge was a contest to bring together computer vision scientists and roboticists. The challenge involved a robot learning how to find a group of objects in a room.
Chris - Why's that such a challenge?
Jim - Well we'd like to develop robot home assistants, or intelligent devices to assist in the home and a robot has to know the various and unusual objects that live in a place with us. We've gotten some to recognise particular objects, like a box of tissues or a cola bottle, but to work in a home a robot needs to see and understand objects like chairs and cups and tables and they're much more challenging and interesting to recognise.
Chris - So if a person for instance said, "I'd like a cup", because they're cup is different to every other cup the robot has ever seen, you need a robot that can then intuitively then work out what a cup must be. That sounds impossible, how do you go about doing that?
Jim - Well in this particular case we looked at images we got from the web by looking up the word 'cup' on search engines and we tried to find characteristics that cup images might share, such as the circular opening of the top and more or less cylindrical sides, or the handle on the cups, and these have appearances that we can try to recognise in the images, and then when we go look for them in the room we can find the object by identifying these features again.
Chris - So its just going on google and trawling through images that it sees of things fitting the tag 'cup' and then deciding that must be what a cup looks like. So how does it decode the picture to work out what the parts are, how does it attach the same amount of importance to the hole in the top and the handle to the shape?
Jim - Currently we just use techniques for finding interesting and distinct points on the object. Others groups work more on the shape of the boundary of the object. But we've come a long way towards being able to recognise these distinctive features from different viewpoints and different images, and in the challenge what we did was look for features that showed up many times in different images of cups for example.
Chris - So what if someone was really nasty and they mislabelled a picture of a cup and it's actually a saucer, and it says cup and saucer, but it's only a saucer. Would your robot then be fooled?
Jim - It would, but what it does is it tries to gather lots of evidence so if the features show up many times in the images, it recognises that this feature is useful for cups and that the other one was irrelevant. In fact, going to google to get images means you get lots of images, most of which are useful but not all of them.
Chris - If I could just switch across to Per, what were the major problems you had to overcome to make this happen?
Per - You have this problem when you search the internet, you have many images that match the same tag and we've tried various ways of filtering out the bad images, like if you have a cartoon of an object or if you have a person drawing something by hand, it doesn't match as well with the real world, so that was one big problem we encountered. Another thing was when we actually went out into the environment and started looking for the objects; we had to somehow limit the search so we had our robot being interest-driven. The problem with the environment we had in the competition was that it had many interesting things, so the robot looked at other things than the object.
Chris - Jim, when you actually did the semantic robot challenge, what was the competition like? What were other people wheeling out?
Jim - There were other small robots. Ours was large; we had a large robot with a stereo camera on it, a simple still image camera, all on a pan tilt unit. The other competitors also had many different cameras and we all had small platforms that allowed us to walk around amongst the tables that composed the contest region. The object we were looking for were either on the tops of the tables or on the floor, kind of separated form each other, scattered around the room.
Chris - How successful were your robots? Presumably they put things there they had no chance of ever having seen so would have to teach themselves to recognise it so how successful were you?
Jim - We did well. There were 15 objects we were asked to find and of that 15 we found 7. 'Finding' as object means returning to home base with a picture of the object and a rectangle drawn on the image to say exactly where the object is. We did very well on specific objects, like brands of particular potato chips or chocolate bars. Much harder though is to find generic objects like red peppers, or cups, or vacuum cleaners. We succeeded in getting a red pepper but we think it was by accident because happened to find a picture on the web that looked very similar to the pepper that we actually found.
Chris - So home-help robots but not for people who happen to have a whole shelf of pepper, not for people in Italy then?
Jim - haha, apples are hard too...