He published Apple’s first research paper on AI, which then won the Best Paper Award at the Conference on Computer Vision and Pattern Recognition. He then cofounded Apple’s Central Research Group for Artificial Intelligence. Now, he’s working as a Research Lead and Interim Head of Research at Google Cloud AI. His work has contributed to the development of autonomous vehicles, Face ID in the iPhone X, facial microexpression detection, and sign language translation – it also landed him on Forbes’ 30 Under 30 in Science for 2018.
With a vision that artificial intelligence can improve lives, Tomas Pfister has poured himself into AI research, hoping to have a positive impact on the world with this powerful technology. In this exclusive interview, Pfister discusses the origins of his scientific interest, what made his Apple paper so innovative, and his current research with Google.
Innovation & Tech Today: What first got you interested in science and technology, specifically artificial intelligence?
Tomas Pfister: As a child I was always fascinated by computers. I started early by playing various Nintendo computer games, programmed on Commodore 64, built my first website when I was 10, and started working as a system administrator when I was 12.
What initially fascinated me was the seemingly infinite depth I could go with computers without any physical limits – the only seeming limitation of what I could achieve was my mind.
At the same time, growing up, as a typical INTJ personality, I was always very observant of my surroundings, and particularly fascinated by human psychology and behaviors. One of my favorite activities was watching people’s faces and trying to guess what’s on their minds.
These two passions, for computer science and psychology, led to my first foray into artificial intelligence in 2009 when I was studying computer science at Cambridge University. A professor, and now friend, there, Professor Peter Robinson, was working on recognizing emotions from facial expressions and speech using artificial intelligence.
This application felt to me the perfect combination of my two passions. I went on to pursue that line of computer science/psychology work for quite a few years, starting with recognizing emotion from speech, then recognizing normal facial expressions, and ending with recognizing facial micro-expressions – very short facial expressions that reveal emotions we try to hide but are difficult for humans to detect…
After these experiences, I went on to do my Ph.D. at Oxford University with Andrew Zisserman, a world leader in computer vision, an application area of AI focused on helping computers understand what they see.
There, I applied my knowledge of AI to translating sign language to text to help the deaf communicate more easily and naturally.
I&T: Could you describe what your award-winning research paper for Apple entailed?
TP: Current AI systems need a lot of data to learn how to do the task we want them to do.
For example, to train an autonomous system to understand what it sees on the street, the standard method would be to collect a large dataset of traffic situations, and then have human annotators annotate the different objects in those images, e.g. pedestrians and cars.
The issue is that to train an accurate AI system, the datasets must be very large (commonly tens or hundreds of millions of images) and diverse (different weather conditions, different car/pedestrian locations, different traffic situations), and having human annotators annotate millions of images is tedious and expensive.
What we did in that paper was develop a method for training AI systems using synthetic data from computer game-style simulators. By generating images with such a simulator, the benefit is that it’s easy to place objects in various locations in the scene and artificially vary conditions such as weather and traffic, all while knowing exactly where the objects are located, so there is no need for human annotation.
But these images from the simulator aren’t perfect, just like even the best computer games today don’t look perfectly realistic. This is problematic for an AI system, as it may learn to only detect ‘game-like’ cars and fail to detect cars in the real world. To prevent that issue, we developed a method that learns what real world images look like, and is then able to change the synthetic images so that they look more like real world images. In that paper, we applied this method to many real world problems and showed that AI systems trained on these synthetic images are much more accurate and robust.
I&T: What does your current research with Google look like?
TP: Rather than focusing on AI for consumer products such as Face ID or autonomous cars, I’m now leading efforts to develop AI for organizations that aren’t tech giants. Think small businesses, big businesses, universities, hospitals, non-profits, or even your local barber shop.
These organizations require solving technical problems that are quite different from the traditional consumer product AI problems.
The first problem I’m working on is what I call the “small data” problem. It’s common for the problems we’re tackling to only have small annotated datasets available. For example, in medicine, some rare diseases may only have a few examples to show the computer, and in other scenarios ample data is available but the organizations do not have the necessary financial resources to have humans annotate large datasets.
The second technical problem I’m tackling is interpretability of AI systems. Many current AI systems are essentially “black boxes,” which output a prediction without an explanation. This is not satisfactory for generating trust between the AI system and the user, and could lead to biased predictions going uncaught. For example, a doctor would find it helpful not to just know whether a particular patient has cancer, but also what led the computer to make that prediction.
This has become a large barrier for non-AI researchers to apply AI to their fields, and it’s an important problem to solve. For both of these problems, any solutions we develop will be highly impactful across many industries and companies, and should help spread useful AI around the world.
I&T: Do you have an ultimate goal for all of your research into AI?
TP: The driving force behind all my work is developing technology to improve people’s lives. AI is an amazingly powerful and flexible technology that can be applied to help solve meaningful and important problems in almost every area of life.
With whatever I do, I’m always trying to maximize for both scientific impact and real-world impact; conduct high-quality research that both advances the science of AI as well as is useful for solving real, meaningful problems. One such problem I’m currently very excited about and toying around with is AI for mental and spiritual well-being.
I&T Today: You’ve worked for two tech giants now. What is one of the greatest things you’ve learned from being at those companies?
TP: The greatest thing I have learned would probably be that AI, developed rightly and responsibly, can have such a significant positive impact on a variety of meaningful real-world problems. A more practical lesson is that when thinking of how to tackle a problem, one should not just think outside the box, but think like there is no box.
Rather than starting with the constraints, I find it really helpful to first think what’s the best and most amazing thing we could possibly do, and then figure out how to do it given the constraints, or bend the constraints.
The article Tomas Pfister is Leading the Artificial Intelligence Revolution first appeared on Inno & Tech Today.