SAN FRANCISCO – Dozens of databases of people’s faces are being compiled without their knowledge by companies and researchers, with many of the images then being shared around the world, in what has become a vast ecosystem fueling the spread of facial recognition technology.
The databases are pulled together with images from social networks, photo websites, dating services like OkCupid and cameras placed in restaurants and on college quads. While there is no precise count of the data sets, privacy activists have pinpointed repositories that were built by Microsoft, Stanford University and others, with one holding over 10 million images and another with two million.
The face compilations are being driven by the race to create leading-edge facial recognition systems. This technology learns how to identify people by analyzing as many digital pictures as possible using “neural networks,” which are complex mathematical systems that require vast amounts of data to build pattern recognition.
Tech giants like Facebook and Google have most likely amassed the largest face data sets, which they do not distribute, according to research papers. But other companies and universities have widely shared their image troves with researchers, governments and private enterprises in Australia, China, India, Singapore and Switzerland for training artificial intelligence, according to academics, activists and public papers.
Companies and labs have gathered facial images for more than a decade, and the databases are merely one layer to building facial recognition technology. But people often have no idea that their faces are in them. And while names are typically not attached to the photos, individuals can be recognized because each face is unique to a person.
Questions about the data sets are rising because the technologies that they have enabled are now being used in potentially invasive ways. Documents released last week revealed that Immigration and Customs Enforcement officials employed facial recognition technology to scan motorists’ photos in trying to identify immigrants who are in the U.S. illegally.
The FBI also spent more than a decade using such systems to compare driver’s license and visa photos against the faces of suspected criminals, according to a Government Accountability Office report last month. On Wednesday, a congressional hearing tackled the government’s use of the technology.
There is no oversight of the data sets. Activists and others said they were angered by the possibility that people’s likenesses had been used to build ethically questionable technology and that the images could be misused. At least one face database created in the United States was shared with a company in China that has been linked to ethnic profiling of the country’s minority Uighur Muslims.
Over the past several weeks, some companies and universities, including Microsoft and Stanford, removed their face data sets from the internet because of privacy concerns. But given that the images were already so well distributed, they are most likely still being used in the United States and elsewhere, researchers and activists said.
“You come to see that these practices are intrusive, and you realize that these companies are not respectful of privacy,” said Liz O’Sullivan, who oversaw one of these databases at the artificial intelligence startup Clarifai. She said she left the New York-based company in January to protest such practices.
“The more ubiquitous facial recognition becomes,” she said, “the more exposed we all are to being part of the process.”