how algorithms feed on photos found on the internet

Clearview AI, PimEyes… these names might not ring a bell, but they know your head well. These two companies specialize in collecting photos on the internet. Thanks to face recognition algorithms, they are able to put a name on the face of a stranger in just a few seconds.

This is what worries a group of NGOs which seized the authorities of several European countries against Clearview last week. “The complaints were lodged with data protection authorities in France, Greece, Austria, Italy and the United Kingdom“, explained last Thursday the NGO Privacy International (PI).

In their complaints, PI and other organizations (the Italian Hermes Center for Transparency and Digital Human Rights, the Greek Homo Digitalis and the Austrian Noyb) denounce the use by Clearview AI of a “automated image retrieval device“, which scours the internet and extracts all the images of human faces detected.

Scraper to identify and match

In internet jargon, this is called “web scraping”. Thanks to small computer programs, companies like Clearview automatically download millions of images available for free access on the web. Their favorite hunting ground: social networks such as Facebook, Instagram, Twitter or Linkedin.

Then, using an algorithm, these images are processed to create a biometric database, access to which is sold. “to the police and to private companies in various countries“, deplore the complainants.

►►► To read also: Facial recognition, a vein and a danger?

This is even one of the main arguments of Clearview, which highlights on its website the various American police forces that use its services. “We believe law enforcement should have the best tools at their disposal to help them resolve investigations“, argues the company.

European data protection law is very clear on the purposes for which a company can use our data“, observes Ioannis Kouvakas, lawyer at PI.”Extracting our unique facial features, and sharing them with the police and other groups, is completely against what an internet user might expect.“, he adds.

British and Australian data protection regulators, for their part, launched a joint investigation into the California company in July 2020.

Illegal “mass surveillance”

In February 2021, a report by the Canadian Privacy Commissioner estimated that Clearview exercised in Canada a “mass surveillance“illegal. The report noted that she had established a database of”more than three billion images of facesThe company withdrew from the Canadian market during the investigation.

Not enough to undermine for the moment its activities of the American company which appears among the ranking of the 100 most influential companies, established by the American magazine Time.

Clearview, which is not open to individuals, is not alone in this burgeoning facial recognition market. The proof with PimEyes, a website accessible free of charge with limited functionalities. The principle is simple: you put an image in the search engine. A few seconds later, dozens of matching portraits appear.

According to the Washington Post, PimEyes is “one of the most powerful face finding tools on the planet. In less than a second, it can browse over 900 million images from the internet and find matches with amazing precision.

What is stopping them? Literally nothing

For Stephanie Hare, a subject specialist who has been warning about facial recognition for several years, the power of PimEyes raises questions. “What is stopping them? Literally nothing“, she told the Washington Post. And to add:”The people who put these photos on the internet, with their kids, their parents, the people who might be vulnerable – they’re not doing it in order to feed a database that companies can then monetize.

Nevertheless, the system exists… and we can imagine how it could be hijacked by malicious people. This service “could be used by stalkers” [un terme qui désigne la recherche d’information sur une personne, pouvant tourner à l’obsession, voire au harcèlement, ndlr], alert the BBC.

On its home page, PimEyes defends itself against such criticism by emphasizing one particular use of its service: the protection of… privacy.

Because the site makes it possible to receive an alert each time a photo which “matches” with the sought target appears in the database. “We believe that you have the right to be on the internet and to protect your privacy and your image. Using the latest artificial intelligence and machine learning technologies, we help you […] to defend yourself against scams, identity theft or against those who use your image illegally.

“Sensitive data”

In any case, PimEyes offers sometimes very precise results. Evidenced by this test carried out by a CNN journalist who tested the tool with a photo of her. Surprise: an image, taken in 2013 at a wedding, resurfaces. “I had not seen at the time that my photo was taken, but it is not the most striking. It is especially that I am barely in the photo, to the right of the frame, we can see part of my face in profile.

PimEyes does not save images found on the web, but it does keep track of facial dimensions on portraits found onlineAs for the images put in the search engine to find matches, the company says it deletes them after 48 hours.

As the website of the Belgian Data Protection Authority (APD) writes, such biometric data “have been expressly elevated to the rank of sensitive data because the context in which they are processed could give rise to significant risks for our rights and freedoms“.

What about private companies that hold the largest image databases to date? And facial recognition technologies that allow, from a name, to find on the network and the web all the images representing the person? What about the use of these methods in public places as well?“, asks the DPA without providing a definitive answer. And to stress that”there are still few institutional opinions on these issues.

See you in ten years?

The question remains: what will become of vacation photos, snapshots of birthdays, parties with friends innocently posted on social networks in a decade or more? Who will own the servers that contain them today? Who will decide on their use and purpose?

Real-world example dating back to 2019: a database called MegaFace catches the attention of the New York Times. “How photos of your children reinforce surveillance technology“, then headlines the newspaper. It is Yahoo, a time owner of Flickr, which provided in 2014 these images under Creative Commons license for purposes of research, in particular for the University of Washington.

The database was then used in many other contexts than that of university research, details, a site specializing on the subject. Or how the memories of yesterday feed the algorithms of tomorrow.

For Latest Updates Follow us on Google News

PREV Vaccination is accelerating in France, health pressure continues to drop
NEXT WTF – The incredible story of Thomas Tuchel’s lucky shoes