Until April, Microsoft boasted of having the largestcollection of faces that anyone could use to trainfacial-recognition algorithms. Since then, the oncepublicly-available dataset has quietly disappeared.
As the Financial Times reports, Microsoft quietlydeleted the dataset after the paper called attentionto privacy and ethical issues, including use of thedataset by military researcherss.
Microsoft did not immediately respond to a request for comment from Fortune. But it toldthe Financial Times: "The site was intended for academic purposes. It was run by anemployee that is no longer with Microsoft and has since been removed."
The now-deleted dataset contained more than 10 million faces culled from websites like Flickr, which host photographs uploaded under a Creative Commons license—meaning many can beused free of copyright concerns.
The name of the Microsoft dataset, MS Celeb, was chosen because many of the images itcontains are famous people who live public lives. Many of the other faces in the set, however, belong to people who are not celebrities—including journalists and privacy researchers—andwho were not aware their images had been included.
Microsoft is hardly the only company to assemble large datasets by scraping photos from theopen Internet. In January, IBM announced it was sharing a collection of 1 million faces in thename of promoting more diversity in artificial intelligence. Meanwhile, a website calledMegapixels identifies several other massive collections as part of a bid to halt what it describesas a "growing crisis of authoritarian biometric surveillance."
While many of the facial recognition sets are culled from public websites like Flickr, that is notthe only way companies obtain pictures of faces. As a recent Fortune investigation revealed, startups have been using photo collection apps to surreptitiously collect millions of faces, while other companies have been scanning public collections of mug shots.