Artificial Intelligence Helps the Blind ‘See’ Facebook

Today, Facebook introduced a new feature that automatically generates text descriptions of pictures using advanced object recognition technology.
Until now, people using screen readers would only hear the name of the person who shared the photo, followed by the term “photo” when they came upon an image in News Feed. Now they will get a richer description of what’s in a photo. For instance, someone could now hear, “Image may contain three people, smiling, outdoors.”
The Facebook researchers noted that it took nearly ten months to roll the feature out publicly, as they had to train their deep learning models to recognize more than just the people in the images. For instance, since people mostly care about who is in the photo and what they are doing, but sometimes the background of the photo is what makes it interesting or significant.
While that may be intuitive to humans, it is quite challenging to teach a machine to provide as much useful information as possible while acknowledging the social context.

Their neural network models were trained on a million parameters, but they have carefully selected a set of about 100 concepts based on prominence in photos as well as the accuracy of the visual recognition system. They also avoided concepts that had very specific meanings like smiling, jewelry, cars, and boats. Currently, they are ensuring their object detection algorithm on the objects have a minimum precision rate of 0.8.
Read more >>

Artificial Intelligence Helps the Blind ‘See’ Facebook

Related resources

Tags

About the Authors

Artificial Intelligence Helps the Blind ‘See’ Facebook

Related resources

Tags

About the Authors

Comments

Related posts

Facebook Self-Supervised AI Outperforms State-of-the-Art Computer Vision Models

Deep Learning Helps Yelp Identify Cover Photos

What a Deep Neural Network Thinks About Your Selfie

Yelp is Using GPUs to Classify Business Photos

Mocha.jl: Deep Learning for Julia

Related posts

Just Released: NVIDIA Modulus v24.04

New Video Series: OpenUSD for Developers

Generative AI for Digital Humans and New AI-powered NVIDIA RTX Lighting

NVIDIA Speech and Translation AI Models Set Records for Speed and Accuracy

Boost Multi-Omics Analysis with GPU-Acceleration and Generative AI