Facebook this week announced a GPU-accelerated model designed for shopping. The model uses AI to automatically identify consumer goods from images to help make them shoppable. GrokNet, a universal computer vision system, can identify items in categories such as fashion, auto, and home decor.
The model is in production today and is available for buyers and sellers in Facebook Marketplace. Facebook says the goal of the project is to develop a new way to shop on Facebook’s platforms.
Because Marketplace allows people to post, buy, and sell products across a variety of categories, the company said they needed a single model to correctly identify products across all the fine-grained product categories. This meant that the team had to aggregate a massive number of data sets, types of supervision, and loss functions into a single model.
“This is a huge AI challenge because optimizing and fine-tuning hyperparameters for one task can sometimes reduce the effectiveness of another. For example, optimizing a model to recognize cars well might mean it’s not as good at recognizing patterns on clothing,” the Facebook researchers stated in a recent post, Powered by AI: Advancing product understanding and building new shopping experiences.
The model was built, trained, and deployed with 83 loss functions across seven data sets to combine multiple verticals in a single space. “This universal model allows us to leverage many more sources of information, which increases our accuracy and outperforms our single vertical-focused models,” the team said.
The GrokNet system is based on the ResNeXt-101 architecture. GrokNet was pretrained on 3.5 billion images and 17,000 hashtags. It was then fine-tuned on the Facebook datasets using Distributed Data Parallel GPU training on 8-GPU hosts, across 12 hosts, which totaled 96 GPUs. The GPUs used were NVIDIA V100 GPUs, as the company stated in their paper, GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce.
“Created end-to-end with Facebook-developed tools, including PyTorch, this system is 2x more accurate than previous product recognition systems we’ve used,” the company said. “This has allowed us to greatly improve search and filtering on Marketplace so people can find products with very specific materials, styles, and colors (like a yellow mid-century loveseat). With this new unified model, the system is able to detect exact, similar (via related attributes), and co-occurring products across billions of photos.”
In the announcement, the company also featured their state-of-the-art clothing segmentation model, which was also trained on NVIDIA GPUs.
“While these systems are fragmented right now, incorporating everything into one system is the ambitious challenge,” the Facebook team said.