A typical search engine indexes several billion documents per day.
Building a ranking model that can surface pertinent documents based on a user query from an indexed document-set is one of its core imperatives.
To accomplish this, documents are grouped on user query relevance, domains, subdomains, and so on, and ranking is performed within each group.
The initial ranking is based on the relevance judgement of an associated document based on a query.
This post describes an approach taken to accelerate ranking algorithms on the GPU.
XGBoost uses the LambdaMART ranking algorithm (for boosted trees), which uses the pairwise-ranking approach to minimize pairwise loss by sampling many pairs.
For this post, we discuss leveraging the large number of cores available on the GPU to massively parallelize these computations.
The number of training instances in these datasets typically run in the order of several millions scattered across 10’s of 1000’s of groups.
Training was already supported on GPU, and so this post is primarily concerned with supporting the gradient computation for ranking on the GPU.