Prostate cancer is expected to be the leading source of new cancer for men and the second most frequent cause of death after lung cancer. It is also cancer that is very hard to detect, and small lesions can comprise just a fraction of 1% of the tissue surface.
To help solve the problem, researchers from Cornell University and the Memorial Sloan Kettering Cancer Center, a cancer treatment and research institution in New York City, developed a deep learning-based approach that more accurately detects cancer.
Using the center’s biopsy dataset, the team developed a state-of-the-art system that can be considered clinically relevant, the researchers said.
“Until recently, studies relied on datasets in the order of few hundreds of slides which are not enough to train a model that can work at scale in the clinic. Here, we have gathered a dataset consisting of 12,160 slides, two orders of magnitude larger than previous datasets in pathology and equivalent to 25 times the pixel count of the entire ImageNet dataset,” the researchers stated in their research paper. “Given the size of our dataset it is possible for us to train a deep learning model under the Multiple Instance Learning (MIL) assumption where only the overall slide diagnosis is necessary for training, avoiding all the expensive pixel-wise annotations that are usually part of supervised learning approaches.”
Using seven NVIDIA DGX-1 systems, each containing eight Tesla V100 GPUs, and the cuDNN-accelerated PyTorch deep learning framework, the team trained their neural networks on the new dataset to detect prostate cancer. Their best model achieved an AUC of 0.98 and a false negative rate of 4.8% on a test set consisting of 1,824 slides.
“Given the current efforts in digitizing the pathology workflow, approaches like ours can be extremely effective in building decision support systems that can be effectively deployed in the clinic,” the researchers said.
The research was recently published on ArXiv.