Groundbreaking Research from SNIoE Selected for Prestigious BMVC 2024 Conference



It’s hard to argue with the fact that interest in machine learning (ML) techniques has spiked throughout the world over the past few years. Impressed by the capabilities of ML tools to classify data and quickly find patterns where humans struggle, researchers have been continuously expanding the applicability of these methods. In particular, computer vision, the subfield of ML where images and videos are given as input to the models, is definitely reaching new heights. Besides applied areas like robotics and surveillance, computer vision tools are poised to become valuable tools in healthcare by improving both the accuracy and speed of diagnostics based on medical images. 

Against this backdrop, three researchers from the Shiv Nadar Institution of Eminence (SNIoE) have developed a groundbreaking framework called UnSegArmaNet that tackles some of the most fundamental problems in computer vision when applied to medical images. In recognition of their remarkable efforts, this research work has been accepted for presentation at one of the most prestigious international conferences on computer vision—the British Machine Vision Conference (BMVC) 2024, scheduled to be held in November at Glasgow, UK. 

Associate Professor Snehasis Mukherjee and Assistant Professors Saurabh Janardan Shigwan and Nitin Kumar from the Department of Computer Science and Engineering, SNIoE, are the minds behind this breakthrough. Although coming from varied academic backgrounds, they found their interests intersecting and culminating in the conception of this innovative framework. 

Dr. Shigwan started exploring computer vision techniques while pursuing his master’s degree, and delved into their applications to medical imaging during his Ph.D. studies thanks to his mentor at the time. On the other hand, Dr. Kumar was originally focused on natural language processing prior to his Ph.D. studies, with the goal of simplifying complex texts. He then jumped into a project related to medical imaging processing and computing, which drew him further into academic and industry endeavours in this area. Dr. Mukherjee, who was interested in the field of computer vision since his career’s early days, had previously dealt mainly with deep learning techniques rather than medical images. Together, the three professors decided to collaborate on applying deep learning-based methods to the analysis of medical image—more specifically, to the task of image segmentation.

Expanding on the meaning of this term, Dr. Shigwan says: “In essence, image segmentation consists of highlighting regions of interest from an image. Applications for image segmentation can be found everywhere, from social media to satellite image processing and medical diagnostics. As an example, let’s say we have a CT scan of the brain. A doctor may want to accurately highlight and segment a brain tumour in the CT slices. This could enable them to accurately measure how much it has grown over the course of a patient’s disease.

The benefits of medical image segmentation assisted by computer vision models is twofold. The first advantage is a potentially higher accuracy, as machine learning models have been shown to outperform professionals in medical image classification and segmentation. The other equally important advantage is time savings. Image segmentation is a laborious process, which limits the time doctors could spend on meeting with patients, training staff, conducting academic research, and various other responsibilities. Unfortunately, modern ML methods often fall short of providing this second advantage due to the massive time sink that training the models represents.

This is one of the key issues that UnSegArmaNet addresses. The underlying architecture of the proposed pipeline, called a Graph Neural Network (GNN), processes an input image as a graph, which is a set of nodes connected by edges. The nodes are ‘features’ that correspond to small patches in the input image, and the edges are a measure of the similarity between different features. 

To obtain this graph representation from an image, the researchers employed a pretrained vision transformer (ViT). A ViT is a deep-learning tool used in the computer vision domain to extract a set of features from images, which can be then used in the GNN—similar to how ChatGPT uses plaintext transformers to produce output. By using a ViT pretrained with generic images, the proposed approach circumvents the need for a large training dataset with manually annotated and segmented medical images. Additionally, as an unsupervised model, it can learn and adjust its internal parameters organically from a set of input images without requiring human effort in the training phase.

Another innovation in UnSegArmaNet is the use of ARMA filters. While a GNN typically calculates and updates the links between spatially close features—that is, nearby image patches—ARMA does something similar for more distant, or ‘higher order,’ neighbours. “The benefit of using a GNN is that by representing the image as patches, we can easily extract local features, which will be specific to locally confined regions in the image. However, it is important to note that, in image segmentation, some features are local, and some are global. Since global features stem from patches that are far away from each other, we need to leverage global information as well to extract these features. ARMA gives us a way to flexibly combine the local and global features to achieve this goal,” explains Dr. Mukherjee. By combining these cutting-edge methods, the UnSegArmaNet pipeline achieves a truly remarkable performance in medical image segmentation, surpassing even that of supervised frameworks and with a relatively low computational complexity. 

The three researchers are grateful to SNIoE for the environment it provides to bolster such collaborative research, and are eager to present their findings at BMVC 2024. Dr. Kumar notes, “While attending the conference, there will be researchers from top universities and industry in the healthcare and computer vision domains. We will present this paper in a poster session, so people will come and ask questions directly in a sort of intellectual give-and-take process. If new collaborations eventually come out of this, it will highlight and give visibility to both our achievements and the institution.”

Finally, with eyes on the future, the researchers hope more students in India and elsewhere decide to apply for Ph.D.s in computer vision—a field where invested and eager individuals are highly needed. They also advise students in the field to dedicate their time to honing their math knowledge and skills. “My message to aspiring researchers is to take mathematics seriously, since it is definitely going to help actually create new designs. Don’t just focus solely on coding and programming,” muses Dr. Shigwan. Dr. Mukherjee concurs, “If you just go with coding, you will be at most doing what was already done in a different way. Only mathematics can give you the expertise to go for completely new solutions, which lets you tackle bigger problems.

We wish the team a great and productive time at BMVC 2024, and hope that their future endeavours keep pushing the boundaries of the field of computer vision!