Outside of work, I enjoy playing badminton and swimming quite often. I'm also
passionate about making a positive impact on the society which led me to initiate Ralith Milith, an
anti-drug society in Kashmir.
"Wonder is the feeling of a philosopher, and philosophy begins in wonder." -
Scorates.
News
Dec 2024 - Nominated as Diversity Intern by Microsoft Canada! [Video].
Oct 2024 - Recieved the RISE
MICCAI travel grant to attent MICCAI 2024.
Aug 2020 - Organized a career counseling session on GRE preparation at NIT Srinagar.
Research
My research focuses on the integration of Computer Vision with Medical AI and
Assistive Robotics. Specifically, I am interested in developing advanced computer
vision systems for medical imaging analysis, including applications such as
automated diagnosis and treatment planning. Additionally, I explore the
intersection of AI and robotics to enhance assistive technologies, aiming to
improve quality of life through innovations in healthcare. My work
often involves deep learning, image processing techniques, and the application
of AI-driven solutions in real-world scenarios.
We focus on the problem of Unsupervised Domain
Adaptation (UDA) for breast cancer detection from
mammograms (BCDM); recent advancements show that masked
image modeling serves as a robust pretext task for UDA,
but these techniques struggle with breast abnormalities
such as masses, asymmetries, and micro-calcifications;
recognizing these challenges, we introduce a
transformer-based Domain-invariant Mask Annealed Student
Teacher autoencoder (D-MASTER) framework, which
adaptively masks and reconstructs multiscale feature
maps to enhance the model’s ability to capture reliable
target domain features; experimental results show
significant improvement over state-of-the-art techniques
on multiple datasets, and we will publicly release
source code and pre-trained models to promote
reproducible research.
Federated learning enables collaborative
machine learning while preserving data privacy, and
tailored federated learning addresses client
heterogeneity for personalized models; this paper
investigates the detrimental effects of federated
averaging (FedAvg) on focal modulation-based
transformers in heterogeneous data scenarios and
proposes TransFed, a novel transformer-based federated
learning framework with a learn-to-tailor approach using
a server-side hyper network to generate client-specific
projection matrices for focal modulation layers,
demonstrating superior performance on non-IID pneumonia
classification datasets and advancing focal modulation
in decentralized environments..
In
clinical applications, X-Ray
technology plays a crucial role in noninvasive
examinations like mammography, providing essential
anatomical information about patients. However, the
inherent radiation risk associated with X-Ray procedures
raises significant concerns. X-Ray reconstruction is
crucial in medical imaging for creating detailed visual
representations of internal structures, and facilitating
diagnosis and treatment without invasive procedures.
Recent advancements in deep learning (DL) have shown
promise in X-Ray reconstruction. Nevertheless,
conventional DL methods often necessitate the
centralized aggregation of substantial large datasets
for training, following specific scanning protocols.
This requirement results in notable domain shifts and
privacy issues. To address these challenges, we
introduce the Hierarchical Framework based Federated
Learning method (HF-Fed) for customized X-Ray Imaging.
HF-Fed addresses the challenges in X-Ray imaging
optimization by decomposing the problem into two
components: local data adaptation and holistic X-Ray
Imaging. It employs a hospital-specific hierarchical
framework and a shared common imaging network called
Network of Networks (NoN) for these tasks. The emphasis
of the NoN is on acquiring stable features from a
variety of data distributions. A hierarchical
hypernetwork extracts domain-specific hyperparameters,
conditioning the NoN for customized X-Ray
reconstruction. Experimental results demonstrate
HF-Fed’s competitive performance, offering a promising
solution for enhancing X-Ray imaging without the need
for data sharing. This study significantly contributes
to the evolving body of literature on the potential
advantages of federated learning in the healthcare
sector. It offers valuable insights for policymakers and
healthcare providers holistically.
Cataract surgery is the most common
surgical procedure globally, with a disproportionately
higher burden in developing countries. While automated
surgical video analysis has been explored in general
surgery, its application to ophthalmic procedures
remains limited. Existing works primarily focus on Phaco
cataract surgery, an expensive technique not accessible
in regions where cataract treatment is most needed. In
contrast, Manual Small-Incision Cataract Surgery (MSICS)
is the preferred low-cost, faster alternative in
high-volume settings and for challenging cases. However,
no dataset exists for MSICS. To address this gap, we
introduce Sankara-MSICS, the first comprehensive dataset
containing 53 surgical videos annotated for 18 surgical
phases and 3,527 frames with 13 surgical tools at the
pixel level. We benchmark this dataset on
state-of-the-art models and present ToolSeg, a novel
framework that enhances tool segmentation by introducing
a phase-conditional decoder and a simple yet effective
semi-supervised setup leveraging pseudo-labels from
foundation models. Our approach significantly improves
segmentation performance, achieving a 23.77% to 38.10%
increase in mean Dice scores, with a notable boost for
tools that are less prevalent and small. Furthermore, we
demonstrate that ToolSeg generalizes to other surgical
settings, showcasing its effectiveness on the CaDIS
dataset.
One
of the biggest issues facing humanity in the
twenty-first century is climate change, as shown by the
increasing sea levels, melting glaciers, and frequent
storms. Accurate temperature forecasting is crucial for
understanding and mitigating its impacts. Cutting-edge
data-driven models for temperature forecasting typically
employ recurrent neural networks(CNNs), with certain
models integrating attention mechanisms. However RNNs
sequential processing limits parallelization, especially
for longer sequences. In order to do this, we provide a
brand-new method for temperature prediction that is
based on the FocalNet Transformer architecture. By
functioning in a multi-tensor format, the suggested
Focal-modulation Attention Encoder (FATE) framework
leverages the spatial and temporal nuances of
meteorological data characteristics by integrating
tensorized modulation. Comparative assessments against
existing transformer encoder architectures, 3D CNNs,
LSTM, and ConvLSTM demonstrate our model’s superior
ability to capture nuanced patterns inherent in the
data, particularly in the context of temperature
prediction. We also introduce a new labeled dataset,
Climate change Parameter dataset (CCPD), which
encompasses 40 years of data from J&K region on seven
key parameters that influence climate change, supporting
further research in this area. Experiments on two
real-world benchmark temperature datasets from weather
stations in the USA, Canada, and Europe demonstrate
accuracy improvements of 12%, 23%, and 28% respectively,
compared to existing SOTA models. In addition, we
achieved state-of-the-art results on our CCPD dataset
with a 24% improvement. To understand FATE, we introduce
two modulation scores from the tensorial modulation
process. These scores clarify our model’s decision
making and key climate change parameters. For
reproducible research, we will release the source code,
pre-trained FATE model, and CCPD dataset.
We focus on the problem of Unsupervised Domain
Adaptation (UDA) for breast cancer detection from
mammograms (BCDM). Recent advancements have shown that
masked image modeling serves as a robust pretext task
for UDA. However, when applied to cross-domain BCDM,
these techniques struggle with breast abnormalities such
as masses, asymmetries, and micro-calcifications.
Recognizing these challenges, we introduce a
transformer-based Domain-invariant Mask Annealed Student
Teacher autoencoder (D-MASTER) framework. D-MASTER
adaptively masks and reconstructs multiscale feature
maps, enhancing the model’s ability to capture reliable
target domain features. Experimental results show a
significant improvement over state-of-the-art techniques
on multiple datasets. To promote reproducible research,
we will publicly release source code and pre-trained
models.
In
clinical applications, X-Ray
technology plays a crucial role in noninvasive
examinations like mammography, providing essential
anatomical information about patients. However, the
inherent radiation risk associated with X-Ray procedures
raises significant concerns. X-Ray reconstruction is
crucial in medical imaging for creating detailed visual
representations of internal structures, and facilitating
diagnosis and treatment without invasive procedures.
Recent advancements in deep learning (DL) have shown
promise in X-Ray reconstruction. Nevertheless,
conventional DL methods often necessitate the
centralized aggregation of substantial large datasets
for training, following specific scanning protocols.
This requirement results in notable domain shifts and
privacy issues. To address these challenges, we
introduce the Hierarchical Framework based Federated
Learning method (HF-Fed) for customized X-Ray Imaging.
HF-Fed addresses the challenges in X-Ray imaging
optimization by decomposing the problem into two
components: local data adaptation and holistic X-Ray
Imaging. It employs a hospital-specific hierarchical
framework and a shared common imaging network called
Network of Networks (NoN) for these tasks. The emphasis
of the NoN is on acquiring stable features from a
variety of data distributions. A hierarchical
hypernetwork extracts domain-specific hyperparameters,
conditioning the NoN for customized X-Ray
reconstruction. Experimental results demonstrate
HF-Fed’s competitive performance, offering a promising
solution for enhancing X-Ray imaging without the need
for data sharing. This study significantly contributes
to the evolving body of literature on the potential
advantages of federated learning in the healthcare
sector. It offers valuable insights for policymakers and
healthcare providers holistically.
Federated learning has emerged as a
promising paradigm for collaborative machine learning, enabling multiple clients to
train a model while preserving data privacy jointly. Tailored federated learning takes
this concept further by accommodating client heterogeneity and facilitating the learning
of personalized models. While the utilization of transformers within federated learning
has attracted significant interest, there remains a need to investigate the effects of
federated learning algorithms on the latest focal modulation-based transformers. In this
paper, we investigate this relationship and uncover the detrimental effects of federated
averaging (FedAvg) algorithms on Focal Modulation, particularly in scenarios with
heterogeneous data. To address this challenge, we propose TransFed, a novel
transformer-based federated learning framework that not only aggregates model parameters
but also learns tailored Focal Modulation for each client. Instead of employing a
conventional customization mechanism that maintains client-specific focal modulation
layers locally, we introduce a learn-to-tailor approach that fosters client
collaboration, enhancing scalability and adaptation in TransFed. Our method incorporates
a hyper network on the server, responsible for learning personalized projection matrices
for the focal modulation layers. This enables the generation of client-specific keys,
values, and queries. Furthermore, we provide an analysis of adaptation bounds for
TransFed using the learn-to-customize mechanism. Through intensive experiments on
datasets related to pneumonia classification, we demonstrate that TransFed, in
combination with the learn-to-tailor approach, achieves superior performance in
scenarios with non-IID data distributions, surpassing existing methods. Overall,
TransFed paves the way for leveraging focal Modulation in federated learning, advancing
the capabilities of focal modulated transformer models in decentralized
environments.
Human
pose estimation is the
process of continuously monitoring a person's action and
movement to track and monitor the activity
of a person or an object. Human pose estimation is
usually
done by capturing the key points which describe the pose
of
a person. A guiding
practicing framework that enables people to learn and
exercise activities like yoga, fittness, dancing, etc.,
might
be built using human posture recognition remotely and
accurately without the help of a personal
trainer. This work has proposed a framework to detect
and
recognize
various yoga and exercise poses to help the individual
practice the same
correctly. A popular Blaze-pose model extracts key
points
from the student end and compares the same with the
instructor pose. The extracted
key points are fed to the Human Pose Juxtaposition model
(HPJT) to
compare the student pose with the instructor. The model
will
assess the
correctness of the pose by comparing the extracted key
points and give
feedback to students if any corrections need to be made.
The
proposed
model is trained with 40+ yoga and exercise poses, and
evaluated the
model's performance with the mAP, and the model achieved
an
accuracy
of 99.04%. The results proved that any person could use
the
proposed
framework in real-time to practice exercise, yoga,
dance,
etc. At their respective location without the help of a
physical instructor with precision
and accuracy, leading to a healthy life.
In
this
paper, we present a Climate Change Parameter Dataset
(CCPD)
intending to achieve state-of-the-art
results in parameters which effect climate change,
including
forest cover, water bodies, agriculture and
vegetation, population, temperature, construction, and
air
index. The dataset can be used by the research
community to validate the claims made in relation to the
climate change. Research community has been
deeply involved in extending the use case of machine
learning algorithms to the effects of climate change.
However, the non-availability of sufficient data related
to
climate change parameters has limited the
research in this domain. By presenting this dataset, we
want
to facilitate the researchers. In this dataset, we
provide a large variety of statistical and satellite
data
acquired by various image processing techniques and
on-ground data collection. The data is collected in
abundance for a specific region, and then various
machine learning techniques are used to extract the
useful
data related to each parameter separately. We
call this amalgam of processed data as CCPD dataset.
CCPD
dataset contains over 6000 data points for all
seven parameters and covers the data from 1960 onwards.
We
hope this dataset will aid the research
community in tackling climate change with the help of
AI.
The
area of Computer Vision has gone through exponential
growth and advancement over the past decade. It is
mainly due to the introduction of effective
deep-learning methodologies and the availability of
massive data. This has resulted in the incorporation of
intelligent computer vision schemes to automate the
different number of tasks. In this paper, we have worked
on similar lines. We have proposed an integrated system
for the development of robotic arms, considering the
current situation in fruit identification,
classification, counting, and generating their masks
through semantic segmentation. The current method of
manually doing these processes is time-consuming and is
not feasible for large fields. Due to this, multiple
works have been proposed to automate harvesting tasks to
minimize the overall overhead. However, there is a lack
of an integrated system that can automate all these
processes together. As a result, we are proposing one
such approach based on different machine learning
techniques. For each process, we propose to use the most
effective learning technique with computer vision
capability. Thus, proposing an integrated intelligent
end-to-end computer vision-based system to detect,
classify, count, and identify the apples. In this
system, we modified the YOLOv3 algorithm to detect and
count the apples effectively. The proposed scheme works
even under variable lighting conditions. The system was
trained and tested using a standard benchmark i.e.,
MinneApple. Experimental results show an average
accuracy of 91%..
Higer
Secondary Class X with 92.6%
Class
XII with
91.8%
Languages I can connect in
English
Kashmiri
Urdu
Hindi
Arabic
Persian
Community
In my academic journey, I transitioned into a research-focused lifestyle, driven by a deep curiosity
to explore
Computer Vision and Medical Imaging. My path has been dynamic, with a domain shift from
IoT to robotics, reflecting a continuous pursuit of learning and innovation.
I am an advocate for open-source contributions, as they foster collective growth and help both
educators and learners critically engage with the wealth of information available online. The
guidance and support
of my academic mentors and peers have been pivotal in shaping my early career, and their influence
continues to
inspire me.
Building on this foundation, I am actively mentoring undergraduate and master’s students in
computer vision, and I look forward to supporting more learners in the future. I particularly
encourage students with diverse backgrounds or unconventional academic paths, similar to my own, to
connect and
explore opportunities for growth and research.
If you are interested, send an introductory email that includes:
- A brief introduction about yourself.
- Your academic background and areas of interest.
- Your CV (optional but preferred).