Outside of work, I enjoy playing badminton and swimming quite often. I'm also
passionate about making a positive impact on the society which led me to initiate Ralith Milith, an
anti-drug society in Kashmir.
"Wonder is the feeling of a philosopher, and philosophy begins in wonder." -
Scorates.
News
Dec 2024 - Nominated as Diversity Intern by Microsoft Canada! [Video].
Oct 2024 - Recieved the RISE
MICCAI travel grant to attent MICCAI 2024.
Aug 2020 - Organized a career counseling session on GRE preparation at NIT Srinagar.
Research
My research focuses on the integration of Computer Vision with Medical AI and
Assistive Robotics. Specifically, I am interested in developing advanced computer
vision systems for medical imaging analysis, including applications such as
automated diagnosis and treatment planning. Additionally, I explore the
intersection of AI and robotics to enhance assistive technologies, aiming to
improve quality of life through innovations in healthcare. My work
often involves deep learning, image processing techniques, and the application
of AI-driven solutions in real-world scenarios.
We
focus on the problem of
Unsupervised Domain Adaptation (UDA) for breast cancer
detection from mammograms (BCDM) problem. Recent
advancements have shown that masked image modeling
serves as a robust pretext task for UDA. However, when
applied to crossdomain BCDM, these techniques struggle
with breast abnormalities such as masses, asymmetries,
and micro-calcifications, in part due to the typically
much smaller size of region of interest in comparison to
natural images. This often results in more false
positives per image (FPI) and significant noise in
pseudo-labels typically used to bootstrap such
techniques. Recognizing these challenges, we introduce a
transformerbased Domain-invariant Mask Annealed Student
Teacher autoencoder (D-MASTER) framework. D-MASTER
adaptively masks and reconstructs multiscale feature
maps, enhancing the model’s ability to capture reliable
target domain features. D-MASTER also includes adaptive
confidence refinement to filter pseudo-labels, ensuring
only high-quality detections are considered. We also
provide a bounding box annotated subset of 1000
mammograms from the RSNA Breast Screening Dataset
(referred to as RSNA-BSD1K) to support further research
in BCDM. We evaluate D-MASTER on multiple BCDM datasets
acquired from diverse domains. Experimental results show
a significant improvement of 9% and 13% in sensitivity
at 0.3 FPI over state-of-the-art UDA techniques on
publicly available benchmark INBreast and DDSM datasets
respectively. We also report an improvement of 11% and
17% on In-house and RSNA-BSD1K datasets respectively. To
promote reproducible research and address the scarcity
of accessible resources in BCDM, we will publicly
release source code, and pre-trained D-MASTER model,
along with RSNA-BSD1K annotations.
Federated learning has emerged
as a
promising paradigm for collaborative machine learning,
enabling multiple clients to train a model while
preserving
data privacy jointly. Tailored federated learning takes
this
concept further by accommodating client heterogeneity
and
facilitating the learning of personalized models. While
the
utilization of transformers within federated learning
has
attracted significant interest, there remains a need to
investigate the effects of federated learning algorithms
on
the latest focal modulation-based transformers. In this
paper, we investigate this relationship and uncover the
detrimental effects of federated averaging (FedAvg)
algorithms on Focal Modulation, particularly in
scenarios
with heterogeneous data. To address this challenge, we
propose TransFed, a novel transformer-based federated
learning framework that not only aggregates model
parameters
but also learns tailored Focal Modulation for each
client.
Instead of employing a conventional customization
mechanism
that maintains client-specific focal modulation layers
locally, we introduce a learn-to-tailor approach that
fosters client collaboration, enhancing scalability and
adaptation in TransFed. Our method incorporates a hyper
network on the server, responsible for learning
personalized
projection matrices for the focal modulation layers.
This
enables the generation of client-specific keys, values,
and
queries. Furthermore, we provide an analysis of
adaptation
bounds for TransFed using the learn-to-customize
mechanism.
Through intensive experiments on datasets related to
pneumonia classification, we demonstrate that TransFed,
in
combination with the learn-to-tailor approach, achieves
superior performance in scenarios with non-IID data
distributions, surpassing existing methods. Overall,
TransFed paves the way for leveraging focal Modulation
in
federated learning, advancing the capabilities of focal
modulated transformer models in decentralized
environments.
In
clinical applications, X-Ray
technology plays a crucial role in noninvasive
examinations like mammography, providing essential
anatomical information about patients. However, the
inherent radiation risk associated with X-Ray procedures
raises significant concerns. X-Ray reconstruction is
crucial in medical imaging for creating detailed visual
representations of internal structures, and facilitating
diagnosis and treatment without invasive procedures.
Recent advancements in deep learning (DL) have shown
promise in X-Ray reconstruction. Nevertheless,
conventional DL methods often necessitate the
centralized aggregation of substantial large datasets
for training, following specific scanning protocols.
This requirement results in notable domain shifts and
privacy issues. To address these challenges, we
introduce the Hierarchical Framework based Federated
Learning method (HF-Fed) for customized X-Ray Imaging.
HF-Fed addresses the challenges in X-Ray imaging
optimization by decomposing the problem into two
components: local data adaptation and holistic X-Ray
Imaging. It employs a hospital-specific hierarchical
framework and a shared common imaging network called
Network of Networks (NoN) for these tasks. The emphasis
of the NoN is on acquiring stable features from a
variety of data distributions. A hierarchical
hypernetwork extracts domain-specific hyperparameters,
conditioning the NoN for customized X-Ray
reconstruction. Experimental results demonstrate
HF-Fed’s competitive performance, offering a promising
solution for enhancing X-Ray imaging without the need
for data sharing. This study significantly contributes
to the evolving body of literature on the potential
advantages of federated learning in the healthcare
sector. It offers valuable insights for policymakers and
healthcare providers holistically.
Cataract surgery is the most common
surgical procedure globally, with a disproportionately
higher burden in developing countries. While automated
surgical video analysis has been explored in general
surgery, its application to ophthalmic procedures
remains limited. Existing works primarily focus on Phaco
cataract surgery, an expensive technique not accessible
in regions where cataract treatment is most needed. In
contrast, Manual Small-Incision Cataract Surgery (MSICS)
is the preferred low-cost, faster alternative in
high-volume settings and for challenging cases. However,
no dataset exists for MSICS. To address this gap, we
introduce Sankara-MSICS, the first comprehensive dataset
containing 53 surgical videos annotated for 18 surgical
phases and 3,527 frames with 13 surgical tools at the
pixel level. We benchmark this dataset on
state-of-the-art models and present ToolSeg, a novel
framework that enhances tool segmentation by introducing
a phase-conditional decoder and a simple yet effective
semi-supervised setup leveraging pseudo-labels from
foundation models. Our approach significantly improves
segmentation performance, achieving a 23.77% to 38.10%
increase in mean Dice scores, with a notable boost for
tools that are less prevalent and small. Furthermore, we
demonstrate that ToolSeg generalizes to other surgical
settings, showcasing its effectiveness on the CaDIS
dataset.
One
of the biggest issues facing humanity in the
twenty-first century is climate change, as shown by the
increasing sea levels, melting glaciers, and frequent
storms. Accurate temperature forecasting is crucial for
understanding and mitigating its impacts. Cutting-edge
data-driven models for temperature forecasting typically
employ recurrent neural networks(CNNs), with certain
models integrating attention mechanisms. However RNNs
sequential processing limits parallelization, especially
for longer sequences. In order to do this, we provide a
brand-new method for temperature prediction that is
based on the FocalNet Transformer architecture. By
functioning in a multi-tensor format, the suggested
Focal-modulation Attention Encoder (FATE) framework
leverages the spatial and temporal nuances of
meteorological data characteristics by integrating
tensorized modulation. Comparative assessments against
existing transformer encoder architectures, 3D CNNs,
LSTM, and ConvLSTM demonstrate our model’s superior
ability to capture nuanced patterns inherent in the
data, particularly in the context of temperature
prediction. We also introduce a new labeled dataset,
Climate change Parameter dataset (CCPD), which
encompasses 40 years of data from J&K region on seven
key parameters that influence climate change, supporting
further research in this area. Experiments on two
real-world benchmark temperature datasets from weather
stations in the USA, Canada, and Europe demonstrate
accuracy improvements of 12%, 23%, and 28% respectively,
compared to existing SOTA models. In addition, we
achieved state-of-the-art results on our CCPD dataset
with a 24% improvement. To understand FATE, we introduce
two modulation scores from the tensorial modulation
process. These scores clarify our model’s decision
making and key climate change parameters. For
reproducible research, we will release the source code,
pre-trained FATE model, and CCPD dataset.
We focus
on the problem of
Unsupervised Domain Adaptation (UDA) for breast cancer
detection from mammograms (BCDM) problem. Recent
advancements have shown that masked image modeling
serves as a robust pretext task for UDA. However, when
applied to crossdomain BCDM, these techniques struggle
with breast abnormalities such as masses, asymmetries,
and micro-calcifications, in part due to the typically
much smaller size of region of interest in comparison to
natural images. This often results in more false
positives per image (FPI) and significant noise in
pseudo-labels typically used to bootstrap such
techniques. Recognizing these challenges, we introduce a
transformerbased Domain-invariant Mask Annealed Student
Teacher autoencoder (D-MASTER) framework. D-MASTER
adaptively masks and reconstructs multiscale feature
maps, enhancing the model’s ability to capture reliable
target domain features. D-MASTER also includes adaptive
confidence refinement to filter pseudo-labels, ensuring
only high-quality detections are considered. We also
provide a bounding box annotated subset of 1000
mammograms from the RSNA Breast Screening Dataset
(referred to as RSNA-BSD1K) to support further research
in BCDM. We evaluate D-MASTER on multiple BCDM datasets
acquired from diverse domains. Experimental results show
a significant improvement of 9% and 13% in sensitivity
at 0.3 FPI over state-of-the-art UDA techniques on
publicly available benchmark INBreast and DDSM datasets
respectively. We also report an improvement of 11% and
17% on In-house and RSNA-BSD1K datasets respectively. To
promote reproducible research and address the scarcity
of accessible resources in BCDM, we will publicly
release source code, and pre-trained D-MASTER model,
along with RSNA-BSD1K annotations.
In clinical applications, X-Ray
technology plays a crucial role in noninvasive
examinations like mammography, providing essential
anatomical information about patients. However, the
inherent radiation risk associated with X-Ray procedures
raises significant concerns. X-Ray reconstruction is
crucial in medical imaging for creating detailed visual
representations of internal structures, and facilitating
diagnosis and treatment without invasive procedures.
Recent advancements in deep learning (DL) have shown
promise in X-Ray reconstruction. Nevertheless,
conventional DL methods often necessitate the
centralized aggregation of substantial large datasets
for training, following specific scanning protocols.
This requirement results in notable domain shifts and
privacy issues. To address these challenges, we
introduce the Hierarchical Framework based Federated
Learning method (HF-Fed) for customized X-Ray Imaging.
HF-Fed addresses the challenges in X-Ray imaging
optimization by decomposing the problem into two
components: local data adaptation and holistic X-Ray
Imaging. It employs a hospital-specific hierarchical
framework and a shared common imaging network called
Network of Networks (NoN) for these tasks. The emphasis
of the NoN is on acquiring stable features from a
variety of data distributions. A hierarchical
hypernetwork extracts domain-specific hyperparameters,
conditioning the NoN for customized X-Ray
reconstruction. Experimental results demonstrate
HF-Fed’s competitive performance, offering a promising
solution for enhancing X-Ray imaging without the need
for data sharing. This study significantly contributes
to the evolving body of literature on the potential
advantages of federated learning in the healthcare
sector. It offers valuable insights for policymakers and
healthcare providers holistically.
Federated learning has emerged
as a
promising paradigm for collaborative machine learning,
enabling multiple clients to train a model while
preserving
data privacy jointly. Tailored federated learning takes
this
concept further by accommodating client heterogeneity
and
facilitating the learning of personalized models. While
the
utilization of transformers within federated learning
has
attracted significant interest, there remains a need to
investigate the effects of federated learning algorithms
on
the latest focal modulation-based transformers. In this
paper, we investigate this relationship and uncover the
detrimental effects of federated averaging (FedAvg)
algorithms on Focal Modulation, particularly in
scenarios
with heterogeneous data. To address this challenge, we
propose TransFed, a novel transformer-based federated
learning framework that not only aggregates model
parameters
but also learns tailored Focal Modulation for each
client.
Instead of employing a conventional customization
mechanism
that maintains client-specific focal modulation layers
locally, we introduce a learn-to-tailor approach that
fosters client collaboration, enhancing scalability and
adaptation in TransFed. Our method incorporates a hyper
network on the server, responsible for learning
personalized
projection matrices for the focal modulation layers.
This
enables the generation of client-specific keys, values,
and
queries. Furthermore, we provide an analysis of
adaptation
bounds for TransFed using the learn-to-customize
mechanism.
Through intensive experiments on datasets related to
pneumonia classification, we demonstrate that TransFed,
in
combination with the learn-to-tailor approach, achieves
superior performance in scenarios with non-IID data
distributions, surpassing existing methods. Overall,
TransFed paves the way for leveraging focal Modulation
in
federated learning, advancing the capabilities of focal
modulated transformer models in decentralized
environments.
Human
pose estimation is the
process of continuously monitoring a person's action and
movement to track and monitor the activity
of a person or an object. Human pose estimation is
usually
done by capturing the key points which describe the pose
of
a person. A guiding
practicing framework that enables people to learn and
exercise activities like yoga, fittness, dancing, etc.,
might
be built using human posture recognition remotely and
accurately without the help of a personal
trainer. This work has proposed a framework to detect
and
recognize
various yoga and exercise poses to help the individual
practice the same
correctly. A popular Blaze-pose model extracts key
points
from the student end and compares the same with the
instructor pose. The extracted
key points are fed to the Human Pose Juxtaposition model
(HPJT) to
compare the student pose with the instructor. The model
will
assess the
correctness of the pose by comparing the extracted key
points and give
feedback to students if any corrections need to be made.
The
proposed
model is trained with 40+ yoga and exercise poses, and
evaluated the
model's performance with the mAP, and the model achieved
an
accuracy
of 99.04%. The results proved that any person could use
the
proposed
framework in real-time to practice exercise, yoga,
dance,
etc. At their respective location without the help of a
physical instructor with precision
and accuracy, leading to a healthy life.
In
this
paper, we present a Climate Change Parameter Dataset
(CCPD)
intending to achieve state-of-the-art
results in parameters which effect climate change,
including
forest cover, water bodies, agriculture and
vegetation, population, temperature, construction, and
air
index. The dataset can be used by the research
community to validate the claims made in relation to the
climate change. Research community has been
deeply involved in extending the use case of machine
learning algorithms to the effects of climate change.
However, the non-availability of sufficient data related
to
climate change parameters has limited the
research in this domain. By presenting this dataset, we
want
to facilitate the researchers. In this dataset, we
provide a large variety of statistical and satellite
data
acquired by various image processing techniques and
on-ground data collection. The data is collected in
abundance for a specific region, and then various
machine learning techniques are used to extract the
useful
data related to each parameter separately. We
call this amalgam of processed data as CCPD dataset.
CCPD
dataset contains over 6000 data points for all
seven parameters and covers the data from 1960 onwards.
We
hope this dataset will aid the research
community in tackling climate change with the help of
AI.
The
area of Computer Vision has gone through exponential
growth and advancement over the past decade. It is
mainly due to the introduction of effective
deep-learning methodologies and the availability of
massive data. This has resulted in the incorporation of
intelligent computer vision schemes to automate the
different number of tasks. In this paper, we have worked
on similar lines. We have proposed an integrated system
for the development of robotic arms, considering the
current situation in fruit identification,
classification, counting, and generating their masks
through semantic segmentation. The current method of
manually doing these processes is time-consuming and is
not feasible for large fields. Due to this, multiple
works have been proposed to automate harvesting tasks to
minimize the overall overhead. However, there is a lack
of an integrated system that can automate all these
processes together. As a result, we are proposing one
such approach based on different machine learning
techniques. For each process, we propose to use the most
effective learning technique with computer vision
capability. Thus, proposing an integrated intelligent
end-to-end computer vision-based system to detect,
classify, count, and identify the apples. In this
system, we modified the YOLOv3 algorithm to detect and
count the apples effectively. The proposed scheme works
even under variable lighting conditions. The system was
trained and tested using a standard benchmark i.e.,
MinneApple. Experimental results show an average
accuracy of 91%..
Higer
Secondary Class X with 92.6%
Class
XII with
91.8%
Languages I can connect in
English
Kashmiri
Urdu
Hindi
Arabic
Persian
Community
In my academic journey, I transitioned into a research-focused lifestyle, driven by a deep curiosity to explore
Computer Vision and Medical Imaging. My path has been dynamic, with a domain shift from
IoT to robotics, reflecting a continuous pursuit of learning and innovation.
I am an advocate for open-source contributions, as they foster collective growth and help both
educators and learners critically engage with the wealth of information available online. The guidance and support
of my academic mentors and peers have been pivotal in shaping my early career, and their influence continues to
inspire me.
Building on this foundation, I am actively mentoring undergraduate and master’s students in
computer vision, and I look forward to supporting more learners in the future. I particularly
encourage students with diverse backgrounds or unconventional academic paths, similar to my own, to connect and
explore opportunities for growth and research.
If you are interested, send an introductory email that includes:
- A brief introduction about yourself.
- Your academic background and areas of interest.
- Your CV (optional but preferred).