Muzammal Naseer

I received my Ph.D. degree from the Australian National University, Australia in 2020. My thesis was “Novel Concepts and Designs for Adversarial Attacks and Defenses.” I am an assistant professor in the computer science department of the College of Computing and Mathematical Sciences at Khalifa University.

I am interested in building Robust Intelligent Systems. My research focuses on robust visual-spatial and temporal perception, understanding and explaining AI behavior through adversarial machine learning, representation learning through self-learning ( self-supervision, self-distillation, self-critique, self-reflection), and configuring the role of large language models (LLMs) in building robust AI systems across applications of security and life sciences.

⚡ Top-Venue

  • One AAAI 2025 paper accepted
  • One TPAMI 2025 paper accepted.
  • Four CVPR 2024 papers accepted.
  • One ICLR 2024 paper accepted.
  • One AAAI 2024 paper accepted (Oral, Top 9.0%).
  • One NeurIPS 2023 paper accepted.
  • Three ICCV 2023 papers accepted.
  • Three CVPR 2023 papers accepted.
  • One ICLR 2023 paper accepted.
  • One TPAMI 2022 paper accepted.
  • One CVPR 2022 paper accepted (Oral, Top 5.0%).
  • One ICLR 2022 paper accepted (Spotlight, Top 5.0%).
  • One NeurIPS 2021 paper accepted (Spotlight, Top 3.0%).
  • Two ICCV 2021 papers accepted.
  • One CVPR 2020 paper accepted (Oral, Top 5.7%).
  • One NeurIPS 2019 paper accepted.

🔥 Notable

  • One ACCV 2024 paper accepted (Oral, Top 5.6%, Best Student Paper Runner-up).
  • Four MICCAI 2024 papers accepted.
  • One MICCAI 2023 paper accepted (Early Accept, Top 14.0%).
  • One BMVC 2022 paper accepted (Oral, Top 9.5%).
  • One ACCV 2022 paper accepted (Oral, Top 14.6%).
Geochat: Grounded large vision-language model for remote sensing
Geochat: Grounded large vision-language model for remote sensing

**Multi-modal Large Language Model**, **CVPR 2024** VLM for remote sensing dialogue and analysis.

Jan 12, 2024

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts
LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts

**Text-to-Image Model**, **ICLR 2024** Leaverging LLM to generate complex scenes in Zero-Shot.

Jan 12, 2024

S3A: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment
S3A: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment

**Vision-Language Model**, **AAAI 2024**, **Oral**, **Top 9.5%** Self-structural Alignment of Foundational Models for Zero-Shot.

Dec 12, 2023

Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

**Vision-Language Model**, **NeurIPS 2023** Test-Time Alignment of Foundational Models for Zero-shot.

Nov 12, 2023

Self-regulating Prompts: Foundational Model Adaptation without Forgetting
Self-regulating Prompts: Foundational Model Adaptation without Forgetting

**Vision-Language Model**, **ICCV 2023** Self-regularization for foundational vision-language models during fine-tuning.

Jul 13, 2023

FLIP: Cross-domain Face Anti-spoofing with Language Guidance
FLIP: Cross-domain Face Anti-spoofing with Language Guidance

**Vision-Language Model**, **ICCV 2023** Face anti-spoofing by adapting foundational vision-language models like CLIP.

Jul 13, 2023

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

**Visual-Spatial and Temporal Perception**, **ICCV 2023** Spatio-temporal focal modulation for video recognition is an efficient network.

Jul 13, 2023

Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation
Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation

**MICCAI 2023** Frequency domain adversarial training for robust medical segmentation.

May 25, 2023

Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting

**Vision-Language Model**, **CVPR 2023** Adapting vision language Foundational models like CLIP for video recognition.

Feb 27, 2023

PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery
PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery

**Self-Learning**, **CVPR 2023** Novel class discovery through prompting.

Feb 27, 2023