Publications

Geochat: Grounded large vision-language model for remote sensing
Geochat: Grounded large vision-language model for remote sensing

**Multi-modal Large Language Model**, **CVPR 2024** VLM for remote sensing dialogue and analysis.

Jan 12, 2024

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts
LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts

**Text-to-Image Model**, **ICLR 2024** Leaverging LLM to generate complex scenes in Zero-Shot.

Jan 12, 2024

S3A: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment
S3A: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment

**Vision-Language Model**, **AAAI 2024**, **Oral**, **Top 9.5%** Self-structural Alignment of Foundational Models for Zero-Shot.

Dec 12, 2023

Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

**Vision-Language Model**, **NeurIPS 2023** Test-Time Alignment of Foundational Models for Zero-shot.

Nov 12, 2023

Self-regulating Prompts: Foundational Model Adaptation without Forgetting
Self-regulating Prompts: Foundational Model Adaptation without Forgetting

**Vision-Language Model**, **ICCV 2023** Self-regularization for foundational vision-language models during fine-tuning.

Jul 13, 2023

FLIP: Cross-domain Face Anti-spoofing with Language Guidance
FLIP: Cross-domain Face Anti-spoofing with Language Guidance

**Vision-Language Model**, **ICCV 2023** Face anti-spoofing by adapting foundational vision-language models like CLIP.

Jul 13, 2023

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

**Visual-Spatial and Temporal Perception**, **ICCV 2023** Spatio-temporal focal modulation for video recognition is an efficient network.

Jul 13, 2023

Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation
Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation

**MICCAI 2023** Frequency domain adversarial training for robust medical segmentation.

May 25, 2023

Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting

**Vision-Language Model**, **CVPR 2023** Adapting vision language Foundational models like CLIP for video recognition.

Feb 27, 2023

PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery
PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery

**Self-Learning**, **CVPR 2023** Novel class discovery through prompting.

Feb 27, 2023