Uncategorized

Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot GeneralizationJameel Hassan, Hanan Gani, Noor Hussein, Muhammad Uzair Khattak, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan November 2023 Abstract The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks. Previous works have shown test time prompt tuning using entropy minimization to adapt text prompts for unseen domains. While effective, this overlooks the key cause for perform

Uncategorized

S3A: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts Sheng Zhang, Muzammal Naseer, Guangyi Chen, Zhiqiang Shen, Salman Khan, Kun Zhang, Fahad Shahbaz Khan December 2023 Abstract Large scale pre trained Vision Language Models (VLMs) have proven effective for zero shot classification. Despite the success, most traditional VLMs based methods are restricted by the assumption of partial source supervision or ideal target vocabularies, which rarely satisfy the open world scenario. In this paper, we aim at a more challenging setting,

Uncategorized

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts Hanan Gani, Shariq Farooq Bhat, Muzammal Naseer, Salman Khan, Peter Wonka January 2024 Abstract Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts describing complex scenes with multiple objects. While excelling in generating images from short, single-object descriptions, these models often struggle to faithfully capture all the nuanced details within longer and mo

Uncategorized

Geochat: Grounded large vision-language model for remote sensing

Geochat: Grounded large vision-language model for remote sensing Kartik Kuckreja, Muhammad Sohail Danish,, Muzammal Naseer, Abhijit Das, Salman Khan, Fahad Shahbaz Khan January 2024. Abstract Recent advancements in Large Vision Language Models (VLMs) have shown great promise in natural image domains allowing users to hold a dialogue about given visual content. However such general-domain VLMs perform poorly for Remote Sensing (RS) scenarios leading to inaccurate or fabricated information when presented with RS domain specific queri