全部 |
  • 全部
  • 题名
  • 关键词
  • NSTL主题词
  • 摘要
  • 会议名称
  • 论文-出处
  • 论文-作者
  • 论文-机构
  • 论文-DOI
  • 会议-出版者
  • 会议-出版地
  • 会议-主编
  • 会议-主办单位
  • 会议-举办地
  • ISSN
  • EISSN
  • ISBN
  • EISBN
检索 搜索会议录 二次检索 AI检索
外文文献 中文文献
筛选条件:

1. ShareGPT4V: Improving Large Multi-modal Models with Better Captions NSTL国家科技图书文献中心

Lin Chen |  Jinsong Li... -  《Computer Vision - ECCV 2024,Part XVII》 -  European Conference on Computer Vision - 2025, - 370~387 - 共18页

摘要: detailed captions enable more nuanced vision-language |  brief captions or VQA data; 2) Cutting-edge LMMs can |  captions, exhibiting remarkable improvements across a |  range from thousands to millions of captions. Drawing |  ShareGPT4V, consisting of 100K high-quality captions
关键词: Large multi-modal models |  Modality alignment |  High-quality captions

2. Parrot Captions Teach CLIP to Spot Text NSTL国家科技图书文献中心

Yiqi Lin |  Conghui He... -  《Computer Vision - ECCV 2024,Part XLII》 -  European Conference on Computer Vision - 2025, - 368~385 - 共18页

摘要: dataset LAION-2B, the captions also densely parrot |  text content and around 30% of captions words are |  whether these parrot captions shape the text spotting |  criteria. We show that training with parrot captions | Despite CLIP being the foundation model in
关键词: Image-Text dataset |  Text spotting bias

3. DreamLIP: Language-Image Pre-training with Long Captions NSTL国家科技图书文献中心

Kecheng Zheng |  Yifei Zhang... -  《Computer Vision - ECCV 2024,Part XVIII》 -  European Conference on Computer Vision - 2025, - 73~90 - 共18页

摘要: requires lengthy captions (e.g., with 10 sentences |  benefit from long captions. To figure this out, we first |  resulting captions under a contrastive learning framework |  dynamically sample sub-captions from the text label to | Language-image pre-training largely relies on
关键词: Language-image pre-training |  Long caption |  Multi-modal learning

4. Misinformation in Reels, Influence of Contextual Superimposed Texts in Short Videos NSTL国家科技图书文献中心

Andrew Bartlett |  Waheeb Yaqub... -  《Web Information Systems Engineering - WISE 2024,Part II》 -  International Conference on Web Information Systems Engineering - 2025, - 3~14 - 共12页

摘要: captions that create doubt in viewers' minds. With the |  Australia to examine the impact of misleading captions on | Short videos and reels have emerged as a new |  method for conveying visual information on social media | . These videos often include misleading superimposed
关键词: Fake short videos |  Fake news |  Social media |  Context captions |  Perceptions |  Misinformation

5. MCANet: Multimodal Caption Aware Training-Free Video Anomaly Detection via Large Language Model NSTL国家科技图书文献中心

Prabhu Prasad Dev |  Raju Hazari... -  《Pattern Recognition,Part XXXII》 -  International Conference on Pattern Recognition - 2025, - 362~379 - 共18页

摘要: captions produced by the audio captioning model. The |  utilizes image-text similarities to clean noisy captions |  generated by the image captioning model, while the second | Towards Video Anomaly Detection (VAD | ), existing methods require labor-intensive data collection
关键词: Video anomaly detection |  Large language model |  Vision language model |  Audio language model |  Multimodal captions

6. Interactive Video Search with Multi-modal LLM Video Captioning NSTL国家科技图书文献中心

Yu-Tong Cheng |  Jiaxin Wu... -  《MultiMedia Modeling,Part V》 -  International Conference on MultiMedia Modeling - 2025, - 302~309 - 共8页

摘要: use LLM to generate video captions for a large video | -grained captions for test video collections to enable |  detailed captions in our interactive video retrieval |  captions are effective in improving the search accuracy |  multi-model LLM on video captioning. Specifically, we
关键词: Interactive video retrieval |  Multi-modal LLM |  Video captioning

7. Controllable Contextualized Image Captioning: Directing the Visual Narrative Through User-Defined Highlights NSTL国家科技图书文献中心

Shunqi Mao |  Chaoyi Zhang... -  《Computer Vision - ECCV 2024,Part L》 -  European Conference on Computer Vision - 2025, - 464~481 - 共18页

摘要: generate image captions given specific contextual |  the model to tailor captions that resonate with the |  captions. P-Ctrl conditions the model generation on |  highlight by prepending captions with highlight-driven |  assess the quality of the controlled captions alongside
关键词: Contextualized image captioning |  Large multimodal model |  Controllable text generation

8. It's Just Another Day: Unique Video Captioning by Discriminitive Prompting NSTL国家科技图书文献中心

Toby Perrett |  Tengda Han... -  《Computer Vision - ACCV 2024,Part III》 -  Asian Conference on Computer Vision - 2025, - 275~293 - 共19页

摘要: given identical captions, which makes it difficult to |  generate unique captions. We introduce two benchmarks for |  demonstrate that captions generated by CDP improve text-to |  captioning: Given multiple clips with the same caption, we |  identifies it. We propose Captioning by Discriminative
关键词: Uniqueness |  Video captioning |  Egocentric |  Movies

9. Deneb: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning NSTL国家科技图书文献中心

Kazuki Matsuda |  Yuiga Wada... -  《Computer Vision - ACCV 2024,Part III》 -  Asian Conference on Computer Vision - 2025, - 166~182 - 共17页

摘要: ability to compare candidate captions with multifaceted |  reference captions. To address this shortcoming, we |  captions. To train Deneb, we construct the diverse and |  captioning, with a particular focus on robustness against | In this work, we address the challenge of
关键词: Vision and language |  Hallucination |  Image captioning |  Metrics

10. ControlCap: Controllable Region-Level Captioning NSTL国家科技图书文献中心

Yuzhong Zhao |  Yue Liu... -  《Computer Vision - ECCV 2024,Part XXXVIII》 -  European Conference on Computer Vision - 2025, - 21~38 - 共18页

摘要: frequent captions but miss the less frequent ones. In |  captions within a few sub-spaces containing the control |  frequent captions, alleviating the caption degeneration | Region-level captioning is challenged by the |  captioning (Control-Cap) approach, which introduces control
关键词: Controllable captioning |  Caption degeneration |  Region-level captioning
检索条件captions
  • 检索词扩展

NSTL主题词

  • NSTL学科导航