Abstract: Person text-image matching, also known as text-based person search, aims to retrieve images of specific pedestrians using text descriptions. Although person text-image matching has made ...
Abstract: Large language models (LLMs)-based image captioning has the capability of describing objects not explicitly observed in training data; yet novel objects occur frequently, necessitating the ...