Dr. Thomas Mensink
Research Scientist @ Google Research Amsterdam
Guest Researcher @ CV-Group, University of Amsterdam

Email: lastname -at- google.com
PublicationsCVGoogle ScholarArXiVGithubTwitterLinkedIn


  • [apr 2024] Proceedings (PMLR) available for How (not) to ensemble LVLMs.
  • [apr 2024] Preprint: HAMMR: HierArchical MultiModal React agents for generic VQA is available on arxiv.
  • [spring 2024] I’ll serve as AC for ECCV 2024.
  • [nov 2023]: How (not) to ensemble LVLMs was accepted for theI Can’t Believe It’s Not Better Workshop (co-located with NeurIPS 2023).
  • [oct 2023] Preprint: How (not) to ensemble LVLMs for VQA (arXiv) with Lisa (intern), Lluis, Mostafa, Fantine and Jasper. Abstract: In the recent work on Encyclopedic-VQA the authors examine a wide variety of models to solve their task: from vanilla LVLMs, to models including the caption as extra context, to models augmented with Lens-based retrieval of Wikipedia pages. Intuitively these models are highly complementary, which should make them ideal for ensembling. Indeed, an oracle experiment shows potential gains from 48.8% accuracy (the best single model) all the way up to 67% (best possible ensemble). So it is a trivial exercise to create an ensemble with substantial real gains. Or is it?
  • [okt 2023] I will visit the ICCV 2023
  • [sept 2023] I will visit the NCCV 2023
  • [sept 2023] Infinite Class Mixup accepted for BMVC 2023!
  • [aug 2023] Encyclopedic-VQA accepted for ICCV 2023!
  • [june 2023] Preprint. Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories (arXiv) with Jasper, Lluis, Arushi, Felipe, Howard, Fei, André, and Vitto. Abstract: Encyclopedic-VQA is a dataset with 1M VQA triplets featuring visual questions about detailed properties of fine-grained categories and instances. This dataset poses a hard challenge for large vision+language models, PaLI achieves only 13.0% accuracy), while retrieval augmented methods perform much better.
  • [may 2023] Preprint. Infinite Class Mixup – a project with Pascal Mettes (from UvA) is available on arXiv.
  • [april 2023] ICML 2023: ViT-22B (arxiv) is paper accepted for an oral presentation!
  • ECCV 2022: Outstanding Reviewer recognition! I’m recognised as one of the outstanding reviewers for the ECCV 2022 conference.
  • ECCV 2022 two papers accepted! Both The Missing Link: Finding label relations across datasets (arxiv) and How stable are transferability metrics? (arxiv) have been accepted as posters to ECCV 2022. With Andrea, Michal, Jasper and Vitto!
  • CVPR 2022 two papers accepted on transferability for semantic segmentation and for classification: Transferability Metrics for Selecting Source Model Ensembles (arxiv, Oral) & Transferability Estimation using Bhattacharyya Class Separability (arxiv). Congrats Michal & Andrea!
  • Accepted paper. TPAMI 2021, Factors of Influence for Transfer Learning across Diverse Appearance Domains and Task Types, with Jasper, Alina, Michael, and Vitto!
  • Accepted paper. CVIU 2021, Automatic Generation of Dense Non-rigid Optical Flow. Congrats An!
  • Accepted paper. ICML 2021, Neural Feature Matching in Implicit 3D Representations, with Yunlu, Basura, Hakan, and Stratis.
  • Accepted paper. ICLR 2021, Calibration of Neural Networks using Splines, with Kartik, Amir, Ajanthan, Cristian and Richard.
  • Accepted paper. WACV 2021, Multi-Loss Weighting with Coefficient of Variations, with Rick, Sezer and Theo.
  • Accepted paper. WACV 2021, EDEN: Synthetic Dataset of Enclosed Garden Scenes, with Hoang-An, Partha, Sezer and Theo.
  • Accepted paper. CoRL 2020, Range Conditioned Dilated Convolutions with Alex, Pei, Drago and Cristian. My first full Google paper.
  • Award. With Florent and Jorge, we received the Koenderink Award 2020 for fundamental contributions in computer vision that have withstood the test of time for our paper: Improving the Fisher Kernel for Large-Scale Image Classification, from ECCV 2010.