My name is Lihu Chen (陈立虎), currently a postdoc at Inria Saclay. Before that, I obtained my PhD at DIG team of Télécom Paris, which is a member of Institut Polytechnique de Paris. I was co-supervised by Fabian Suchanek and Gaël Varoquaux. My research topics include

  • Efficient Language Models. Check out our small models for biomedical entity disambiguation (AAAI 21), out-of-vocabulary words (ACL 22) and short text representations (EACL 24)
  • Information Extraction. I am interested in Entity Linking (AAAI 21, EACL 23) and Knowledge Base Completion (ACL@Matching 23)
  • Hallucinations in LLMs. Reconfidence answers generated by LLMs (Preprint)
  • Interpretability and Analysis of LLMs. Study properties of positional encodings (EMNLP 23)

Github 🐦 Twitter 🎓 Scholar 🤗 Huggingface


Publications

  1. A Lightweight Neural Model for Biomedical Entity Linking
    Lihu Chen, Gaël Varoquaux, and Fabian M. Suchanek
    In AAAI 2021
    | 💻 [code] | 📑 [slide] |
  2. Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost
    Lihu Chen, Gaël Varoquaux, and Fabian M. Suchanek
    In ACL 2022 (Oral)
    | 💻 code | 📑 [slide] | 📊 [poster] |
  3. GLADIS: A General and Large Acronym Disambiguation Benchmark
    Lihu Chen, Gaël Varoquaux, and Fabian M. Suchanek
    In EACL 2023 (Oral)
    | 💻 [code] | 🕹️ [demo] | 📑 [slide] |
  4. Who’s Speaking? Predicting Speaker Profession from Speech
    Yaru Wu, Lihu Chen, Benjamin Elie, Fabian Suchanek, Ioana Vasilescu, and Lori Lamel
    In International Congress of Phonetic Sciences (ICPhS) 2023
  5. Knowledge Base Completion for Long-Tail Entities
    Lihu Chen, Simon Razniewski and Gerhard Weikum
    In ACL 2023 (MATCHING workshop)
    | 💻 [code] | 💻 [code-mpi] |
  6. Towards efficient, general and robust entity disambiguation systems
    Lihu Chen
    PhD thesis
  7. The Locality and Symmetry of Positional Encodings
    Lihu Chen, Gaël Varoquaux and Fabian M. Suchanek
    In Findings of EMNLP 23
    | 💻 [code] | 📊 [poster] |
  8. Learning High-Quality and General-Purpose Phrase Representations
    Lihu Chen, Gaël Varoquaux and Fabian M. Suchanek
    In Findings of EACL 24
    | 💻 [code] | 💾 [data] | 🤗 PEARL-small | 🤗 PEARL-base | 🤗 PEARL Benchmark |
  9. Reconfidencing LLMs from the Grouping Loss Perspective
    Lihu Chen, Alexandre Perez-Lebel, Fabian M. Suchanek, and Gaël Varoquaux
    In preprint

Education and Experiences


Teaching

I work as a teaching assistant mainly servicing the following courses:

2020

  • Knowledge Base Construction, Télécom Paris
  • Natural Language and Speech Processing, Ecole Polytechnique
  • Databases, Télécom Paris

Service

Program Committee & Reviewer

  • 2022: PVLDB Reproducibility (External Reviewer); EMNLP
  • 2023: ACL; EMNLP; Machine Learning (External Reviewer)
  • 2024: ACL ARR; ESWC

Talks

  • Sep 2020, “A Lightweight Neural Model for Biomedical Entity Linking”, at SoDa Team and DIG Team, Paris, France
  • Sep 2021, “A Simple Yet Effective Positional Module”, at JDSE, CentraleSupelec, Saclay, France
  • May 2022, “Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost”, at ACL 22, Dublin, Ireland
  • Feb 2023, “Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost”, at Université Paris-Saclay, Saclay, France
  • May 2023, “GLADIS: A General and Large Acronym Disambiguation Benchmark”, at EACL 23, Dubrovnik, Croatia
  • Dec 2023, “The Locality and Symmetry of Positional Encodings”, at Peking University, Beijing, China

Misc

  • My Github
  • Human Language: Chinese Mandarin (Native); English (Fluent); French (Basic)
  • Travel Photos

Contact

Email:[firstname].[lastname][AT]inria.fr