About

I am currently a Machine Learning Engineer at Apple. Prior to this I was a PhD Student and PostDoc at LMU Munich under the supervision of Prof. Dr. Hinrich Schütze. My current research interests are: multilingual NLP, representation learning, low-resource processing, interpretability of embeddings, position encodings.


Publications

Wine is not v i n. On the Compatibility of Tokenizations across Languages
Antonis Maronikolakis*, Philipp Dufter*, Hinrich Schütze.
emnlp 2021 - findings. [Paper] * equal contribution

Graph Algorithms for Multiparallel Word Alignment
Ayyoob Imani Googhari*, Masoud Jalili Sabet*, Lütfi Kerem Şenel, Philipp Dufter, François Yvon, Hinrich Schütze.
emnlp 2021. [Paper] * equal contribution

BERT Cannot Align Characters
Antonis Maronikolakis, Philipp Dufter, Hinrich Schütze.
insights21 workshop (collocated with emnlp21) [Paper]

ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus
Ayyoob ImaniGooghari, Masoud Jalili Sabet, Philipp Dufter, Michael Cysouw, Hinrich Schütze.
acl 2021 - demos. [Paper] [Code]

Static Embeddings as Efficient Knowledge Bases?
Philipp Dufter*, Nora Kassner*, Hinrich Schütze.
naacl 2021. [Paper] [Code] * equal contribution

Position Information in Transformers: An Overview.
Philipp Dufter*, Martin Schmitt*, Hinrich Schütze.
arxiv 2021. [Paper] * equal contribution

Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models.
Nora Kassner*, Philipp Dufter*, Hinrich Schütze.
eacl 2021. - best short paper award [Paper] [Data] [Code] * equal contribution

Semantic Text Segment Classification of Structured Technical Content.
Julian Höllig, Philipp Dufter, Michaela Geierhos, Wolfgang Ziegler, Hinrich Schütze
nlbd 2021. [Paper]

Locating Language-Specific Information in Contextualized Embeddings.
Sheng Liang, Philipp Dufter, Hinrich Schütze
arxiv 2021. [Paper]

Increasing Learning Efficiency of Self-Attention Networks through Direct Position Interactions, Learnable Temperature, and Convoluted Attention.
Philipp Dufter, Martin Schmitt, Hinrich Schütze.
coling 2020. [Paper] [Code]

Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations.
Sheng Liang, Philipp Dufter, Hinrich Schütze.
coling 2020. [Paper]

Modeling Graph Structure via Relative Position for Better Text Generation from Knowledge Graphs.
Martin Schmitt, Leonardo F. R. Ribeiro, Philipp Dufter, Iryna Gurevych, Hinrich Schütze.
textgraphs-15 workshop (collocated with naacl21). [Paper]

Identifying Necessary Elements for BERT’s Multilinguality.
Philipp Dufter, Hinrich Schütze.
emnlp 2020. [Paper] [Code]

SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings.
Masoud Jalili Sabet*, Philipp Dufter*, François Yvon, Hinrich Schütze.
emnlp-findings 2020. [Paper] [Code] [Demo] * equal contribution

Subword Sampling for Low Resource Word Alignment.
Ehsaneddin Asgari*, Masoud Jalili Sabet*,Philipp Dufter, Christopher Ringlstetter, Hinrich Schütze.
arxiv 2020. [Paper]

Quantifying the Contextualization of Word Representations with Semantic Class Probing.
Mengjie Zhao, Philipp Dufter, Yadollah Yaghoobzadeh, Hinrich Schütze.
emnlp-findings 2020. [Paper]

Analytical Methods for Interpretable Ultradense Word Embeddings.
Philipp Dufter, Hinrich Schütze.
emnlp 2019. [Paper] [Code] [Supplementary] [Presentation]

Multilingual Embeddings Jointly Induced from Contexts and Concepts: Simple, Strong and Scalable.
Philipp Dufter, Mengjie Zhao, Hinrich Schütze.
arxiv 2018. [Paper]

Embedding Learning through Multilingual Concept Induction.
Philipp Dufter, Mengjie Zhao, Martin Schmitt, Alexander Fraser, Hinrich Schütze.
acl 2018. [Paper] [Poster] [Resources]


Theses

Branch-and-Cut Algorithms for the Distributionally Robust Capacitated Vehicle Routing Problem. MSc-Thesis supervised by Prof. Wolfram Wiesemann. [Paper which is partly based on the thesis.]

Positively Excited Random Walks on Integers. BSc-Thesis supervised by Prof. Noam Berger Steiger.


Teaching & Supervisions

A Comparative Study of Positional Information in Self-Attention Artificial Neural Networks. Master's-Thesis. 2020.

Semantic Text Classification Using Deep Learning. Master's-Thesis. 2020.

Predicting Commonsense Knowledge Using Pretrained Language Models. Bachelor's-Thesis. 2020.

Neural Methods in Document Similarity Detection and Information Retrieval. Master's-Thesis (co-supervised). 2019.

Cross-Lingual Named Entity Recognition. Master's-Thesis (co-supervised). 2019.

Webcrawling of a Bavarian low-resource corpus. Bachelor's-Thesis. 2019.

Teaching Assistent for "Basics of Computational Linguistics". Master Computer Linguistics. 2018/2019.

Deep Learning for Text Classification. Bachelor's-Thesis. 2018.

Machine Learning for Automated Detection of Fake News. Bachelor's-Thesis. 2018.

Deep Learning for Extraction of Opinion Entities. Bachelor's-Thesis. 2017.

Teaching Assistent for "Probability Theory". Bachelor Mathematics. 2014.

Teaching Assistent for "Mathematics II". Bachelor Civil Engineering. 2013/2014.


Community

Reviewer at ACL21, NAACl21, EACL21, EMNLP20, COLING20, ACL19, NAACL18, EMNLP18


Talks

2021 Berlin Machine Learning Meetup

2021 Conference on Hate Speech Detection Hildesheim

2020 EMNLP Main conference / SIGTYP workshop

2019 EMNLP Main conference

2019 Applied.ai

2019 Munich Datageeks Meetup