At last, a computer that understands you like your mother.
--1985, McDonnell-Douglas ad (Lee, 2004)


  • Job opening: 2 year postdoctoral researcher, see here
  • I have been awarded an Amazon Research Award to work on Transfer Learning in Natural Language Processing (Multi-task Sequence Learning under Adverse Conditions) (IT University of Copenhagen press release

  • Current/Upcoming

    Recent events

    • December 2018: I gave an invited talk on Multi-task Learning in Natural Language Processing at Turku AI Meetup, Finland, December 19, 2018
    • December 2018: I gave an invited talk on Transfer Learning in Turku, Finland, December 19, 2018
    • June 2018: I gave an invited talk at the LCT day 2018 in Nancy, France, June 27, 2018
    • 2017-2018: NAACL 2018 area chair
    • June 2018: Invited talk at Uppsala University, Lectures on Language Technology and Machine Learning, Uppsala, June 8, 2018
    • June 2018: WiNLP 2018 career panel discussion, New Orleans, June 2, 2018
    • June 2018: NAACL 2018 ethics in NLP panel discussion, New Orleans, June 3, 2018
    • June 2018: keynote speaker at the NAACL workshop on Stylistic Variation, New Orleans, June 5, 2018
    • June 2018: keynote speaker at the NAACL workshop on Subword & Character Level Models in NLP (SCLeM), New Orleans, June 6, 2018
    • March 2018: visiting scholar at University of Malta
    • March 2018: talk at the Natural Language Processing MeetUp at the University of Zurich, March 20, 2018
    • March 2018 talk at Women in Data Science WiDS 2018, Zurich, March 21, 2018
    • November 2017: Visited Edinburgh NLP to give a talk at the ILLC colloqium series (November 24, 2017)
    • September 2017: I won the IJCNLP 2017 shared task on multilingual customer feedback analysis (ranked 1st / 12 teams)!
    • September 6-11: I'll be at EMNLP 2017, Copenhagen
    • July 28-29: Excited to be invited speaker at the Google NLU (Natural Language Understanding) workshop, New York
    • July 2017: keynote speaker at PyData, Berlin, July, 2017
    • 2016-2017: ACL 2017 area co-chair (for tagging, chunking and parsing)
    • 2016-2017: EACL 2017 student research workshop senior faculty advisor


    What we need are Natural Language Processing (NLP) models that are more robust: that work better on unexpected input (like new domains or new languages) and can be trained from semi-automatically or weakly annotated data from a variety of sources. My research focuses on bringing NLP one step closer to this goal, by combining fortuitous data with proper machine learning algorithms to enable robust language technology.
    I am interested in learning under sample selection bias (domain adaptation, transfer learning), annotation bias (embracing annotator disagreements in learning) and generally, semi-supervised and weakly-supervised machine learning applied to cross-domain and cross-language natural language processing.

    Fortuitous data

    Ultimately, NLP should be able to handle any language and any domain. However, there is still a long way to go! Our models need training data, but annotated data is biased and scarce. One way to address this problem of training data sparsity is to leverage data that so far has been neglected or rests in non-obvious places. Such fortuitous data [1] includes using hyperlinks to build more robust Part-of-Speech taggers or named-entity recognizers, learning from annotator disagreement and using behavioral data such as gaze or keystrokes [2] to inform NLP. Read up more:

    1. Barbara Plank. What to do about non-standard (or non-canonical) language in NLP. In KONVENS 2016. [arXiv]
    2. Barbara Plank. Keystroke dynamics as signal for shallow syntactic parsing. The 26 th International Conference on Computational Linguistics (COLING). Osaka, Japan. [arXiv]
    3. Barbara Plank, Anders Johannsen and Željko Agić. Improving language technology with fortuitous data, ESSLLI 2016 summer school.

    Research group

    • Sigrid Klerke (postdoc)
    • Philip Ströbel (external PhD)
    • Current (external) Master's theses: Reinard van Dalen (Groningen), Jovana Urosevic (Malta, co-supervised with Lonneke van der Plas)

    Supervised PhD students

    • Hessel Haagsma (co-supervision with Johan Bos, 2017)
    • Johannes Bjerva (co-supervision with Johan Bos, 2017): defended (December 2017); now Postdoc at University of Copenhagen


    Selected publications (more)

    • Barbara Plank and Željko Agić. Distant supervision from disparate sources for low-resource part-of-speech tagging. In Proceedings of EMNLP 2018. [arXiv]
    • Sebastian Ruder and Barbara Plank. Strong Baselines for Neural Semi-supervised Learning under Domain Shift. In ACL 2018, Melbourne, Australia. [arXiv]
    • Sebastian Ruder and Barbara Plank. Learning to select data for transfer learning with Bayesian Optimization. In EMNLP 2017, Copenhagen, Denmark. [arXiv]
    • Héctor Martínez Alonso and Barbara Plank. When is multitask learning effective? Semantic sequence prediction under varying data conditions. In EACL (long). [pdf] [arXiv]
    • Barbara Plank. Keystroke dynamics as signal for shallow syntactic parsing. The 26th International Conference on Computational Linguistics (COLING). Osaka, Japan. [arXiv] received finalist for best paper award
    • Johannes Bjerva, Barbara Plank and Johan Bos. Semantic Tagging with Deep Residual Networks. The 26th International Conference on Computational Linguistics (COLING). Osaka, Japan. [arXiv]
    • Chloe Braud, Barbara Plank and Anders Søgaard. Multi-view and multi-task training of RST discourse parsers. The 26th International Conference on Computational Linguistics (COLING). [pdf]
    • Barbara Plank. What to do about non-standard (or non-canonical) language in NLP. In KONVENS 2016. [pdf] [arXiv]
    • Željko Agić, Anders Johannsen, Barbara Plank, Héctor Martínez Alonso, Natalie Schluter and Anders Søgaard. Multilingual Projection for Parsing Truly Low-Resource Languages. In [TACL], 2016.
    • Barbara Plank, Anders Søgaard and Yoav Goldberg. Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss. In ACL (short), 2016. [arXiv]
    • Ben Verhoeven, Walter Daelemans and Barbara Plank. TwiSty: a Multilingual Twitter Stylometry Corpus for Gender and Personality Profiling. In LREC 2016.
    • Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat and Barbara Plank. Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures. To appear in JAIR. [JAIR]

    Recent talks

    • November 2, 2018: EMNLP 2018 talk on "Distant Supervision from Disparate Sources" [Vimeo video ]
    • March 19, 2018, Zurich NLP MeetUp "Transfer Learning in NLP", Zurich [Tube Switch video ]
    • July 28, 2017, Google Research NLU workshop, New York
    • July 1, 2017: PyData 2017 Berlin, Natural Language Processing: Challenges and Next Frontiers [YouTube]
    • March 28, 2017, Geneva: "What to do about non-canonical data in NLP"
    • March 27, 2017, Geneva: "Multi-task learning in NLP: What? How? When?"
    • March 14, 2017, Keynote at the Nuance Research Conference (NRC) 2017: "Beyond text: fortuitous data and deep multi-task learning for processing non-standard text"
    • March 10, 2017, Milan: "Introduction to Natural Language Processing"
    • YRNLP, Osaka Japan, December 10, 2016, Young Researcher in Natural Language Processing in Japan: "Variety in research, research in variety"


    Professional Service

    • Chair & board member:
      • NoDaLiDa 2019 general chair
      • WiDS 2019 Copenhagen main organizer (with Natalie Schluter)
      • ACL 2019 workshop chair
      • NAACL 2018 area chair (Multilingual NLP including Phonology, Morphology and Word Segmentation)
      • ESSLLI 2018 Chair for Language and Computation
      • ACL 2017 area chair (Tagging, Chunking, Syntax and Parsing)
      • EACL 2017 Student research workshop faculty advisor
      • Editorial board member Computational Linguistics journal (2017-2019)
      • Editorial board member TACL journal (2018-2010)
      • ACL 2016 publicity chair
      • EMNLP 2015 publicity chair
    • Program committee for conferences: AAAI 2018, 2017, 2016; NIPS 2018, 2017, 2016; ACL 2018, 2017, 2016, 2015, 2014, 2013; EMNLP 2016, 2015, 2014; NAACL 2019, 2016; CoNLL 2017, 2016, 2015; COLING 2016, 2014; KONVENS 2016; IJCNLP 2014; *SEM 2015;
    • Program committee for workshops (selected): NAACL SRW 2016; CL4LC 2016; DADA 2016; MWE 2016,2015; LAW 2016; L&V 2016; NoDaLiDa 2013, 2015; NLPIT 2016, 2015; IWPT 2015; SemEval 2015; IJCAI 2013; CLIN 20;
    • Journals: PLOS ONE, 2016; Computational Linguistics; Information Processing and Management Journal 2013; Journal of Logic and Computation special issue, 2012; IMIX project book chapter 2011; JIS 2016;

    Bio, Teaching & more

    Short Bio

    • since May 2018: Associate Professor (tenured), IT University of Copenhagen (ITU)
    • April 2016-Mar 2018: Assistant Professor (tenured), University of Groningen (RUG)
    • Sep 2014-Mar 2016: Assistant Professor, CST, University of Copenhagen (UCPH)
    • Aug 2013-Aug 2014: Postdoc, CST, Copenhagen Lowlands
    • Nov 2011-Jun 2013: Postdoc, DISI, Trento LiMoSiNe project
    • 2007-2011: Ph.D., cum laude, University of Groningen
    • MSc European Masters Program in Language and Communication Technologies (EM-LCT), cum laude. Joint degree from the University of Bozen-Bolzano (Italy) and University of Amsterdam (UvA, The Netherlands) (2007).
    • BSc, Computer Science, University of Bozen-Bolzano (2005).


    • 2017-2018:
      • Deep Learning for Social Media Processing (course given as visiting scholar in Malta)
      • Language Technology project (i.e., project-based intro to Deep Learning for NLP, Master's level)
      • Collecting Data (Master in Digital Humanities)
      • Shared Task (Master's level)
      • Bachelorscriptie Informatiekunde
      • Computationele Grammatica
      • Inl.wetensch.onderzoek/Introduction to research methods
      • Digital Skills
    • 2016-2017:
      • Language Technology project (Master's)
      • Collecting Data (new Master in Digital Humanities)
      • Bachelorscriptie Informatiekunde
      • Computationele Grammatica
      • Inl.wetensch.onderzoek/Introduction to research methods
    • Summer 2016: ESSLLI 2016 summer school on Fortuitous data, Bozen-Bolzano
    • Spring 2016: Language Technology Project, RUG
    • Spring 2016: Language Processing 2, UCPH (initial lectures before departure)
    • Autumn 2015: Cognitive Science 1, UCPH
    • Spring 2015: Language Processing 2, UCPH
    • Autumn 2014: Cognitive Science 1, UCPH

    Code & Data

    Press & Media