Schedule – Advanced Language Processing Winter School

The speakers will provide pre-recorded lectures. Free slots allow you to watch those lectures before the Q&A sessions. The schedule can be downloaded as an ics calendar here (bottom left).

CET	Monday 18/01	Tuesday 19/01	Wednesday 20/01	Thursday 21/01	Friday 22/01
8-9				Gather town Poster Session 2	Zoom Q&A Tim Baldwin
9-10				Gather town Poster Session 2
10-11			Social		Slack Lab Session Isabelle Augenstein
11-12		Zoom Q&A Claire Gardent	Social	Zoom Q&A Grzegorz Chrupała
12-13
13-14
14-15	Slack Lab Session Kyunghyun Cho	Gather town Poster Session 1			Zoom Q&A Isabelle Augenstein
15-16		Gather town Poster Session 1
16-17			Zoom Q&A Laurent Besacier
17-18	Zoom Q&A Kyunghyun Cho			Gather town Poster Session 3
18-19			Zoom Q&A Yejin Choi	Gather town Poster Session 3

Poster Sessions

Session 1

A1 Yanai Elazar
Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals
[Abstract]
A growing body of work makes use of probing in order to investigate the working of neural models, often considered black boxes. Recently, an ongoing debate emerged surrounding the limitations of the probing paradigm. In this work, we point out the inability to infer behavioral conclusions from probing results, and offer an alternative method which focuses on how the information is being used, rather than on what information is encoded. Our method, Amnesic Probing, follows the intuition that the utility of a property for a given task can be assessed by measuring the influence of a causal intervention which removes it from the representation. Equipped with this new analysis tool, we can ask questions that were not possible before, e.g. is part-of-speech information important for word prediction? We perform a series of analyses on BERT to answer these types of questions. Our findings demonstrate that conventional probing performance is not correlated to task importance, and we call for increased scrutiny of claims that draw behavioral or causal conclusions from probing results.
A3 Katsiaryna Krasnashchok
RUNE: automating GDPR compliance based on natural language contracts.
[Abstract]
ASGARD is a multi-faceted research project at EURA NOVA, in collaboration with the Walloon region of Belgium, dedicated to tackling the highest barriers between R&D and industry and delivering applicable and effective state-of-the-art solutions. The RUNE track’s goal is to create an open-source end-to-end system to facilitate the implementation of privacy by design for the business – a task that became increasingly important after May 2018, when the GDPR came into force. The system is aimed at ensuring compliance for business activities involving personal data, based on written privacy policies and data processing agreements. One of the biggest challenges of the track is the information extraction from written legal documents: permissions, prohibitions and obligations regarding various personal data categories. For the system to function properly, the extraction should be accurate and complete. Additionally, the lack of appropriate models and annotated data for post-GDPR privacy policies creates another level of complexity for the project. So far we have successfully introduced a conceptual model for privacy policies, and our current challenge is to populate the model using the data and resources available to us. We utilize the pre-GDPR dataset of privacy policies and map its attributes to our model to produce the rules to be used in the downstream applications. Anticipating some level of inaccuracy, we allow and encourage human intervention in our system, which will make it reliable and provide feedback to further improve the rule extraction.
A4 Nora Kassner
Are Pretrained Language Models Symbolic Reasoners Over Knowledge?
[Abstract]
How can pretrained language models (PLMs) learn factual knowledge from the training set? We investigate the two most important mechanisms: reasoning and memorization. Prior work has attempted to quantify the number of facts PLMs learn, but we present, using synthetic data, the first study that investigates the causal relation between facts present in training and facts learned by the PLM. For reasoning, we show that PLMs seem to learn to apply some symbolic reasoning rules correctly but struggle with others, including two-hop reasoning. Further analysis suggests that even the application of learned reasoning rules is flawed. For memorization, we identify schemaconformity (facts systematically supported byother facts) and frequency as key factors for its success.
A5 Vít Novotný
When FastText Pays Attention: Efficient Estimation of Word Representations Using Positional Weighting
[Abstract]
Since the seminal work of Mikolov and colleagues, word vectors of log-bilinear models have found their way into many NLP applications. Later, Mikolov and colleagues have equipped their log-bilinear model with positional weighting that allowed them to reach state-of-the-art performance on the word analogy task. Although the positional model improves accuracy on the intrinsic word analogy task, prior work has neglected qualitative evaluation of its linguistic properties as well as quantitative evaluation on extrinsic end tasks. We open-source the positional model and we evaluate it using qualitative and quantitative tasks. We show that the positional model captures information about parts of speech and self-information. We also show that the positional model consistently outperforms non-positional models on text classification and language modeling.
A7 Marina Speranskaya
Ranking vs. Classifying: Measuring Knowledge Base Completion Quality
[Abstract]
Knowledge base completion (KBC) methods aim at inferring missing facts from the information present in a knowledge base (KB). Such a method thus needs to estimate the likelihood of candidate facts and ultimately to distinguish between true facts and false ones to avoid compromising the KB with untrue information. In the prevailing evaluation paradigm, however, models do not actually decide whether a new fact should be accepted or not but are solely judged on the position of true facts in a likelihood ranking with other candidates. We argue that consideration of binary predictions is essential to reflect the actual KBC quality, and propose a novel evaluation paradigm, designed to provide more transparent model selection criteria for a realistic scenario. We construct the data set FB14k-QAQ with an alternative evaluation data structure: instead of single facts, we use KB queries, i.e., facts where one entity is replaced with a variable, and construct corresponding sets of entities that are correct answers. We evaluate a number of state-of-the-art KB embeddings models on our new benchmark. The differences in relative performance between ranking-based and classification-based evaluation that we observe in our experiments confirm our hypothesis that good performance on the ranking task does not necessarily translate to good performance on the actual completion task. Our results motivate future work on KB embedding models with better prediction separability, and, as a first step in that direction, we propose a simple variant of TransE that encourages thresholding and achieves a significant improvement in classification F1 score.
A8 Nikolai Ilinykh
Multi-Head Self-Attention in Transformers for Image Captioning: Preliminary Study on Input Features and Heads’ Roles
[Abstract]
Using transformer architectures (Vaswani et al., 2017) has become very popular for various Vision and Language tasks (Tan et al., 2019; Herdade et al., 2019; Lu et al., 2019). These networks’ strength is in the self-attention mechanism, which learns to connect different image representation parts with the generated caption’s words. However, a little has been researched to interpret and explain the transformer’s attention to image objects for image captioning. This poster will describe our initial experiments on investigating how attention heads in the simple transformer architecture can be explained to have particular ‘roles’ in terms of their confidence/focus on the image objects. We perform multiple experiments with different image feature representations (top-down vs bottom-up as defined by Anderson et al., 2018) and look at the differences between the attention weights that various heads assign to the image objects. Our preliminary results indicate that some heads spread their attention towards the whole image, while others seem to be focused on particular objects. In the future, we are planning to investigate if it is possible to identify each confident head’s role. For example, an attention head can focus on the largest objects in the scene, as shown by its confidence score.
A9 Qixiang Fang
Predicting responses to survey questions from question embeddings
[Abstract]
Social constructs like personal values and political orientation are traditionally measured with survey questions. Well-designed survey questions should thus be able to elicit responses that accurately capture the constructs of interest. Research shows that such responses are influenced by not only the meaning (i.e. the underlying construct of interest) but also the form of the questions (e.g. language style; question length). It has been, however, a challenge to incorporate these two aspects in prediction models of survey responses. It is the goal of our project to look into the possibility of using language models and sentence vectors to tackle this problem. We will focus on three tasks: 1) construction of a corpus of survey questions with which a language model can be pre-trained; 2) representation of survey questions as vectors; 3) prediction of survey question responses.
A10 Paul Lerner
Fusing Text, Image and Knowledge for Question Answering about Entities
[Abstract]
While Visual Question Answering (VQA) has spawned numerous work in the last years, most popular benchmarks are limited by questions about coarse-grained object categories (e.g. person, car) that do not require any knowledge beyond vision (e.g. “What color is the bus?”) We aim at answering questions about entities (e.g. “When was the bus company founded?”) by exploiting the interactions between language, vision and knowledge. Recent work (Shah et al., 2019) have set a first stone — limited to automatically created questions about persons — we plan to extend it to various entity types, using existing textual QA datasets. This raises many challenges, including visual representation of non-person entities, e.g. a company can be associated with its logo, but also with its main products or even its CEO.
A11 Malvina Nikandrou
Continual Learning in Visual Dialog
[Abstract]
Grounded Visual Dialog refers to the problem of developing agents that are capable of natural language interactions in visual context. In cooperative settings of goal-oriented dialog, agents are required to perform two types of grounding: Multimodal symbol grounding, as currently applied in the tasks such as Visual Question Answering, and conversational grounding, where two speakers coordinate their contextual understanding. While humans are able to continually adapt to new contexts and reuse information from past conversations, artificial agents suffer from catastrophic forgetting when trained progressively on a sequence of tasks. We formulate a continual learning setting based on the GuessWhat?! Dataset and compare how successfully two popular approaches can prevent catastrophic forgetting.
A12 Nadine El Naggar
Using Symbolic Stacks with Neural Networks to Improve Compositionality in NLP
[Abstract]
Deep Neural Networks (NNs) have become the state-of-the-art technique for Natural Language Processing (NLP) tasks. They learn very effectively from large amounts of data, but they do not learn like humans. Humans exhibit systematic compositionality. Recent research indicates that NNs require huge training sets because they lack systematic compositionality and learn by memorisation. The lack of systematic learning in NNs is evident in their inability to learn hierarchical grammatical structures. Traditional NLP applications often rely on Abstract Data Types (ADTs) such as stacks and queues to learn grammatical structures. We found in experiments that standard NNs do not learn the inversion of subsequences, as in the SCAN task, and the integration of ADTs is a natural way to address this problem. The goal of our research is to develop methods for tight integration of symbolic ADTs with NNs to achieve compositional generalisation in NLP tasks. We are starting with a stack model and the approach is to create synthetic gradients that enable direct integration into learning with backpropagation.
A13 Bryan Eikema
The Inadequacy of the Mode in Neural Machine Translation
[Abstract]
Recent studies have revealed a number of pathologies of neural machine translation (NMT) systems. Hypotheses explaining these mostly suggest there is something fundamentally wrong with NMT as a model or its training algorithm, maximum likelihood estimation (MLE). Most of this evidence was gathered using maximum a posteriori (MAP) decoding, a decision rule aimed at identifying the highest-scoring translation, i.e. the mode. We argue that the evidence corroborates the inadequacy of MAP decoding more than casts doubt on the model and its training algorithm. In this work, we show that translation distributions do reproduce various statistics of the data well, but that beam search strays from such statistics. We show that some of the known pathologies and biases of NMT are due to MAP decoding and not to NMT’s statistical assumptions nor MLE. In particular, we show that the most likely translations under the model accumulate so little probability mass that the mode can be considered essentially arbitrary. We therefore advocate for the use of decision rules that take into account the translation distribution holistically. We show that an approximation to minimum Bayes risk decoding gives competitive results confirming that NMT models do capture important aspects of translation well in expectation.
A14 Emily Allaway
Zero-Shot Stance Detection: A Dataset and Model using Generalized Topic Representations
[Abstract]
Stance detection is an important component of understanding hidden influences in everyday life. Since there are thousands of potential topics to take a stance on, most with little to no training data, we focus on zero-shot stance detection: classifying stance from no training examples. In this work, we present a new dataset for zero-shot stance detection that captures a wider range of topics and lexical variation than in previous datasets. Additionally, we propose a new model for stance detection that implicitly captures relationships between topics using generalized topic representations and show that this model improves performance on a number of challenging linguistic phenomena.
A15 Mauricio Mazuecos
On the role of effective and referring questions in GuessWhat?!
[Abstract]
Task success is the standard metric used to evaluate referential visual dialogue systems. In this paper we propose two new metrics that evaluate how each question contributes to the goal. First, we measure how effective each question is by evaluating whether the question discards objects that are not the referent. Second, we define referring questions as those that univocally identify one object in the image. We report the new metrics for human dialogues and for state of the art publicly available models on GuessWhat?!. Regarding our first metric, we find that successful dialogues do not have a higher percentage of effective questions for most models. With respect to the second metric, humans make questions at the end of the dialogue that are referring, confirming their guess before guessing. Human dialogues that use this strategy have a higher task success but models do not seem to learn it.
A16 Ravfogel Shauli
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
[Abstract]
The ability to control for the kinds of information encoded in neural representation has a variety of use cases, especially in light of the challenge of interpreting these models. We present Iterative Null-space Projection (INLP), a novel method for removing information from neural representations. Our method is based on repeated training of linear classifiers that predict a certain property we aim to remove, followed by projection of the representations on their null-space. By doing so, the classifiers become oblivious to that target property, making it hard to linearly separate the data according to it. While applicable for multiple uses, we evaluate our method on bias and fairness use-cases, and show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.
A17 Pierre Colombo
Heavy-tailed Representations, Text Polarity Classification and Data Augmentation
[Abstract]
The dominant approaches to text representation in natural language rely on learning embeddings on massive corpora which have convenient properties such as compositionality and distance preservation. In this paper, we develop a novel method to learn a heavy-tailed embedding with desirable regularity properties regarding the distributional tails, which allows to analyze the points far away from the distribution bulk using the framework of multivariate extreme value theory. In particular, a classifier dedicated to the tails of the proposed embedding is obtained which exhibits a \textit{scale invariance} property exploited in a novel text generation method for label preserving dataset augmentation. Experiments on synthetic and real text data show the relevance of the proposed framework and confirm that this method generates meaningful sentences with controllable attribute, \textit{e.g.} positive or negative sentiments.
A18 Liesbeth Allein
Your are how you write and what you share: Leveraging correlations between user and news article writing style for disinformation detection
[Abstract]
In this ongoing research, we investigate to which extent the social identity of a Twitter user correlates with the identity of the news article they share, and whether there are correlations signaling disinformation sharing. Following the Human Stylome Hypothesis, a person’s identity can be derived from their writing style. We therefore focus on correlations and disassociations between the writing style of a user and that of the news article they share.

Session 2

B1 Heereen Shim
From Free-Text to Structured Information: Free-text Sleep Diary Analysis
[Abstract]
A sleep diary, which is a log of sleep-related events, is a widely used method to access the quality of sleep. This research project is motivated by the potential use-case of dialogue-based sleep diary. The goal of this study to build an automated natural language processing (NLP) system that extracts structured time-related information from this free-text sleep diary to support sleep assessment. In this poster, we show the preliminary results of building a neural networks-based NLP system that detects sleep-related events and extracts timestamps from time expressions by reformulating it as machine translation task.
B2 Maël Fabien
BertAA – Bert fine-tuning for Authorship Attribution
[Abstract]
Identifying the author of a given text can be useful in historical literature, plagiarism detection, or criminal investigations. Authorship Attribution has been well studied and mostly relies on a feature engineering step to extracts features related to the style, the content, or other characteristics of a text. More recently, Convolutional Neural Networks or Siamese Networks have been explored for Authorship Attribution. In this paper, we introduce BertAA, a fine-tuning of a pre-trained BERT language model with an additional dense layer and a softmax activation to perform authorship classification. This approach overcomes the burden of feature engineering and reaches competitive performances on Enron Email, Blog Authorship, and IMDb (and IMDb62) datasets, up to 5.3% (relative) above current state-of-the-art approaches. We performed an exhaustive analysis that helped us to identify the strengths and weaknesses of the proposed method. In addition, we evaluate the impact of including additional features (e.g. stylometric and hybrid features) in an ensemble approach, improving the macro-averaged F1-Score by 2.7% (relative) on average.
B3 Joel Niklaus
ESRA: An End-to-End System for Re-Identification and Anonymization of Swiss Court Rulings
[Abstract]
Justice should be open and transparent to ensure the public understanding of court decisions. On the other hand, each person should have the right to privacy and in particular the right to be forgotten. With this work we try to find a balance in this antagonism. The literature for anonymization of unstructured text documents is thin and for court decisions virtually non-existent. We plan to implement an end-to-end system for anonymization and re-identification of Swiss court decisions. This system will serve as a proof of concept that both the re-identification of a large part of manually anonymized court decisions is possible and that re-identification can be made significantly harder with the automated anonymization of our system. Our system will relieve legal experts of the burdensome task of manually anonymizing court decisions. Additionally, we hope to advance the knowledge in the field of text anonymization in general which will also serve many other fields.
B4 Adrian Doyle
New Challenges and a Very Old Language: Exploring the efficacy of modern solutions for challenges associated with tokenizing Old Irish.
[Abstract]
Old Irish text, written between the 7th and 10th centuries, represents a language which had only recently adopted the Roman alphabet, a factor which lends itself to extreme morphological variability. This, when added to the non-standard spelling, frequent code-mixing with Latin, and spacing based primarily on stressed morphemes, makes for a language which can be difficult to automatically parse. Even tokenization and part-of-speech tagging, rudimentary precursors to downstream NLP techniques, prove to be quite difficult as the traditional grammar of the language makes frequent reference to the ways in which words may be combined, often without clarifying where word boundaries ultimately lie. In the case of ‘infixed pronouns’ the word representing the object of a verb occurs between two morphemes, each morpheme being a part of the verb. These morphemes should, therefore, make up a single token together, though separated by the pronoun. My research to date has focused on the potential application to Old Irish of established techniques for tokenization of languages which do not make standard use of white space to separate words. A character-level LSTM-based RNN model has shown some success in this area, though the relative scarcity of surviving text makes it difficult to train neural models on sufficient quantities of linguistic data. I have also shown that doctoral and post-doctoral annotators do not agree strongly when asked to manually tokenize, leading to my current work on the development of a new word-separation standard intended to enable tokenization while respecting the traditional grammar of Old Irish.
B5 Irina Nikishina
Studying Taxonomy Enrichment on Diachronic WordNet Versions
[Abstract]
Ontologies, taxonomies and thesauri have always been in high demand in a large number of NLP tasks. However, most studies are focused on the creation of lexical resources rather than maintaining the existing ones and keeping them up-to-date. At the same time, the manual annotation process is too costly: it is time-consuming and requires language or domain experts. In this paper we address the problem of taxonomy enrichment — given words that are not included in a taxonomy, associate each word with the appropriate hypernyms from it. Namely, we explore the possibilities of taxonomy extension in a resource-poor setting. We present a bunch of methods which are applicable to a large number of languages and are even able to identify some cases of polysemy which were not reflected in taxonomies. We find that this task does not benefit from context-informed embeddings (BERT), but can make use of Wiktionary and possibly other dictionaries. We create a novel English dataset for training and evaluation of taxonomy enrichment systems and describe a technique of creating such datasets for other languages.
B6 Sardana Ivanova
Tools for supporting language learning for Sakha
[Abstract]
Our work presents an overview of the available linguistic resources for the Sakha language, and presents new tools for supporting language learning for Sakha. The essential resources include a morphological analyzer, digital dictionaries, and corpora of Sakha texts. Based on these resources, we implement a language-learning environment for Sakha in the Revita CALL platform. We extended an earlier, preliminary version of the morphological analyzer/transducer, built on the Apertium finite-state platform. The analyzer currently has an adequate level of coverage, between 86% and 89% on two Sakha corpora. Revita is a freely available online language learning platform for learners beyond the beginner level. We describe the tools for Sakha currently integrated into the Revita platform. To the best of our knowledge, at present, this is the first large-scale project undertaken to support intermediate-advanced learners of a minority Siberian language.
B7 Michal Bien
RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation
[Abstract]
Semi-structured text generation is a non-trivial problem. Although last years have brought lots of improvements in natural language generation, thanks to the development of neural models trained on large scale datasets, these approaches still struggle with producing structured, context- and commonsense- aware texts. Moreover, it is not clear how to evaluate the quality of generated texts. To address these problems, we introduce RecipeNLG – a novel dataset of cooking recipes.
B9 Mayank Jobanputra
Unsupervised Question Answering for Fact-Checking
[Abstract]
Recent Deep Learning (DL) models have succeeded in achieving human-level accuracy on various natural language tasks such as question-answering, natural language inference (NLI), and textual entailment. These tasks not only require the contextual knowledge but also the reasoning abilities to be solved efficiently. In this paper, we propose an unsupervised question-answering based approach for a similar task, fact-checking. We transform the FEVER dataset into a Cloze-task by masking named entities provided in the claims. To predict the answer token, we utilize a fine-tuned BERT. The classifier computes label based on the correctly answered questions and a threshold. Currently, the classifier is able to classify the claims as” SUPPORTS” and” MANUAL_REVIEW”. This approach achieves a label accuracy of 80.2% on the development set and 80.25% on the test set of the transformed dataset.
B10 Roman Lyapin
Cross-Lingual Transfer For Japanese Question Answering
[Abstract]
Recent years have been marked with a rapid progress in NLP using transformers and transfer learning. Typical workflow involves pretraining a large model on a variation of language modeling task (e.g. masked language modeling) and finetuning the resulting model on downstream tasks (e.g. SQuAD, SuperGLUE). This approach turned out to be working well and helped to improve performance on a wide range of NLP tasks. This strategy, however, might be difficult to replicate for languages other than English. While recently there has been progress on preparing corpora (e.g. Wikipedia, Common Crawl) and training Bert models for more languages (e.g. Camembert, Flambert, Deepset German Bert, Tohoku Japanese Bert) there is still a shortage in labeled non-English datasets that can be used for finetuning (with a notable exception of SQuAD-like datasets like FQuAD or SberQuad). The alternative lies in training models for all languages simultaneously and there are recent results (e.g. mBart) that show that multilingual pretraining allows efficient zero-shot transfer between languages. The research I want to present extends this idea and shows that multilingual Bert and XLM-Roberta models are also capable of cross-lingual transfer. Specifically, I demonstrate that these models can do Japanese QA after finetuning on English SQuAD and explore how such cross-lingual transfer holds for other language pairs.
B12 Rohan Kumar
Logic Constrained Pointer Networks for Interpretable Textual Similarity
[Abstract]
Systematically discovering semantic relationships in text is an important and extensively studied area in Natural Language Processing, with various tasks such as entailment, semantic similarity, etc. Decomposability of sentence-level scores via subsequence alignments has been proposed as a way to make models more interpretable. We study the problem of aligning components of sentences leading to an interpretable model for semantic textual similarity. In this paper, we introduce a novel pointer network based model with a sentinel gating function to align constituent chunks, which are represented using BERT. We improve this base model with a loss function to equally penalize misalignments in both sentences, ensuring the alignments are bidirectional. Finally, to guide the network with structured external knowledge, we introduce first-order logic constraints based on ConceptNet and syntactic knowledge. The model achieves an F1 score of 97.73 and 96.32 on the benchmark SemEval datasets for the chunk alignment task, showing large improvements over the existing solutions.
B14 Oksana Dereza
Diachronic Word Embeddings for Historical Languages: the Case of Early Irish
[Abstract]
The surge of interest to distributional semantics has lately reached historical linguistics. The recently emerged concept of diachronic, or dynamic embeddings transforms the task of language modelling into the task of modelling language change. This work is aimed at finding an optimal solution to this problem for historical languages, such as Old and Middle Irish, Gothic, Latin, Ancient Greek, Old Church Slavonic etc. Most of the published work on diachronic word embeddings is focused on semantic change in modern languages and covers only a short time span, not exceeding two centuries. The scope of languages that have been used in experiments so far is quite narrow, and none of the proposed methods has been tested in non-ideal conditions, with hindering factors such as high spelling variation, substantial grammatical changes or the lack of data. Given the open challenges outlined in the previous section, we would like to focus on historical languages as they allow us to address several aspects of diachronic language modelling that have not yet received proper attention. Firstly, working with a larger period of time makes it possible to track not only semantic shifts, but also developments in morphology and syntax, which evolve slower than lexicon. Secondly, historical language data tends to be both scarce and inconsistent, which provides a perfect test case to evaluate how robust existing algorithms are and to map out the ways of their improvement.
B15 Michal Štefánik
On Eliminating Inductive Biases of Deep Language Models
[Abstract]
A development of current state-of-the-art language models is heavily focused on Transformers architecture, that can be pre-trained on vast pre-training corpora and relatively quickly be fine-tuned to a specific task. On pre-training stage, these models are usually trained on Masked Language Modeling, while on fine-tuning stage, models are trained to minimise cross-entropy on either token-level, or sequence-level tasks. While these fine-tuning objectives can be successfully followed, it is often for the price of the loss of generality of the network. Significantly, such generality loss is often misperceived during the training process, as both the loss and measured performance is tightly bound to the optimised objective. This in consequence causes the network to prone to what Kahneman in humans call Availability of Heuristics – the system seeks for every shortcut heuristics, that would allow him for cutting down the loss any further. Our experiments show this problem with neural translators, where model overfits specific length of parallel corpus pairs, that can not be observed on reported BLEU on own validation data set, or with summarisation, where the model learns to write syntactically-coherent output, but lacks to frame and propagate key points of the input. Again, being able to properly match the declension and other morphology also has beneficial, yet misleading impact on validation ROUGE. This poster aims to familiarise the audience with this common flaw, analyse its reasons and to outline and promote the possible research directions, that could help ou language systems in a way towards higher levels of generality.
B16 Gyuwan Kim
Large Product Key Memory for Pretrained Language Models
[Abstract]
Product key memory (PKM) proposed by Lample et al. (2019) enables to improve prediction accuracy by increasing model capacity efficiently with insignificant computational overhead. However, their empirical application is only limited to causal language modeling. Motivated by the recent success of pretrained language models (PLMs), we investigate how to incorporate large PKM into PLMs that can be finetuned for a wide variety of downstream NLP tasks. We define a new memory usage metric, and careful observation using this metric reveals that most memory slots remain outdated during the training of PKM-augmented models. To train better PLMs by tackling this issue, we propose simple but effective solutions: (1) initialization from the model weights pretrained without memory and (2) augmenting PKM by addition rather than replacing a feed-forward network. We verify that both of them are crucial for the pretraining of PKM-augmented PLMs, enhancing memory utilization and downstream performance.
B17 Aman Sinha
When Information Retrieval met Knowledge Graph
[Abstract]
The amount of literature present in any research field is increasing day by day, and selecting relevant materials by investigating each one of them is not manually possible. Information retrieval (IR) techniques provide us with tools to extract the information in a qualitative manner in terms of similarity, relevance, and other measurable heuristics. Although, general IR tools cannot be applied to every field because of the domain knowledge limitation. Recently, domain-specific representation methods have shown to model better representation and can be handy for this purpose. The graphical aspect of such data is also acknowledged by the research community. Therefore, we propose to build a graph representation model (GReMIE) for extracting relevant information that could be used to reduce the search space and compare the effect of the increasing granularity of linguistic content in the models.

Session 3

C1 Anna Liednikova
Learning Health-Bots from Training Data that was Automatically Created using Paraphrase Detection and Expert Knowledge
[Abstract]
A key bottleneck for developing dialog models is the lack of adequate training data. Due to privacy issues, dialog data is even scarcer in the health domain. We propose a novel method for creating dialog corpora which we apply to the health domain. We want to show that this approach not only allow for the semi-automatic creation of large quantities of training data, but also provides a natural way of guiding learning and a novel method for assessing the quality of human-machine interactions.
C3 Mitzy Gabriela Sánchez Sánchez
Humor Detection in Spanish Tweets
[Abstract]
Automatic recognition of humor is a complex task because humor is ambiguous and different among people, even with people with similar backgrounds. This means that humor has not been fully characterized. Although several approaches to detecting humor have been presented, most of them are for English language. This work aims to identify humor in Spanish applying several machine learning techniques, such as text classification and regression. It is proposed to work with a corpus of Spanish tweets extracted from humorous and non-humorous accounts.
C4 Christopher Klamm
Stop the Shutdown: Tracking Public Reactions to Policy Responses to the Coronavirus Pandemic around the World.
[Abstract]
Crises (e.g., COVID-19) have caused enormous pressure upon policy-makers to make critical strategy decisions under hard time constraints. In times of crisis, policy-makers may use different strategies to attempt to constrain the crisis. The acceptance of these strategies can hardly be anticipated and it is often unclear which strategies are more effective to convince the citizens of implementing protective measures for reducing the infection rate. With colleagues, I try to answer: How do citizens respond to the strategies adopted by policy-makers in different geographical areas? Using a large volume of unstructured data (e.g., Twitter posts in multiple languages).
C5 Vésteinn Snæbjarnarson
ByteBERT: Masked language modeling for morphologically rich languages (IceBERT)
[Abstract]
Generic neural language models such as BERT have in recent years shown great success as an off the shelf component ready to be used for development of more complex tasks such as question answering and named entity recognition. The standard approach to input representation for language modeling represents words either whole or as subwords and ignores character level information. It has been shown that this approach can be detrimental, particularly when word formation is not performed by compounding (as in english) or concatenation (as in Danish). We propose a modified masked language model objective where the targets are not fixed subwords (or whole words). This is achieved with a modified transformer architecture that convolves over character level features. To verify our hypothesis we compare results on downstream tasks for the morphologically rich language Icelandic and the rather analytical English.
C6 Harry Kamdem Fezeu
Fair language tech
[Abstract]
Presenting the early results of our work to make sure ASR technologies benefit people around the world fairly
C7 Cristian Cardellino
Exploration of Multiple Supervised Tasks in e-Commerce Marketplace
[Abstract]
In the context of the marketplace of an e-Commerce company, there are multiple tasks that need to be addressed in a daily basis. Examples of these tasks are the detection of counterfeit or forbidden products, estimation of measures, among others. In this scenario, those methods that guarantee good performance over different tasks, with little time investment in their implementation, are bound to be the best ones. We explored several baselines and models, including the use of navigation information, in order to find those algorithms that better served for multiple tasks.
C10 Ronald Cardenas Acosta
Unsupervised Extractive Summarization by Human Memory Simulation
[Abstract]
Summarization systems face the core challenge of identifying and selecting important information. In this paper, we tackle the problem of content selection in unsupervised extractive summarization of long, structured documents. We introduce a wide range of heuristics that leverage cognitive representations of content units and how these are retained or forgotten in human memory. We find that properties of these representations of human memory can be exploited to capture relevance of content units in scientific articles. Experiments show that our proposed heuristics are effective at leveraging cognitive structures and the organization of the document (i.e. sections of an article), and automatic and human evaluations provide strong evidence that these heuristics extract more summary-worthy content units.
C11 Allmin Susaiyah
Neural NLG system for Behaviour Insight Text Synthesis
[Abstract]
Insight texts are generated from personal health data to improve user behaviour. Current synthesis of such insight texts is performed using templates and rules. Although robust, these dont generalise well. In this Poster, we show our experiments using neural NLG systems to study its understanding of structure of existing templates and the ability to generalise them.
C12 Daryna Dementieva
It’s harder to spread fakes across many languages: multilingual evidence improves fake news detection
[Abstract]
Misleading information spreads on the Internet at an incredible speed, which can lead to irreparable consequences in some cases. As a result, it becoming essential to develop fake news detection technologies. While substantial work has been done in this direction, one of the limitations of the current approaches is that these models are focused only on one language and do not use multilingual information. In this work, we propose the new technique based on multilingual evidence that can be used for fake news detection and improve existing approaches. The hypothesis of the usage of multilingual evidence as a feature for fake news detection is confirmed by the two experiments based on a set of known true and fake news. Moreover, we show that integration of our feature into a baseline fake news detection system (Pérez-Rosas et al., 2018) yields significant improvements.
C13 Joseph Marvin Imperial
A Simple Post-Processing Technique for Improving Readability Assessment of Texts using Word Mover’s Distance
[Abstract]
Assessing the proper difficulty levels of reading materials or texts in general is the first step towards effective comprehension and learning. In this study, we improve the conventional methodology of automatic readability assessment by incorporating the Word Mover’s Distance (WMD) of ranked texts as an additional post-processing technique to further ground the difficulty level given by a model. Results of our experiments on three multilingual datasets in Filipino, German, and English show that the post-processing technique outperforms previous vanilla and ranking-based models using SVM.
C14 Amit Moryossef
Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation
[Abstract]
Data-to-text generation can be conceptually divided into two parts: ordering and structuring the information (planning), and generating fluent language describing the information (realization). Modern neural generation systems conflate these two steps into a single end-to-end differentiable system. We propose to split the generation process into a symbolic text-planning stage that is faithful to the input, followed by a neural generation stage that focuses only on realization. For training a plan-to-text generator, we present a method for matching reference texts to their corresponding text plans. For inference time, we describe a method for selecting high-quality text plans for new inputs. We implement and evaluate our approach on the WebNLG benchmark. Our results demonstrate that decoupling text planning from neural realization indeed improves the system’s reliability and adequacy while maintaining fluent output. We observe improvements both in BLEU scores and in manual evaluations. Another benefit of our approach is the ability to output diverse realizations of the same input, paving the way to explicit control over the generated text structure.
C15 Hadeel Al-Negheimish
Discrete Reasoning Templates for Natural Language Understanding
[Abstract]
State of the art models in natural language reading comprehension are usually based on Language Models with millions of parameters that have been pre-trained on massive amounts of data. Yet, these models suffer when they are asked to combine information from multiple parts of text or to reason about a certain part to derive an answer. We work on developing a complementary approach based on reasoning templates that exploits the power of the contextualized representations these LMs provide with symbolic reasoning. We evaluate it on a subset of the DROP dataset for discrete and symbolic reasoning over the content of paragraphs and find that it is competitive with SOTA while being interpretable and requiring little supervision.
C17 Kelvin Han
Generating varied questions from meaning representations
[Abstract]
The task of automatically generating questions is gaining renewed attention recently, in part due to its potential to support and further develop the wider task of question and answering, as well as the arrival of recent advancements in approaches and tools in the field of NLP. As part of my Master 2 internship (ending in August 2020), I am studying the task of generating questions from knowledge bases. Specifically, I have crowdsourced questions pertaining to RDF triples present in the WebNLG dataset, and I have adopted a sequence-to-sequence approach to generate these questions, involving the pre-training and fine-tuning of different Transformer-based models.