Intel Domain Leader : Moshe Wasserllat
The Converstaional Speech Understanding technology deals with understanding, analyzing and extracting valuable insight from human-to-human, verbal and/or textual interactions (e.g. meetings). Unlike human-to-machine existing solutions (e.g. SIRI), the challenges induced by Conversational Understanding are currently unaddressed by the industry. It is a generic technology that can enable multiple capabilities critical to rising usages (e.g. Meeting Assistants, Business Analytics, Customers Experience etc…)
The key developments will include: Integrated Speech and Text understanding, Natural Language knowledge Graph representation, Personal user modeling (e.g. behavioral patterns), Events and relations extraction and discourse analysis (e.g. argumentation & deliberation)…
The projects
- Universal Semantics (UCCA)
- Automatic Measurement of Transcription Quality
- Holistic Inference for Natural Language Processing
- Open Information Extraction Knowledge Graphs
- Unsupervised Extraction of Relations and Events
- Hybrid Models for Minimally Supervised Information Extraction from Conversations
- Syntactic and Semantic Reranking of Speech Interaction Data
- Topic Dependent Language Modeling
- Providing People with Arguments during Persuasive Discussion
Academia Researcher(s): Prof. Ari Rappoport, Bar Ilan University
Participating Student(s):
Roy Schwartz
Elior Sulem
Effi Levi
Daniel Hershcovich
Research Project Summary:
The goal of the project is to develop a universal semantic-based annotation and analysis scheme (UCCA) for natural language, and to see how it can be used in real-world applications. Specifically, the scheme is based on semantics and is relatively syntax-independent, which means that applications in which syntax is unstable could greatly benefit from UCCA annotation. Examples applications include those related to free speech (e.g., conversation understanding), because it is well known that the syntax of free speech is “incorrect” when compared to written standards. An additional application area are the multi-lingual applications such as machine translation, because syntax is unstable across languages while semantics is. The claim here is extremely ambitious, because all existing language theory and applications (except those using only the trivial bag-of-words method) take the necessity for formal syntax for granted.
Ari Rappoport– Publications
- Roy Schwartz, Roi Reichart, Ari Rappoport, Symmetric Patterns and Coordinations: Fast and Enhanced Representations of Verbs and Adjectives, NAACL 2016
- Dana Rubinstein, Effi Levi, Roy Schwartz and Ari Rappopor, “How Well Do Distributional Models Capture Different Types of Semantic Knowledge?”, In proceedings of ACL 2015
- Roy Schwartz, Roi Reichart and Ari Rappoport, “Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction“, In proceedings of CoNLL 2015
Automatic Measurement of Transcription Quality
Academia Researcher(s): Prof. Moshe Koppel, Bar Ilan University
Participating Student: Roee Aharoni
Research Project Summary:
Text categorization methods will be used to distinguish texts obtained via automated transcription from native written texts. We hypothesize that the error rate of such experiments (as measured through cross-validation) is strongly correlated with transcription quality. Furthermore, markers of automated transcription can be identified for the purpose of improving automated transcription methods.
The method promises a cheap and quick way to evaluate a transcription system; it requires neither a reference set (i.e., reliable manual transcriptions) nor any voice files. A possible bonus is automatic improvement of such transcription systems.
Moshe Koppel – Publications
- Automatic Detection of Machine Translated Text and Translation Quality Estimation. Association for Computational Linguistics (ACL), Baltimore, Maryland, 2014
Holistic Inference for Natural Language Processing
Academia Researcher(s): Prof. Amir Globerson, Tel Aviv University
Participating Students:
Yoav Wald
Hillel Taub Tabib
Research Project Summary:
A key goal of machine learning is to build algorithms that can communicate using natural language. A major obstacle to achieving this goal is that prior knowledge and context are hard to represent, and even harder to use when reasoning about text. We propose to design algorithms which will use elaborate models of the world (e.g., knowledge bases, existing and constructed) for holistic natural language understanding. This involves many challenges, from designing scalable algorithms for inference, to unsupservised algorithms that can learn from vast amounts of text.
Amir Globerson – Publications
- Nir Rosenfeld, Mor Nitzan, Amir Globerson, Discriminative Learning of Infection Models.
Open Information Extraction Knowledge Graphs
Academia Researcher(s): Prof. Ido Dagan, Bar Ilan University
Participating Student: Vered Shwartz
Research Project Summary:
We propose to develop a method for construction of user-specific knowledge graphs, representing propositions extracted automatically from texts originating from natural language. The first phase of the project will focus on the construction of concept entailment graph, as a necessary first step towards the construction of proposition graphs and supporting other types of relations (e.g., contradiction, casualty).
The challenges include the development of methods for automatic acquisition of inference rules to identify textual entailment relations between concepts and propositions, the projection of general knowledge rules on user-specific graphs (also regarding the context-sensitivity of the rules), and active enrichment of the knowledge graph for target-concepts.
More broadly, hierarchical personal world knowledge may be useful in various settings, such as meeting/personal assistance applications, for the tasks of topic segmentation / agenda tracing, opinion mining / sentiment analysis, session enrichment and Information Retrieval. If successful, it will contribute to the development of these and similar applications.
Ido Dagan – Publications
- Adding Context to Semantic Data-Driven Paraphrasing
- Improving Hypernymy Detection with an Integrated Path-based and Distributional Method. Vered Shwartz, Yoav Goldberg and Ido Dagan (Submitted to ACL 2016)
Unsupervised Extraction of Relations and Events
Academia Researcher(s): Prof. Ronen Feldman, Hebrew University
Participating Student:
Zvi Ben Ami
Rachel Shapira
Research Project Summary:
We propose to develop an unsupervised method for extracting relations and events from free-formatted text. Both relations and events possess a great value for deeper semantic understanding of text. In the first phase of the project, we will focus on developing novel extraction methods for text with proper grammatical structure, such as text used in formal letters or news articles, as well as in less formal text such as that found in emails correspondences, for which conventional parsers should be adapted. At a later stage, we will adapt the system to process even less grammatically correct textual content generated by ASR (Automatic Speech Recognition) systems. The impact of the proposed research is expected to be on the method with which relations and events are extracted, but further can serve as a framework for other information extraction tasks. The industry-wide impact of the project is expected to be on many NLP applications, such as, information retrieval, queries searching, summarization, sentiment analysis and more.
Hybrid Models for Minimally Supervised Information Extraction from Conversations
Academia Researcher(s): Prof. Roi Reichart, Technion
Research Project Summary:
We propose to develop hybrid machine learning models, models that integrate traditional feature-based components with components that encode declarative expert knowledge about the nature of the task and the properties of the domain, for the processing of conversation transcripts. While hybrid models have shown useful for a variety of Natural Language Processing (NLP) tasks in many edited text domains, they have not been applied to the conversation domain yet. In this work we will therefor extend the reach of hybrid models to the conversation domain with the example application of minimally supervised extraction of entities, relations and event templates – as means of producing a summary of the conversation. In the first phase of the project we will focus on developing novel extraction methods for high quality conversation transcripts – i.e. the scenario where the content of the conversation is decoded with a minimal error rate. At a later stage, we will adapt the system to process the more realistic case where the ASR (Automatic Speech Recognition) system transcribing the conversation outputs a noisy transcription of its content.
Roi Reichart – Publications
- Roy Schwartz, Roi Reichart, Ari Rappoport, Symmetric Patterns and Coordinations: Fast and Enhanced Representations of Verbs and Adjectives, NAACL 2016
Syntactic and Semantic Reranking of Speech Interaction Data
Academia Researcher(s): Prof. Yoav Goldberg, Bar Ilan University
Research Project Summary:
Intel’s conversation understanding system crucially relies on a speech recognition component. This component often provides incorrect analyses. The project aim is to improve the quality of the speech recognition output by integrating linguistic cues (both syntactic and semantic) from the conversational domain into the scoring of candidate outputs. By taking into account linguistic cues as well as the conversational context (what has been said so far), we hope to increase the quality of the speech recognition output system, and as a consequence improve the accuracy of the conversation understanding component as a whole. The project tackles important and as yet under-explored areas in NLP community: identification of malformed input, and working on the document rather than on the sentence level.
Yoav Goldberg – Publications
- Hillel Taub-Tabib, Yoav Goldberg, Amir Globerson, Template Kernels for Dependency Parsing
Topic Dependent Language Modeling
Academia Researcher(s): Prof. Jacob Goldberger, Bar Ilan University
Participating Student: Aviad Ovadia
Research Project Summary:
Current speech recognition systems are based on general-domain statistical language modeling that is built from a large text corpus. Language models, however, may change a lot when moving from one domain to another. The vocabulary and the word frequency are strongly dependent on the specific conversation. In this research project we aim to improve the language modeling layer of state-of the-art speech recognition systems. The approach we are taking is based on training a group of (n-gram based) language models. We define the model as a mixture of standard n-gram models and utilize the EM algorithm to find the optimal mixture model. Combining our model into a speech recognition system, we first decide which language model is relevant for the current conversation and next we apply the selected model to obtain an improved recognition performance.
Jacob Goldberger – Publications
- Alan Bekker, Hayit Greenspan and Jacob Goldberger, Multi-view deep learning architecture for classification of breast microcalcifications, ISBI, 2016.
Providing People with Arguments during Persuasive Discussion
Intel Mentor : Yoram Zehavi
Academia Researcher(s): Prof. Sarit Kraus, Bar Ilan University
Participating Student(s):
Ariel Rosenfeld
Osnat Drein
Jonathan Azaria
Research Project Summary:
Thus far, we have developed an advice provision agent for deliberations in a single interaction, where the sole goal of our agent was to provide arguments that the user will find beneficial. The success of the agent was shown in extensive experiments with people. We are currently working on the development of an agent for repeated interaction on different topics which is a very challenging task. Next, we propose to expand our methodology to account for repeated interactions in persuasive discussions. That is, the agent supports its user by providing persuasive arguments for him to use in order to convince the other participant to change his position on a given topic or take a desired action. The agent’s success is measured by both the observed change in the other participant’s position or the observed actions of the other participant and the user’s reported satisfaction.
The main challenge of this work is the development of efficient algorithms for the generation of contextual and compelling arguments fitting the participants’ preferences, style and personality. This process must be done in real-time.
Sarit Kraus – Publications
- Ariel Rosenfeld and Sarit Kraus. “Providing arguments in discussions based on the prediction of human argumentative behavior.” ACM transactions on Interactive Intelligent Systems, 2016.