DeText: A deep NLP framework for intelligent text understanding
July 28, 2020
Natural language processing (NLP) technologies are widely deployed to process rich natural language text data for search and recommender systems. Achieving high-quality search and recommendation results requires that information, such as user queries and documents, be processed and understood in an efficient and effective manner. In recent years, the rapid development of deep learning models has been proven successful for improving various NLP tasks, indicating the vast potential for further improving the accuracy of search and recommender systems.
Deep learning-based NLP technologies like BERT (Bidirectional Encoder Representations from Transformers) have recently made headlines for showing significant improvements in areas such as semantic understanding when contrasted with prior NLP techniques. However, exploiting the power of BERT in search and recommender systems is a non-trivial task, due to the heavy computation cost of BERT models. In this blog post, we will introduce DeText, a state-of-the-art open source NLP framework for text understanding. We will also describe how DeText grants new capabilities to popular NLP models, and illustrate how neural ranking is designed and developed in DeText.
Introduction to NLP
Natural language text data is generally represented by a sequence of words, from which we aim to mine semantic information for serving specific functions in search and recommendation. Processing natural language data with deep learning technologies consists of 6 representative deep NLP tasks, as illustrated in Figure 1:
- Ranking with typical applications on document ranking for search and recommendation
- Text classification to predict labels on an input sequence, with typical applications like intent prediction and spam classification
- Sequence tagging to predict labels on each word of an input sequence, with typical applications in name entity recognition (NER) such as query tagging and Part of Speech (POS) tagging
- Sequence completion to generate a completed sequence from a partial sequence, with typical applications like query autocomplete and smart compose
- Sequence generation to generate a sequence from an input sequence, with typical applications like query expansion and machine translation
Unsupervised representation learning to learn language representations and serve as a foundational language model (e.g., BERT) for the other 5 deep NLP tasks.
Figure 1: Representative deep NLP tasks
DeText: A deep neural text understanding framework
The above deep NLP tasks play important roles in the search and recommendation ecosystem, and are crucial in improving various enterprise applications. To provide a unified technical solution, we developed DeText, a deep NLP framework for intelligent text understanding to support these NLP tasks in search and recommendation. Currently, DeText can support ranking, classification, and sequence completion—3 of the 6 representative tasks.
Our logo is inspired by the sloth: Relax like a sloth, and let DeText do the understanding for you
DeText is analogous to the accessories and attachments that come with power tools, such as a cordless drill. While the drill’s “engine” may be inherently powerful, it is important to use the right attachment to achieve your desired result.
Similarly, with DeText, users can swap NLP models depending on the type of task and leverage the models to make search and recommendation systems better than ever before. For example, at LinkedIn, we might use LiBERT (BERT trained on LinkedIn data) to better understand the meaning behind a text query and capture user intention in search (e.g., a search for “sales consultant at Insights" is looking for sales consultant jobs at the company, Insights). But what about using our pre-trained data set to also rank the relatedness of different documents to each other, in addition to the user's input query? Through the magic of DeText, we're able to quickly apply the efficiencies in an area like understanding the meaning of word sequence (semantic meaning) to another area like how many word sequences might be related (classification) or which word sequences (documents) might also be relevant to a query (ranking) given one or several inputs. This is a capability that is nontrivial with vanilla BERT or other NLP models, but is easy to do with DeText.
DeText has been applied in various applications at LinkedIn, including search/recommendation ranking, query intent classification, and query autocompletion. In open sourcing the code, our hope is that for research and industry communities will benefit in adopting the framework.
Combining ranking and semantic meaning
Ranking is one of the most common components in search and recommender systems, such as feed/ads ranking (Facebook), web page ranking (Google), movie ranking (Netflix), job ranking (LinkedIn), product ranking (Amazon), etc. A common part of all these products is dealing with text data. For example, given a particular query (optional in recommendation), member profile, and a list of documents, the product needs to rank the most relevant documents in descending order. Therefore, a successful ranking model needs to understand the semantics of text data, i.e., identifying similar words, word sense disambiguation, etc.
Recent developments in deep learning based NLP technologies have greatly deepened the understanding of text semantics with use of neural networks like CNN (convolutional neural network) and LSTM (long short-term memory). Furthermore, to enhance contextual modeling, BERT has been proposed and shown significant improvements in various NLP tasks over existing techniques.
However, applying BERT in ranking is a nontrivial task. First, there is no standard on how to efficiently and effectively leverage BERT. Second, existing approaches generally compute query and document embeddings together—this does not support document embedding pre-computing, and hence is not feasible to be integrated into commercial search engines and recommender systems due to online latency concerns. With DeText, we are able to successfully exploit pre-trained BERT models for ranking in a production environment. We further extend DeText so that multiple text encoder methods are available such as CNN, LSTM, or BERT.
DeText provides a sophisticated design to support neural ranking with the following advantages:
- Support for state-of-the-art semantic understanding models (CNN/LSTM/BERT)
- Balance between efficiency and effectiveness (Figure 2)
- Provide high flexibility on module configurations
One important design principle is to achieve a good balance between efficiency and effectiveness to meet the industry standard. Regarding efficiency, we use a representation-based modeling structure that enables parallel computation and ease of deployment on various representation components without loss of relevance strength. Regarding effectiveness, clients can choose the state-of-the-art text encoders and leverage them in end-to-end training for specific applications. In addition, traditional hard-crafted features are carefully handled and combined with deep features to maximize the relevance performance.
Figure 2: DeText Neural Ranking Framework
As illustrated in Figure 2, given multiple source (queries, user profiles), target (documents) texts and traditional features, a DeText ranking model first computes the semantic relatedness between the sources and the targets. These semantic features are then combined with the hand-crafted traditional features to generate a final score for the source-target relevance. There are multiple components in a DeText ranking model with high flexibility in configuration:
- Input text data: The input text data are generalized as source and target texts. The source could be queries (in search systems) or user profiles (in recommender systems). The target could be the documents. Both source and target could have multiple fields.
- Word embedding layer: The sequence of words are transformed into an embedding matrix.
- Text embedding layer: DeText provides CNN/LSTM/BERT to extract text embedding. CNN/LSTM are provided as a lightweight solution with small latency. In other cases where complicated semantic meaning extraction is needed, BERT can be used.
- Interaction layer: Multiple interaction methods are available to compute deep features from the source and the target embeddings: Cosine Similarity, Hadamard Product, Concatenation, etc.
- Traditional feature processing: Feature normalization and element-wise rescaling is applied to the hand-crafted traditional features. By doing this, the deep learning models are at least as good as the shallow models.
- MLP layer: The deep features from the interaction layer are concatenated with the traditional features. These features are fed into a Multilayer Perceptron (MLP) layer to compute the final target score. The hidden layer in MLP is able to extract the non-linear combination of deep features and traditional features.
- LTR layer: The last layer is the learning-to-rank layer that takes multiple target scores as input. DeText provides the flexibility of choosing pointwise, pairwise or listwise LTR, as well as Lambda rank. In applications focusing on relative ranking, pairwise and listwise LTR can be used. When modeling the click probability is important, pointwise LTR can be used.
We have applied DeText in various LinkedIn applications. In search ranking, we have observed single to double digit lift of business metrics in people search, job search, and help center search. In these cases, the source text is query, and the target texts are the document fields. For example, in job search (Figure 3), the document fields are job title and company. The comparisons are in terms of BERT vs. CNN/LSTM, and CNN/LSTM vs. traditional models. Offline improvements on recommendation ranking has also been observed with online A/B tests in plan. In addition, significant online improvement is achieved in classification tasks such as query intent classification, and sequence completion tasks such as query autocomplete.
Figure 3: Examples of search ranking systems at LinkedIn
Getting started with DeText
DeText uses the TensorFlow Estimator framework with TF-ranking integrated. If you’re interested in trying DeText for your application, please check out our GitHub repo. DeText is an active research project, and feedback and contributions from the community are welcome.
DeText is developed in the AI Foundations team at LinkedIn: Weiwei Guo, Xiaowei Liu, Sida Wang, Huiji Gao, and Bo Long. We thank the following members for their advice, support and collaboration: Mingzhou Zhou, Yu Gan, Zhoutong Fu, Ananth Sankar, Liang Zhang, Bee-Chung Chen, Deepak Agarwal, Sandeep Jha, Zimeng Yang, Yu Liu, Jiadong Yin, Song Yan, Dylan Wang, Abhi Lad, Qi Guo, Haifeng Zhao, Meng Meng, Badrul Sarwar, Swanand Wakankar, Vitaly Abdrashitov, Cagri Ozcaglar, Nitin Panjwani, Lijun Peng, David Pardoe, Yi Zhang, Onkar Dalal, Kevin Kao, Jeffrey Lee, Caleb Johnson, Zinan Xing, Zeke Huang, Michael Chernyak, Xiaoxia Feng, Vikas Jain, Scott Banachowski, Pei-Lun Liao, Ann Yan, Yu Gong, Haitong Tian, Jimmy Guo, Keqiu Hu, Zhe Zhang, Sathish Kumar Palaniswami, Xinling Dai, James Gatenby, and Rachel Zhao.
Weiwei Guo, Huiji Gao, Jun Shi, Bo Long, Liang Zhang, Bee-Chung Chen, and Deepak Agarwal. "Deep Natural Language Processing for Search and Recommender Systems." In KDD 2019.
Weiwei Guo, Huiji Gao, Jun Shi, and Bo Long. "Deep Natural Language Processing for Search Systems." In SIGIR 2019.
Weiwei Guo, Xiaowei Liu, Sida Wang, Huiji Gao, Ananth Sankar, Zimeng Yang, Qi Guo, Liang Zhang, Bo Long, Bee-Chung Chen and Deepak Agarwal. "DeText: A Deep Text Ranking Framework with BERT". In CIKM 2020.
Sida Wang, Weiwei Guo, Huiji Gao, Bo Long. "Efficient Neural Query Auto Completion". In CIKM 2020.