LinkedIn has a vast quantity of data. While much of the data is structured—graph nodes and edges, normalized fields in database records—a great deal of it is simply natural language text. Attaching structure and meaning to this text is essential to LinkedIn’s overall mission of connecting its members to opportunity.
The NLP team provides natural language processing (NLP) tools, analyses and features for use throughout the entire company. We analyze text in English and many other languages, covering more than 95 percent of people on the planet.
Our vision is to use natural language understanding to help corroborate and complete the Economic Graph. We want to find entities, relations and economic events that correlate with and complement LinkedIn's understanding of the world's economy, so that we as a company can help every person and organization everywhere reach their fullest potential. For us, an entity is a person, an organization, a location or one of the other types of entities that LinkedIn cares about, such as a title or a skill.
Our mission is to take unstructured text, analyze it along with information from our structured and semi-structured sources and produce useful structured representations for all of LinkedIn's current and future product areas. These areas comprise content (e.g., SlideShare, online courses, groups and discussions), jobs, feed, SEO, search and ads. Crucial to this mission is to provide structured analyses in all the primary or secondary languages of all of LinkedIn’s members worldwide.
Our main projects are:
- BaseNLP: This is our multilingual NLP processing pipeline that provides all the analyses of a full-blown information extraction engine, tailored to the type of text LinkedIn needs to analyze. The processing steps include language identification, sentence breaking, tokenization/segmentation, stemming/lemmatization, part-of-speech tagging and entity mention detection, and will soon include syntactic parsing, coreference and entity resolution and relation/event extraction.
- PolyglotMT: We have in-house machine translation (MT) models that can work in conjunction with out-of-the-box models to provide state-of-the-art automatic translation from one language to another. We are using MT to make more LinkedIn content available to more people on the planet, and to help spur the creation of more content in the languages of the world.
- Query understanding: We apply specialized versions of models developed for BaseNLP for query understanding. The structured representations we infer from query text allow our search platform to find more meaningful results, and to find them with more precision.
- Large-scale language modeling: A language model estimates the likelihood of any sequence of words, and is essential for many different NLP and related tasks, such as authorship detection, parsing, machine translation or speech recognition. We have an infrastructure capable of training models on web-scale data, i.e., billions of documents, and which can provide real-time probability estimates on sentences.