Lessons from 1.5 Years of our AI & Data Reading Group

Qi He

Senior Leader at Amazon Science, IEEE Fellow, ACM Distinguished Member

December 3, 2018

Group photo of the reading group for Data Standardization, Knowledge Graphs, Natural Language Understanding, and Conversational AI at LinkedIn, celebrating the group’s 17-month anniversary

Co-authors: Qi He and Jaewon Yang

For the past 17 months, a group of roughly 20 people of different ages, genders, and backgrounds have been meeting every Friday in LinkedIn’s Silicon Valley HQ. At these meetings, they discuss cutting-edge research in specialized areas of machine learning and artificial intelligence (AI). Seeing this group in action, you’d be forgiven for assuming that they were be reviewing literature for the editorial board of a scientific journal, or working on a new top-secret project. However, the group is none of these things—these gatherings are actually the regular meetings of the LinkedIn Data & AI Reading Group.

Scaling research knowledge, expertise, and curiosity

Encouraging knowledge-sharing and learning within any engineering or research organization is important. This is especially true at a company like LinkedIn, where the valuable data contained in our Economic Graph can be combined with the latest data science and AI techniques to create new solutions to problems affecting our members and customers. However, researchers and practitioners in fields like AI, data science, and machine learning face a unique challenge when it comes to reviewing, assimilating, and discussing the newest developments: according to the 2017 AI Index Report, the number of new AI papers published each year has increased by more than 9x since 1996.

To help tackle this problem of scale, one method of encouraging intellectual enrichment and collaboration is to hold regular meetings of like-minded peer researchers—just like in college or university programs. This reading group idea was initiated by me and Bee-Chung Chen, and then independently led by Staff Software Engineer Jaewon Yang since its launch in July 2017.

Creating the group

At first, this new group at LinkedIn focused specifically on the review of literature related to chatbots. We would sit together every Friday and discuss a cutting-edge paper—just like we used to do when reviewing our homework back in school. Over time, the focus of the group broadened to include research in fields such as conversational AI systems, natural language understanding, data standardization, and knowledge graphs.

Here is the format we’ve been following. Each week, one person volunteers to present a paper. The presenter announces which paper they’ll study on Monday. To help the presenter pick a paper, we manage a wiki listing of interesting papers that could be good options to choose from. In the meeting, the presenter gives a deep dive on the paper to the audience. After we cover the content of the paper, we discuss high-level takeaways, such as what to leverage for our work, strengths and weaknesses of the paper, and so on. In some cases, we decide to try the main ideas of the papers in our own projects, such as using cross-lingual word embeddings, using multiple rules to annotate our data sets, and dividing a compound word into subwords.

Below are just a few of the interesting AI topics that we have covered in the past 17 months:

Knowledge graph based question answering
Semantic parsing
Dialogue management systems
Translating natural language to database queries
Entity identification and resolution
Machine reading comprehension
Sequence to sequence models
Natural language understanding
Machine translation
Cross-lingual word embeddings
Machine learning for taxonomy generation
Generating training data from weak supervision

Among the above AI topics, we’ve also focused on state-of-the-art deep learning methods that can tremendously improve the way an AI agent understands the semantic meanings of natural language sentences, much like a human brain. Below are three example papers that were featured in our reading group:

Word Translation Without Parallel Data (Conneau et al., 2018), which is about developing word-to-word alignments between two languages without human supervision.
Phrase-Based & Neural Unsupervised Machine Translation (Lample et al., 2018) is a paper on developing sentence-to-sentence translation models without a bilingual text corpus.
End-to-End Memory Networks (Sukhbaatar et al., 2015) is a paper about developing a neural network that can remember short conversations and answer simple questions based on the memory.

Research outcomes

We’ve applied our learnings to develop deep learning methods for many LinkedIn applications. For example, we applied neural networks to find the right articles to answer member questions in the LinkedIn Help Center, to segment job posting text into different sections, and to build an internal Analytics chatbot (Anabot) to handle questions about LinkedIn product metrics, etc.

In a KDD 2018 tutorial, titled End-to-end goal oriented question answering, we presented a special deep dive on the recent 100+ papers that have come out about question answering, discussing their common technical components along with our own practical experiences. On Jan. 28, we will also be presenting a tutorial on this topic at the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19).

We also covered engineering designs that enable end-to-end systems in two LinkedIn use cases:

Analytics Bot (Anabot) is an internal Q&A bot that can answer questions about company metrics. The data analytics team uses the bot to address questions they get from business partners.
Help Center Search is an AI model we developed that can understand members’ questions in the Help Center article search.

In organizing a reading group, we’ve learned that topics with strong connections to our daily work are most useful for fruitful discussions and actionable plans. For example, if the paper is about cross-lingual word embeddings, we can discuss during the reading group how we can apply it for our i18n standardization. We’ve also learned that presenting a paper takes time and effort, so we set a culture that everyone volunteers to present a paper at some point, no matter how senior he or she is. We think that setting such a culture of equality is a key reason why this group has sustained itself for 17 months.

Reading and discussing papers as a group increases the knowledge level of the entire group. When we started the reading group, we mostly discussed basic concepts. But after 17 months, we’ve learned deeper things and can practise them in our work.

Learning is an attitude and a lifelong mission. We gratefully thank all the team members in this reading group for staying together in the same boat for the past 17 months. Such long-term, cross-team collaborations help us to learn not just in schools but throughout life, as well. We are continually learning for mastery, for work, and for life.

Topics: Artificial intelligence Data Research Machine Learning