Getting to Know Qingbo Hu

June 23, 2017

Qingbo Hu is a Senior Business Analytics Associate in LinkedIn’s Analytics Data Mining team. His team provides end-to-end data mining solutions and builds machine learning models to support our partners from LinkedIn’s various business lines.

  • qingbo2

Prior to joining LinkedIn as a full-time employee, Qingbo completed an internship with us on the Analytics Data Mining team. He has also interned with Walmart’s e-commerce unit on the SEO team. He holds a Ph.D. in computer science from the University of Illinois at Chicago.


What are some of the coolest projects that you and your team have been working on?Among all the projects my team and I have worked on, I want to highlight three of them. The first project is Magnet, which is a powerful business intelligence engine that can provide analysis, feature reasoning, prediction, and more for our business partners, sales agents, and other stakeholders. The system leverages many cutting-edge big data techniques, such as Apache Spark, Hadoop, and more to efficiently analyze millions of business records to provide actionable insights and prediction.

The second project is related to Search Engine Optimization (SEO) for web pages. In order to improve the search ranking results of certain web pages of LinkedIn, we designed a framework that enables us to extract features related to those web pages of interest and train machine learning models to optimize the link structure of those web pages. We also designed a Natural Language Processing (NLP) pipeline to automatically generate a text summary for those pages to improve their content quality.

Last but not least, we are also involved in projects related to LinkedIn’s Economic Graph Research program. We build machine learning models and powerful graph processing tools to help us extract insightful information from LinkedIn’s Economic Graph, which contains more than 500 million members, more than 10 million job postings, and many other entities and their relationships. The project I introduced in my talk at Spark Summit 2017 was one of the graph processing tools we have developed to handle and analyze social activity graphs on LinkedIn.

Speaking of Spark Summit, can you tell us more about your current research and recent presentations?
While working on my Ph.D. program, I specialized in research topics in data mining and machine learning. To be more specific, I am especially interested in problems related to social network analysis and, in a more general form, information network analysis. For example, how can we use mathematical models to capture and describe the information propagation process on social networks? How can we infer and quantify the influence between two users in a social network? In the summer of 2015, when I worked as an intern in LinkedIn, I developed a graph-based algorithm that is able to help sales agents more successfully identify potential enterprise customers. Compared to traditional machine learning models, such as Logistic Regression and Random Forest, the new model is much more accurate in the unique problem settings of B2B sales. This project lead to two patents jointly filed by LinkedIn, as well as a research paper at the WWW 2016 conference.

After I joined LinkedIn as a full-time employee, I continued to work on various machine learning and data mining tasks, as well as creating graph processing tools. For example, at this year’s Spark Summit, I was invited to give a talk about a tool we developed for processing multi-label graphs. Unlike single-label graphs, the nodes and edges in multi-label graphs can have various labels to mark the different types of nodes and edges. It’s challenging to extract features and analyze the graphs at the label-level in a scalable way. However, due to their generality, multi-label graphs ubiquitously exist in today’s social networks and in many other types of networks, and they are seamlessly tied to numerous real-life applications. The tool we have developed utilizes the GraphX library in Apache Spark to support scalable multi-label graph processing and analysis. For example, it is able to compute a PageRank score for ~2 million nodes and ~76 million edges with 3 different labels in 30-40 minutes. The slides and the video of this talks are already published and can be accessed publicly.

What is something about you not found on your LinkedIn profile?
I am a big fan of water-related sports: swimming, snorkeling, kayaking, etc. As a result, unlike people who go to Lake Tahoe every winter for skiing, I go there every summer for water sports, due to the clean and nice water. I am also an enthusiastic pet lover. I have a 10-year-old Shih-Tzu dog, Bingo, who has his own Instagram page and a group of fans. I am also a fan of Golden State Warriors, and I am SUPER excited about their 2017 championship title!

What are your favorite things to do when you’re not at the office?
I learned to play guitar when I was a sophomore student back in college. Since then, I occasionally play guitar for fun in my leisure time. I am also a BIG foodie, and that actually means I am not only interested in having good food but also constantly trying out different recipes to share good food with my friends. I also go to the theater to watch movies often, and hardcore sci-fi movies are my favorite!