Behind "Big Data" and "AI": Elements of Modern Data Science
April 5, 2018
I’m sure everyone who has been following tech industry news knows about “big data” and “AI.” Although there is no industry-consistent definition for either term, most people tend to agree that both have been playing more and more important roles lately, and that we need to know and leverage them better in both our personal and professional lives. But wouldn’t it be interesting if we looked beyond the marketing hype for a moment and talked about the people who are solving real business problems by using these technologies, who they are, which skills they have, and what they actually do? If you think so, continue to read on and we’ll share with you some practical aspects of what’s behind the scenes of data science and analytics at LinkedIn.
With the mission to “Drive understanding and impactful decision making through rigorous use of data,” the Analytics team at LinkedIn has been growing at a fast speed, with no sign of slowing down (currently have 20+ team openings worldwide). We constantly look for top talent to join us in building LinkedIn’s Economic Graph and analyzing it in order to discover insights that create economic opportunities for our members and customers. When looking for new talent to join our team, we typically interview candidates in the following areas.
Although it doesn’t sound as cool as “data science,” and people may refer to it by many different names (hence the multiple forward-slashes in this section heading), this is one of the core skills we believe a data scientist must have. It includes a set of things you have to do to understand your data and get that data ready for deeper analyses, data mining, machine learning, etc. Different from what’s taught in school most of the time, in the real world, you almost never get “perfect” data, especially when you work with cutting-edge technology that is changing every day. This means that you have to know the business context for the data well enough to be able to interpret it, clean it, and transform it into a consumable format. It may sound easy, but it is really not. Oftentimes you can easily spend more than 50% of your time for your whole project on this step (assuming you are skilled at it and actually do it right the first time). LinkedIn, as a fast-growing business with products changing quickly to meet our members’ and customers’ professional needs, is no different when it comes to this challenge. Typically, we ask candidates to perform a series of data manipulations, including aggregation, distribution, ordering, etc., with with programing languages such as SQL, R, or Python, for a particular dataset to demonstrate their capabilities in this area. The goal is not to test the exact syntax, but rather the right approach and thought process, and how well they can make reasonable judgements based on the business context.
Experiment design and A/B testing
Statistics knowledge is a must-have for a data scientist. In particular, knowing how to design experiments and carry out A/B tests for different business use cases is an essential skill to have. We want every team member to have the capabilities to understand basic stats concepts (e.g., hypothesis testing, mean/median, variance, probability distributions, sample size calculation, power calculation, etc.), design and analyze experiments, and apply them in a business setting. The expectation is not only that they need to have the theoretical knowledge for these questions, but also that they must know how to proactively use this knowledge to guide product development in a scientific way. This includes instances like how we design success metrics, how to set up an experiment plan, and how to provide timely insights to guide ramping up a test from a small pilot to 100% member base, which often also requires iterations to get the product features right for member satisfaction and desired business outcomes. For junior-level data scientists, we’d focus more on the basic understanding of key stats concepts and the business sense candidates show when applying those concepts to real use cases. For senior-level candidates, we’d expect them to have relevant industry experience and in-depth statistics knowledge to not only answer questions, but also drive towards solutions that create the optimized level of member experience and business impact.
Statistical modeling/Machine learning
Statistical modeling or machine learning skills are required for a data scientist to perform their job well. The aspect we are looking at is the candidate’s ability to formalize a business problem into a machine learning problem, select the proper modeling algorithms, and build out the models following the right process of training, testing, and validation. Typically, a data scientist on the Analytics team doesn’t focus the majority of their time on modeling work. However, we have found that having knowledge of common machine learning algorithms and knowing how to apply them to specific business contexts are key to the success of a data scientist’s career. Without the proper understanding in this area, it’s easy to cause incorrect interpretation of data, which may lead to imperfect decision-making or worse outcomes. The constant debates around “correlation versus causation” and “wrong data is worse than no data” are all good examples. What is also really important is how you pick the right machine learning algorithms (knowing the pros and cons from each, e.g., logistic regression, linear regression, decision-tree, deep learning, etc.) for the type of business problems you are solving for.
Besides all the technical skills (or “hard skills”) we check during the interview, we also closely evaluate candidates’ communication, project management, and influence skills, all of which are considered equally important. The art of being a data scientist includes how you effectively influence others based on what you’ve found from the data, which oftentimes can be the hardest part in driving a data-driven decision-making culture. The types of questions we ask can include: how do you summarize your findings in a clear and succinct way, how do you handle the situation if the stakeholders are not convinced based on the analysis results, how do you respond to questions about the algorithms/methodology from people who are not technical, and how do you manage a project that isn’t going as planned and turn it around? Ultimately, the goal is to take the insights generated from the analysis and effectively influence critical decision-making, which drives business impact. The “hard skills” and “soft skills” need to work together for the success of a data scientist.
Case studies and problem-solving
Being a data scientist is not an easy job, especially in the sense that you are required to understand business use cases really well in order to solve problems in a data-driven way. This requires good business domain knowledge, critical analytical thinking, familiarity with carrying out root cause analysis, and the ability to communicate results effectively to influence business decision-making. The capabilities that we would like to assess here include the ability to solve a business case with the right analytical approach and reasonable data intuition, as well as the ability to make relevant and actionable recommendations based on data insights. The case studies could be from business domains like products, marketing, or sales, which are all based on what you would experience on a daily basis at work. We make this a wildcard module that allows candidates to choose a business domain that they are more interested in or more experienced with, so they can demonstrate fully the business sense they have. This also helps us to identify the best sub-teams within Analytics that candidates could be allocated to.
For more information
Thanks for reading up to this point. If you feel you’ve got all the skills mentioned in the blog post and are passionate about LinkedIn’s vision of “Creating economic opportunity for every member of the global workforce,” you can apply for current job openings with the analytics team here. You can also view other blog posts by the team to learn more about Analytics and Data Science at LinkedIn.