Monitoring business performance data with ThirdEye smart alerts
June 25, 2020
At LinkedIn, ThirdEye is used for business and platform health metrics monitoring, keeping track of a variety of metrics across production infrastructure, AI model performance, or key business indicators (i.e., page view or click count). It’s a key quality assurance system for two reasons: its rule- or model-based anomaly detection reduces false alarms, and its multiple interactive root cause analysis tools help metrics owners narrow down the cause of an anomaly.
In this blog, we will explain how ThirdEye smart alerts and automated dashboards helped the LinkedIn Premium business operations team monitor key metrics—such as new free trial signups—for the timely detection of outliers in business performance data.
Data-driven business decisions through anomaly detection
LinkedIn has an extremely complex data ecosystem operating with 8K+ services, 2K+ tracking events, 8K+ deployments, and 300+ experiments every day. Uncovering data blind spots in growing data was critical for success of Premium business health monitoring, but was becoming difficult to do so at scale. The Premium business operations team, in collaboration with the ThirdEye (anomaly detection) team, discovered a clear opportunity to uncover data blind spots quickly and efficiently by leveraging the existing tools for both business metric and system performance monitoring.
For any subscription business at LinkedIn, it is critical to monitor member signups from various channels (i.e., new free trials) on a regular basis to understand business performance and respond with a clear actionable plan. The Premium business operations team conducts the business performance management on a daily and weekly basis, with a primary focus on changes in the latter. There are various ways to pinpoint where members sign up, such as attribution (in-product vs. marketing), country, and device (i.e., desktop vs. mobile). During performance management, it's imperative that the team is able to identify the exact driver behind anomalies in signups to quickly identify and address issues, and continue tracking to business health success plans.
Despite this rigorous performance management process, the team didn't have a formalized way for automatically tracking granular changes in signups. Tracking codes indicate the exact source in consumer products (or marketing campaigns) where members sign up and are the most granular way to track changes. However, there are thousands of tracking events to monitor, and it is extremely time-consuming to track the performance of each individual tracking code.
It was a constant challenge for the team to monitor subscription-based member signups from various channels in a granular way across this complex ecosystem. This led to data blindspots, missed opportunities, and delayed time to detect and remediate issues that impacted the business. The best option was to manually look at the aggregated data in dashboards, but this was not enough to extract actionable insights to perform root cause analysis.
Eventually, as the data grew, they realized that the time to monitor and track changes in metrics would only worsen over time. At one point, the team had to contend with over 1,700 separate tracking codes for online A/B tests, depending on an ad-hoc system for identifying anomalies among our running key business metrics. There was no centralized way to see the changes in several metrics simultaneously at a glance.
ThirdEye as a ready-made solution
To address the issue of not being able to uncover data blindspots in the subscription trend data accurately and quickly, the Premium business operations team sought to leverage automated tools to detect changes in metrics. At LinkedIn, ThirdEye is a well-known solution that grew from the need to unify various teams and business lines across monitoring use cases without incurring the technical costs of supporting multiple monitoring infrastructures. The team collaborated with the ThirdEye team to leverage ThirdEye’s smart alerts and dashboards to automate the Premium business performance management at scale.
By looking at a “before and after” comparison of the impact of leveraging ThirdEye for the team, the value becomes clear.
Before leveraging ThirdEye, the team relied on ad hoc analysis and manual data mining techniques that:
- Tracked changes in business metrics, particularly for changes in signups, at the type level.
- Provided a deep dive into business data tracking codes when the outlier is detected in type.
- Manually tracked gradual changes in business trends and metrics on an ad-hoc basis, particularly for changes in survival and renewal rate.
After leveraging ThirdEye, the team was able to:
- Automatically track and detect outliers in the top 200 business data tracking events, which contribute to several hundreds of signups per week.
- See a high-level view of gradual changes in business trends and metrics that were automatically sent via email every week, with deep-dives conducted as needed.
Key ThirdEye capabilities
ThirdEye anomaly detection is leveraged for smart alerts that catch outliers in business performance data before they escalate into negative business impact. All business metrics at LinkedIn are stored in Pinot. Pinot provides a shared infrastructure for outlier detection and user-interactive data analysis of various system and business metrics. It connects to a large number of data sources to gather information and learns over time to generate more relevant detection and analysis results through user interaction.
In our use case, ThirdEye provides answers to questions from various executives across the company about deviations in subscription metrics, such as member sign ups. Small changes in these metrics can be attributed to anything from regional holidays, minor configuration issues, to outages of entire data centers, so it’s important to have a single platform that provides visibility into all possible causes. Along with “Smart Alerts,” the ThirdEye platform supplies potential root causes without users needing to farm these questions out to a large number of specialized analysts and coordinate across multiple responses.
Over the last two years, ThirdEye’s capabilities have continued to evolve, and the tool is now hosted by the Apache Incubator as part of the Pinot project.
A core feature of ThirdEye is automated anomaly detection at scale. It allows the team to track changes in signups with unprecedented detail and ease by tracking changes in metrics down to the tracking code level. Some features include:
- Configuration set to measure changes in the top 200 business data tracking events, and anomalies in business metrics will be monitored every week
- Automated configuration allows for more efficiency in measuring changes in metrics, down to the tracking code level
- A regular cadence for tracking allows for a structured and easy process for monitoring changes in metrics.
Automated dashboards share meaningful insights; however, they require manual monitoring to uncover data blindspots. It’s important that dashboards show changes over time, filtered by various criteria, and are automatically sent via email, providing necessary context in a digestible way. Features include:
- Dashboards tracking year-long trends for most relevant cuts of signup, survival, and renewal data.
- Automated emails including a visualization of changes create an easy way to quickly understand changes at a high-level without repeatedly having to log in to dashboards.
- Visualization of long-term trends allowing users to quickly identify areas for potential deep dives.
The Premium business operations team automated business performance management at scale by leveraging the solutions mentioned above to track a variety of topline metrics like signups, survival rate, renewal rates, etc., at different intervals with no manual intervention required.
In today’s digital age, monitoring business performance and tracking relevant insights are important in empowering managers and C-level executives to make smarter decisions around productivity and costs. In fact, we can think of data as the raw resource that business decisions are based on, and ThirdEye smart alerts as the environment that allows leaders to act on generated information swiftly and accurately.
By leveraging ThirdEye’s strengths in monitoring anomalies, comparing values, and automated detection, we saw value across the following factors:
- Enhanced strategic planning
- Improved management decisions
- Advanced performance reporting
I would like to thank the ThirdEye development team: Xiaohui Sun, Akshay Rai, Jihao Zhang, and Harley Jackson (Analytics Platform), Tie Wang, Kexin Nie, and Yen-Jung Chang (AI/ML), Suhil Srinivas (Premium Product), and Tyler James (Business Ops) for the great collaboration, contribution, and support for sharing their learnings on business performance data monitoring at scale using ThirdEye Smart Alerts.
I would like to thank our ThirdEye alumni: Kishore Gopalkrishna, Ravi Aringunram, Praveen Gujar, Tushar Shanbhag, Kajal Kamdar, Alex Pucher, Steve McClung, Neha Pawar, Selene Chew, Long Huynh, Yves Yuen, and the numerous collaborators inside and outside of LinkedIn over the past years. Without your work and dedication, this would not have been possible. Special thanks to the reviewers of this blog Kishore Gopalkrishna, Pardhu Gunnam, Xiaohui Sun, and Akshay Rai. Finally, we would like to thank Kapil Surlaker, Deepak Agarwal, Eric Baldeschwieler, Kamal Duggireddy, Bo Long, and Igor Perisic for their consistent support for ThirdEye’s vision from the executive level.