Seth is the Senior Data Scientist at Demandbase, where he works on the artificial intelligence across all of Demandbase’s solutions. He joined the company via the acquisition of Spiderbook, where he was the Chief Scientist. He has also been a data scientist at Twitter, Marin Software, and Qforma, as well as a Fellow at XSeed Capital. He has a PhD in Computational Mathematics from Stanford University, where he also received his MS degree.
Networks are inherently messy. Often, the basic properties of how things are connected to one another are well understood, but the result of these connections is massive complexity. And yet, networks can be the most effective way to model a complex system. The observed behavior of such a system can often be reduced to the interactions between more simple entities. For example, whether or not a hashtag will go viral on Twitter largely depends on the size of the follower network of the users who tweet it.
The interactions between businesses form another complex network. The actions of one business will change the behavior of many other (seemingly unrelated) businesses in surprising ways. Every business is connected to each other through a network of their interactions. In many ways, business deals are actually new connections being added to the network. One company might be partnered with another company, who is then a supplier to yet another company, who is then in direct competition with another still. In fact, the most important events that a company goes through are actually direct changes to their local network: a new partnership, a lawsuit, and an acquisition are all new connections being made between businesses.
Demandbase’s Artificial Intelligence (AI) and machine learning technology is founded on the observation that the best B2B salespeople use the network of companies to their advantage. When they are looking for their next deal, they might target the customers of their competitors, or they may use a mutual customer as a reference in their initial outreach. In both cases, the salesperson is using the existing connections between their company and the prospective customer to get the deal signed. Our hypothesis was that existing company connections are key in creating new customer connections. In fact, recent studies show that 84% of all B2B decision makers rely on referrals in their buying process. So in order to predict which business you should sell to next, we built a network of a million companies and 10 million connections between them. Building this network, however, was no small task. We had to invent a state-of-the-art natural language processing engine that can identify connections between companies being described in billions of documents (press releases, news articles, blogs, job posts, and everything else online) every month. Our engine learns industry-specific language in order to work for all companies and industries. It also understands important properties of the connections it identifies, such as the timing of when the connection was created as well as the decision makers at each company involved in the connection. The result is a company network with millions of connections that gives us a global view of the interactions between businesses.
The idea of mutual connections being predictive of new connections appears in other types of networks as well. Facebook uses the number of mutual connections in the friendship network to recommend new friends to you–the more mutual friends you and another person share, the more likely you are to know that person. In fact, we did an apples-to-apples comparison between the company network we constructed and the Facebook friendship network. The figure below plots the number of mutual connections between any two companies/Facebook users in the network versus the probability that they are connected. What it shows is that mutual connections are even more predictive of a connection between companies than it is for friendship between people.
This observation led us to the discovery that identifying your next customer is actually a link prediction problem. Link prediction algorithms use machine learning and other techniques to predict the creation of new connections in a wide range of networks. We can leverage these techniques to predict customers. The resulting algorithm is a machine learning classifier that uses over 200 different inputs, not just mutual connections. Some of these inputs are simple, like revenue or geography, but others are the results of network analysis algorithms:
Personalized PageRank, which is a variation on Google’s original PageRank algorithm, is used by social networks to recommend interesting content to its users. It is a measure of how often one user will land on another user’s social profile by clicking through the network randomly. We use it to measure how likely two companies are to interact with each other if they randomly explored their network connections.
Matrix Factorization is used by many recommendation engines. Netflix, for example, can use it to find underlying patterns in movie watching data. The technique makes the assumption that the movie tastes of most Netflix users are governed by a small number of dimensions, such as how much they like romantic comedies or if they prefer Seth Rogan movies. If you like Quentin Tarantino movies and love Westerns, it is a reasonable prediction that you’ll like Django Unchained. Matrix factorization determines what these hidden dimensions are and how they apply to each user. We apply the same mathematics to discover the hidden dimensions that govern company connections. These dimensions might include a business’s likelihood to buy from Silicon Valley startups or whether or not a business outsources their marketing.
These mathematical tools offer a global perspective on company connections. Because we have the company network, our algorithm can ask how well a prospective customer connection fits into the entire network.
The network-based approach also allows the algorithm to look at both sides of the customer connection. The company network contains all of the companies that your prospective customer has bought from in the past. The algorithm, for example, compares your prospective customer’s revenue to your existing customers, but equally importantly it compares your revenue to the prospect’s existing suppliers. If the prospective customer doesn’t typically buy from companies as small as yours, or if they don’t like to do business out of state, then it doesn’t matter how much they look like your ideal customer. The company network gives a uniquely broad perspective on your would-be deal.
The network-based model that chooses which companies to sell to is one of many machine learning models that make up Demandbase’s technology. Identifying the prospective customer’s budget for your product, the right decision makers, and customized messaging all involve separate but equally in-depth machine learning models. The network, however, allows all of this analysis to focus on key companies that have a high likelihood of a customer connection. As they have done in so many other applications, networks have provided a much more clear and actionable perspective on how connections between businesses are formed.