John wiley sons data mining techniques for marketing sales_13 docx

Thông tin tài liệu

470643 c11.qxd 3/8/04 11:17 AM Page 380 380 Chapter 11 Using Thematic Clusters to Adjust Zone Boundaries The goal of the clustering project was to validate editorial zones that already existed. Each editorial zone consisted of a set of towns assigned one of the four clusters described above. The next step was to manually increase each zone’s purity by swapping towns with adjacent zones. For example, Table 11.1 shows that all of the towns in the City zone are in Cluster 1B except Brookline, which is Cluster 2. In the neighboring West 1 zone, all the towns are in Cluster 2 except for Waltham and Watertown which are in Cluster 1B. Swapping Brook- line into West 1 and Watertown and Waltham into City would make it possible for both editorial zones to be pure in the sense that all the towns in each zone would share the same cluster assignment. The new West 1 would be all Cluster 2, and the new City would be all Cluster 1B. As can be seen in the map in Figure 11.12, the new zones are still geographically contiguous. Having editorial zones composed of similar towns makes it easier for the Globe to provide sharper editorial focus in its localized content, which should lead to higher circulation and better advertising sales. Table 11.1 Towns in the City and West 1 Editorial Zones TOWN EDITORIAL ZONE CLUSTER ASSIGNMENT Brookline City 2 Boston City 1B Cambridge City 1B Somerville City 1B Needham West 1 2 Newton West 1 2 Wellesley West 1 2 Waltham West 1 1B Weston West 1 2 Watertown West 1 1B 470643 c11.qxd 3/8/04 11:17 AM Page 381 Automatic Cluster Detection 381 Lessons Learned Automatic cluster detection is an undirected data mining technique that can be used to learn about the structure of complex databases. By breaking complex datasets into simpler clusters, automatic clustering can be used to improve the performance of more directed techniques. By choosing different distance measures, automatic clustering can be applied to almost any kind of data. It is as easy to find clusters in collections of news stories or insurance claims as in astronomical or financial data. Clustering algorithms rely on a similarity metric of some kind to indicate whether two records are close or distant. Often, a geometric interpretation of distance is used, but there are other possibilities, some of which are more appropriate when the records to be clustered contain non-numeric data. One of the most popular algorithms for automatic cluster detection is K-means. The K-means algorithm is an iterative approach to finding K clusters based on distance. The chapter also introduced several other clustering algorithms. Gaussian mixture models, are a variation on the K-means idea that allows for overlapping clusters. Divisive clustering builds a tree of clusters by successively dividing an initial large cluster. Agglomerative clustering starts with many small clusters and gradually combines them until there is only one cluster left. Divisive and agglomerative approaches allow the data miner to use external criteria to decide which level of the resulting cluster tree is most useful for a particular application. This chapter introduced some technical measures for cluster fitness, but the most important measure for clustering is how useful the clusters turn out to be for furthering some business goal. 470643 c11.qxd 3/8/04 11:17 AM Page 382 TEAMFLY Team-Fly ® 470643 c12.qxd 3/8/04 11:17 AM Page 383 Analysis in Marketing 12 Knowing When to Worry: Hazard Functions and Survival CHAPTER Hazards. Survival. These very terms conjure up scary images, whether a shimmering-blue, ball-eating golf hazard or something a bit more frightful from a Stephen King novel, a hatchet movie, or some reality television show. Perhaps such dire associations explain why these techniques are not fre- quently associated with marketing. If so, this is a shame. Survival analysis, which is also called time-to-event analysis, is nothing to worry about. Exactly the opposite: survival analysis is very valuable for understanding customers. Although the roots and terminol- ogy come from medical research and failure analysis in manufacturing, the concepts are tailor made for marketing. Survival tells us when to start worrying about customers doing something important, such as ending their relationship. It tells us which factors are most correlated with the event. Hazards and survival curves also provide snapshots of customers and their life cycles, answering questions such as: “How much should we worry that this customer is going to leave in the near future?” or “This customer has not made a purchase recently; is it time to start worrying that the customer will not return?” The survival approach is centered on the most important facet of customer behavior: tenure. How long customers have been around provides a wealth of information, especially when tied to particular business problems. How long customers will remain customers in the future is a mystery, but a mystery that past customer behavior can help illuminate. Almost every business recognizes the value of customer loyalty. As we see later in this chapter, a guiding principle 383 470643 c12.qxd 3/8/04 11:17 AM Page 384 384 Chapter 12 of loyalty—that the longer customers stay around, the less likely they are to stop at any particular point in time—is really a statement about hazards. The world of marketing is a bit different from the world of medical research. For one thing, the consequences of our actions are much less dire: a patient may die from poor treatment, whereas the consequences in marketing are merely measured in dollars and cents. Another important difference is the vol- ume of data. The largest medical studies have a few tens of thousands of par- ticipants, and many draw conclusions from a just a few hundred. When trying to determine mean time between failure (MTBF) or mean time to failure (MTTF)—manufacturing lingo for how long to wait until an expensive piece of machinery breaks down—conclusions are often based on no more than a few dozen failures. In the world of customers, tens of thousands is the lower limit, since customer databases often contain data on millions of customers and former customers. Much of the statistical background of survival analysis is focused on extracting every last bit of information out of a few hundred data points. In data mining applications, the volumes of data are so large that statistical concerns about confidence and accuracy are replaced by concerns about manag- ing large volumes of data. The importance of survival analysis is that it provides a way of understanding time-to-event characteristics, such as: ■■ When a customer is likely to leave ■■ The next time a customer is likely to migrate to a new customer segment ■■ The next time a customer is likely to broaden or narrow the customer relationship ■■ The factors in the customer relationship that increase or decrease likely tenure ■■ The quantitative effect of various factors on customer tenure These insights into customers feed directly into the marketing process. They make it possible to understand how long different groups of customers are likely to be around—and hence how profitable these segments are likely to be. They make it possible to forecast numbers of customers, taking into account both new acquisition and the decline of the current base. Survival analysis also makes it possible to determine which factors, both those at the beginning of customers’ relationships as well as later experiences, have the biggest effect on customers’ staying around the longest. And, the analysis can be applied to things other then the end of the customer tenure, making it possible to determine when another event—such as a customer returning to a Web site—is no longer likely to occur. A good place to start with survival is with visualizing customer retention, which is a rough approximation of survival. After this discussion, we move on to hazards, the building blocks of survival. These are in turn combined into 470643 c12.qxd 3/8/04 11:17 AM Page 385 Hazard Functions and Survival Analysis in Marketing 385 survival curves, which are similar to retention curves but more useful. The chapter ends with a discussion of Cox Proportional Hazard Regression and other applications of survival analysis. Along the way, the chapter provides particular applications of survival in the business context. As with all statistical methods, there is a depth to survival that goes far beyond this introductory chapter, which is consciously trying to avoid the complex mathematics under- lying these techniques. Customer Retention Customer retention is a concept familiar to most businesses that are concerned about their customers, so it is a good place to start. Retention is actually a close approximation to survival, especially when considering a group of customers who all start at about the same time. Retention provides a familiar framework to introduce some key concepts of survival analysis such as customer half-life and average truncated customer tenure. Calculating Retention How long do customers stay around? This seemingly simple question becomes more complicated when applied to the real world. Understanding customer retention requires two pieces of information: ■■ When each customer started ■■ When each customer stopped The difference between these two values is the customer tenure, a good measurement of customer retention. Any reasonable database that purports to be about customers should have this data readily accessible. Of course, marketing databases are rarely simple. There are two challenges with these concepts. The first challenge is deciding on what is a start and stop, a decision that often depends on the type of business and available data. The second challenge is technical: finding these start and stop dates in available data may be less obvious than it first appears. For subscription and account-based businesses, start and stop dates are well understood. Customers start magazine subscriptions at a particular point in time and end them when they no longer want to pay for the magazine. Customers sign up for telephone service, a banking account, ISP service, cable service, an insurance policy, or electricity service on a particular date and cancel on another date. In all of these cases, the beginning and end of the relationship is well defined. Other businesses do not have such a continuous relationship. This is particu- larly true of transactional businesses, such as retailing, Web portals, and cata- logers, where each customer’s purchases (or visits) are spread out over time—or 470643 c12.qxd 3/8/04 11:17 AM Page 386 386 Chapter 12 may be one-time only. The beginning of the relationship is clear—usually the first purchase or visit to a Web site. The end is more difficult but is sometimes created through business rules. For instance, a customer who has not made a purchase in the previous 12 months may be considered lapsed. Customer retention analysis can produce useful results based on these definitions. A similar area of application is determining the point in time after which a customer is no longer likely to return (there is an example of this later in the chapter). The technical side can be more challenging. Consider magazine subscriptions. Do customers start on the date when they sign up for the subscription? Do customers start when the magazine first arrives, which may be several weeks later? Or do they start when the promotional period is over and they start paying? Although all three questions are interesting aspects of the customer relationship, the focus is usually on the economic aspects of the relationship. Costs and/or revenue begin when the account starts being used—that is, on the issue date of the magazine—and end when the account stops. For understanding customers, it is definitely interesting to have the original contact date and time, in addition to the first issue date (are customers who sign up on weekdays different from customers who sign up on weekends?), but this is not the beginning of the economic relationship. As for the end of the promotional period, this is really an initial condition or time-zero covariate on the customer relationship. When the customer signs up, the initial promotional period is known. Survival analysis can take advantage of such initial conditions for refining models. What a Retention Curve Reveals Once tenures can be calculated, they can be plotted on a retention curve, which shows the proportion of customers that are retained for a particular period of time. This is actually a cumulative histogram, because customers who have tenures of 3 months are included in the proportions for 1 month and 2 months. Hence, a retention curve always starts at 100 percent. For now, let’s assume that all customers start at the same time. Figure 12.1, for instance, compares the retention of two groups of customers who started at about the same point in time 10 years ago. The points on the curve show the proportion of customers who were retained for 1 year, for 2 years, and so on. Such a curve starts at 100 percent and gradually slopes downward. When a retention curve represents customers who all started at about the same time— as in this case—it is a close approximation to the survival curve. Differences in retention among different groups are clearly visible in the chart. These differences can be quantified. The simplest measure is to look at retention at particular points in time. After 10 years, for instance, 24 percent of the regular customers are still around, and only about a third of them even make it to 5 years. Premium customers do much better. Over half make it to 5 years, and 42 percent have a customer lifetime of at least 10 years. 470643 c12.qxd 3/8/04 11:17 AM Page 387 Hazard Functions and Survival Analysis in Marketing 387 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0 12 24 36 48 60 72 84 96 108 120 High End Regular Percent Survived Tenure (Months after Start) Figure 12.1 Retention curves show that high-end customers stay around longer. Another way to compare the different groups is by asking how long it takes for half the customers to leave—the customer half-life (although the statistical term is the median customer lifetime). The median is a useful measure because the few customers who have very long or very short lifetimes do not affect it. In general, medians are not sensitive to a few outliers. Figure 12.2 illustrates how to find the customer half-life using a retention curve. This is the point where exactly 50 percent of the customers remain, which is where the 50 percent horizontal grid line intersects the retention curve. The customer half-life for the two groups shows a much starker difference than the 10-year survival—the premium customers have a median lifetime of close to 7 years, whereas the regular customers have a median a bit under over 2 years. Finding the Average Tenure from a Retention Curve The customer half-life is useful for comparisons and easy to calculate, so it is a valuable tool. It does not, however, answer an important question: “How much, on average, were customers worth during this period of time?” Answering this question requires having an average customer worth per time and an average retention for all the customers. The median cannot provide this information because the median only describes what happens to the one customer in the middle; the customer at exactly the 50 percent rank. A question about average customer worth requires an estimate of the average remaining lifetime for all customers. There is an easy way to find the average remaining lifetime: average customer lifetime during the period is the area under the retention curve. There is a clever way of visualizing this calculation, which Figure 12.3 walks through. 470643 c12.qxd 3/8/04 11:17 AM Page 388 388 Chapter 12 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0 12 24 36 48 60 72 84 96 108 120 High End Regular Percent Survived Tenure (Months after Start) Figure 12.2 The median customer lifetime is where the retention curve crosses the 50 percent point. First, imagine that the customers all lie down with their feet lined up on the left. Their heads represent their tenure, so there are customers of all different heights (or widths, because they are horizontal) for customers of all different tenures. For the sake of visualization, the longer tenured customers lie at the bottom holding up the shorter tenured ones. The line that connects their noses counts the number of customers who are retained for a particular period of time (remember the assumption that all customers started at about the same point in time). The area under this curve is the sum of all the customers’ tenures, since every customer lying horizontally is being counted. Dividing the vertical axis by the total count produces a retention curve. Instead of count, there is a percentage. The area under the curve is the total tenure divided by the count of customers—voilà, the average customer tenure during the period of time covered by the chart. TIP The area under the customer retention curve is the average customer lifetime for the period of time in the curve. For instance, for a retention curve that has 2 years of data, the area under the curve represents the two-year average tenure. This simple observation explains how to obtain an estimate of the average customer lifetime. There is one caveat when some customers are still active. The average is really an average for the period of time under the retention curve. Consider the earlier retention curve in this chapter. These retention curves were for 10 years, so the area under the curves is an estimate of the average customer lifetime during the first 10 years of their relationship. For customers who are still active at 10 years, there is no way of knowing whether they will all leave at 10 years plus one day; or if they will all stick around for another century. For this rea- son, it is not possible to determine the real average until all customers have left. 470643 c12.qxd 3/8/04 11:17 AM Page 389 Hazard Functions and Survival Analysis in Marketing 389 time A group of customers with different tenures are stacked on top of each other. Each bar represents one customer. At each point in time, the edges count the number of customers active at that time. Notice that the sum of all the areas is the sum of all the customer tenures. Proportion of Number of Customers Customers Making the vertical axis a proportion instead of a count produces a curve that looks the same. This is a retention curve. The area under the retention curve is the average customer tenure. Figure 12.3 Average customer tenure is calculated from the area under the retention curve. This value, called truncated mean lifetime by statisticians, is very useful. As shown in Figure 12.4, the better customers have an average 10-year lifetime of 6.1 years; the other group has an average of 3.7 years. If, on average, a customer is worth, say, $100 per year, then the premium customers are worth $610 – $370 = $240 more than the regular customers during the 10 years after they start, or about $24 per year. This $24 might represent the return on a retention program designed specifically for the premium customers, or it might give an upper limit of how much to budget for such retention programs. Looking at Retention as Decay Although we don’t generally advocate comparing customers to radioactive materials, the comparison is useful for understanding retention. Think of customers as a lump of uranium that is slowly, radioactively decaying into lead. Our “good” customers are the uranium; the ones who have left are the lead. Over time, the amount of uranium left in the lump looks something like our retention curves, with the perhaps subtle difference that the timeframe for uranium is measured in billions of years, as opposed to smaller time scales. [...]... were forced to leave, so they are censored at the point of attrition instead of being considered stopped All the data from before they left is included in the calculation of the hazard functions for voluntary attrition — since this they remained as customers before then time Figure 12.8 Using censoring makes it possible to develop hazard models for voluntary attrition that include customers who were forced... only in terms of their data There is no general shape, no parametric form, no grand theory of customer decay The data is the message Hazard probabilities extend this idea As discussed here, they are an exam ple of a nonparametric statistical approach—letting the data speak instead of finding a special function to speak for it Empirical hazard probabilities simply let the historical data determine what... 1, whereas a probability never does Also, a rate seems less intuitive for many survival problems encountered with customers The method for calculating hazards in this chapter is called the life table method, and it works well with discrete time data A very similar method, called Kaplan-Meier, is used for continuous time data The two techniques produce almost exactly the same results when events occur... customer relationship, to filter customers for analysis The right approach is to break this into two problems What are the hazards for voluntary attrition? What are the hazards for forced attrition? Each of these uses all the customers, censoring the customers who leave due to other factors When calculating the hazards for voluntary attrition, whenever a customer is forced to leave, the customer is included... are censored because they are still active Let’s walk through the hazard calculation for these customers, paying par ticular attention to the role of censoring When looking at customer data for hazard calculations, both the tenure and the censoring flag are needed For the customers in Figure 12.7, Table 12.2 shows this data It is instructive to see what is happening during each time period At any point... For our purposes, the simpler hazard probabilities are not only easier to explain, but they also solve the problems that arise when work ing with customer data TE 402 Other Types of Censoring The previous section introduced censoring in two cases: hazards for customers after they have stopped and hazards for customers who are still active There Team-Fly® Hazard Functions and Survival Analysis in Marketing. .. to find the best functional form for the hazards This is an alternative approach, calculating the hazards directly from the data The parameterized approach has the important advantage that it can more easily include covariates in the process Later in this chapter, there is an example based on such a parameterized model Unfortunately, the hazard function rarely follows a form that would be familiar... to be very valuable for understanding customers and quantifying marketing efforts in terms of customer retention It provides a way of estimating how long it will be until something occurs This section gives some particular examples of survival analysis Handling Different Types of Attrition Businesses that deal with customers have to deal with customers leaving for a variety of reasons Earlier, this... relationships due to unpaid bills? If such a customer were forced to stop on day 100, then that customer did not stop vol untarily on days 1–99 This information can be used to generate hazards for voluntary stops However, starting on day 100, the customer is censored, as shown in Figure 12.8 Censoring customers, even when they have stopped for other reasons, makes it possible to understand different types... hazards have significant consequences To provide an example of hazards, let’s step outside the world of business for a moment and consider life tables, which describe the probability of someone dying at a particular age Table 12.1 shows this data, for the U.S pop ulation in 2000: Table 12.1 Hazards for Mortality in the United States in 2000, Shown as a Life Table AGE PERCENT OF POPULATION THAT DIES IN EACH . Detection 381 Lessons Learned Automatic cluster detection is an undirected data mining technique that can be used to learn about the structure of complex databases. By breaking complex datasets into. customer databases often contain data on millions of customers and former customers. Much of the statistical background of survival analysis is focused on extracting every last bit of information. on extracting every last bit of information out of a few hundred data points. In data mining applications, the volumes of data are so large that statistical concerns about confidence and

Ngày đăng: 21/06/2014, 04:20

Xem thêm: John wiley sons data mining techniques for marketing sales_13 docx, John wiley sons data mining techniques for marketing sales_13 docx

John wiley sons data mining techniques for marketing sales_13 docx

Thông tin tài liệu

Từ khóa liên quan

Mục lục

sample.pdf

sterling.com

Welcome to Sterling Software

Tài liệu cùng người dùng

Tài liệu liên quan