... of the datamining task,
the nature of the available data, and the skills and preferences of the data
miner.
Data mining comes in two flavors—directed and undirected. Directed data
mining ... that, on a technical level, the datamining effort is working and
the data is reasonably accurate. This can be quite comforting. If the dataand
the dataminingtechniques applied to it are powerful ...
corporation to improve its marketing, sales, andcustomer support operations
through a better understanding of its customers. Keep in mind, however, that
the dataminingtechniquesand tools described...
... level
data, 96
publications
Building the Data Warehouse (Bill
Inmon), 474
Business Modeling andDataMining
(Dorian Pyle), 60
Data Preparation forDataMining
(Dorian Pyle), 75
The Data ... 89–90
metadata repository, 484, 491
methodologies
data correction, 72–74
data exploration, 64–68
data mining process, 54–55
data selection, 60–64
data transformation, 74–76
data translation, ...
Business Modeling andData Mining, 60
Data Preparation forData Mining, 75
470643 bindex.qxd 3/8/04 11:08 AM Page 619
C
Index 619
calculations, probabilities, 133–135
call detail databases, 37...
... of the datamining task,
the nature of the available data, and the skills and preferences of the data
miner.
Data mining comes in two flavors—directed and undirected. Directed data
mining ...
corporation to improve its marketing, sales, andcustomer support operations
through a better understanding of its customers. Keep in mind, however, that
the dataminingtechniquesand tools described ... cards, and
banking, for example. Adding to the deluge of internal data are external sources
of demographic, lifestyle, and credit information on retail customers, and credit,
financial, and marketing...
...
before. The newly discovered relationships suggest new hypotheses to test
and the datamining process begins all over again.
Lessons Learned
Data mining comes in two forms. Directed datamining ... California based on data that excludes calls to Los Angeles.
Step Six: Transform Data to Bring
Information to the Surface
Once the data has been assembled and major data problems fixed, the data ... 11:10 AM Page 97
Data Mining Applications 97
mining techniques used to generate the scores. It is worth noting, however,
that many of the dataminingtechniques in this book can and have been...
... value of their customer
data by beginning to track customers from their first response, even before they
become customers, and gathering and storing additional information when
customers are ... Page 109
Data Mining Applications 109
Start Tracking Customers before
They Become Customers
It is a good idea to start recording information about prospects even before
they become customers. ...
Segmenting the Customer Base
Customer segmentation is a popular application of datamining with estab-
lished customers. The purpose of segmentation is to tailor products, services,
and marketing...
... Use customers in California for the challenger and everyone else for the
champion.
■■ Use the 5 percent lowest and 5 percent highest value customers for the
challenger, and everyone else for ... percent most recent customers for the challenger, and every-
one else for the champion.
■■ Use the customers with telephone numbers for the telemarketing cam-
paign; everyone else for the direct ... in several areas:
■■ Data miners tend to ignore measurement error in raw data.
■■ Data miners assume that there is more than enough dataand process-
ing power.
■■ Datamining assumes dependency...
... common for neural networks are the logistic and the hyperbolic tangent.
The major difference between them is the range of their outputs, between 0 and
1 for the logistic and between –1 and 1 for ... generalize and learn from data
mimics, in some sense, our own ability to learn from experience. This ability is
useful fordata mining, and it also makes neural networks an exciting area for
research, ... test set to see how well it performs.
7. Apply the model generated by the network to predict outcomes for
unknown inputs.
Fortunately, datamining software now performs most of these steps auto-
matically....
... detection is
used to evaluate editorial zones for a major daily newspaper.
Searching for Islands of Simplicity
In Chapter 1, where dataminingtechniques are classified as directed or undi-
rected, ... the databases encountered in market-
ing, sales, andcustomer support are not about points in space. They are about
purchases, phone calls, airplane trips, car registrations, and a thousand other
things ... applied to data. These patterns can be turned into new features of the data,
for use in conjunction with other directed datamining techniques.
470643 c11.qxd 3/8/04 11:17 AM Page 355
Automatic Cluster...
... censoring. When looking at customerdatafor
hazard calculations, both the tenure and the censoring flag are needed. For the
customers in Figure 12.7, Table 12.2 shows this data.
It is instructive ... Practice
Survival analysis has proven to be very valuable for understanding customers
and quantifying marketing efforts in terms of customer retention. It provides a
way of estimating how long ... cus-
tomer databases often contain data on millions of customers and former
customers. Much of the statistical background of survival analysis is focused
on extracting every last bit of information...
... which datamining technique or techniques to apply depends on
the particular datamining task to be accomplished and on the data available
for analysis. Before deciding on a datamining technique, ... into a series of datamining tasks and under-
stand the nature of the available data in terms of the content and types of the
data fields.
Formulate the Business Goal as a DataMining Task
The ... models,
one based on the marketingdataand one based on call detail data.
The marketingdata was already summarized at the customer level and
stored in an easily accessible database system. Getting...
... preparing, and loading data. These are important and must be a standard and
repeatable process, but what is the role of meta data?
ã
Central control repository for all databases
ã
Repository fordata ...
discuss customer data. This data is used to cross-sell, up-sell, and retain existing customers. And finally, I discuss
several types of risk data. This is appropriate for both prospects and customers.
Data ...
Extracting and staging data from sources
ã
Cleaning and aligning data/ exception handling
ã
Transporting and loading data
ã
Summarizing data
ã
Refreshing process and procedures
ã
Employing meta data...
... various forms of estimated income (inc_est3). I have created three forms for each
model: inc_miss, inc_est3, and inc_low. These represent the original form after data clean-up (inc_est3) and two ... I have 22 forms of the variable estimated income. I have 20 continuous forms and 2 categorical forms. I will use
logistic regression to find the best form or forms of the variable for the final ... sensitivity level entering, and
sls
=, which stands for
sensitivity level staying. These are the sensitivity levels for variables entering and remaining in the model.
proc logistic data= acqmod.model2(keep=
active...