... that, on a technical level, the datamining effort is working and
the data is reasonably accurate. This can be quite comforting. If the data and
the dataminingtechniques applied to it are powerful ... of DataMining 33
Table 2.1 DataMining Differs from Typical Operational Business Processes
TYPICAL OPERATIONAL SYSTEM DATAMINING SYSTEM
Operations and reports on Analysis on historical data ... of techniques to
apply in a particular situation depends on the nature of the datamining task,
the nature of the available data, and the skills and preferences of the data
miner.
Data mining...
... level
data, 96
publications
Building the Data Warehouse (Bill
Inmon), 474
Business Modeling and DataMining
(Dorian Pyle), 60
Data Preparation forDataMining
(Dorian Pyle), 75
The Data ...
Business Modeling and Data Mining, 60
Data Preparation forData Mining, 75
470643 bindex.qxd 3/8/04 11:08 AM Page 619
C
Index 619
calculations, probabilities, 133–135
call detail databases, 37 ... Preparation forDataMining
(Dorian Pyle), 75
The Data Warehouse Toolkit (Ralph
Kimball), 474
data warehousing
customer patterns, 5
for decision support, 13
discussed, 4
database administrators...
... analyzing data
on the information.
can provide value.
into actionable information
using datamining techniques.
Identify
Transform data
1 2 3 4 5 6 7 8 9 10
Measure the results
of the efforts ... of datamining in practice. Figure 2.1
shows the four stages:
1. Identifying the business problem.
2. Miningdata to transform the data into actionable information.
3. Acting on the information. ... them.
How DataMining Is Being Used Today
This whirlwind tour of a few interesting applications of datamining is
intended to demonstrate the wide applicability of the datamining techniques...
...
before. The newly discovered relationships suggest new hypotheses to test
and the datamining process begins all over again.
Lessons Learned
Data mining comes in two forms. Directed datamining ... c04.qxd 3/8/04 11:10 AM Page 97
Data Mining Applications 97
mining techniques used to generate the scores. It is worth noting, however,
that many of the dataminingtechniques in this book can ... California based on data that excludes calls to Los Angeles.
Step Six: Transform Data to Bring
Information to the Surface
Once the data has been assembled and major data problems fixed, the data...
... Statistics: DataMining Using Familiar Tools 127
Looking at Discrete Values
Much of the data used in datamining is discrete by nature, rather than contin-
uous. Discrete data shows up in the form ... can be used to purchase household-level informa-
tion about prospects from providers of marketing data. This sort of data is use-
ful for targeting broad, general segments such as “young mothers” ... 11:10 AM Page 109
Data Mining Applications 109
Start Tracking Customers before
They Become Customers
It is a good idea to start recording information about prospects even before
they become...
... 159
The Lure of Statistics: DataMining Using Familiar Tools 159
statisticians use similar techniques to solve similar problems, the datamining
approach differs from the standard statistical ... in several areas:
■■ Data miners tend to ignore measurement error in raw data.
■■ Data miners assume that there is more than enough data and process-
ing power.
■■ Datamining assumes dependency ...
censored data.
470643 c05.qxd 3/8/04 11:11 AM Page 151
The Lure of Statistics: DataMining Using Familiar Tools 151
Table 5.6 Calculating the Expected Values and Deviations from Expected for the Data...
... to generalize and learn fromdata
mimics, in some sense, our own ability to learn from experience. This ability is
useful fordata mining, and it also makes neural networks an exciting area for ... test set to see how well it performs.
7. Apply the model generated by the network to predict outcomes for
unknown inputs.
Fortunately, datamining software now performs most of these steps auto-
matically. ... 0 (for 0 children), 0.5 (for one
child), 0.75 (for two children), 0.875 (for three children), and so on. For cate-
gorical variables, it is often easier to keep mapped values in the range from...
... applied to data. These patterns can be turned into new features of the data,
for use in conjunction with other directed datamining techniques.
470643 c11.qxd 3/8/04 11:17 AM Page 355
Automatic Cluster ... be
desirable to filter outliers from the data; more often, the solution is to massage
the data values. Later in this chapter there is a section on data preparation for
clustering which describes ... structure in data. However, there is no one right
description of that structure. For instance, someone not from New York City
may think that the whole city is “downtown.” Someone from Brooklyn...
...
These two customers were forced to
leave, so they are censored at the
point of attrition instead of being
All the datafrom before they left is
hazard functions for voluntary
attrition — ... periods ago. The survival at
any given point in time t uses information from all customers. The hazard at
time t uses information from all customers whose tenure is greater than or
equal to ... Survival,
though, is calculated by combining all the information for hazards from
smaller values of t.
Because survival calculations use all the data, the values are more stable
than retention calculations....
...
Choosing a DataMining Technique
The choice of which datamining technique or techniques to apply depends on
the particular datamining task to be accomplished and on the data available
for analysis. ... to build the datamining team and secure sponsorship for a data
mining pilot.
The successful efforts crossed corporate boundaries to involve people from
both marketing and information technology. ...
data fields.
Formulate the Business Goal as a DataMining Task
The first step is to take a business goal such as “improve retention” and turn it
into one or more of the datamining tasks from...
... of real data to use for training sets. Consequently, they spent
much time and effort trying to coax the last few drops of information from
their impoverished datasets—a problem that data miners ... Trees as a Data Exploration Tool
During the data exploration phase of a datamining project, decision trees are a
useful tool for picking the variables that are likely to be important for predict-
ing ... value of an attribute from one time period
to the next are likely to be important. Therefore, for each numeric attribute, the
software automatically generates interpretations for the difference...
... shared lines from dedicated or data lines, we
assumed that any number that calls information—411 or 555-1212 (directory
assistance services)—is used for voice communications, and is therefore a
voice ... significant improvement.
The Data
Cellular telephone data is similar to the call detail data seen in the previous case
study for finding fax machines. There is a record for each call that includes ... these operations are suitable fordata stored in a relational database.
SELECT originating_number
FROM call_detail
WHERE terminating_number IN (SELECT number FROM dedicated_fax)
AND duration...