... that, on a technical level, the datamining effort is working and
the data is reasonably accurate. This can be quite comforting. If the data and
the dataminingtechniques applied to it are powerful ... resolve these issues. Datamining can help make more informed
decisions. It can suggest tests to make. Ultimately, though, the business needs
What Is Data Mining?
Data mining, as we use the ... of techniques to
apply in a particular situation depends on the nature of the datamining task,
the nature of the available data, and the skills and preferences of the data
miner.
Data mining...
... level
data, 96
publications
Building the Data Warehouse (Bill
Inmon), 474
Business Modeling and DataMining
(Dorian Pyle), 60
Data Preparation forDataMining
(Dorian Pyle), 75
The Data ...
Business Modeling and Data Mining, 60
Data Preparation forData Mining, 75
470643 bindex.qxd 3/8/04 11:08 AM Page 619
C
Index 619
calculations, probabilities, 133–135
call detail databases, 37 ... Preparation forDataMining
(Dorian Pyle), 75
The Data Warehouse Toolkit (Ralph
Kimball), 474
data warehousing
customer patterns, 5
for decision support, 13
discussed, 4
database administrators...
... analyzing data
on the information.
can provide value.
into actionable information
using datamining techniques.
Identify
Transform data
1 2 3 4 5 6 7 8 9 10
Measure the results
of the efforts ... of datamining in practice. Figure 2.1
shows the four stages:
1. Identifying the business problem.
2. Miningdata to transform the data into actionable information.
3. Acting on the information. ... hours
for reports
System of record fordata Copy of data
Descriptive and repetitive Creative
First, problems being addressed by datamining differ from operational
problems—a data mining...
...
before. The newly discovered relationships suggest new hypotheses to test
and the datamining process begins all over again.
Lessons Learned
Data mining comes in two forms. Directed datamining ... California based on data that excludes calls to Los Angeles.
Step Six: Transform Data to Bring
Information to the Surface
Once the data has been assembled and major data problems fixed, the data ... c04.qxd 3/8/04 11:10 AM Page 97
Data Mining Applications 97
mining techniques used to generate the scores. It is worth noting, however,
that many of the dataminingtechniques in this book can...
... which messages are most appropriate for each one.
Even a customer with low scores for every offer has higher scores for some
then others. In Mastering DataMining (Wiley, 1999), we describe how ... 11:10 AM Page 109
Data Mining Applications 109
Start Tracking Customers before
They Become Customers
It is a good idea to start recording information about prospects even before
they become ... together can be used to purchase household-level informa-
tion about prospects from providers of marketing data. This sort of data is use-
ful for targeting broad, general segments such as “young...
... in several areas:
■■ Data miners tend to ignore measurement error in raw data.
■■ Data miners assume that there is more than enough data and process-
ing power.
■■ Datamining assumes dependency ... most recent customers for the challenger, and every-
one else for the champion.
■■ Use the customers with telephone numbers for the telemarketing cam-
paign; everyone else for the direct mail ... 11:11 AM Page 159
The Lure of Statistics: DataMining Using Familiar Tools 159
statisticians use similar techniques to solve similar problems, the datamining
approach differs from the standard...
... test set to see how well it performs.
7. Apply the model generated by the network to predict outcomes for
unknown inputs.
Fortunately, datamining software now performs most of these steps auto-
matically. ... and learn from data
mimics, in some sense, our own ability to learn from experience. This ability is
useful fordata mining, and it also makes neural networks an exciting area for
research, ...
children variable might be mapped as follows: 0 (for 0 children), 0.5 (for one
child), 0.75 (for two children), 0.875 (for three children), and so on. For cate-
gorical variables, it is often easier...
... applied to data. These patterns can be turned into new features of the data,
for use in conjunction with other directed datamining techniques.
470643 c11.qxd 3/8/04 11:17 AM Page 355
Automatic Cluster ... Islands of Simplicity
In Chapter 1, where dataminingtechniques are classified as directed or undi-
rected, automatic cluster detection is described as a tool for undirected knowl-
edge discovery. ... clustering can be a directed activity because
clusters are sought for some business purpose. In marketing, clusters formed
for a business purpose are usually called “segments,” and customer...
... calculation for these customers, paying par-
ticular attention to the role of censoring. When looking at customer datafor
hazard calculations, both the tenure and the censoring flag are needed. For ... the data speak instead of
finding a special function to speak for it. Empirical hazard probabilities simply
let the historical data determine what is likely to happen, without trying to fit
data ...
measurement of customer retention.
Any reasonable database that purports to be about customers should have
this data readily accessible. Of course, marketing databases are rarely simple.
There are two...
...
Choosing a DataMining Technique
The choice of which datamining technique or techniques to apply depends on
the particular datamining task to be accomplished and on the data available
for analysis. ... to build the datamining team and secure sponsorship for a data
mining pilot.
The successful efforts crossed corporate boundaries to involve people from
both marketing and information technology. ... analysis of the combined data
mining and telemarketing action plan. Armed with this data, Comcast was
able to make an informed decision to invest in future datamining efforts. Of
course, the...
... of real data to use for training sets. Consequently, they spent
much time and effort trying to coax the last few drops of information from
their impoverished datasets—a problem that data miners ... Trees as a Data Exploration Tool
During the data exploration phase of a datamining project, decision trees are a
useful tool for picking the variables that are likely to be important for predict-
ing ... binomial formula was posthu-
mously published. So there are well-known formulas for determining what it
means to have observed E occurrences of some event in N trials.
In particular, there is a formula...
... significant improvement.
The Data
Cellular telephone data is similar to the call detail data seen in the previous case
study for finding fax machines. There is a record for each call that includes ... from dedicated or data lines, we
assumed that any number that calls information—411 or 555-1212 (directory
assistance services)—is used for voice communications, and is therefore a
voice line ... machines is
possible on a relational database. They are probably not the most efficient SQL
statements for this purpose, depending on the layout of the data, the database
engine, and the hardware...