Thông tin tài liệu
RuleExtractionfromArtificial NeuralNetworks RudySetiono SchoolofComputing NationalUniversityofSingapore 1 Outline 1. Thetrainproblem 2. Motivations 2. Motivations 3. Feedforward neuralnetworksforclassification 4 Rl ttif l tk 4 . R u l eex t rac ti on f romneura l ne t wor k s 5. Examples 6. Differenttypesofclassificationrules 7. Regression rules 7. Regression rules 8. Hierarchicalrules:TheReRX algorithm 9 Cli 9 . C onc l us i ons 2 1.Thetrainproblem Westbound trains Eastbound trains scribe which trains are east/westbound? Attributes of a train: lon g cars can onl y be rectan g ular , and if closed then their roofs are gyg, either jagged or flat if a short car is rectangular then it is also double side d a short closed rectan g ular car can have either a flat or p eaked roof 3 gp Thetrainproblem Westbound trains Eastbound trains Att ib t f t i Att r ib u t es o f a t ra i n: a long car can have either two or three axels a car can be either open or closed a train has 2,3 or 4 cars, each can be either short or long 4 ……. Thetrainproblem Westbound trains Eastbound trains Answers : Answers : if a train has short closed car, then it is westbound, otherwise eastbound if a train has two cars, or has a car with a jagged roof, then it is eastbound, otherwise westbound. and many others 5 All the above rules can be obtained by neural networks! 2.Motivations • Neuralnetworkshavebeenappliedtosolvemanyapplicationproblems involving ‐ patternclassification ‐ functionapproximation/datafitting ‐ dataclustering • Theyoftengivebetterpredictiveaccuracythanothermethodssuchas regressionordecisiontrees. • Dataminin g usin g neuralnetworks:: ifwecanextractrulesfromatrained g g network,abetterunderstandingaboutthedataandtheproblemcanbe gained gained . • Howtoextractsuchrules? 6 3.Feedforward neuralnetworksforpatternclassification • Dataisfedintothenetworkinputunits • Patternclassificationisdeterminedbytheoutputunitwiththe largestoutputvalue. • Unitsinthehiddenlayerallowthenetworktoseparateany number of disjoint sets 7 number of disjoint sets . Networkhiddenunits Foreachunit: I N is usually fi d I 1 fi xe d , I N = 1 • Sumoftheweightedinputs iscomputed: net=I t w • Anonlinearfunctionisappliedtoobtainedtheunit’sactivation l va l ue: o=f(net) • This acti ation fnctionis sall the logistic sigmoid fnction 8 • This acti v ation f u nction is u s u all y the logistic sigmoid f u nction (unipolar)orthehyperbolictangentfunction(bipolar). Hyperbolictangentfunction • Thefunctionisusedtoapproximatetheon‐offfunction. • Thesumoftheweight edinputslarge outputiscloseto1(on). • Thesumoftheweight edinputssmall outputiscloseto‐1(off). • Differentiable: f'(net)=(1‐o 2 )/2 whereo=f(net) • Derivative is largest when o = 0 that is when net = 0 9 • Derivative is largest when o = 0 , that is when net = 0 • andapproaches0as|net|becomeslarge. Neuralnetworktraining • Givenasetofdata,minimisethetotalerrors: Σ i ( target ‐ predicted ) 2 Σ i ( target i ‐ predicted i ) • Supervisedlearning. • Nonlinearoptimisationproblem:findasetofneuralnetworkweightsthat minimisesthetotalerrors. • Optimisationmethodsused:backpropagation/gradientdescent,quasi‐Newton method,conjugategradientmethod,etc. • Apenaltytermisusuallyaddedtotheerrorfunctionsothatredundantconnections havesmall/zeroweights. • Anexampleofanaugmentederrorfunction: Σ i (target i – predict i ) 2 +CΣ j w j 2 • N = # of samples • K = # of weights 10 i j j • C is a penalty parameter [...]... 30 Rule extraction from a pruned network REFANN (Rule extraction from function approximating neural networks): (Rule extraction from function approximating neural networks): o For each hidden unit, approximate the hidden unit activation function by a 3‐ piece linear function o Replace the predicted output of the network by the linear combination of these piece‐wise linear functions. These are the rule consequence... 6. Various types of classification rules 1 Propositional: IF …. THEN …., ELSE … 2 MofN l IF M f th i 2 M fN rules: IF M of the given N conditions are satisfied, then … N diti ti fi d th 3 Fuzzy rules: IF X is large, THEN … , ELSE IF X is medium, THEN … 4 Oblique rules: If X1 + 2X2 + 5X3 ≥ 100, THEN … 5 Hierarchical rule set (more on this later) Variants of the network rule extraction algorithms have been... St t with a trained fully connected network t d t k 2 Identify potential connection for pruning (for example, one with small magnitude) 3 3 Set the weight of this connection to 0 0 4 Retrain the network (if necessary) 5 If the network still meets the required accuracy, go to step 2 6 Otherwise, restore the removed connection and its corresponding weight Stop 12 4. Rule extraction from neural networks ... An algorithm is needed to ensure the network does not lose its accuracy 3 Generate classification rules in terms of clustered activation values 4 Generate rules which explain the clustered activation values in terms of the 4 G t l hi h l i th l t d ti ti l i t f th input data attributes 5 Merge the two sets of rules 5 M th t t f l Decompositional approach! 13 Rule extraction by decompositional approach Output layer Hidden layer Input layer... Data is split into 350 training samples and 349 test samples • 100 neural networks were trained: 100 l t k t i d ‐ Original number of hidden units: 5 ‐ Original number of connections: 460 • After pruning: ‐ Average number of connections: 10.70 ‐ Average predictive accuracy: 92.70% 19 Breast Cancer Diagnosis: Example 1 o Extracted rules: If uniformity of cell size ≤ 4 If uniformity of cell size ≤ 4 and bare nuclei ≤ 5, then benign benign Else malignant...Neural network pruning • After a network has been trained, redundant connections and units are removed by pruning • Pruned networks generalise better: they can predict new patterns better than fully connected networks • Simple classification rules can be extracted from skeletal pruned Si l l ifi i l b df k l l d networks • Various methods for network pruning can be found in the literature... Hierarchical rule set (more on this later) Variants of the network rule extraction algorithms have been developed to extract different types of rules! 28 7. Regression rules Data points Least square regression line Piecewise regression line Neural network output Piecewise linear approximation! 29 Regression rules (continued) • Neural network fits the data better than linear regression line • Th k The knowledge embedded in a neural network is ... The network has only 2 hidden units and 10 connections after pruning • It correctly classifies all but one training pattern • 2‐dimensional plot of the activation values 16 Scattered plot of hidden unit activations H2 setosa virginica versicolor H1 Rule in terms of the hidden unit activations: If H1 ‐0.7: Iris setosa Else if H2 ‐0.55: Iris versicolor Else: Iris virginica 17 Iris classification rules If petal length ... 142, then Y2 f Default rule: Y1 Y1 = 4 96 + 0.0036 MMIN + 0.0032 MMAX + 0.3086 CACH + 0.3366 CHMAX 4.96 0 0036 0 0032 0 3086 0 3366 Y2 = -453.02 + 0.0159 MMIN + 0.0143 MMAX + 1.3662 CACH + 1.4903 CHMAX 35 Example 1 (continued) Predictive error rates: RMSE: root mean squared errors RRMSE: relative RMSE MAE: mean absolute error 50 RMAE: relative MAE RMAE: relative MAE 45 40 35 30 Neural network NN rules Linear... Input attributes: * MYCT: machine cycle time (nanoseconds) * MMIN: minimum main memory (KB) * MMAX: maximum main memory (KB) MMAX: maximum main memory (KB) * CACH: cache memory (KB) * CHMIN: minimum channels * CHMAX: maximum channels • Output: CPU’s relative performance • Samples: 167 training 21 x validation and 21 testing Samples: 167 training, 21 x‐validation and 21 testing • Neural network: 8 hidden units, 1 hidden unit remained after
Ngày đăng: 28/04/2014, 10:17
Xem thêm: nn rule extraction, nn rule extraction