nn rule extraction

48 179 0
nn rule extraction

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

RuleExtractionfromArtificial NeuralNetworks RudySetiono SchoolofComputing NationalUniversityofSingapore 1 Outline 1. Thetrainproblem 2. Motivations 2. Motivations 3. Feedforward neuralnetworksforclassification 4 Rl ttif l tk 4 . R u l eex t rac ti on f romneura l ne t wor k s 5. Examples 6. Differenttypesofclassificationrules 7. Regression rules 7. Regression  rules 8. Hierarchicalrules:TheReRX algorithm 9 Cli 9 . C onc l us i ons 2 1.Thetrainproblem Westbound trains Eastbound trains scribe which trains are east/westbound? Attributes of a train:  lon g cars can onl y be rectan g ular , and if closed then their roofs are gyg, either jagged or flat  if a short car is rectangular then it is also double side d  a short closed rectan g ular car can have either a flat or p eaked roof 3 gp Thetrainproblem Westbound trains Eastbound trains Att ib t f t i Att r ib u t es o f a t ra i n:  a long car can have either two or three axels  a car can be either open or closed  a train has 2,3 or 4 cars, each can be either short or long  4  ……. Thetrainproblem Westbound trains Eastbound trains Answers : Answers :  if a train has short closed car, then it is westbound, otherwise eastbound  if a train has two cars, or has a car with a jagged roof, then it is eastbound, otherwise westbound.  and many others 5 All the above rules can be obtained by neural networks! 2.Motivations • Neuralnetworkshavebeenappliedtosolvemanyapplicationproblems involving ‐ patternclassification ‐ functionapproximation/datafitting ‐ dataclustering • Theyoftengivebetterpredictiveaccuracythanothermethodssuchas regressionordecisiontrees. • Dataminin g usin g neuralnetworks:: ifwecanextractrulesfromatrained g g network,abetterunderstandingaboutthedataandtheproblemcanbe gained gained . • Howtoextractsuchrules? 6 3.Feedforward neuralnetworksforpatternclassification • Dataisfedintothenetworkinputunits • Patternclassificationisdeterminedbytheoutputunitwiththe largestoutputvalue. • Unitsinthehiddenlayerallowthenetworktoseparateany number of disjoint sets 7 number  of  disjoint  sets . Networkhiddenunits Foreachunit: I N is usually fi d I 1 fi xe d , I N = 1 • Sumoftheweightedinputs iscomputed: net=I t w • Anonlinearfunctionisappliedtoobtainedtheunit’sactivation l va l ue: o=f(net) • This acti ation fnctionis sall the logistic sigmoid fnction 8 • This  acti v ation  f u nction  is u s u all y the  logistic  sigmoid  f u nction  (unipolar)orthehyperbolictangentfunction(bipolar). Hyperbolictangentfunction • Thefunctionisusedtoapproximatetheon‐offfunction. • Thesumoftheweight edinputslarge outputiscloseto1(on). • Thesumoftheweight edinputssmall outputiscloseto‐1(off). • Differentiable: f'(net)=(1‐o 2 )/2 whereo=f(net) • Derivative is largest when o = 0 that is when net = 0 9 • Derivative  is  largest  when  o  =  0 , that  is  when  net  =  0 • andapproaches0as|net|becomeslarge. Neuralnetworktraining • Givenasetofdata,minimisethetotalerrors: Σ i ( target ‐ predicted ) 2 Σ i ( target i ‐ predicted i )  • Supervisedlearning. • Nonlinearoptimisationproblem:findasetofneuralnetworkweightsthat minimisesthetotalerrors. • Optimisationmethodsused:backpropagation/gradientdescent,quasi‐Newton method,conjugategradientmethod,etc. • Apenaltytermisusuallyaddedtotheerrorfunctionsothatredundantconnections havesmall/zeroweights. • Anexampleofanaugmentederrorfunction: Σ i (target i – predict i ) 2 +CΣ j w j 2 • N = # of samples • K = # of weights 10 i j j • C is a penalty parameter [...]... 30 Rule extraction from a pruned network REFANN  (Rule extraction from function approximating neural networks): (Rule extraction from function approximating neural networks): o For each hidden unit, approximate the hidden unit activation function by a 3‐ piece linear function o Replace the predicted output of the network by the linear combination of  these piece‐wise linear functions. These are the rule consequence... 6. Various types of classification rules 1 Propositional: IF …. THEN …., ELSE … 2 MofN l IF M f th i 2 M fN rules: IF M of the given N conditions are satisfied, then … N diti ti fi d th 3 Fuzzy rules: IF X is large, THEN … , ELSE IF X is medium, THEN … 4 Oblique rules: If X1 + 2X2 + 5X3 ≥ 100, THEN … 5 Hierarchical rule set (more on this later) Variants of the network rule extraction algorithms have been... St t with a trained fully connected network t d t k 2 Identify potential connection for pruning (for example, one with small magnitude) 3 3 Set the weight of this connection to 0 0 4 Retrain the network (if necessary) 5 If the network still meets the required accuracy, go to step 2 6 Otherwise, restore the removed connection and its corresponding weight Stop 12 4. Rule extraction from neural networks ... An algorithm is needed to ensure the network does not lose its accuracy 3 Generate classification rules in terms of clustered activation values 4 Generate rules which explain the clustered activation values in terms of the  4 G t l hi h l i th l t d ti ti l i t f th input data attributes 5 Merge the two sets of rules 5 M th t t f l Decompositional approach! 13 Rule extraction by decompositional approach Output layer Hidden layer Input layer... Data is split into 350 training samples and 349 test samples • 100 neural networks were trained:  100 l t k t i d ‐ Original number of hidden units: 5 ‐ Original number of connections: 460 • After pruning:  ‐ Average number of connections: 10.70 ‐ Average predictive accuracy: 92.70% 19 Breast Cancer Diagnosis: Example 1  o Extracted rules: If uniformity of cell size ≤ 4  If uniformity of cell size ≤ 4 and bare nuclei ≤ 5, then  benign benign Else malignant...Neural network pruning • After a network has been trained, redundant connections and units are  removed by pruning • Pruned networks generalise better: they can predict new patterns better  than fully connected networks • Simple classification rules can be extracted from skeletal pruned  Si l l ifi i l b df k l l d networks • Various methods for network pruning can be found in the literature... Hierarchical rule set (more on this later) Variants of the network rule extraction algorithms have been developed to extract different types of rules! 28 7. Regression rules Data points Least square regression line Piecewise regression line Neural network output Piecewise linear approximation! 29 Regression rules (continued) • Neural network fits the data better than linear  regression line • Th k The knowledge embedded in a neural network is ... The network has only 2 hidden units and 10 connections after pruning • It correctly classifies all but one training pattern • 2‐dimensional plot of the activation values 16 Scattered plot of hidden unit activations  H2 setosa virginica versicolor H1 Rule in terms of the hidden unit activations: If H1  ‐0.7: Iris setosa Else if H2  ‐0.55: Iris versicolor Else: Iris virginica 17 Iris classification rules If petal length ... 142, then Y2 f Default rule:  Y1 Y1 = 4 96 + 0.0036 MMIN + 0.0032 MMAX + 0.3086 CACH + 0.3366 CHMAX 4.96 0 0036 0 0032 0 3086 0 3366 Y2 = -453.02 + 0.0159 MMIN + 0.0143 MMAX + 1.3662 CACH + 1.4903 CHMAX 35 Example 1 (continued) Predictive error rates: RMSE: root mean squared errors RRMSE: relative RMSE MAE: mean absolute error 50 RMAE: relative MAE RMAE: relative MAE 45 40 35 30 Neural network NN rules Linear... Input attributes:  * MYCT: machine cycle time (nanoseconds) * MMIN: minimum main memory (KB) * MMAX: maximum main memory (KB) MMAX: maximum main memory (KB) * CACH: cache memory (KB) * CHMIN: minimum channels * CHMAX: maximum channels • Output: CPU’s relative performance • Samples: 167 training 21 x validation and 21 testing Samples: 167 training, 21 x‐validation and 21 testing • Neural network: 8 hidden units, 1 hidden unit remained after 

Ngày đăng: 28/04/2014, 10:17

Tài liệu cùng người dùng

Tài liệu liên quan