Fundamentals of Database systems 3th edition PHẦN 9 pptx

constants, or if a variable is repeated twice in the rule head, it can easily be rectified: a constant c is replaced by a variable X, and a predicate equal(X, c) is added to the rule body. Similarly, if a variable Y appears twice in a rule head, one of those occurrences is replaced by another variable Z, and a predicate equal(Y, Z) is added to the rule body. The evaluation of a nonrecursive query can be expressed as a tree whose leaves are the base relations. What is needed is appropriate application of the relational operations of SELECT, PROJECT, and JOIN, together with set operations of UNION and SET DIFFERENCE, until the predicate in the query gets evaluated. An outline of an inference algorithm GET_EXPR(Q) that generates a relational expression for computing the result of a DATALOG query Q = p(arg 1 , arg 2 , , arg n ) can informally be stated as follows: 1. Locate all rules S whose head involves the predicate p. If there are no such rules, then p is a fact-defined predicate corresponding to some database relation R p ; in this case, one of the following expressions is returned and the algorithm is terminated (we use the notation $i to refer to the name of the i th attribute of relation R p ); a. If all arguments are distinct variables, the relational expression returned is R p . b. If some arguments are constants or if the same variable appears in more than one argument position, the expression returned is SELECT<condition>(R p ), where the selection <condition> is a conjunctive condition made up of a number of simple conditions connected by AND, and constructed as follows: i. if a constant c appears as argument i, include a simple condition ($i = c) in the conjunction. ii. If the same variable appears in both argument locations j and k, include a condition ($j = $k) in the conjunction. c. For an argument that is not present in any predicate, a unary relation containing values that satisfy all conditions is constructed. Since the rule is assumed to be safe, this unary relation must be finite. 2. At this point, one or more rules S i , i = 1, 2, , n, n > 0 exist with predicate p as their head. For each such rule S i , generate a relational expression as follows: a. Apply selection operations on the predicates in the RHS for each such rule, as discussed in Step 1. b. A natural join is constructed among the relations that correspond to the predicates in the body of the rule S i over the common variables. For arguments that gave rise to the unary relations in Step 1(c), the corresponding relations are brought as members into the natural join. Let the resulting relation from this join be R s . c. If any built-in predicate X h Y was defined over the arguments X and Y, the result of the join is subjected to an additional selection: SELECT X h Y (R s ), d. Repeat Step 2(b) until no more built-in predicates apply. 3. Take the UNION of the expressions generated in Step 2 (if more than one rule exists with predicate p as its head). 25.5.4 Concepts for Recursive Query Processing in Datalog 1 Page 698 of 893 Naive Strategy Seminaive Strategy The Magic Set Rule Rewriting Technique Query processing can be separated into two approaches: • Pure evaluation approach: Creating a query evaluation plan that produces an answer to the query. • Rule rewriting approach: Optimizing the plan into a more efficient strategy. Many approaches have been presented for both recursive and nonrecursive queries. We discussed an approach to nonrecursive query evaluation earlier. Here we first define some terminology for recursive queries, then discuss the naive and seminaive approaches to query evaluation—which generate simple plans—and then present the magic set approach—which is an optimization based on rule rewriting. We have already seen examples involving recursive rules where the same predicate occurs in the head and in the body of a rule. Another example is ancestor(X,Y) :- ancestor(X,Z), parent(Z,Y) which states that Y is an ancestor of X if Z is an ancestor of X and Y is a parent of Z. It is in conjunction with the rule ancestor(X,Y) :- parent (X,Y) which states that if Y is a parent of X, then Y is an ancestor of X. A rule is said to be linearly recursive if the recursive predicate appears once and only once in the RHS of the rule. For example, sg(X,Y) :- parent(X,XP), parent(Y,YP), sg(XP,YP) is a linear rule in which the predicate sg (same-generation cousins) is used only once in RHS. The rule states that X and Y are same-generation cousins if their parents are same-generation cousins. The rule 1 Page 699 of 893 ancestor(X,Y) :- ancestor(X,Z), parent(Z,Y) is called left linearly recursive, while the rule ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y), is called right linearly recursive. Notice that the rule ancestor(X,Y) :- ancestor(X,Z), ancestor(Z,Y) is not linearly recursive. It is believed that most "real-life" rules can be described as linear recursive rules; algorithms have been defined to execute linear sets of rules efficiently. The preceding definitions become more involved when a set of rules with predicates that occur on both the LHS and the RHS of rules are considered. A predicate whose relation is stored in the database is called an extensional database (EDB) predicate, while a predicate for which the corresponding relation is defined by logical rules is called an intensional database (IDB) predicate. Given a Datalog program with relations corresponding to the predicates, the "if" symbol, :-, may be replaced by an equality to form Datalog equations, without any loss of meaning. The resulting set of Datalog equations could potentially have many solutions. In a set of relations for the EDB predicates, say R 1 , R 2 , , R n , a fixed point of the Datalog equations is a solution for the relations corresponding to the IDB predicates of those equations. The fixed point with respect to the given EDB relations, along with those relations, forms a model of the rules from which the Datalog equations were derived. However, it is not true that every model of a set of Datalog rules is a fixed point of the corresponding Datalog equations, because the model may have "too many" facts. It turns out that Datalog programs each have a unique minimal model containing any given EDB relations, and this also corresponds to the unique minimal fixed point, with respect to those EDB relations. Formally, given a family of solutions S i = P 1(i) , ,P m(i) , to a given set of equations, the least fixed point of a set of equations is obtained by finding the solution whose corresponding relations are the smallest proper subsets for all relations. For example, we say S 1 1 S 2 , if relation P k(1) is a subset of relation P k(2) for all k, 1 1 k 1 m. Fixpoint theory was first developed in the field of recursion theory as a tool for explaining recursive functions. Since Datalog has an ability to express recursion, fixpoint theory is well suited for describing the semantics of recursive functions. For example, if we represent a directed graph by the predicate edge(X,Y) such that edge (X,Y) is true if and only if there is an edge from node X to node Y in the graph, the paths in the graph may be expressed by the following rules: 1 Page 700 of 893 path(X,Y) :- edge(X,Y) path(X,Y) :- path(X,Z), path (Z,Y) Notice that there are other ways of defining paths recursively. Let us assume that relations P and A correspond to the predicates path and edge in the preceding rules. The transitive closure of relation P contains all possible pairs of nodes that have a path between them, and it corresponds to the least fixed- point solution corresponding to the equations that result from the preceding rules (Note 6). These rules can be turned into a single equation for the relation P corresponding to the predicate edge. P(X,Y) = A(X,Y) D p X,Y (P(X,Z)P(Z,Y)) Suppose that the nodes are 3,4,5 and A = {(3,4), (4,5)}. From the first and second rules we can infer that (3,4), (4,5) and (3,5) are in P. We need not look for any other paths, because P = {(3,4),(4,5),(3,5)} is a solution of the above equation: {(3,4),(4,5),(3,5)} = {(3,4),(4,5)}D p X,Y ({(3,4),(4,5),(3,5)}{ (3,4),(4,5),(3.5)}) This solution constitutes a proof theoretic meaning of the rules, as it was derived from the EDB relation A, using just the rules. It is also the minimal model of the rules or the least fixed point of the equation. For evaluating a set of Datalog rules (equations) that may contain recursive rules, a large number of strategies have been proposed, details of which are beyond our scope. Here we illustrate three important techniques: the naive strategy, the seminaive strategy, and the use of magic sets. Naive Strategy The naive evaluation method is a pure evaluation, bottom-up strategy which computes the least model of a Datalog program. It is an iterative strategy and at each iteration all rules are applied to the set of tuples produced thus far to generate all implicit tuples. This iterative process continues until no more new tuples can be generated. The naive evaluation process does not take into account query patterns. As a result, a considerable amount of redundant computation is done. We present two versions of the naive method, called Jacobi 1 Page 701 of 893 and Gauss-Seidel solution methods; these methods get their names from well known algorithms for the iterative solution of systems of equations in numerical analysis. Assume the following system of relational equations, formed by replacing the :- symbol by an equality sign in a Datalog program. R i = E i (R 1 , R 2 , , R n ) The Jacobi method proceeds as follows. Initially, the variable relations R i are set equal to the empty set. Then, the computation R i = E i (R 1 , R 2 , , R n ), i = 1, , n is iterated until none of the R i changes between two consecutive iterations (i.e., until the R i reach a fixpoint). Algorithm 25.1 Jacobi naive strategy. Input: A system of algebraic equations and an EDB. Output: The values of the variable relations R 1 , R 2 , , R n . for i = 1 to n do R i = ; repeat condition = true; for i = 1 to n do S i = R i ; for i = 1 to n do begin R i = E i (S 1 , , S n ); If R i S i then condition = false end until condition; 1 Page 702 of 893 The convergence of the Jacobi method can be slightly improved if, at each step k, in order to compute the new value R i (k) , we substitute in E i the values of R j (k) that have just been computed in the same iteration instead of the old values R j (k - 1). This variant of the Jacobi method is called the Gauss-Seidel method, which produces the same result as the Jacobi algorithm. Consider the following example where ancestor(X, Y) means X is ancestor of Y; parent(X, Y) means X is parent of Y. ancestor(X,Y) :- parent(X,Y). ancestor(X,Y) :- ancestor(X,Z), parent(Z,Y). If we define a relation A for the predicate ancestor and P for the parent, the Datalog equation for the above rules can be written in the form: A(X,Y) = p X,Y (A(X,Z)P(Z,Y)) D A(X,Y) Suppose the EDB is given as P = {(bert, alice), (bert, george), (alice, derek), (alice, pat), (derek, frank)}. Let us follow the Jacobi algorithm. The parent tree looks as in Figure 25.09. Initially, we set A(0) = , enter the repeat loop, and set condition = true. We then initialize S 1 = A = , then compute the first value of A. Since the first join involves an empty relation, we get A (1) = P = {(bert, alice),(bert, george),(alice, derek),(alice, pat),(derek, frank)}. A (1) includes parents as ancestors. A (1) S 1 , thus condition = false. We therefore enter the second iteration with S 1 set to A (1) . Computing the value of A again, we get, A (2) = P = {(bert,alice),(bert,george),(alice,derek),(alice,pat),(derek,frank), (bert,derek), (bert,pat), (alice,frank)}. 1 Page 703 of 893 It can be seen that A (2) = A(1) D {(bert,derek), (bert,pat), (alice,frank)}. Note that A (2) now includes grandparents as ancestors besides parents. Since A (2) S 1 , we iterate again, setting S 1 to A (2) : A (3) = P = {(bert,alice),(bert,george),(alice,derek),(alice,pat),(derek,frank), (bert,derek), (bert,pat), (alice,frank), (bert,frank)}. Now, A (3) = A (2) D {(bert,frank)}. A (3) now has great grandparents included among ancestors. Since A (3) is different from S 1 , we enter the next iteration, setting S 1 = A (3) . We now get, A (4) = P = {(bert,alice),(bert,george),(alice,derek),(alice,pat),(derek,frank), (bert,derek), (bert,pat), (alice,frank), (bert,frank)}. Finally, A (4) = A (3) = S 1 , the evaluation is finished. Intuitively, from the above parental hierarchy, it is obvious that all ancestors have been computed. Seminaive Strategy Seminaive evaluation is a bottom-up technique designed to eliminate redundancy in the evaluation of tuples at different iterations. This method does not use any information about the structure of the program. There are two possible settings of the seminaive algorithm: the (pure) seminaive and the pseudo rewriting seminaive. Consider the Jacobi algorithm. Let R i(k) be the temporary value of relation R i at iteration Step k. The differential of R i at Step k of the iteration is defined as, D i(k) = R i(k) -R i(k-1) When the whole system is linear, D i can be substituted for R i in the Jacobi or Gauss-Seidel algorithms: the result is obtained by the union of the newly obtained term and the old one. 1 Page 704 of 893 Algorithm 25.2 Seminaive strategy. input: A system of algebraic equations and an EDB. output: The values of the variable relations R 1 , R 2 , , R n . for i = 1 to n do R i = ; for i = 1 to n do D i = ; repeat for i = 1 to n do S i = ; condition = true; for i = 1 to n do begin D i = E i [S 1 , , S n ] - R i ; R i = D i D R i ; if D i then condition = false end until condition The advantage of this method is that, at each iteration step, a differential term D i is used in each equation instead of the whole R i . Let us now look at the improvement due to the seminaive evaluation. Consider the EDB to be the same as in the previous example. We have D (0) = , A (0) = . D (1) = P = {(bert,alice),(bert,george), (alice,derek),(alice,pat), (derek,frank)}. Hence, 1 Page 705 of 893 A (1) = D (1) D A (0) = {(bert,alice),(bert,george),(alice,derek), (alice,pat),(derek,frank)}. D (2) = {(bert,alice),(bert,george),(alice,derek), (alice,pat),(derek,frank), (bert,derek), (bert,pat), (alice,frank)}- A(1) = {(bert,derek), (bert,pat), (alice,frank)}. A (2) = D (2) D A (1) = {(bert,alice),(bert,george),(alice,derek), (alice,pat),(derek,frank), (bert,derek), (bert,pat), (alice,frank)}. D (3) = {(bert,frank)}. A (3) = D (3) D A (2) = {(bert,frank)}D A (2) = {(bert,alice),(bert,george),(alice,derek), (alice,pat),(derek,frank), (bert,derek), (bert,pat), (alice,frank),(bert,frank)}. D (4) = , and hence we have come to the end of our evaluation. Although the computation of the two results is the same, the computation is more efficient in the seminaive evaluation. Only the D (i) ’s have been involved in the join, whereas in the naive evaluation we had to compute joins for each of the temporary values A (i) , which have always had more tuples than D (i) . The Magic Set Rule Rewriting Technique The problem addressed by the magic sets rule rewriting technique is that frequently a query asks not for the entire relation corresponding to an intentional predicate but for a small subset of this relation. Consider the following program: 1 Page 706 of 893 [...]... relations: 99 status(X,Y), 98 status(X,Y), 98 Miles(X,Y) The status data refers to passenger X having a status Y for the year, where Y can be regular, silver, gold, or platinum Let the requirements for achieving gold status be expressed by: 99 status(X,’gold’) :- 98 status(X,’gold’) AND 98 Miles(X,Y) AND Y>45000 1 Page 720 of 893 99 status(X,’gold’) :- 98 status(X,’platinum’) AND 98 Miles(X,Y) AND Y>40000 99 status(X,’gold’)... Srivastava et al ( 199 3) Ullman ( 198 5) provides the basis for the NAIL! system, which is described in Morris et al ( 198 7) Phipps et al ( 199 1) describe the GLUE-NAIL! deductive database system Zaniolo ( 199 0) reviews the theoretical background and the practical importance of deductive databases Nicolas ( 199 7) gives an excellent history of the developments leading up to DOODs Falcone et al ( 199 7) survey the... Gallaire and Minker ( 197 8) provide an early book on this topic A detailed treatment of logic and databases appears in Ullman ( 198 9, vol 2), and there is a related chapter in Volume 1 ( 198 8) Ceri, Gottlob, and Tanca ( 199 0) present a comprehensive yet concise treatment of logic and databases Das ( 199 2) is a comprehensive book on deductive databases and logic programming The early history of Datalog is covered... include Friesen et al ( 199 5), Vieille ( 199 7), and Dietrich et al ( 199 9) Footnotes Note 1 Note 2 Note 3 Note 4 Note 5 Note 6 Note 1 A historical perspective of these developments appears in Nicolas ( 199 7) Note 2 A Prolog system typically has a number of different equality predicates that have different interpretations Note 3 Named after the mathematician Alfred Horn Note 4 1 Page 722 of 893 The most commonly... deductive rules with relational databases The LDL system prototype is described in Chimenti et al ( 199 0) Krishnamurthy and Naqvi ( 198 9) introduce the "choice" notion in LDL Zaniolo ( 198 8) discusses the language issues for the LDL system A language overview of CORAL is provided in Ramakrishnan et al ( 199 2), and the implementation is described in Ramakrishnan et al ( 199 3) An extension to support object-oriented... the database, assuming John is one of the players, how do you compute "superior (john, X)?" using naive, and seminaive algorithms? Selected Bibliography The early developments of the logic and database approach are surveyed by Gallaire et al ( 198 4) Reiter ( 198 4) provides a reconstruction of relational database theory, while Levesque ( 198 4) provides a discussion of incomplete knowledge in light of logic... Rao ( 198 7) discuss an extension of the seminaive differential approach for multiple predicates 1 Page 721 of 893 The original paper on magic sets is by Bancilhon et al ( 198 6) Beeri and Ramakrishnan ( 198 7) extends it Mumick et al ( 199 0) show the applicability of magic sets to nonrecursive nested SQL queries Other approaches to optimizing rules without rewriting them appear in Vieille ( 198 6, 198 7) Kifer... deductive databases and recursive query processing include Warren ( 199 2) and Ramakrishnan and Ullman ( 199 3) A complete description of the seminaive approach based on relational algebra is given in Bancilhon ( 198 5) Other approaches to recursive query processing include the recursive query/subquery strategy of Vieille ( 198 6), which is a top-down interpreted strategy, and the Henschen-Naqvi ( 198 4) top-down... the notion of object identifier in OO databases Further, external mappings can be defined for a predicate; they enable the retrieval of facts (through their factIDs) based on the value of some of their unique attributes Basis predicates may also have methods in the OO sense—that is, functions can be invoked in the context of a specific fact 25.8 Applications of Commercial Deductive Database Systems 25.8.1... integrity of data In modern organizations, users of data are often completely removed from the data sources Many people only need read-access to data, but still need a very rapid access to a larger volume of data than can conveniently be downloaded to the desktop Often such data comes from multiple databases Because many of the analyses performed are recurrent and predictable, software vendors and systems . 7 09 of 893 whereas in RDBMSs the user formulates a correct query and leaves the optimization of query execution to the system. The navigational nature of Prolog is manifested in the ordering of. the class of stratified programs. The bill -of- materials problem, in which the cost of a composite part is defined as being the sum of the costs of all atomic parts, is an example of a problem. context of a specific fact. 25.8 Applications of Commercial Deductive Database Systems 25.8.1 LDL Applications 25.8.2 VALIDITY Applications We discussed two commercial deductive database systems:

Fundamentals of Database systems 3th edition PHẦN 9 pptx

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan