Joe Celko s SQL for Smarties - Advanced SQL Programming P33 pdf

292 CHAPTER 14: THE [NOT] IN() PREDICATE SELECT * FROM JohnsBook AS J1 WHERE NOT EXISTS (SELECT * FROM QualityGuide AS Q1 WHERE Q1.restaurant_name = J1.restaurant_name); The reason the second version will probably run faster is that it can test for existence using the indexes on both tables. The NOT IN() version has to test all the values in the subquery table for inequality. Many SQL implementations will construct a temporary table from the IN() predicate subquery, if it has a WHERE clause, but the temporary table will not have any indexes. The temporary table can also have duplicates and a random ordering of its rows, so that the SQL engine has to do a full-table scan. 14.2 Replacing ORs with the IN() Predicate A simple trick that beginning SQL programmers often miss is that an IN() predicate can often replace a set of ORed predicates. For example: SELECT * FROM QualityControlReport WHERE test_1 = 'passed' OR test_2 = 'passed' OR test_3 = 'passed' OR test_4 = 'passed'; can be rewritten as: SELECT * FROM QualityControlReport WHERE 'passed' IN (test_1, test_2, test_3, test_4); The reason this is difficult to see is that programmers get used to thinking of either a subquery or a simple list of constants. They miss the fact that the IN() predicate list can be a list of expressions. The optimizer would have handled each of the original predicates separately in the WHERE clause, but it has to handle the IN() predicate as a single item, which can change the order of evaluation. This might or might not be faster than the list of ORed predicates for a particular query. This 14.3 NULLs and the IN() Predicate 293 formulation might cause the predicate to become nonindexable; you should check the indexability rules of your particular DBMS. 14.3 NULLs and the IN() Predicate NULLs make some special problems in a NOT IN() predicate with a subquery. Consider these two tables: CREATE TABLE Table1 (x INTEGER); INSERT INTO Table1 VALUES (1), (2), (3), (4); CREATE TABLE Table2 (x INTEGER); INSERT INTO Table2 VALUES (1), (NULL), (2); Now execute the query: SELECT * FROM Table1 WHERE x NOT IN (SELECT x FROM Table2) Let’s work it out step by painful step: 1. Do the subquery: SELECT * FROM Table1 WHERE x NOT IN (1, NULL, 2); 2. Convert the NOT IN() to its definition: SELECT * FROM Table1 WHERE NOT (x IN (1, NULL, 2)); 3. Expand IN() predicate: SELECT * FROM Table1 WHERE NOT ((x = 1) OR (x = NULL) OR (x = 2)); 4. Apply DeMorgan’s law: SELECT * FROM Table1 294 CHAPTER 14: THE [NOT] IN() PREDICATE WHERE ((x <> 1) AND (x <> NULL) AND (x <> 2 5. Perform the constant logical expression: SELECT * FROM Table1 WHERE ((x <> 1) AND UNKNOWN AND (x <> 2)); 6. Reduce OR to constant: SELECT * FROM Table1 WHERE UNKNOWN; 7. The results are always empty. Now try this with another set of tables CREATE TABLE Table3 (x INTEGER); INSERT INTO Table3 VALUES (1), (2), (NULL), (4); CREATE TABLE Table4 (x INTEGER); INSERT INTO Table3 VALUES (1), (3), (2); Let’s work out the same query step by painful step again. 1. Do the subquery SELECT * FROM Table3 WHERE x NOT IN (1, 3, 2); 2. Convert the NOT IN() to Boolean expression SELECT * FROM Table3 WHERE NOT (x IN (1, 3, 2)); 3. Expand IN() predicate SELECT * FROM Table3 14.4 IN() Predicate and Referential Constraints 295 WHERE NOT ((x = 1) OR (x = 3) OR (x = 2)); 4. Apply DeMorgan’s law: SELECT * FROM Table3 WHERE ((x <> 1) AND (x <> 3) AND (x <> 2)); 5. Compute the result set; I will show it as a UNION with substitutions: SELECT * FROM Table3 WHERE ((1 <> 1) AND (1 <> 3) AND (1 <> 2)) FALSE UNION ALL SELECT * FROM Table3 WHERE ((2 <> 1) AND (2 <> 3) AND (2 <> 2)) FALSE UNION ALL SELECT * FROM Table3 WHERE ((CAST(NULL AS INTEGER) <> 1) AND (CAST(NULL AS INTEGER) <> 3) AND (CAST(NULL AS INTEGER) <> 2)) UNKNOWN UNION ALL SELECT * FROM Table3 WHERE ((4 <> 1) AND (4 <> 3) AND (4 <> 2)); TRUE 6. The result is one row = (4). 14.4 IN() Predicate and Referential Constraints One of the most popular uses for the IN() predicate is in a CHECK() clause on a table. The usual form is a list of values that are legal for a column, such as: CREATE TABLE Addresses (addressee_name CHAR(25) NOT NULL PRIMARY KEY, street_loc CHAR(25) NOT NULL, city_name CHAR(20) NOT NULL, state_code CHAR(2) NOT NULL CONSTRAINT valid_state_code 296 CHAPTER 14: THE [NOT] IN() PREDICATE CHECK (state_code IN ('AL', 'AK', )), ); This method works fine with a small list of values, but it has problems with a longer list. It is very important to arrange the values in the order that they are most likely to match to the two-letter state_code to speed up the search. In Standard SQL a constraint can reference other tables, so you could write the same constraint as: CREATE TABLE Addresses (addressee_name CHAR(25) NOT NULL PRIMARY KEY, street_loc CHAR(25) NOT NULL, city_name CHAR(20) NOT NULL, state_code CHAR(2) NOT NULL, CONSTRAINT valid_state_code CHECK (state_code IN (SELECT state_code FROM ZipCodes AS Z1 WHERE Z1.state_code = Addresses.state_code)), ); The advantage of this is that you can change the ZipCodes table and thereby change the effect of the constraint on the Addresses table. This is fine for adding more data in the outer reference (i.e., Quebec joins the United States and gets the code ‘ QB’), but it has a bad effect when you try to delete data in the outer reference (i.e., California secedes from the United States and every row with ‘ CA’ for a state code is now invalid). As a rule of thumb, use the IN() predicate in a CHECK() constraint when the list is short, static, and unique to one table. When the list is short, static, but not unique to one table, then use a CREATE DOMAIN statement, and put the IN() predicate in a CHECK() constraint on the domain. Use a REFERENCES clause to a lookup table when the list is long and dynamic, or when several other schema objects ( VIEWs, stored procedures, etc.) reference the values. A separate table can have an index, and that makes a big difference in searching and doing joins. 14.5 IN() Predicate and Scalar Queries 297 14.5 IN() Predicate and Scalar Queries As mentioned before, the list of an IN() predicate can be any scalar expression. This includes scalar subqueries, but most people do not seem to know that this is possible. For example, given tables that model warehouses, trucking centers, and so forth, we can find if we have a product, identified by its UPC code, somewhere in the enterprise. SELECT P.upc FROM Picklist AS P WHERE P.upc IN ((SELECT upc FROM Warehouse AS W WHERE W.upc = Picklist.upc), (SELECT upc FROM TruckCenter AS T WHERE T.upc = Picklist.upc), (SELECT upc FROM Garbage AS G WHERE G.upc = Picklist.upc)); The empty result sets will become NULLs in the list. The alternative to this is usually a chain of OUTER JOINs or an ORed list of EXISTS() predicates. CHAPTER 15 EXISTS() Predicate T HE EXISTS PREDICATE IS very natural. It is a test for a nonempty set. If there are any rows in its subquery, it is TRUE ; otherwise, it is FALSE . This predicate does not give an UNKNOWN result. The syntax is: <exists predicate> ::= EXISTS <table subquery> It is worth mentioning that a <table subquery> is always inside parentheses to avoid problems in the grammar during parsing. In SQL-89, the rules stated that the subquery had to have a SELECT clause with one column or a * . If the SELECT * option was used, the database engine would (in theory) pick one column and use it. This fiction was needed because SQL-89 defined subqueries as having only one column. Some early SQL implementations would work better with EXISTS(SELECT <column> ) , EXISTS(SELECT <constant> ), or EXISTS(SELECT * ) versions of the predicate. Today, there is no difference in the three forms in the major products, so the EXISTS(SELECT * ) is the preferred form. Indexes are very useful for EXISTS() predicates because they can be searched while the base table is left alone completely. For example, we want to find all employees who were born on the same day as any famous person. The query could be: 300 CHAPTER 15: EXISTS() PREDICATE SELECT P1.emp_name, ' has the same birthday as a famous person!' FROM Personnel AS P1 WHERE EXISTS (SELECT * FROM Celebrities AS C1 WHERE P1.birthday = C1.birthday); If the table Celebrities has an index on its birthday column, the optimizer will get the current employee’s birthday P1.birthday and look up that value in the index. If the value is in the index, the predicate is TRUE and we do not need to look at the Celebrities table at all. If it is not in the index, the predicate is FALSE and there is still no need to look at the Celebrities table. This should be fast, since indexes are smaller than their tables and are structured for very fast searching. However, if Celebrities has no index on its birthday column, the query may have to look at every row to see if there is a birthday that matches the current employee’s birthday. There are some tricks that a good optimizer can use to speed things up in this situation. 15.1 EXISTS and NULLs A NULL might not be a value, but it does exist in SQL. This is often a problem for a new SQL programmer who is having trouble with NULL s and how they behave. Think of them as being like a brown paper bag—you know that something is inside because you lifted it, but you do not know exactly what that something is. For example, we want to find all the employees who were not born on the same day as a famous person. This can be answered with the negation of the original query, like this: SELECT P1.emp_name, ' was born on a day without a famous person!' FROM Personnel AS P1 WHERE NOT EXISTS (SELECT * FROM Celebrities AS C1 WHERE P1.birthday = C1.birthday); But assume that among the celebrities we have a movie star who will not admit her age, shown in the row ('Gloria Glamour', NULL) . A new SQL programmer might expect that Ms. Glamour would not match 15.1 EXISTS and NULLs 301 to anyone, since we do not know her birthday yet. Actually, she will match to everyone, since there is a chance that they may match when some tabloid newspaper finally gets a copy of her birth certificate. But work out the subquery in the usual way to convince yourself: WHERE NOT EXISTS (SELECT * FROM Celebrities WHERE P1.birthday = NULL); becomes: WHERE NOT EXISTS (SELECT * FROM Celebrities WHERE UNKNOWN); becomes: WHERE TRUE; And you see that the predicate tests to UNKNOWN because of the NULL comparison, and therefore fails whenever we look at Ms. Glamour. Another problem with NULL s is found when you attempt to convert IN predicates to EXISTS predicates. Using our example of matching our employees to famous people, the query can be rewritten as: SELECT P1.emp_name, ' was born on a day without a famous person!' FROM Personnel AS P1 WHERE P1.birthday NOT IN (SELECT C1.birthday FROM Celebrities AS C1); However, consider a more complex version of the same query, where the celebrity has to have been born in New York City. The IN predicate would be: . 'passed' OR test_3 = 'passed' OR test_4 = 'passed'; can be rewritten as: SELECT * FROM QualityControlReport WHERE 'passed' IN (test_1, test_2, test_3, test_4); The. any scalar expression. This includes scalar subqueries, but most people do not seem to know that this is possible. For example, given tables that model warehouses, trucking centers, and so forth,. is no difference in the three forms in the major products, so the EXISTS(SELECT * ) is the preferred form. Indexes are very useful for EXISTS() predicates because they can be searched

Joe Celko s SQL for Smarties - Advanced SQL Programming P33 pdf

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan