Tài liệu Managing time in relational databases- P5 pptx

20 420 1
Tài liệu Managing time in relational databases- P5 pptx

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

the row representing that assertion will cease to be asserted on that date even if no correcting assertion is supplied to replace it. The last reason an assertion end date may be changed is to lock an assertion which has been updated or deleted by a deferred transaction, until the resulting deferred assertion becomes current. We will have more to say about deferred trans- actions, deferred assertions and locking in Chapter 13. Now() and UTC Keeping our notation DBMS agnostic, and keeping the clock tick granularity generic, we will refer to the current moment, to right now, as Now(). 7 SQL Server may use getdate(), and DB2 may use Current Timestamp or Current Date. Depending on our clock tick duration, we might need to use a date formatting func- tion to set the current granularity. In our examples, we generally use one month as our clock tick granularity. However for our purposes, Now() can take on values at whatever level of granular- ity we choose to use, including day, second or microsecond. Now() is usually assumed to represent the current moment by us ing local time. But local time may change because of entering or leaving Daylight Savings Time. And another issue is time zone. At any one time, data about to update a database may exist in a different time zone than the database itself. Users about to retrieve data from a database may exist in a different time zone than the database itself. And, of course, federated queries may attempt to join data from databases located in different time zones. So the data values returned by Now() can change for reasons other than the passage of time. Daylight Savings Time can change those values. At any one point in time, those values can differ because of time zones. Clearly, we need a reference frame- work, and a set of values, that will not change for any reason other than the passage of time, and that will be the same value, at any point in time, the world over and year around. This reference framework is Universal Coordinated Time (UTC). 8 To make use of UTC, our Asserted Versioning Framework will convert local time to UTC on maintenance and queries, and 7 Now() is a function that returns the current date. It is not a value. However, we will often use it to designate a specific point in time. For example, we may say that a time period starts at Now() and continues on until 12/31/9999. This is a shorthand way of emphasizing that, whenever that time period was created, it was given as its begin date the value returned by Now() at that moment. 8 However, even in UTC, some variations in time values do not reflect the passage of time. We are referring here to the periodic adjustments in UTC made by adding or removing leap seconds, as we described in an earlier section of this chapter. 62 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH will store Asserted Versioning temporal parameters, such as begin and end dates, in UTC. For example, with Policy_AV being an asserted version table of insurance policies, we would insert a policy like this: INSERT INTO Policy_AV (oid, asr_beg_dt .) VALUES (55, CURRENT TIMESTAMP - CURRENT TIMEZONE .) For queries, they will perform better if we do the time conver- sion before using the value as a selection predicate in the SQL itself. This is because most optimizers treat functions that appear in predicates as non-indexable. For example, in DB2, we should write: SET :my-cut ¼ TIMESTAMP(:my-local-time-value) - CURRENT TIMEZONE SELECT .FROM . WHERE oid ¼ 55 AND asr_beg_dt <¼ :my-cut AND asr_end_dt > :my-cut rather than SELECT .FROM . WHERE oid ¼ 55 AND asr_beg_dt <¼ TIMESTAMP(:my-local-time-value) - CURRENT TIMEZONE AND . However, if these functions are used for display purposes, then there is no reason to exclude them from the queries. For example: SELECT asr_beg_dt þ CURRENT TIMEZONE AS my_local_asr_beg_dt . . .FROM . It would also be useful to add alternate columns for the tem- poral dates in our views that have the translation to local time performed already. The Very Concept of Bi-Temporality Business IT professionals were using tables with both an effective date and a physical row create date by the early 90s. 9 But they were doing so with apparently no knowledge of 9 Or timestamps, or other datatypes. We remind the reader that, throughout this book, we use the date datatype for all temporal columns, and a first of the month value for all our dates. This simplifies the presentation without affecting any of the semantics. In real business applications, of course, these columns would often be timestamps. Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 63 academic work on bi-temporality. At that time, these version tables which also contained a row create date were state of the art in best practice methods for managing temporal data. We will discuss them in the next chapter. With a row creation date, of course, any query can be re stric ted to the rows present inatableasofanyspecificdate by including a WHERE clause predicate that qualifies only thoserowswhosecreatedateislessthanorequaltothespeci- fied date. With two effective dates, tables like these are also able to specify one of the two temporal dimensions that make up full bi-temporality. The standard temporal model uses the term “valid time” where we use the term “effective time”. But the difference is purely verbal. We have found no differences between how valid time works in the standard model, and how effective time works in Asserted Versioning. We use “effective time” because it is the preferred term among business IT professionals, and also because it readily adapts itself to other grammatical forms such as “becoming effective” and “in effect”. The standard model states that “(v)alid time . captur(es) the history of a changing reality, and transaction time . . . . . captur(es) the sequence of states of a changing table . . . . . A table supporting both is termed a “bi-temporal table” [2000, Snodgrass, p. 20]. But as we will see later, Asserted Versioning does not define bi-tempo- rality in exactly the same way. The difference lies primarily in the second of the two temporal dimensions, what computer scientists call “transaction time” and what we call “assertion time”. While a transaction begin date always indicates when a row is physically inserted into a table, an assertion begin date indicates when we are first willing to assert, or claim, that a row is a true statement about the object it represents, during that row’s effective (valid) time period, and also that the quality of the row’s data is good enough to base business decisions on. In the standard temporal model, the beginning of a transaction time period is the date on which the row is created. Obviously, once the row is created, that date cannot be changed. But in the Asserted Versioning temporal model, an assertion time period begins either on the date a row is created, or on a later date. Because an assertion begin date is not necessarily the same as the date on which its row is physically created, Asserted Versioning needs, in addition to the pair of dates that define this time period, an additional date which is the physical creation date of each row. That date serves as an audit trail, and as a means of reconstructing a table as it physically existed at any past point in time. 64 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH What are these rows with future assertion begin dates? To take a single example, they might be rows for which we have some of the business data, but not all of it, rows which are in the process of being made ready “for prime time”. These rows—which may be assertions about past, present or future versions—are not yet ready, we will say, to become part of the production data in the table, not yet ready to become rows that we are willing to present to the world and of which we are will- ing to say “We stand behind the statements these rows make. We claim that the statements they make are (or are likely to become) true, and that the information these rows provide meetsthestandardsofreliabilityunderstood(orexplicitly stated) to apply to all rows in this table”. So the semantics of the standard temporal model are fully supported by Asserted Versioning. But Asserted Versioning adds the semantics of what we call deferred assertions, and which we have just briefly described. As we will see in later chapters, deferred assertions are just one kind of internalized pipeline dataset, and the internalization of pipeline datasets can eliminate a large part of the IT maintenance budget by eliminating the need to manage pipeline datasets as distinct physical objects. Allen Relationships Allen relationships describe all possible positional relationships between two time periods along a common time- line. This includes the special case of one or both time periods being a point in time, i.e. being exactly one clock tick in length. There are 13 Allen relationships in total. Six have a corresponding inverse relationship, and one does not. Standard treatments of the Allen relationships may be found in both [2000, Snodgrass] and [2002, Date, Darwen, Lorentzos]. We have found it useful to reconstruct the Allen relationships as a binary taxonomy. Our taxonomy is shown in Figure 3.4. In this diagram, the leaf nodes include a graphic in which there are two timelines, each represented by a dashed line. All the leaf nodes but one have an inverse, and that one is italicized; when two time periods are identical, they do not have a distinct inverse. Thus, this taxonomy specifies 13 leaf- node relationships which are, in fact, precisely the 13 Allen relationships. The names of the Allen relationships are standard, and have been since Allen wrote his seminal article in 1983. But those names, and the names of the higher-level nodes in our own tax- onomy of the Allen relationships, are also expressions in Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 65 ordinary language. In order to distinguish between the ordinary language and the technical uses of these terms, we will include the names of Allen relationships and our other taxonomy nodes in brackets when we are discussing them. We will also underline the non-leaf node relationships in the taxonomy, to emphasize that they are relationships we have defined, and are not one of the Allen relationships. In the following discussion, the first time period in a pair of them is the one that is either earlier than the other, or not longer than the other. Given two time periods on a common timeline, either they have at least one clock tick in common or they do not. If they do, we will say that they [ intersect] one another. If they do not, we will say that they [ exclude] one another. If there is an [ intersects] relationship between two time per- iods, then either one [ fills] the other or each [overlaps] the other. If one time period [ fills] another, then all its clock ticks are also in the time period it [ fills], but not necessarily vice versa. If one time period [overlaps] another, then the latter also overlaps the former; but, being the later of the two time periods, we say that the latter time period has the inverse relationship, [overlaps À1 ]. In the overlaps cases, each has at least one clock tick that the Time Periods Relationships Along a Common Timeline Fills Overlaps Intersects Excludes Before Meets |----------| |----------| |-----| |-----| |-----|-----| Equals |-----| |-----| Occupies Starts Finishes During |-----| |------------| Aligns |-----| |------------| |-----| |-----------| Figure 3.4 The Asserted Versioning Allen Relationship Taxonomy. 66 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH other does not have, as well as having at least one clock tick that the other does have. If two time periods [ exclude] one another, then they do not share any clock ticks, and they are either non-contiguous or con- tiguous. If there is at least one clock tick between them, they are non-contiguous and we say that one is [before] the other. Other- wise they are contiguous and we say that one [meets] the other. If one time period [ fills] the other, then either they are [equal], or one [ occupies] the other. If they are [equal], then neither has a clock tick that the other does not have. If one [ occupies] the other, then all the clock ticks in the occupying time period are also in the occupied time period, but not vice versa. If one time period [ occupies] the other, then either they share an [ aligns] relationship, or one occurs [during] the other. If they are aligned, then they either start on the same clock tick or end on the same clock tick, and we say that one either [starts] or [finishes] the other. Otherwise, one occurs [during] the other, beginning after the other and ending before it. Note that if two time periods are aligned, one cannot both [start] and [finish] the other because if it did, it would be [equal] to the other. If one time period [starts] another, they both begin on the same clock tick. If one [finishes] the other, they both end on the same clock tick. If one time period [ occupies] another, but they are not aligned, then one occurs [during] the other. Now let’s consider the special case in which one of the two time periods is a point in time, i.e. is exactly one clock tick in length, and the other one contains two or more clock ticks. This point in time may either [ intersect] or [exclude] the time period. If the point in time [ intersects] the time period, it also [fills] and [ occupies] that time period. If it [aligns] with the time period, then it either [starts] the time period or [finishes] it. Otherwise, the point in time occurs [during] the time period. If the point in time [ excludes] the time period, then either may be [before] the other, or they may [meet]. Finally, let’s consider one more special case, that in which both the time periods are points in time. Those two points in time may be [equal], or one may be [before] the other, or they may [meet]. There are no other Allen relationships possible for them. As we will see later, four of these Allen relationship categories are especially important. They will be discussed in later chapters, but we choose to mention them here. (i) The [ intersects] relationship is important because for a tem- poral insert transaction to be valid, its effective time period cannot intersect that of any episode for the same object which is already in the target table. By the same token, for Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 67 a temporal update or delete transaction to be valid, the tar- get table must already contain at least one episode for the same object whose effective time period does [ intersect] the time period designated by the transaction. (ii) The [ fills] relationship is important because violations of the temporal analog of referential integrity always involve the failure of a child time period to [ fill] a parent time period. We will be frequently discussing this relationship from the parent side, and we would like to avoid having to say things like “. . . . . failure of a parent time period to be filled by a child time period”. So we will use the term “includes” as a synonym for “is filled by”, i.e. as a synonym for [ fills À1 ]. Now we can say “. . . . . failure of a parent time period to include a child time period”. (iii) The [before] relationship is important because it distinguishes episodes from one another. Every episode of an object is non-contiguous with every other episode of the same object, and so for each pair of them, one of them must be [before] the other. (iv) The [meets] relationship is important because it groups versions for the same object into episodes. A series of vers- ions for the same object that are all contiguous, i.e. that all [meet], fall within the same episode of that object. Advanced Indexing Strategies Indexes are one way to improve performance. And it should be clear that it would be a serious performance handicap if we could not define indexes over either or both of the two time per- iods of a bi-temporal table. But this proves to be more complex than it might at first sight appear to be. The issue is that traditional indexes contain pointers to rows, pointers which are based on discrete values, while the two time periods of rows in bi-temporal tables are not discrete values, but rather an unbroken and non-overlapping sequence of such values. Such rows occupy points in effective (valid) time or in assertion (transaction) time only as a limit case. What they really occupy are intervals along those two timelines. That’s the reason we need two dates to describe each of them. Traditional balanced-tree indexes work well with discrete values, including such discrete values as dates. But they don’t work well with intervals of time, i.e. with time periods. But indexing methods which manage intervals are being developed. Specifically, some bi-temporal indexing methods manage the two intervals for a bi-temporal object as a single 68 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH unit, which would appear as a rectangle on a Cartesian graph in which one temporal dimension is represented by the X-axis and the other by the Y-axis. Another approach is to manage each of the two temporal dimensions separately. One reason for taking this approach is that, for the standard temporal model, the two temporal dimensions behave differently. Specifically, for the standard model, transac- tion time always moves forwards, whereas valid time can move forwards or backwards. This means that a bi-temporal row can be inserted into a table proactively in valid time, but can never be inserted into a table proactively in transaction time. Asserted Versioning, as we have already pointed out, supports both forwards and backwards movement in both temporal dimensions. So for Asserted Versioning, there is no difference in behavior which would justify separating the two temporal dimensions for indexing purposes. Specifically, Asserted Versioning supports both proactive (future-dated) versions and proactive assertions (i.e. deferred assertions) and also both retro- active versions and an approval transaction which can move deferred assertions backwards in time, but not prior to Now(). In Chapter 15, we will describe certain indexing strategies that will improve performance using today’s DBMS index designs. Temporal Extensions to SQL Following [2000, Snodgrass], we will refer to a future release of the SQL language that will contain temporal extensions as SQL3. A more detailed discussion may be found in that book, although we should note that the book is, at the time of publica- tion of this book, 10 years old. Temporal Upward Compatibility One issue related to changes in the SQL standard is temporal upward compatibility. In describing SQL3, Snodgrass states that “(t)emporal upward compatibility at its core says this: ‘Take an application that doesn’t involve time, that concerns only the cur- rent reality . . . . . Alter one or more of the tables so that they now have temporal support . . . . . The application should run as before, without changing a single line of code’” [2000, Snodgrass, p. 449]. This cannot be an objective for Asserted Versioning, because we are limited to current SQL, not to a future release of SQL that builds temporal extensions into the language itself. But we can come close. We can achieve this objective for queries by using a view which filters out all but current data, and by redirecting Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 69 existing queries to that view. We can achieve this objective for temporal inserts, updates and deletes by defining default effec- tive and assertion dates for these transactions. These will be default dates that cause these transactions, as written by their authors, and as parsed and submitted to the DBMS by the AVF, to physically insert and update current assertions of the current versions of the objects they reference. The PERIOD Datatype A second issue related to changes in the SQL standard is the need for a PERIOD datatype. This new datatype will not change the semantics of temporal data management, but it will simplify the expression of those semantics. For one thing, a single col- umn will replace a pair of dates. This will simplify the way that Allen relationships are specified in SQL statements. For example, it will change the expression with which we ask whether a point in time is or is not contained in a period of time. Where P and T are, respectively, a point in time and a period of time, and T 1 and T 2 are dates delimiting T using the closed-open representation, we currently must ask whether P is a clock tick within T like this: WHERE T 1 <¼ P AND P < T 2 With the PERIOD datatype, and the new Allen relationship operators that will accompany it (including such derivative operators as those used in our taxonomy), we will be able to ask the same question like this: WHERE T OCCUPIES P A PERIOD datatype will also make it easier to enforce con- straints on time periods, such as insuring that two time periods do not intersect. When representing time periods by means of begin and end dates, this is impossible to do with only an index. Here’s why. Consider the time period represented by the closed-open pair [4/23/2012 – 8/04/2014]. Suppose that we want to define an exclusive index on time periods. The problem is that there is no way, by means of any standards-compliant indexing method available with today’s technology, to exclude [3/12/2011 – 4/24/ 2012], or [10/15/2012 – 9/30/2013], or [6/1/2014 – 12/31/2014], or any other time period that in fact [ intersects] the time period designated by [4/23/2012 – 8/04/2014]. The index sees two columns of values, and knows that the combination of values must be unique in each instance. That’s all it sees, that’s all it knows, and that’s all it can enforce. 70 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH But if we had a PERIOD datatype, and SQL extensions and indexing methods that could recognize and manage that datatype, then all the Allen relationships among time periods could be easily expressed, and the very important [ excludes] relationship could be enforced by means of a unique index. Lacking that future technology, and the standards needed to insure interoperability across different vendor implementations, the AVF contains its own code that effectively turns pairs of dates into a user-defined PERIOD datatype. Temporal Primary Keys A third issue related to changes in the SQL standard is sup- port for temporal primary keys. With those temporal extensions, we will be able to declare a temporal primary key to the DBMS and, by the same token, declare temporal foreign keys as well. But what is it we will be declaring? Temporal tags added to phys- ically unique identifiers of rows of otherwise non-temporal tables? Or something more? If a SQL standard for bi-temporality, when we eventually have one, is a standard for adding two temporal tags to rows in other- wise non-temporal tables, and providing a PERIOD data type and Allen relationship operators to manage the data thus tagged, then most of the semantics of bi-temporality will have been left out of that standard, and left up to developers. The managed objects of temporal data management are not physical rows in physical tables. They are collections of one or more rows which represent temporally delimited claims about temporally delimited statements about what real-world persistent objects were like, are like, or will be like. As long as every database table contains one and only one row for each instance of the type indicated by the table, it is easy to forget about the semantics and concentrate on the mechan- ics. Primary key uniqueness is mechanics; its role in eliminating row-level synonyms—and its failure to address the problem of row-level homonyms—is the semantics that are easily and almost always overlooked. Foreign key referential integrity is mechanics; its role in expressing existence dependencies among the objects represented by those rows is the semantics that are easily and almost always overlooked. It has been this one-to-one correlation between rows and the objects they represent that has made it easy to give short shrift to semantics, and to then get right down to what really fascinates engineering types—the mechanics of making things work. But once we attempt to manage both the epistemological Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 71 [...]... Examples of work in temporal logic include first-order temporal logic (FOTL) Just as FOPL uses the quantifiers for all X and for some X, FOTL adds the quantifiers at some time in the past, at every time in the past, at some time in the future and at every time in the future, and also adds the binary predicates SINCE and UNTIL Glossary References Glossary entries whose definitions form strong interdependencies... in particular, that none of the nodes in our Allen relationships taxonomy are included in this list In general, we leave taxonomy nodes out of these lists, but recommend that the reader look them up in the Glossary Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH clock tick granularity PERIOD datatype point in time time period episode managed object object pipeline dataset inflow... represents what we said, during a specific period of time, that object is like And so, in an assertion table, there also can be any number of rows representing the same object, each describing what we said, at a different period of time, the object is like Managing Time in Relational Databases Doi: 10.1016/B978-0-12-375041-9.00004-2 Copyright # 2010 Elsevier Inc All rights of reproduction in any form reserved... THE ORIGINS OF ASSERTED VERSIONING: IT BEST PRACTICES The upper components of the diagram read as follows: (i) The box in the upper left-hand corner of the diagram indicates what time it is now, in the example (ii) The row of vertical bars represents a timeline Each vertical bar on that timeline represents one month The month which is current, in the context of each example, is marked by filling in its... corresponding tables containing current data, which is often the case, then the job of query authors becomes even more difficult and error-prone They won’t be able to simply copy production queries and change names They will have to write new queries, perhaps joining data that the production queries did not have to join, perhaps assembling intermediate results and then combining those intermediate results in. .. is the SQL statement that inserted or most recently altered the data in that table In each table, primary key columns are indicated by underlining their column headings In each table, the italicized column is a foreign key to a table which will make its first appearance in Chapter 11 Type and copay are the two business data columns in both tables Create date and update date in the non-temporal table... Delete Versioning 83 Temporal Gap Versioning 86 Effective Time Versioning 87 Effective Time Versioning and Retroactive Updates 89 Effective Time Versioning and Retroactive Inserts and Deletions The Scope and Limits of Best Practice Versioning 92 Glossary References 93 90 Lots of things are important to us That’s why we keep data about them in our databases In a non-temporal table, each one of them,... is represented by one and only one row In a version table, however, each row represents a period of time in the life of an object, and is a description of what that object is like during that time And so, in a version table, there can be any number of rows representing the same object, each describing what the object is like during a different period of time In an assertion table, on the other hand,... review of them possible So in this chapter, we will distinguish four main types of versioning, and will discuss one representative method of each type Those four types are: (i) Basic versioning; (ii) Logical delete versioning; (iii) Temporal gap versioning; and (iv) Effective time versioning In this chapter, we will use several variations of a version table, and a corresponding non-temporal table, to... tables used in illustrations throughout the book In the version table, there is no update date because rows are not physically updated Instead, each logical update is carried out by copying the most current version, applying the update to the copy and inserting the result as the new most current version As for a physical create date, that is the same thing as the version date in basic versioning Chapter . along a common time- line. This includes the special case of one or both time periods being a point in time, i.e. being exactly one clock tick in length. There. more clock ticks. This point in time may either [ intersect] or [exclude] the time period. If the point in time [ intersects] the time period, it also [fills]

Ngày đăng: 24/12/2013, 02:16

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan