Building management systems

Thông tin tài liệu

Defining Data, Information, and Content A CM Domain White Paper By Bob Boiko This white paper is produced from the Content Management Domain which features the full text of the book "Content Management Bible," by Bob Boiko Owners of the book may access the CM Domain at www.metatorial.com This paper contains the content of Chapter of "Content Management Bible." It concerns the relationship between the terms in the title of the paper Building Management Systems A CM Domain White Paper By Bob Boiko This white paper is produced from the Content Management Domain which features the full text of the book "Content Management Bible," by Bob Boiko Owners of the book may access the CM Domain at www.metatorial.com Table of Contents Table of Contents What's in a Management System? Building a Repository _3 Essential and recommended repository functions The content model _6 Storing Content Relational database repositories Relational database basics Storing component classes and instances Fully parsing structured text _11 Partially parsing structured text 12 Not parsing structured text 13 Breaking the spell of rows and columns _14 Storing access structures _16 Hierarchies in a relational database 16 Indexes in relational databases 18 Cross-references in relational databases 19 Sequences in relational databases _21 Storing the content model 21 XML-based repositories 23 Object databases vs XML 24 Storing component classes and instances _24 Storing access structures _27 Hierarchies in XML 27 Indexes in XML _29 Cross-references in XML _29 Sequences in XML 30 Storing the content model 31 File-based repositories _33 Implementing Localization Strategies _34 Doing Management Physical Design _36 A repository-wide DTD _36 Link checking 36 Media checking _37 CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page Search and replace _38 Management integrations 39 Summary 40 The management system within a content management system holds and organizes all the content that you've collected In addition to storing content, the management system can provide a full cataloging and administration system for your content and related data In this white paper I discuss the variety of databases and functions that you may encounter or need to create to store and administer your content What's in a Management System? Many CMS companies describe their entire product as a management system I take a different tack For me, although it's of course true that a content management system is a management system, it's more instructive to focus the term management on the specific parts of the CMS that deal with the content that's in the system and differentiate them from the other parts of the CMS that enable you to get content in (collection) and get it out (publication) The management system within a CMS has these parts: ?? A repository: All the content and control files for the system are stored here The repository houses databases and files that hold content The repository can also store configuration and administrative files and databases that specify how the CMS runs and produces publications ?? A repository interface: This enables the collection, publishing, workflow and administrative system to access and work with the repository The interface provides functions for input, access, and output of components as well as other files and data that you store in the repository ?? Connections to other systems: This enables you to send and receive information from the repository ?? A workflow module: This module embeds each component and publication in a managed life cycle ?? An administrative module: This module enables you to configure the CMS In this white paper I focus most on the repository itself to give you a central place from which to understand management Building a Repository The repository is the heart of the management system and of the CMS as a whole Into the repository flow all the raw materials on which any publication is built Within the repository, components are stored and can be continually fortified to increase the quality of their metadata or content Out of the repository flow the components and other parts that a page of a publication needs (as shown in Figure 1) CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page Figure 1: A high-level view of a CMS repository that shows its different parts and the content storage options that you have As a first approximation, you can think of the repository as a database As does a database, a repository enables you to store and retrieve information The repository, however, is much more For one thing, the repository can house many databases It can house files as well It has an interface to other systems that goes beyond what a standalone database usually does If you stand back from the repository and look at it as single unit, however, most of what you may know about databases helps you understand the functions of the repository In fact, most repositories have a database at their core The database, however, is wrapped in so much custom code and a user interface that end users aren't likely to ever see the database Note My discussion implies that all the authoring, conversion, and aggregation are done on a component before it enters the repository This is for clarity of presentation only In fact, the only thing that must be done to a component before it enters the repository is that it be segmented Until it's segmented, there's no component You can, and often should, add a component to the repository before it's fully authored, converted, edited, and has had metadata added to it After it's in the repository, these processes can be brought under the control of your workflow module Essential and recommended repository functions At the most basic level, a repository must provide the same functions as any database, as follows: ?? It must hold your content Whether you employ a vast distributed network of databases or a simple file structure on a computer under someone's desk, the central function of a management system is to contain your content in one "place." In addition, the system must have some way of segmenting content into individually locatable units (such as files or database records) CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page ?? It must enable you to input content Whether you have tools for loading multiple component at a time (bulk processing), automatic inputs via syndication, or one-by-one entries via Web-based forms, the management system must give you some way to get content in ?? It must enable you to locate content Whether it employs sophisticated natural language searches or a simple index, you must be able to find content in the system ?? It must enable you to output content Whether it supports advanced transformations or only the simplest tab-delimited format, the management system must enable you to retrieve a copy of content that you've found in a format you can use ?? It must enable you to remove content Whether it can archive automatically or whether you must delete old content by hand, without the capability to remove content, a management system is inadequate Although a repository that performs the preceding minimum functions would be sufficient to build a CMS on, it would be far from ideal And in fact, most repositories go far beyond these basic functions to enable you to the following: ?? Support the concept of components: Although all management systems must somehow segment information, a good system facilitates inputting, naming, cataloging, locating, and extracting content based on its type (or, in my language, its component class) ?? Track your content: The management system ought to provide statistics and reports on your components that enable you to assess the status of individuals or groups of components ?? Support the notion of workflow: Although not part of the repository, the workflow module must be tightly integrated with it As one example among many, events that occur within the repository, such as adding new components or deleting them, should be capable of triggering workflow processes ?? Support element and full-text search: You're likely to know one of two things about components that you want to find in the repository: the value of some piece of metadata that they contain or some piece of text that you remember that they contain In the first case, you want what's called an element search (In relational databases, this is usually called a fielded search.) To an element search, what you want most is a list of the elements and a place where you can type or select the value that you want To find components by author, for example, you want to see an Author box into which you can type a name For a bonus, the system can help you type only valid possibilities The Author box, for example, can be a list from which you simply choose an author rather than typing her name In the second case, where you remember some piece of text that the component contains, a full-text search is what you want Here, what you want is to type a word or phrase in a box and have the system find components that contain that word or phrase in any element For spice, the repository can enable you to combine full text and fielded search or to type Boolean operators such as AND, OR, and NOT to make more precise searches of either type ?? Support bulk processes: Managing components one at a time is far too slow for many situations A good repository enables you to specify an operation and then it over and over to all the components it applies to Suppose, for example, that your lead metator is out of town and you want to extend the expiration date on any components that "turn off" while she's out You could an element search for all components with an expiration date between today and the day that she returns Then you could open each of these components and change its Expire Date element to sometime next week ?? Support all field types: Any repository enables you to type metadata as text, but the one that you want can much more The best kind of repository supports all the types of fields that I describe in white paper20, "Working with Metadata," in the section "Metadata fields." In CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page any repository, for example, you can type the name of an author into each component's Author element Spelling errors and variations on the same name (Christopher Scott vs C Scott), however, eventually cause problems It would be better if you had one place where you could type all author names once Then, whenever an author needs to be specified, you can choose the name rather than type it The best would be a system that can be linked to the main sources of metadata in your organization People log into an organization's network, for example, based on a user ID and password This information - as well as the organizational groups to which they belong - is stored in a registry Wouldn't you most like to work with a system that could connect to this registry and find all the people that are in the Authors group? Then, to have access to all authors' names (not to mention any other information that the registry stores), you just need to make sure that the Authors group is correctly maintained by your organization's system administrators Similarly, if your repository holds master copies of metadata lists, you want it to be openly accessible to your organization's other systems ?? Support organization standards: Your repository should access and work within whatever user security and other network standards that you employ If you aren't running a TCP/IP network protocol, for example, the CMS's Web-based forms and administrative tools can't work on your local area network The content model Database developers create data models (or database schema) These models establish how each table in the database is constructed and how it relates to the other tables in the database XML developers create DTDs (or XML Schema) DTDs establish how each element in the XML file is constructed and how it relates to the other elements in the file CMS developers create content models that serve the same function - they establish how each component is constructed and how it relates to the other components in the system In particular, the content model specifies the following: ?? The name of each component class ?? The allowed elements of each component class ?? The element type and allowed values for each element ?? The access structures in which each component class and instance participate The content model puts bones and sinew on the content domain Although the content domain is a simple statement, the content model is a fully detailed framework On the other hand, all the components that you detail in the model ought to be specifically in support of the domain If you can't determine quickly how a particular component serves the domain, you should reconsider the necessity of the component or the validity of the domain statement If your CMS is built on a relational database, your content model gives rise to a database schema If your CMS is built on XML files or an XML database, your content model gives rise to a DTD The content model, however, isn't simply reducible to either of these models Suppose, for example, that you establish that you want an Author element that's an open list This fact can't be coded in either a database schema or a DTD Rather, it must be established in the authoring environment that you use Still, the majority of the content model can be coded either explicitly or implicitly in the database or XML schema that you develop The rest of the content model becomes part of the access structures in your repository and the rules that you institute in your collection system CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page Storing Content Most content management systems store components in databases Some store metadata in databases and keep the component content in files Although almost all content management systems use some sort of database, the exact database they employ and how the components are stored in the database varies widely The two major classes of databases that a CMS may use to store content components are shown in Figure Figure 2: A CMS may store components in a relational database or an XML database To date, content management systems have stored content in the following general ways: ?? In relational databases, which are the computer industry's standard place to store large amounts of information ?? In an object (or XML) database, which stores information as XML Sometimes the component body elements are stored in files In these cases, management elements are generally stored apart from the body elements in a relational or XML database As I write, many CMS companies are experimenting with new technologies that seek to make the best of both the database world and the world of files In addition, database product companies themselves are breaking the established boundaries by creating hybrid object -relational databases that overlay XML Schema onto the basic relational database infrastructure Regardless of the type of storage system that you use, it must be capable of storing components, relationships between components, and the content model, as follows: ?? Storing component instances: The primary function of a CMS repository is to store the content components that you intend to manage Suppose, for example, that you want to manage a type of information called an HR benefit that includes a name and some text If your system has 50 HR benefits, there must be 50 separately stored entities, each following the HRBenefits class structure, which can be retrieved one at a time or in groups ?? Storing component classes: To store component instances, the repository needs some way of representing component classes Somewhere in your storage system, for example, there must be a template for an HRBenefit component After you create a new HRBenefit component, the system uses this template to decide what the new HRBenefit includes ?? Storing relationships between components: The repository must have some way of representing and storing the access structures that you create Any indexes that you decide that you need, for example, must be capable of being represented somewhere in the repository and must be capable of linking to the components that are indexed ?? Storing the content model: Your repository system must somehow account for all the rules in your content model Most are covered by storing the components and their relationships, but some aren't If certain component elements are required (meaning that, in every CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page component instance in which that element is present, that element must not be blank), for example, that fact must be somehow stored so that it can be upheld Similarly, if the content of a particular element must be a date, or can't be longer than 100 characters, these facts must also be stored somewhere so that you can enforce these rules Relational database repositories The relational database was invented as a way to store large amounts of related information efficiently At this task, it's excelled The vast majority of computer systems that work with more than a small amount of information have relational databases behind them Today, there are a handful of database product companies (Oracle, Microsoft, IBM, and the like) who supply database systems to most of the programmers around the world Programmers use these commercial database systems to quicken their own time-to-market and increase their capability to integrate with the databases currently in use by their customers The majority of CMS product companies also base their repositories on these commercial database products In fact, many require that you buy your database directly from the manufacturer (This fact, by the way, puts a convenient-for-them and inconvenient-for-you firewall between the CMS product support staff and that of the database company.) Buying a database (or, more accurately, a license) from a commercial company is no big problem; database vendors are happy to sell to you directly What's much more of an issue is whether the CMS requires that you administer the database separately You may give preference to CMS products that have integrated database administration into their own user interface and don't require you to administer the databases separately Relational database basics To help readers with less background in data storage, I provide some database basics before going into the more technical aspects of representing content in a relational database Whatever you store in a relational database must fit into the database's predefined structures, as follows: ?? Databases have tables: Tables contain all the database's content Loosely, one table represents one type of significant entity You may create a table, for example, to hold your HRBenefit components The structure of that table represents the structure of the component it stores Tables can be related to each other (This is where the relations in relational databases are.) Rather than typing in the name of each author, for example, your HRBenefits table may be linked (via a unique ID) to a separate author table that has the name and e-mail address of each author ?? Tables have rows (also called records): Loosely, each row represents one instance of its table's entity Each HRBenefit component, for example, can occupy one row of the HRBenefits table ?? Rows have columns (also called fields): Strictly, each field contains a particular piece of uniquely named information that can be individually accessed An HRBenefit component, for example, may have an element called Benefit Name In a relational database, that element may be stored in a field called Benefit Name Using the database's access functions, you can extract individual Benefit Name elements from the component (or row) that contains them ?? Columns have data types: As you create the column, you assign it one of a limited number of types The Benefit Name column, for example, would likely be of the type "text" (generally with a maximum length of 255 characters) Other relevant column data types include integer, date, binary large object, or BLOB (for large chunks of binary data such as images), and large text or memo (for text that's longer than 255 characters) CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page As you see, even given these exacting constraints, there are many ways to represent content in a relational database I don't present the following examples to give you a guide to building a CMS database (You'd need much more than I provide.) In addition, if you purchase a CMS product, you work with a database that the product company's already designed What I intend is to give you insight into how the needs of a CMS mesh with the constraints of a relational database so that you can understand and evaluate the databases that you encounter Storing component classes and instances The simplest way to represent components in a relational database is one component class per table, one component instance per row, and one element per column An example of an HRBenefits component class in Microsoft Access is shown in Figure Figure 3: A simple table representing the HRBenefits component class Note Even if you know nothing about databases, you can likely see that this is very well structured Everything is named and organized into a nice table It's not hard to imagine how database programs could help you manage, validate, and access content stored in tables In fact, database programs are quite mature and can handle tremendous amounts of data in tables They offer advanced access and connect easily to other programs It's no wonder that relational databases are the dominant players in component storage The component class is called HRBenefits There are three HRBenefit component instances, one in each row of the table As shown in the figure, HRBenefit components have six elements, one per column Interestingly, you'd likely ever type only two of the elements - Name and Text The ID element can be filled in automatically by the database, which has a unique ID feature Even this most simple representation of a component in a relational database isn't so simple There are really four tables involved in storing component information The Type, Author, and EmployeeType columns contain references to other tables (lookup tables in database parlance) Behind the scenes, what's actually stored in the column isn't the words shown but rather the unique ID of a row in some other table that contains the words From a more CMS-focused vocabulary set, you can say that Type, Author, and EmployeeType are closed list elements The lists are stored in other tables and can be made available at author time to ensure that you enter correct values in the fields for these elements There may, for example, be three drop-down lists on the form that you use to create HRBenefit components In the first is a list of Types, in the second a list of Authors, and in the third a list of Employee Types The words in the list are filled in from the values in three database tables I continue to complicate the example to show some of the other issues that come into play whenever you store components in a relational database Suppose that there's an image that goes along with each benefit component (an image of a happy employee, perhaps) To represent the image you have the following two choices: ?? You can actually store the image in the database ?? You can store a reference to the file that contains the image The second technique is the usual choice because, historically, databases have been lousy at storing binary large objects (BLOBs) They became bloated and lost performance This is often CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page to This one is simple; match="/" means that this template should be applied to the highest element in the XML file The following template will match any Concept component: The attribute match="//CONCEPT" means that the template should be applied to content surrounded by the element, regardless of its location in the repository In XSLT, templates are little programs that run when the XSLT processor hits specified element and attribute values In effect, the processor finds the element and attribute combinations you specify and applies (or more aptly, runs) the code in one of your templates There is not much more you need to know to begin to understand XSLT files Try to see past the clumsy syntax and the strangeness of the hierarchies that are applied to other hierarchies to produce still other hierarchies Keep your eye on the structure of the source in the repository, the process that the XSLT file applies to that source, and the structure of the publication files that are produced The book The printed book that is produced from the CM Domain is delivered as a set of Microsoft Word files, one per white paperof the book The content for the book is drawn entirely from the primary hierarchy To help you understand how the book publication is built, I describe the following templates: ?? The section template produces whole white paperfiles It builds the hierarchy of the white paperand calls in the component and navigation templates as needed ?? A body element template that lays out the body of the Concept components in the print publication ?? A navigation template that formats cross-references in the book publication Note Although the convention in this white paper is to display all XML element names in uppercase, the convention in the CM Domain is to use upper- and lowercase So, in the XML and XSL examples that follow, I will bend the book convention a bit to allow upper and lower case XML tag names Concept components disappear into the hierarchy in the printed book Thus there is no specific Concept component template Before digging through the code in the sections that follow, I need to give an overview of a few of the features of the markup you will see To bridge between the text files that XSLT can create and the binary world of Word, I devised an intermediate markup language This seemed much easier to me than either trying to write some sort of converter that created binary Word markup automatically So the XSLT templates create the intermediate format, and then another processing program within Word translates the intermediate format into Word's binary markup So to help you unpack this template, keep the following special markup in mind: ?? Paragraph marks: These are not very important in XML In fact, an XML file may be riddled with paragraph marks that never seem to appear in XML viewers Word, on the other hand, cares deeply about paragraph marks Each one must be placed correctly and (providing that you use Word styles correctly), there never needs to be more than one paragraph mark between paragraphs So the template takes pains to add the mark wherever there will need to be a paragraph mark in the final Word files All other paragraph marks are deleted by the print processor ?? Paragraph formatting: Many of the XML elements produce paragraphs in Word that need to have a particular Word paragraph style In my system, the markup for this is StyleName## at the beginning of a paragraph For example, the text Normal## at the beginning of a CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page 22 paragraph signifies that the paragraph should get the style Normal Much of the work of this template is done by simply assigning styles to the text that will end up in Word ?? Character formatting: If a piece of text needs to have character formatting (or some other manipulation done on it), I surround it with percent signs (%%) I include a code for what kind of additional processing the text will need to receive in Word For example, %%b:Make Me Bold%% will make the text bold; %%i:Make Me Italic%% will make the text italic; %%image:photo.jpg%% will find and load the image photo.jpg into Word Note A prize to the first one who can tell me why my print processor produced incorrect results from the preceding bulleted list until I modified it! The section template The section XSLT template for the print book publication looks as follows: ChHead##In This Chapter @@p@@Bull##*@@TAB@@ @@p@@Heading ## CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page 23 @@p@@Summary##Summary The first line of the file establishes it as an XSLT style sheet It also states the location that should be used to uniquely identify the prefix xsl W3C.org uniquely defines the tags that this XML file uses By preceeding them all with xml:, they will not be mistaken for your own custom XML tags In this system, all templates (that is, tags) must be inside a stylesheet (that is, an tag) The other marked sections of the stylesheet perform the following functions: ?? Include child templates: These lines link this stylesheet file to others that it can call and use to process content I will describe the BodyElements and Links stylesheets later in this white paper The other files are included with this one when the template file is processed ?? The always runs template: The forward slash (/) value in the match attribute of the tag in this block of code means that match on the very first element you come to in the XML file If the XML file is well formed, it will always have a root element that contains all others, and this template will always run first Thus it is a good place to put a sort of main routine for the template system Before beginning the processing, the code creates a variable called "SectID" that holds the ID of the section that contains the white paperthat you want to publish Then, for each section in the repository that has that particular ID (and there will only ever be one because IDs are unique), the code begins to create a white paper ?? Start the Chapter: Each Chapter starts with a paragraph with a "ChHead" style (ChHead##) and the words, "In This Chapter" Following that, the Chapte has a list of bullets and an introduction The bulleted list is created by collecting all the elements from the Chapter Then, for each one of the elements, it creates a paragraph (@@p@@) with a style of "Bull" (Bull##) Each bullet has an asterisk (*) and a tab (@@TAB@@) preceding the text of the SummaryBullet Finally for each element under the Chapter element (and according to the DTD I use, there will be only one Intro), it puts the introduction text into the output file ?? Process each section: For each element that is a child of the current element, the template puts in a paragraph with a heading style and inserts the title of the section in that paragraph The heading level (Heading 1, Heading 2, Heading 3, and so on) is calculated from the level the section has in the repository To signify that the template processor should continue processing elements below the level of current section, the template uses an statement This statement, which you will find throughout XSLT files, says, "Carry on and process elements at any lower levels." And the vast majority of the elements in the repository are at a level lower than a section In fact, all of the Concept CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page 24 components are below sections And even though Concept components are not marked in the white paper, they are still the source of the white paper's content ?? Finish the Chapter: When the template processor finds a element, it puts a paragraph with a style of "Summary" in the output file The paragraph has the word "Summary" on it After the summary, the OutTransition template will match The DTD specifies that an tag must be the last tag in any tag The OutTransition template includes the text of the Chapter closing transition in the output file A body element template The body element template file (BodyElement.xsl) processes a large number of XML elements Of those, I have chosen just a few to give you a feeling for how it works Here is part of the listing for the body element template: @@p@@Normal## @@p@@Bull##*@@TAB@@ book cchapter %%b:%% %%i:%% CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page 25 The file contains the following marked blocks of text: ?? Match normal paragraphs: XML

elements will trigger this template, which creates a paragraph with the style Normal ?? Bring in marked content chunks: Chunks of content within the repository can be marked for reuse When the template processor hits a element, it will engage this template The template, in turn, uses a form of the statement to include the appropriate content chunk in the output file ?? Build bullet lists: When the template processor its a

and ) into my home-grown markup style (%%i: and %%b:) Tip Actually, because of a nice HTML converter in Word, I could have chosen to leave in the character formatting XML tags Word will recognize and convert them when you open the file in Word I have used this feature to great advantage to get HTML tables into Word effortlessly A navigation template To give you a feel for the work involved in publishing page navigation, I detail the way crossreferences are formatted in the printed book publication The following code provides a partial listing of the template that handles internal cross-references ( elements): @@p@@Icon###Cross Reference @@p@@XRef### CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page 26 !!!!!BAD LINK TYPE HERE!!!!!! Part , "" Cchapter , "" The file has the following code blocks marked with XML comment tags: CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page 27 ?? Match internal cross-references: Two kinds of internal links are processed by this template The template chooses between elements that have a Type attribute of ForMore or eLink The ForMore links are normal cross-references The eLinks are cross-references that are activated only in electronic publications ?? Process ForMore links: If an element has a Type attribute set to ForMore, this part of the template will engage This code puts two paragraphs in the output file The first paragraph has the word Cross-Reference and has the style Icon The second paragraph carries the XRef style and will receive the reference text when it is processed ?? Process eLinks: Because eLinks are not shown in a print publication, this template just passes the reference text in the eLink unformatted into the output file If someone makes a mistake and creates an tag that is neither of the ForMore or eLink type, the template puts a noticeable message into the output file that hopefully someone will read and take action to fix ?? Process reference text: The element carries the text that the author intends to use to refer to the link (On the Web, it might be a link.) The LinkText template uses a tag to ensure that any tags within the element (such as character formatting) are processed ?? Process referent ID: In this code segment, the template figures out what text to put into the output file, given only the ID of the element that the link refers to I leave it to you to parse all the way to the end of this code segment Instead, I focus on the logic of what it accomplishes ?? Referent is a "Part" in the primary hierarchy: If it turns out that the ID referred to in the ILink is at the first level of the hierarchy, it will need to be referred to as a part in the printed book The cross reference will then be of the following form: Part N, "Part Title" ?? Referent is a Chapter or lower level in the primary hierarchy: If the ID referred to in the link is below the first level of the hierarchy, it will be referred to the nearest white paper Regardless of the level of the referent, the template formats the link as follows: Chapter N, "Chapter Title" The result is a set of book-style cross-references delivered to the specification of the publisher The Web representation of the same links uses HTML hyperlinks to link you to the exact section where the referent is Summary A publication system, such as the one that builds these words I am typing into the white paper you are reading, has a lot to If it is a good system it does the following: ?? It produces a book that looks as if it was the only thing created from the content it contains ?? It also creates any number of other publications that also look as though they are the only publication ?? It creates all publications from the same content base ?? Each publication serves different audiences and purposes Each audience should be able to forget about the publications it does not see and take its publication as the one source of its information Your content should be free enough to become any kind of publication you can dream of, but for most users, it will seem like the same stuff that they are used to looking at - and that is what you want The goal of all of this automation is to produce a system that can disappear behind the scenes - where it belongs In front is the author-audience relationship When a CMS does what it should, it ceases to be a point of focus and instead becomes a normal way to create content, a CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page 28 sophisticated control panel for managing content, and a set of publications that look as if they were not created by some dumb machine CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page 29 Table of Contents Table of Contents Content Is Not Data Content Is Information Put to Use _4 Content Is Information Plus Data _6 From Data to Content and Back Summary _8 Computers were built to process data Data consists of small snippets of information that have all the human meaning squeezed out of them Today, people call on computers to process content Like data, content is also information, but it retains its human meaning and context In this white paper I lay out one of the basic challenges of content management: Computers are designed to deal with data that's stripped of any context and independent meaning Users want computers to deal with content, however, which is rich in context and meaning How can you use the data technologies to manage and deliver very nondatalike content? This challenge isn't easy If you err toward making your information too much like data, it looks mechanical and uninteresting to consumers If you make your information too rich, varied, and context-laden, then you can't get a computer to automate its management The compromise, as you see in this white paper, is to wrap your information in a data container (known as metadata) The computer manages the data and the interesting, meaningful information goes along for the ride Content Is Not Data Computers were first conceived as a way to perform computations that were too time-consuming or complex for humans The model was (and to a large extent still is) as follows: If you can reduce a problem to a series of simple mechanical operations on numbers and logical entities (entities that are either true or false), it is amenable to solution by a computer Computer professionals were either programmers or data input clerks Programmers reduced problems to a series of mechanical operations according to the following simple maxim: You input data; the computer processes it and then outputs it in a more useful form Clerks took care of inputting the data They sat in long rows and columns, typing long rows and columns of numbers as well as small phrases, such as first name/last name and street address As time moved on, computer scientists invented databases (bases of data) to organize and hold vast quantities of these snippets As you may expect, some problems were better solved this way than others were Thus, as computer technology developed, the use of computers moved naturally from science to manufacturing and finance, where numbers were still the main event Today, of course, computers are in everything But the part of everything that computers are in is the reducible part The reducible part is the part where a finite set of very specific rules operating on numbers and logical entities can yield a useful result The idea of computers as data-processing machines runs deep and continues to this day as the main thing that computers On the other hand, everyone knows that most users want computers to more than grind finely through mountains of snippets Today, people want CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page computers to sift through mountains of large, complete chunks (not snippets) of information and deliver the ones that they want most at that moment In addition, people want computers to deliver information of the quality that they expect from more familiar sources of information, such as books, radio, TV, and film From manufacturing and finance, computers moved to the business desktop as the replacement for the typewriter and the paper-based spreadsheet Then, three related developments occurred in the personal-computer industry Together, the following breakthroughs set the stage for a major change in our expectations of what computers can, and are, to do: ?? Digital media creation (images, sounds and video) became possible ?? Digital media output (color displays, sound cards, and video accelerators) became available ?? Consumer-oriented mass removable storage (CD-ROMs, in other words) became available and cheap These developments led to the meteoric rise of the multimedia industry For the first time, you could create and cheaply deliver actual information and not just data Soon, multimedia CDROMs proliferated, with everything from encyclopedias to full-motion games You can now consider your computer an actual replacement for familiar information channels such as books, TV, and radio What these traditional channels deliver is content and not data What the multimedia industry began, the Web is in the process of completing Today, getting your content online isn't only possible, but it's often preferable to obtaining it in any other way I now listen to more music and talk on my laptop computer and Personal Digital Assistant (PDA) than I on my radio or stereo Still, I'm usually frustrated whenever I go to the Web because I expect to quickly locate the content that I want and see it presented at least as well as - in the traditional channels Unfortunately, that's not always what happens Note If you've worked with digital content for a while, then you realize just how sticky a paradox this situation is People expect access to prove easy and presentation to seem compelling If they are, it's only because someone's put in an enormous amount of effort behind the scenes to make everything appear so easy and compelling Making content natural is an unnaturally difficult endeavor Although users' needs and expectations changed, the guts of the computer didn't Ten years ago, people came to computers to input, process, and output data Today, most people come to find and consume content At the base of all computer technology, however, is still the idea that you can reduce any problem to a set of simple instructions working on discrete and structured snippets Data and content are different, certainly, but that difference doesn't mean that they don't interact In fact, innumerable transitions from one to the other occur every day Moreover, from the standpoint of the computer system, content doesn't exist — only data exists Today, users have few tools for dealing with content as content Instead, you must treat it as data so that the computer can store, retrieve, and display it Consider, for example, a typical Web interaction where you may go to a music site You browse a page that features a music CD that you like; you add it to your shopping cart and then pay for it What you experience is a series of composed Web pages with information about music as well as some buttons and other controls that you use to buy it All in all, the experience feels like a content-rich interaction What happens behind the scenes, however, is a set of data-oriented computer programs exchanging data with a database Some of the data behind the scenes is very contentlike A database stores, for example, the feature article with the artist's picture The artist's name is in one field, and the text of the article is in another A third field contains the picture of the artist Some of the data is very datalike It consists of numbers and other snippets that create an economic transaction between you and the CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page record company A database stores your credit card number, your order number, the quantity that you order, and the order price, for example, and uses them in calculations and other algorithms Some of the data is in-between data and content The song catalog contains song names, running time, price, and availability, which are all snippets of information that can look a whole lot like data as the site's database stores them, but they appear more like content as the site displays them (as shown in Figure 1) Figure 1: As a site stores content, it can look a lot like any other data In displaying that content, however, it can't look like data if you want to hold your audience In the database, it's all just data On the page, the transaction data still looks a little like data; while the feature-article data looks nothing like data, and the catalog data can retain or lose as much of its data appearance as you want On a well-designed page, however, visitors perceive it all as content So, from the user's perspective, information is all content, while from the computer programmer's perspective, it's all data The trick to content management, in an age when the technology is still data-driven, is to use the data technologies to store and display content Content Is Information Put to Use People in the computer world more or less agree to the definition of data Data are the small snippets of information that people collect, join together in data records, and store in databases The word information, on the other hand, contains all meanings, and no meaning, at the same time You can rightly call anything, including data, information The word information holds a lot of meanings For the purpose of this white paper, I use a pretty mundane definition I take information to mean all the common forms of recorded communication, including the following: ?? Text, such as articles, books, and news ?? Sound, such as music, conversations, and reading ?? Images, such as photographs and illustrations ?? Motion, such as video and animations CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page ?? Computer files, such as spreadsheets, slide shows, and other proprietary files that you may want to find and use Before you ever see a piece of information, someone else has done a lot of work That someone else has formed a mental image of a concept to communicate, and used creativity and intellect to craft words, sounds or images to suit the concept (thus crystallizing the concept) The person has then recorded the information in some presentable way The author of the information pours a lot of personality and context into the information before anyone else ever sees it So, unlike data, information doesn't naturally come in distinct little buckets, all displaying the same structure and behaving the same way Information tends to flow continuously, as a conversation does, with no standard start, end, or attributes You disrupt this continuity at your own peril If you break up information, then you run the risk of changing or losing the original intelligence and creativity that the author meant the information to express If you break up information, then you run the risk, too, of losing track of, or disregarding, the assumptions the author made about the audience and the purpose of the information The now defunct ContentWatch organization (http://www.contentwatch.com/what.html) gave the following definition of content: "What is 'content'? Raw information becomes content when it is given a usable form intended for one or more purposes Increasingly, the value of content is based upon the combination of its primary usable form, along with its application, accessibility, usage, usefulness, brand recognition, and uniqueness." Information that passes casually around in the world isn't content It becomes content after someone grabs it and tries to make some use of it You grab and make use of information by adding a layer of data around it The step of adding data may seem like a small step — from information to its use — but it's not By refocusing from the nature of information to its practical application, you open up a world of possibilities for applying the data perspective to information The crux is that, although you can't treat information itself as data, you can treat information use that way As you begin to use information, you wrap it in a set of simplifying assumptions (metadata) that enable you to use computers to manage that use Humans are mandatory in creating the information and figuring out the simplifying assumptions that wrap the information, but after that, the computer is fine by itself in doling information out in a way that usually makes sense By wrapping information in data, a small action by a person can trigger a lot of work by the computer Suppose, for example, that to simplify information management, you decide that you need to consider a piece of information as 1) new; 2) ready to publish; or 3) ready for deletion By itself, no computer can decide which of these statuses to apply to a piece of information By wrapping your information with a piece of metadata known as status, however, and by having a person set the status metadata, you can use a computer to perform a lot of work based on the status The computer doesn't need to know anything about the information itself; it just needs to know what status a human is applying to the information, as the following list outlines: ?? If you give a piece of content the status "new," the computer sends a standard e-mail message to a designated reviewer ?? If you give a piece of content the status "ready-to-publish," the computer outputs it to a designated Web page ?? If you give a piece of content the status "ready-for-deletion," the computer removes it from the Web page and deletes it from the system In this way, the computer can accomplish a lot of work as the result of a small amount of work by people CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page The Web version of the Merriam-Webster dictionary (www.m-w.com) defines content in part as follows: "1 a: something contained — usually used in plural [the jar's contents ] [the drawer's contents b: the topics or matter treated in a written work [table of contents ] " This definition provides a nice angle on content — something that something else contains By switching from information to content, you're switching from a consideration of a thing to its container You're shifting the focus from the information itself to the metadata that surrounds it The container for information is a set of categories and metadata that well contain the information This additional data corrals and confines that information and packages it for use, reuse, repurposing, and redistribution If content is information that you put to use, the first question to ask is, "What use?" What is the purpose behind marshalling all this information? For such a simple question, an astounding number of content management projects go forward without an answer Or, to state the situation more precisely, all projects have some purpose, but the purpose may have little to with the content that the project involves A question such as "Why are we creating this Web site?" all too often receives one of the following answers: ?? Because we need to ?? Because our competitors have one ?? Because our CEO thinks that we need one ?? Because everyone's clamoring for one These answers initiate a project, but they don't define it To define the project, you need to answer the following question, "What is the purpose of the content we're about to put together?" If you provide a solid and well-stated answer, it then leads naturally to how you need to organize the content to meet your goal The key to a good purpose is that it's specific and measurable Following are some examples of good purposes for an intranet: ?? To provide a 24-hour turn-around in getting any new product data from the product groups to the field sales representatives ?? To provide a site with all articles from the identified sources that mention the identified competitive products ?? To show all the data on a pay-stub with a complete history for every employee Notice that these goals are pretty specific, so quite a few of them may prove necessary to motivate a large intranet As you organize your goals, you organize the content behind them Your process is complete after all your individual goals fit together into a coherent whole and you can ultimately summarize them under a simple, single statement such as "Full support for the field" or "Zero unanswered employee questions." Content Is Information Plus Data There is something human and intuitive about information that makes treating it simply as data quite impossible With data, what you see is what you get With information, much of what it conveys isn't actually in the information but, rather, in the mind of the person creating or consuming it Information lives within a wider world of connotation, context, and interpretation that make it fundamentally not amenable to the data-processing model In fact, the concept of data was created specifically to remove these subjective qualities from information so that computers could manage it with strict, logical precision So, can computers ever really manage information? I believe that they can (albeit poorly by human standards) Until a new computer processing model appears, you can put methodologies, CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page processes, and procedures in place to "handle" information by using data-age tools Rather than reducing the information to mere data, you can capture whole, meaningful chunks of information and wrap it in descriptive data that computers can then read and act on Metadata makes the context and meaning of information explicit enough that a computer can handle it Adding data to information (to metadata, that is) helps split the difference between keeping the information whole and enabling data techniques to effectively manage it The data that you add to information is a way of making the context, connotation, and interpretation explicit More important, metadata can make explicit the kind of mind that you expect to interpret the information By adding a piece of metadata known as audience type to each chunk of information that you produce, for example, you can make explicit any implicit assumption about toward whom you're directing the information Then, a computer can perform the simple task of finding information based on who's on the other side of the terminal Of course, not all tasks are this simple, but the concept is always the same You tag a large chunk of information with the data that the computer needs to access so that it can figure out what to with that information Content, therefore, is information that you tag with data so that a computer can organize and systematize its collection, management, and publishing Such a system, a content management system, is successful if it can apply the data methodologies without squashing the interest and meaning of the information along the way Until computers (or some newer technology) can handle content directly, you must -figure out how to use the data technologies to collect and deliver content Using data technologies to handle content is a central theme of this white paper From Data to Content and Back From the perspective of this white paper, which attempts to reconcile the data and content perspectives, what is data and what is content depends mainly on how you create, manage, and bring each type of information onto a page, as the following list describes: ?? Transaction information are datalike snippets that you typically use to track the processes of buying and selling goods For transaction information and other very datalike sources, you don't generally use a content management system (CMS) Usually, they are managed as part of a traditional data-processing system The templating system of a CMS, however, often does manage the displaying of such information on the -publication page ?? Article information is the sort of information that is text-heavy and has some sort of editorial process behind it You create and manage article information, and other very contentlike information, most often using a CMS In fact, without this sort of information, the CMS has little reason to exist The CMS is also generally responsible for displaying this sort of information ?? Catalog information, which is information that might appear in a directory or product listing, can go either way You sometimes create and manage it by using a CMS, and sometimes a separate data-processing application handles it Obviously, if an organization already has a full-featured catalog system, tying your CMS to it is the easiest way to go On the other hand, if you need rich media and lots of text, along with the part numbers and prices in your catalog, then including the catalog as part of the overall CMS may make the most sense The purpose of content management isn't to turn all data into content The purpose of content management is to oversee the creation and management of rich, editorially intensive information and to manage the integration of this information with existing data systems The CMS must, in all cases, carry the responsibility for the creation of the final publication If this publication includes both data and content, then the CMS is there to ensure that the right data and content appear in the right places and that the publication appears unified and coherent to the end consumer CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page Summary Data was invented because it was much easier to deal with than information It's small, simple, and all its relationships are clearly known (or else ignored) Data makes writing computer programs possible Information is large, complex, and rife with relationships that are important to its meaning but impossible for a computer to decipher Here are some points to keep in mind as you hear and discuss the terms data, information, and content: ?? Content (at least for the purposes of this white paper) is a compromise between the usefulness of data and the richness of information Content is rich information that you wrap in simple data The data that surround the information (metadata) is a simplified version of the context and meaning of the information ?? In a content management system, the computer manages information indirectly through data The content compromise says, "I can't get the computer to understand and manage information, so I must simplify the problem I must create a set of data that represents my best guess of the important aspects of this information Then, I can use the computer's data capabilities to manage my information via the simplified data." ?? Someday, someone may invent computers that can deal directly with information I'm not holding my breath, however, because to so requires cracking some of computer science's hardest problems, such as artificial intelligence and natural language processing In the meantime, everyone must make with the current dumb but very strong beasts (If someone does finally invent these new machines, they may need to change their name from computers to something less mechanical.) CM Domain White Paper Bob Boiko Copyright 2002 Metatorial Services Inc & HungryMinds Inc Do not reproduce without permission Page .. .Building Management Systems A CM Domain White Paper By Bob Boiko This white paper is produced from the Content Management Domain which features the full text of the book "Content Management. .. Management System? Many CMS companies describe their entire product as a management system I take a different tack For me, although it's of course true that a content management system is a management. .. Content Most content management systems store components in databases Some store metadata in databases and keep the component content in files Although almost all content management systems use some

Ngày đăng: 25/08/2018, 10:52

Xem thêm: