XProc is a language

Chapter 1Introduction

XProc is a language that is used for the expression of pipeline of XML operations. An XML pipeline is used to specify a sequence of operations that are to be implemented on a zero or more XML documents. The principal operation of XML pipelines is that their input is usually a zero or more XML documents and the output produced is also in the form of a zero or more XML documents. Pipelines consists of basic steps that are used in performing atomic operations that have been embedded in XML documents and other constructs that are identical to iteration, conditional statements and exception handlers that play an integral role in controlling the execution of the steps in XML pipelines.

There are three principal types of steps that can be implemented in XML pipeline, which are the atomic steps, compound steps and multi-container steps. The atomic steps are used to perform single operations, and their implementation does not entail the use of substructure in the XML pipeline; compound and multi-container steps are used in the control of the execution of the steps in the XML pipelines, this implies that they are implemented using one or more sub-pipelines.

The main purpose of the paper to conduct an analysis of the XProc updates facilities. Chapter 2 introduces the background information about XML, XPath and XProc. Chapter 3 provides an overview related work in order to facilitate the analysis of the XProc update facilities. The chapter details various aspects related to XProc such as c/d query classification, itself query classification, sibling query classification and p/a query classification. Chapter 4 discusses the problems we meet and the possible approaches to facilitate effective analysis of the XProc update facilities. Chapter 5 points out the query performance experiments to be used in the analysis of the XProc update facilities. Chapter 6 provides a conclusion and an overview of any further work related to the analysis of the XProc update facilities.

Chapter 2 Background Information

2.1 XML

The Extensible Markup Language (XML) is a set of data objects that are found in XML documents, and also describes the attributes of the computer programs that are used in processing the XML documents. XML documents usually contain structured information, and the markup language is used in the identification of the structures in the document. The specification of the XML describes a standard way of adding markup to documents, implying that people can use this attribute to define their own tags to describe the data. XML can therefore be perceived to be an example of an application profile that has a restricted form of the Standard Generalized Markup Language (SGML). The basic structure of an XML document comprises of storage units that are known as entities, which are used for holding data that is either parsed or unparsed. Data that has been parsed comprises of characters, which make up the character data and markup. The Markup is used to encode a description of the layout structure with respect to its storage and the logical structure. This implies that the XML provides a framework for imposing constraints on the structure of the layout and storage of the document. XML documents are read by a software module referred to as the XML processor, which is used to offer access to the content and the layout structure of the document. Conventionally, the XML processor performs its functions on behalf of a module referred to as the application. This approach to behavior specification describes the specific attributes regarding how the XML processor should read the data and relevant information it provides to the application module. Some of the design goals during the development of XML included the following:

  1. Its usability over the internet platform
  2. It should be able to support a wide range of applications
  • Its compatibility with SFML
  1. Ease of writing programs used in the processing of XML documents
  2. Minimal number of optional features
  3. XML documents should be legible to the user and should be clear
  • The ease of design of the XML documents

In comparison to other markup languages such as the HTML, XML has more flexibility. In HTML, the tags are always fixed, and there is probability of compiler mistakes if the rules are not followed. For instance, if we were to write an HTML page, the document must include <HTML> and <HTML/> as the exact words and the required sequence.

Example 1: HTML document that contains some data

<HTML>

<Head>                                                                                                                                                           <title> account details <title/>                                                                        <head/>                                                                                                                                                                       <p><b> Account name</b> Steven Richards</>

<p><b> Account Number<b> 0125478521 </>

<body/>

<HTML/>

 

 

 

 

 

 

 

 

 

 

 

 

Example 2 represents the same document but implemented using XML

<? xml version=”1.0standalone=”yes”

<Account name>

<Surname> Steven<surname/>

<Firstname>Richards<firstname/>

<Account name/>

<Account number>

<number> 0125478521 <number/>

<branchcode> 021<branchcode/>

<Account number/>

An overview of the above examples in HTML and XML reveals that in HTML, there are more rules that must be followed. For instance, most tabs make use of <body> and <body/>, and each tag must be nested with a corresponding end tag. For the case of XML, there is no fixed tag, and the writer of the XML document created his own tags, implying that XML has more flexibility in the sense that a person can add more custom tags to reflect the context of the data required.

Text write in XML usually contains the character data and the markup. The markup are usually identified by start tags, end tags, tags that are element-empty, entity references, character references, comments, CDATA section, comments, delimiters, processing instructions, and XML declarations. Any other part of the XML text write that is not mark up makes up the character data of the XML document. In general, an XML document should begin with an XML decoration that specifies the version of XML that is being used to write the document. The following example is a first line declaration in XML;

<? XML version= “1.0”

<Greeting> Hello world <greeting/>

Elements and tags are also an integral component of the XML document. A notable characteristic of all XML documents is that they must have only one root element. An XML document could also have an attribute, and they appear in the name and value altogether. An important characteristic of the elements of an XML document is that they are related and usually extensible. Elements are normally defined by tags such as start tags, end tags and empty element tags. A start tag is enclosed by <>, and end tag is enclosed by </>, while an empty element tag is constructed using <followed by the element name and the attribute, and then ends with />. The following example 4 represents how an empty element tag looks like.

<IMG align=”left” src=:http://www.w3.org/icons/WWW/w3c_home”/>

Comments also form of an XML document and can be placed anywhere within the document, provided it is placed outside the mark up, however, they are not part of the XML document character data. They are usually ignored by the compiler. Comments are normally placed between <! – – and — >

An important characteristic of XML materials is the constant references that the various elements make to their descendants and children, resulting something like a tree structure relationship in XML. This is one of the most beneficial characteristics of XML, since it describes the relationship that exists between the various data elements in the XML document. This relationship is represented using the XML tree. Consider the following example 5, which is part of an XML text write;

<Inventory>

<Beverage>

<Lemonade>

<Cost>$3</cost>

<Amount>30</amount>

</lemonade>

<Soda>

<Cost>$5</cost>

<Amount>10</amount>

</Soda>

</beverage>

 

<Snacks>

<Potatochips>

<Cost>$4.50</cost>

<Amount>60</amount>

</potatochips>

</snacks>

</inventory>

The XML tree of the above text write can be represented by the figure 1 shown below.

 

 

 

 

 

 

 

 

 

 

 

 

From the above XML tree, the serves as the root element, with all other elements being viewed as its descendants.

2.2 XPath

XPath is an expression language that facilitates the processing of values that conform to the data model defined by XQuery Data Model. The principal use of XPath is to facilitate navigation through the various elements and the attributes that make up the XML document in order to extract useful data from the document. The basic purpose of using XPath is to address the various parts that constitute the XML document; in addition, the XPath can be used in the operation of strings, number and Boolean expressions within the document. XPath uses a syntax that is compact and different from the syntactical methodology applied in XML in order to extract relevant data within the Uniform Resource Locators (URL) and the attributes of the XML values. XPath normally operates on the underling logical structure of the XML document, instead just analyzing then surface syntax. This is facilitated by use of path notation in order to ascertain the hierarchical arrangement of the XML document. Apart from its role of addressing XML documents, XPath can also be used to perform some matching operations; this is done by use of a natural subset that can test whether a particular node matches a pattern.

The XPath normally models an abstract of the XML document in form a tree of the nodes found in the document. XPath normally contains seven types of nodes, which are element, attribute, text, name space, processing instruction, comment and the nodes found within the XML document due to the relationship structure of the document. Conventionally, XML documents are treated as a tree of nodes. This implies that the functionality of the XPath significantly depends on the tree representation of the XML document. The model of a node normally consists of a local part and the corresponding namespace of URL; this concept is referred to as the expanded name. The basic construct of the XPath syntax is the expression, which is used in matching the Expr. The analysis and evaluation of an expression results to an object, which can take one of the following types:

  1. Node set, which is an group of unordered nodes that have no duplicates;
  2. Boolean, which takes the form of true or false;
  • Number, which is a floating number;
  1. String, which is represented by a series of UCS characters.

The expression can be said to the fundamental building block of the XPath, and all the information that can have an effect on the outcome of the expression are usually described by the expression context. A path expression is used in the location of node within the XML tress, and this is usually achieved after a series of one more steps that are separated by “/” or “//”. The evaluation of the expression depends on the expression context, in which case, the XSLT and XPointer are used in the specification of the context can be predetermined for the expressions implemented in XPath. The expression contexts are usually made up of a node, which is referred to as the context node; two positive integers that are non-zero, used in defining the context position and its respective context; a collection of variable bindings; function library and a collection of namespace declarations within the scope of the XPath expression. In most cases, the context position is supposed to be less than or equal to the size of the context. One of the most vital forms of the XPath expressions is the Location path, which is used in the selection of a set of nodes with respect to the context node. The outcome of this expression evaluation of a location path is the node set that has the nodes that have been selected by the location path. In addition, the location path can be used to hold the information that can be used in the filtering of other nodes. The following table shows some of location paths in their unabbreviated syntactical form.

   
child::Para Used in the selection of the para element children found in the context node
Child: : * Used in the selection of all children found in the context node
Child : : text () Used for selection of all the text node children that are found in the context node
Child : : node () Used for selecting all the children that are found in the context node irrespective of their node type
Attribute : : name Used for selection of the name attribute that is found in the context node
/ Used in the selection of the XML document root, which is normally the parent of the XML document

 

Basically, there are two types of location paths; they are the relative location paths and the absolute location paths. The relative location paths comprises of a series of at least one location steps that have been separated by /. The location steps in the relative location path are summed up together from left to right, and each location step plays an important role in the selection of the set of nodes with respect to the context node. This implies that each node in a location step serves as the context node for the following location step. In an absolute location path, “/” are optionally followed with a relative location path. The absolute location selects the root node of the XML document by itself in the context node. In case where the context node is followed by a relative location path, then the selection of the location path would be done using the path that is relative location path with respect to the node of the document that has the context node.

The location steps normally contains three parts, an axis, which is used in the specification of the tree relationship that exists between the selected nodes and the context node; a node test, which is used in the specification of the node type and the expanded-name of the selected nodes in the current location step; and a zero or more predicates, which make use of the XPath expressions to additionally refine the group of nodes that have been selected in the current location step. The syntax for denoting the location steps normally comprise of an axis name and the node test with a double colon separating the two, followed by zero or more XPath expressions in square brackets. For instance, in child:: Para [position () = 1], the child represents the axis name, Para represents the node test and [position () =1] represents the predicate. Examples of the axes include, the child axis, which is used for holding all the children found in the context node; the descendant axis, which is used to hold all the descendants of the context node, a descendant axis does not contain normally contain an attribute or a namespace nodes; the ancestor axis, which usually holds ancestors that are found in the context node, and usually has a root node. There are other axis types such as the preceding-sibling axis, following axis, the namespace axis and many more. A list of most of the abbreviated syntax and their expected outcome is shown in the following Table 2.

syntax Expected outcome
para Selection of the para element found in the children found in the context node
* Selects all the element children that are found in the context node
Text () Selects all the text children that are found in the context node
@name Selects the name attribute from the context node
@* Selects all the attributes that are found in the context node
Para [1] Selects the first para child that is found in the context node
*/Para Chooses all th para grandchildren found in the context node
Para [ last () ] Selects the last Para child that is found in the context node
Character // Para Chooses the para element descendants found in the chapter element of the context node
. Chooses the context node
.. Chooses the parent node from the context node
../@lang Chooses the lang attribute of the parent found in the context node
//o lis/item Used for selecting all of the item elements found in the same document as the context node, which has the olist parent
Chapter [title] Used for selecting the chapter children of the context node, which has more than one title children
Chapter [title] Selects the chapter children found in the context node, which has more than one title children having a string value that is equal to introduction

A significant aspect of XPath is conformance, implying that that it can be used with other specifications. This means that XPath depends on specifications such as the XPointer to evaluate a decisive factor for implementation of XPath.

2.3 XProc

XProc is a language used in the specification of operations that are carried out in XML documents. It is designed to solve the problem associated with composing XML processes. XProc is one of the most effective HTML processors owing to that it has a declarative format that incorporates an element of simplicity with respect to the development of XML pipelines; this means that even non-technical individuals can write and maintaining the various processing work flows in XProc, compared to other XML processing frameworks.  Other characteristics of XProc that makes it effective compared to other XML processors such as XSLT are outlined below:

  1. XProc has various configurations compared to other XML processors which have only single configurations;
  2. The steps in XProc lay emphasis on the performance of specific operations, this implies that it will be easy to optimize the XProc steps;
  • XProc has a characteristic of extensibility and its standard step library makes it more effective in the processing of XML documents;
  1. The structured arrangement of data in XProc implies that it is easy to reuse compared to structured code implemented in other XML processors;

2.3.1 Pipeline concepts

A pipeline is defined as a collection of connected steps, with outputs serving as inputs of one another. A pipeline in itself is considered as step; therefore, it must meet the constraints associated with steps. The connections between steps occur in instances whereby the input of one step is linked to the output of another step. The outcome of evaluation of a pipeline or its sub pipelines corresponds to the outcome of evaluating the steps contained in the pipelines, in a manner that should be consistent with the arrangement of the steps and their respective communications. This means that the behavior of a pipeline should correspond to the behavior of each of the individual interconnected steps that make up the pipeline.

2.3.2 XPath in XProc

The implementation of expression language in XProc makes use of XPath, in this context, the expressions implemented using XPath are evaluated using the XProc processor in various positions. These positions of evaluation could be on compound steps, which is used in the computation of the default values for the options and the values of the variables; while on atomic steps, they are used in the computation of the actual values of the options and the parameter values. The XPath expressions can sometimes be passé on to other steps, with their evaluation depending on the XPath implementations on the individual computational units. This kind of distinction is expressed in the following example 6

 

<p: variable name=”home” select=”‘http://academics.com/docs'”/>

<p: load name=”read-from-about us”>

<p: with-option name=”href” select=”concat ($about us,’/about.xml’)”/>

</p: load>

<p: split-sequence name=”select-chapters” test=”@role=’chapter'”>

<p: input port=”source” select=”//section”/>

</p: split-sequence>

 

 

 

 

 

 

 

 

 

 

The select expression on the above example is on the variable “home”, which is to be evaluated by the XML processor. The value for this variable is http://academics.com/docs. The href option that located in the step of the p: load is the one to be evaluated by the XProc processor.

2.3.3 Syntax overview in XProc

The design of XProc is such that it should work with all versions of the XML equally. Elements that are found in the pipeline document are usually a representation of the pipeline, the steps contained in the pipeline and their respective connections between the various steps connecting the pipelines. Each of the pipeline steps in an XML pipeline document represents an element. The grouping of the pipeline elements and their attributes determine the specification of the inputs and the outputs at the various steps within the XML pipeline. In addition, they play an integral role in parameter passing in XProc syntax. Theoretically, steps in an XML pipeline can be perceived to be objects that have alternating inputs and outputs that are interconnected, which in turn contain supplementary steps. Therefore, the XProc syntax deploys a mechanism that is used in the specification of such kind of relationships. An important aspect of the XProc syntax is the concept of containment, which can be conventionally represented by use of nested elements in the documents. For instance, if a specific XML element encounters compound step, then the elements of the compound step, then the immediate children are said to constitute the sub pipeline of the document. The following table represents some of the basic commands deployed in XProc and their respective outcomes.

Command Its description
Add attribute Is used in the addition of a single attribute to any of the elements that are found in the incoming document, which matches with XPath
Add-XML-base Used in the adding of XML:base, which is used for evaluating the relative positions of other entities that are linked in an XML document
Compare Used for making comparisons between two XML documents for instances of inequality
Count Used for returning the number of documents that are found in the input sequence of an XML pipeline document
Delete Used for deleting items that are found in an incoming stream, which that of a particular XPath
Directory list Used for displaying the XML document that contains the files and the directories for the specified IRI
Error Used in the generation of an error depending on the incoming XML pipeline document to be processed.
Escape markup Used in the conversion of an incoming XML pipeline document into a markup that has been serialized
filter Used for displaying the portion of the XML pipeline document based on the specified XPath expression
Label-elements Used for generating a label for every element of the XML pipeline document that matches the specified XPath expression
load Used for loading an XML document that is located in an external resource
Make-absolute-uris Used in the conversion of the relative URIs into an URI that is different from the original one basing on the XML: base
Namespace-rename Used in the renaming of a namespace URI to a dissimilar URI
pack Used for assimilation of two XML pipeline documents, especially in the merging of elements that contain the linear table data
parameters Used for displaying the parameters found in XProc
rename Used in the renaming of elements or attributed according to the XPath expressions
replace Used for replacing the elements of a given element in accordance with the XPath specifications
Set-attributes Used for setting the value of an attribute that is extant into a matched attribute
sink Can be used in the termination of operations by accepting and ignoring sequence of XML pipeline documents
Split-sequence Used for splitting a sequence into two parts
store Used for storing the serialized version of an XML pipeline document into  specified URL
string-replace Used for replacing the text springs in an XML pipeline document that has been matched by XPath into with the target string
Unescape-markup Used in the parsing of the elements of the XML markup into a document
unwrap Used in the removal of the element of matched documents and links children of the children with their parent
wrap Used in the wrapping of the nodes found in an element into a new element

 

XProc namespaces are divided into three sub categories according to the specifications of the W3C. They include the namespace of the XProc XML vocabulary, which is represented by the namespace prefix “p:”; the XProc-step, which is used for XML pipeline documents that serves as inputs to and outputs from the various steps, which may be standard or optional, as explained in the W3c specification. Some of the steps specified under the XProc-step namespace include p: http-request and the p: store, which represents the subsequent output and input respectively. This namespace can be applied in all XML documents. The third subcategory of the XML namespaces is the XProc-error, which is represented by the namespace prefix “err”. The W3C also recommends the use of the namespace prefix “xs” for this namespace.

It is important to note the various steps in an XML pipeline should have distinctive names, especially if one step type has been identified and declared within the same scope in more than a single instance or if the name of the step type is built-in. The scope of the naming of the various step types depends on the environment of each of the particular step type. As a rule of thumb, the step name, the names of the children steps, and the names of the steps that are linked directly, the step names of its ancestors, and the step names of the children of the ancestors can be perceived to be in a single scope. The XProc reports a static error in cases whereby there are more than one step having the same name within the same context. It is also important to note that both the input and the output are in the same scope, implying that both of the output and input port names must have dissimilar port names in order to avoid instances of static error. This implies that the names of the various ports at any pipeline step should be unique. This uniqueness is an important aspect of the XProc, in the sense that it ensures that the grouping of the name of the step and the name of the port is used in the identification of one port on a single step within the scope. The names of the parameters are not subject to scoping, rather, they are made distinctive at the pipeline step level instead of the scope level. The following section summarizes the XProc syntax overview.

  1. A name is used to represent an instance of an element that has that name
  2. Grouping are implemented un XProc by use of parentheses
  • A comma is used for the separation of the various elements in a group that comprises of ordered sequence
  1. A vertical bar separating elements or groups implies a choice
  2. Groups or element that have separated using an Ampersand “&” implies that they are an ordered sequence
  3. A group or an element that has a question mark at the end implies that it is optional, in the sense that it can or it may not occur, and if it does occurs, then it does so once
  • An asterisk separating groups or elements implies that the element is discretionary, and if it does occur, then it does so in a repetitive manner
  • An element or a group that has a plus (+) sign at the end implies that the element is required, meaning that it should occur at least once, although it can take place a several times.
  1. The XML: id attribute can be used on any element, and makes use of the semantics XML: id
  2. The attribute XML: base can also be used on any element and follows the semantics associated with XML base
  3. The p : documentation and p: pipeinfo elements can be used anywhere within the XML document
  • The p: log element can be implemented on any step within the XML pipeline document

The following instances are associated with a static error in XProc

  1. It is an instance of static error (err: XS0059) if the pipeline element is not specified by either p: pipeline, p: declare-step, or p: library.
  2. static error represented by (err:XS0008) occurs if any element in the XProc namespace contains attributes that are not defined by the W3C specification except for cases associated with  extension attributes.
  3. It is a static error represented by (err: XS0038) in cases whereby any required attribute is not made available.
  4. It is a dynamic error represented by (err: XD0028) in cases when an attribute value does not meet the type that is required for that particular attribute.
  5. It is a static error represented by (err: XS0044) in instances whereby an element in the XProc namespace or any step contains the element children that are not specified for it by the W3C specification. Especially presence of atomic steps under which there is no the declarations are visible may result to this kind of errors.
  6. It is static errors represented by err: XS0037 in cases whereby the pipeline steps directly hold text nodes that do not comprise completely of whitespace.
  7. It is dynamic errors represented by err: XD0019 in cases whereby any option value does not meet the requirement of the type needed for that specific option.
  8. It is static errors represented by err: XS0015 when a compound step does not have contained steps.
  9. It is a dynamic errors represented by err: XD0012 when an effort is made to dereference a URI that has the scheme of the URI reference which is not compatible. Implementations are therefore encouraged to be compatible with as many schemes possible and should be compatible with both the file: and http(s): schemes. The set of URI schemes really supported is implementation-defined.
  10. It is a dynamic error represented by err: XD0030 in cases whereby a step is not capable of performing its functionalities.
  11. In most steps which use a select expression or match pattern, any kind of node can be identified by the expression or pattern. However, some expressions and patterns on some steps are only applicable to some kinds of nodes
  12. It is a dynamic error represented by err: XC0023 in instances when a select expression or match pattern reports a node type that is unallowable in that particular pipeline step. In case an XProc processor can establish statically the occurrence of a dynamic error, it may report that error as static given that the error does not take place amongst the descendants of a p: try. Dynamic errors inside a p: try are supposed to be reported as static errors.

2.3.4 Update Operations in XProc

Update operations are an integral element in the functionality of XML processing by XProc. They constitute a few of the many operations that XProc is capable of handling. The XProc update facility is extensions to the XProc, the update facilities in XProc include the insert, delete, replace, rename and transform. In this section, we are going to provide an overview of the insert, delete, replace and the rename expression.

Insertion

The p : insert step is used for insertion in XProc; it basically serves to insert the insertion port of an XML document into the port of the source document with respect to the matching elements that have been found in the port of the source document. An implementation of the p: insert in XProc is shown in the following example.

<p:declare-step type=”p:insert”>
<p:input port=”source” primary=”true”/>
<p:input port=”insertion” sequence=”true”/>
<p:output port=”result”/>
<p:option name=”match” select=”‘/*'”/>                        <!—XSLT Match Pattern –>
<p:option name=”position” required=”true”/>                   <!– “first-child” | “last-child” | “before” | “after” –>
</p:declare-step>

 

 

 

 

 

 

 

 

 

 

 

The value deployed at the match option should be implemented using the XSLT Match patter. In most cases, a dynamic error of err: XC0023 is reported if the pattern does not match the element, the text write, the instructions for processing, or the nodes indicating the comments. Multiple matches can be allowed, in the sense that more copies of insertion instances can occur. In cases where there no matching in terms of the elements, then the XML pipeline document is not updated, implying that it remains unchanged irrespective of the insertion expression. This is one of the key constraints associated with insertion in XProc. The underlying value that denotes the position option should be implemented using an NMTOKEN specified by the requirements in the following table:

Value of the position Expected outcome
First-child The insertion occurs as the foremost child of the XPath match
Last-child The insertion takes place at the last child according the XPath match
before The insertion takes place at the immediate previous sibling of the XPath match
after The insertion takes place at the immediate subsequent sibling of the XPath match

 

Deletion

p: delete step

Deletion process in XProc is implemented using the p: delete step, which is used to delete specified items according to a match by the XPath in the source or the input document, and produces the output document, that has the specified to be deleted removed at result port. The deletion process on XProc is implemented by the following lines of code:

<p: declare-step type=”p: delete”>
<p: input port=”source”/>          <!—this specifies the source input port –>
<p: output port=”result”/>
<p: option name=”match” required=”true”/>    <!—this is the XSLT Match Pattern –>
</p: declare-step>

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

It is important to note that the value of the match option should be denoted by an XLST Match Pattern in order for an update of deletion process to take place. In other cases, a match pattern may be implemented to delete multiple elements in the XML pipeline document. In case the match option is implemented on an element, the whole of the sub tree under the element is subject to deletion. A significant constraint associated with the deletion step is that it can be used in the deletion of namespaces. A dynamic error denoted by err: XCOO62 is reported in cases whereby the match option corresponds to the value of the namespace node. It is also worth noting that the deletion of attribute referenced as XML: base does not have any effect on the base URI of the element in which the deletion step was implemented.

Replace

P: replace step

The replace update in XProc is implemented using the p: replace step, which is used in the replacement of matching nodes that are found in the primary input with the XML pipeline document found on the replacement port’s number. An implementation of the p: replace step in XProc is shown by the following example

<p: declare-step type=”p: replace”>        < — step declaration — >
<p:input port=”source” primary=”true”/>  <!—specifies the input port — >
<p:input port=”replacement”/>
<p:output port=”result”/>
<p:option name=”match” required=”true”/>    <!—this represents the XSLT Match Pattern –>
</p:declare-step>

 

 

 

 

 

 

 

 

 

 

It is important to note that the value of the match option in the p: rename step should correspond to an XSLT pattern. Otherwise, it is reported as a dynamic error, denoted by err: XCOO23 if the pattern matches any other thing except for the element, text write, the instructions used for processing or the nodes that represents the comments. Multiple matches can be implemented in order to facilitate multiple instances of replacement within the XML pipeline document. Each of the nodes that are found in the primary input that is found to match the specified pattern undergoes a replacement at the document element that is found on the replacement document. The significant constraint associate with the p: replace step in XProc is that replacement takes place on only non-tested matches. This means that once a replacement has occurred at a node, its respective descendants are not subject to replacements.

p:string replace step

Another implementation of the replace update in XProc can be used in the replacement of namespaces, which is implemented using the p: string replace expression. The p: string replace step functions by matching the nodes of the XML pipeline document found on the input source port and replaces them with the string result after the evaluation of the corresponding XPath expression. An instance of the p: string replace implementation is shown in the following example.

<p: declare-step type=”p: string-replace”>                     < — step declaration   — >
<p: input port=”source”/>
<p: output port=”result”/>
<p: option name=”match” required=”true”/>                      <! – XSLT Match Pattern –>
<p: option name=”replace” required=”true”/>                    <!—XPath Expression –>
</p: declare-step>

 

 

 

 

 

 

 

 

 

It is worth noting that the value of the match option should correspond to the XSLT Match Pattern, while the value of the replace option should be represented by an XPath Expression. The specification of the matched nodes is done by the match pattern found in the match option. For every matching node found in the XML pipeline document, the XPath expression used for replacement is evaluated in accordance with the matching node found in the XPath context node. The output of the p: string replace operation is represented by the string value of the result. Non-matching nodes are altered by the operation, and they are therefore copied without any changes. If the XPath expression provided in the match option corresponds to the value of the attribute, the string value, which is the output of the replace expression, serves as the new value for the attribute in the corresponding output elements in the document. It is worth noting that the p: string replace has an effect on the XML: base, contrary to other update operations, this implies that the base URI of the element is also changed in accordance with the evaluation of the XPath expression.

Rename

p: rename step

<p:declare-step type=”p:rename”>
<p: input port=”source”/>
<p:output port=”result”/>
<p:option name=”match” required=”true”/>                      <!—XSLT Match Pattern –>
<p:option name=”new-name” required=”true”/>                   <!– QName –>
<p:option name=”new-prefix”/>                                 <!– NCName –>
<p:option name=”new-namespace”/>                              <!– anyURI –>
</p:declare-step>

 

The p: rename step is used for renaming of the various elements, the target of instructions used for processing the XML pipeline document and the attributes of the document. An implementation of the p: rename step is illustrated in the following lines of code.

 

 

 

 

 

It is imperative to note that the value of the match option should correspond to an XSLT Match Pattern. Otherwise, it is reported as a dynamic error denoted by err: XCOO32 in cases whereby the pattern is equivalent with any other thing except for the document element, the nodes that contain the processing instructions or the attributes. The value used in the specification of the new-name option should correspond to a QName. In cases whereby the lexical value is implemented without a colon, the new-namespace value can be used in the specification of the namespace for the new namespace. In such a scenario, the new-prefix value is specified in order to offer a suggestion for the prefix of the new name. A dynamic error denoted by err: XDOO32 is reported in order to facilitate the specification of the new namespace or in cases whereby there is a colon is the specified name of the lexical value.

Each of the document element, the instructions used for processing the XML pipeline document and their attributes found in the input that match the pattern that has been specified by the match option is usually renamed in accordance with the specification provided by the new-name option at the output. If the match option is equivalent to an attribute, and at the same time the element on which it is found has an attribute that contains an expanded name which is t6he same as the expanded name that has been specified under the new name option, then the outcome of the rename operation corresponds a scenario that is as if the present attribute that been named as the “new name” has been deleted prior to the renaming of the attribute that was matched. Concerning the attributes that have the name XML: base, the semantics described below apply in such scenarios: when renaming from XML: base to any other name, the p: rename step usually has no effect on the base of the URI of the document element. However, in cases where the renaming is from another name to XML: base, the base of the URI document element is changed according to the specifications. In cases whereby by the pattern are equivalent to the processing instructions, then the target of the processing instructions is subject to being renamed. Otherwise, a dynamic error, denoted by err: XCOO13 is reported if the pattern is equivalent to the processing instruction, yet the new-name specification does not have any namespace specifies, that is, it is represented by a null-namespace.

p: namespace rename step

<p:declare-step type=”p:namespace-rename”>
<p:input port=”source”/>
<p:output port=”result”/>
<p:option name=”from”/>                                       <!– anyURI –>
<p:option name=”to”/>                                         <!– anyURI –>
<p:option name=”apply-to” select=”‘all'”/>             <!– “all” | “elements” | “attributes” –>
</p:declare-step>

 

The p: namespace rename step is used in the renaming of any namespace declaration, or to facilitate the usage of a new namespace in an XML pipeline document into a new IRI value. An implementation of the p: namespace-rename is shown in the example below.

 

 

 

 

 

 

 

 

The “from option” value should be set to any URI; it can be empty or absolute, in the sense that it will not be resolved in either scenario. The value of the “to option” should be set to any URI, and can be empty or absolute as for the case of the “from option”. The “apply-to” option should be set to one of all the elements or attributes in the XML pipeline document. In cases whereby the value is represented by elements, then the renaming will only take place on the elements, in cases where the values are represented by attributes, then only the attributes of the document are subject to renaming. If the value contains both the elements and attributes, then renaming will take place on both the elements and attributes. A dynamic error denoted by err: XCOO14 is reported in cases whereby the XML namespace or the XMNLNS namespace is used as the value of either the “from” or the “to” option. In instances whereby the both the “to” and “from” options are equivalent, then the output is not altered. If not, the namespace bindings and attributes, the element and the attribute names are altered in accordance with the following.

  1. For the case of namespace bindings, if the “from” option is available and it is not an empty-valued string, then each of the binding of the default namespace found in the input document that has a similar value which is the same as the from option is replaced in the output document with a binding to the “to” option value, provided that the from option is present and not represented by the empty string; of not, it is absent from the output. In the absence of the “from” option, or if its string is empty-valued, then the bindings are not subject to alterations.
  2. For the case of elements and attributes, if there is a “from” option and its value is not represented by an empty string, then for each of the element and attributes found in the document, that have the namespace being equivalent to the “from” option value, the output of the p: namespace-rename step on the namespace is usually replaced with the value that has been specified in the “to” option, on condition that the from option is present and not represented by an empty string. If not, the input namespace is changed so that it reflects no value.
  • For the case of the namespace attributes, if the “from” option is available and that its value is not represented by an empty string, then for each one of the attributes of the namespace found in the input, which has a value that is equivalent to the “from” option, then the output of the namespace attribute value is subject to being replaced with the value specified in the “to” option, on condition that the attribute of the namespace is present and it is not specified by an empty string. Otherwise, the namespace space attribute is not present.