Structure and Global Ambiguity
First, we will present some well-known results, which are important for our further
Theorem 1 RTGs are weakly equivalent to
CFGs (that is, the set of strings of leaves generated by CFGs is equivalent to the set of
strings of leaves generated by RTGs).
RTGs are strongly equivalent
to graphs generated by a CFGs, modulo a homomorphism of node labels (i.e., a homomorphism
which maps various node labels in a given tree onto a single one), provided the RTG has
LTGs are strongly equivalent
to graphs generated by CFGs, provided finitary branching.
Proof is trivial: as in LTGs every node label is generated by exactly one nonterminal; and
in CFGs, nodes which are not leaves are labeled by nonterminals, there is a one-to-one
correspondence. This has some importance for the relation of LTGs and RTGs. It follows, that
LTGs and RTGs are equivalent up to homomorphism, as well as all grammar types in between.
Every discussion we have about generative capacity concerns only non-equivalence in
isomorphism. Since also the strings of leaves for all grammar types are identical, we can only
be interested in the sets of trees. In the sequel, if we speak of the weak generative capacity
of a tree grammar, we refer to the sets of trees it generates, not the strings of
Since to our knowledge, only the strong generative capacity of the grammar types of Murata et al., 2005 has been in the focus of research, we will now scrutinize their weak
Theorem 4 The
sets of trees generated by STGs form a proper subset of the sets of trees generated by
For proof, consider the trees generated by the following grammar:
S → a(AB)
A → b(C)
B → b(D)
C → c(D)
D → c(ε)
This is an RCG, since A and B do not occur in similar contexts. In order to see that there
is no way to generate this tree with an STG, consider the following fact: a governs two identical labels, which however govern different
subtrees. It is therefore impossible to introduce them with identical rules, and (by
definition of STGs) forbidden to have two competing rules in the content model of the first
rule. This is sufficient for the proof of weak inclusion, since all STG rules are also RCG
rules, and therefore the languages generated form a proper subset.
Theorem 5 The
sets of trees generated by LTGs form a proper subset of the sets of trees generated by
Consider the trees generated by the following grammar:
A → a(B)
B → b(C)
C → a(D)
D → c(ε)
This is a single type tree grammar, and no LTG is able to generate such a tree (remember
that LTGs are strongly equivalent to graphs generated by CFG, provided finitary
Restrained Competition Grammars and Variants
In this section we will scrutinize formal properties of the different types of
restrained competition grammars. We will show which kind of languages cannot be generated by
RCGs; we will prove that there are GRCGs which are ambiguous, and that for every language
which can be generated by an RTG, there is a GRCG which generates the same language.
Finally, we will show that URCGs do not have these properties, are properly included within
GRCGs and properly include RCGs.
The type of languages we cannot generate with RCGs is quickly described as follows: all
grammars, where a single content model contains competing nonterminals, which can be
uniquely distinguished only from their left (right, respectively) context, are not RCGs.
Consequently, we cannot generate sets of trees, where a certain node has different subtrees
depending on its right siblings. If we want to get rid if this asymmetry, and allow for
GRCGs, where competing nonterminals in a single content model are uniquely determined by
their left or right context, we run into problems:
Theorem 6 There
are GRCGs which are ambiguous.
For proof, consider the following rule: S →
a(AB|BA). Suppose, A and B are competing nonterminals; suppose furthermore, that there is some
overlapping between A and B; i.e., the nonterminals generate overlapping sets of trees. In particular,
we may assume that the trees generated by A form a subset
of the trees generated by B. For example, A and B generate identical trees
up to depth n; B in
addition generates a tree of depth n+1. In this case, the
trees of the language have the root a, with two symmetrical
sets of subtrees up to depth n, and possibly one subtree
with depth n+1. It is easily seen that now it is impossible
to merge A and B, for then
we would be incapable of expressing the condition that at most one subtree has depth
n+1. However, for the trees, where the subtrees
introduced by A and B have
depth at most n, there is necessarily more than one
analysis. The grammar we have described so far is, however, a GRCG, because neither
A nor B occur in
identical contexts (though in similar contexts, remember the preceding section).
Theorem 7 For
every language which can be generated by an RTG, there is a GRCG which generates the same
To proof this theorem, we describe a simple procedure to convert any RTG into a GRCG,
which generates the same language. We define competing sequences of length n of nonterminals as follows: two sequences of nonterminals
compete, if for all n, the nth nonterminal of one sequence competes with the nth nonterminal of the other sequence. We have to assume a content model
r which is not GRCG conform. Therefore, there have to be
two competing nonterminals or competing sequences of nonterminals A and B in r, such that for possibly empty sequences of nonterminals Γ and Δ, (Γ A Δ) and (Γ B Δ) match
Given this, we can be sure, that in the instantiations of r, which violate the GRCG condition, A and
B occur in exactly the same global tree contexts. By
global tree context we here mean that a tree with a governing the subtrees generated by
A is part of the language iff a also governs the set of
subtrees generated by B. Since this is the case, we can
simply merge the two nonterminals to a new one, C, which is
the union of the former two. This new nonterminal substitutes all instantiations of
A and B, which occur in
the same global tree context. This, by definition, are the instantiations which violate the
GRCG condition. This we can apply to all nonterminals which violate the GRCG condition. The
only thing we have to take care of is that we apply this only to those instantiations of the
content models where two competing nonterminals match equally (this might force us to change
some regular expressions). We do not show an exact algorithm at this point, since it is
clear that an equivalent GRCG exist, and the details of the construction are of no practical
interest at this point.
We now show that there is a hierarchy of proper inclusion RCG ⊂ URCG ⊂ GRCG.
To show that RCG ⊂ URCG, consider the following: every rule which is admitted by an
RCG is also admitted by a URCG, because if competing nonterminals in the same content model
have a unique prefix, a fortiori they also have a unique context (we have already shown that
a unique prefix of nonterminals is paramount to a unique prefix of labels/siblings, by
induction). Above, we have already shown that for an RCG it is impossible to generate
languages as the following, which is a URCG.
S → a(AC|BD)
A → b(C)
B → b(D)
C → c(ε)
D → d(ε)
compete, but are
determined uniquely by their context.
This concludes the first part; the second part will be a corollary of the next section:
We will show that some languages are inherently ambiguous, that is, there is no unambiguous
grammar for them. By Theorem 7 we know that we can generate these languages with GRCGs, but
Proposition 1 A URCG cannot be ambiguous.
This is easy to see: an ambiguous grammar assigns two different sequences of
nonterminals to the daughters of one node (since root nodes are unambiguous): Then however,
there must be at least two competing nonterminals which occur in the same content model in
similar contexts, which, by definition, is impossible.
Inherently Ambiguous Languages
As a corollary, we can show that there are regular tree languages, for which there is no
unambiguous grammar. There are sets of trees, which are generated by an ambiguous GRCG, but
by no URCG. We will call these languages inherently
Theorem 8 Some
regular tree languages are inherently ambiguous.
This can be seen easily, if we spell out a grammar which we described in the above
subsection. We will then show that there is now way to write an unambiguous grammar which
generates the same language.
S → r(AB|BA)
A → a(C)
B → a(D)
C → b(ε)
D → b(ε|E)
E → c(ε)
There is no way to merge A and B, since they generate different sets of subtrees (we can write L(A)≠L(B)); but since they overlap (L(A)∩ L(B)≠ ∅), there is no way to have a unique
interpretation in the cases where the subtrees generated by the nonterminals are identical.
There will always be two ways to generate trees in this case.
We can, furthermore, precisely state the conditions, under which a regular tree language
is ambiguous. To this end, however, we need to introduce some notation. We now for
simplicity write trees as terms: a tree with root a and
daughters b and c is
denoted as a(b,c), etc. As a next step, we define a context
as the position, where certain subtrees occur within trees of a language.
A context C is a tree-term with exactly one variable. We say that a
set of subtrees α occurs in a context C in a language L, if the following holds: We
can instantiate the variable of C with any tree from α, and the resulting tree is in
Note that sets of subtrees correspond to nonterminals, when we speak of languages rather
than grammars. In the sequel, for simplicity we will use lower case Greek variables equally
for sets of subtrees as for ordered sequences of sets of subtrees. The definition of a
context is easily accommodated to sequences. A set of sequences of trees of length n consists of ordered tuples of trees of length n, of the form (t0,...,tn-1). Sets of subtrees
are then simply sets of one-tuples. Importantly, we will not provide a proof for the
following proposition, and leave it open as a conjecture. However, we will sketch the
argument. We now make the following conjecture:
A tree-language is inherently ambiguous iff at least one node fulfills
all of the following conditions:
We need to have one node with an arbitrary label a, with at least
two (sequences of) sets of subtrees α and β, such that
α ∩ β ≠ ∅;
α ≠ β;
There is at least one context C in L, such that both a(Γ,
(u1,...,un), Θ) and
(t1,...,tn), Θ) occur
in C, for all (t1,...,tn) ∈ α
and all (u1,...,un)
∈ β, where uppercase Greek letters designate possibly empty
sequences of daughter sub-trees; note that the sequences need to have equal length in
order to meet condition 1.
Due to space restrictions, we leave the prove for this conjecture open here; this
reminds however of a theorem in Odgen, 1968 for string languages. But we
will give some rather informal discussion of the points in the next section. It is not hard
to see that this is merely a generalization of the cases we have been described above. As we
will see, we can derive some useful facts from these properties of ambiguous languages, even
without a general proof: we can show that we can construe grammars for languages which do
not fulfill one of the conditions, and, moreover, which type of grammars we can
From the grammar types sketched so far, there is no type which
generates all and only the RTLs that are not inherently ambiguous.
We will demonstrate this going through the three conditions mentioned in the preceding
section, and look which unambiguous grammar we can construe if one condition is not met.
This is to be read as follows: if one condition is not met, then it means, that from all
nodes of the tree language, there might be any one which meets the ones not in question, but
none which meets the one currently under consideration.
If there is no intersection between the subtrees of a given node, the grammar is of
course not ambiguous. We can, however not necessarily construe a URCG for this grammar,
since in the content model of the mother node there are competing nonterminals in
similar contexts (recall the example given above).
We can, however, construe a USG for such a language, since subtrees are uniquely
This means that there are no two sets of subtrees governed by the same node which
are not identical. We can thus introduce them by the same nonterminals, and have a local
tree grammar (having no different sets of subtrees governed by the same node amounts to
say we need no competing rules in the grammar, as nonterminals correspond to sets of
If the third condition is not met, then we can construe nonterminals (corresponding
to the sets of subtrees) such that for all of them the following holds: assuming they
compete (introduce identical labels), they either occur in different contexts, in which
case they are distinguishable thereby and no ambiguity arises; or they occur in
identical contexts, in which case we can use a unique nonterminal which is the merge of
both (this also holds for root nodes). The critical case, where the content of one
(sequence of) set(s) of subtrees depends on the other one, which makes them occur in
similar contexts, while making it impossible to merge them, however, we have excluded by
Since this argument holds inductively from the root to all subtrees, we can construe
a URCG for the language were condition 3 is not met, but we cannot use any strictly
weaker type. The only thing we have provided is that if two sets of subtrees occur in
similar contexts (for the grammar we construe), then they actually occur in identical
contexts. It follows that we do not need competing nonterminals in similar
This shows that we still have not solved the problem to define a canonical grammar type
which generates all and only the unambiguous languages, since there are languages which are
generated by USGs, but no other canonical class which does not allow any ambiguity type, and
languages which are generated by URCGs and no other such type. So far, we are still lacking
a characterization of the unambiguous languages in terms of grammar rules.
[Abiteboul et al., 2000] Abiteboul, S., P.
Buneman, and D. Suciu (2000). Data on the Web: From Relations to Semistructured Data and XML.
Morgan Kaufmann Publishers, San Francisco, California.
[Ansari et al., 2009] Ansari, M. S., Zahid, N., and
K.-G. Doh. A Comparative Analysis of XML Schema Languages. In Slezak, D., Kim, T., Zhang, Y.,
Ma, J., and K. Chung, eds., Database Theory and Application. International Conference, DTA
2009, Held as Part of the Future Generation Information Technology Conference, FGIT 2009, Jeju
Island, Korea, December 10-12, 2009. Proceedings, volume 64, pages 41– 48. Springer, Berlin,
Heidelberg, 2009. doi: 10.1007/978-3-642-10583-8_6.
[Balmin et al., 2004] Balmin, A., Papakonstantinou,
Y., and V. Vianu (2004). Incremental validation of XML documents. ACM Transactions on Database
Systems (TODS), 29(4):710–751. doi: 10.1145/1042046.1042050.
[Bauman, 2008] Bauman, S., (2008). Freedom to
Constrain: where does attribute constraint come from, mommy? In Proceedings of Balisage: The
Markup Conference 2008. Balisage Series on Markup Technologies, vol. 1 (2008). doi: 10.4242/BalisageVol1.Bauman01.
[Bex et al., 2009] Bex, G. J., Gelade, W., Martens, W.
and F. Neven (2009). Simplifying XML Schema: Effortless Handling of Nondeterministic Regular
Expressions. In SIGMOD ’09: Proceedings of the 35th SIGMOD international conference on
Management of data, pages 731–744, New York, NY, USA, ACM. doi: 10.1145/1559845.1559922.
[Brüggemann-Klein and Wood, 1992] Brüggemann-Klein, A., and D. Wood (1992). Deterministic Regular Languages. In Finkel, A. and
M. Jantzen, eds., STACS 92. 9th Annual Symposium on Theoretical Aspects of Computer Science
Cachan, France, February 13–15, 1992 Proceedings, volume 577 of Lecture Notes in Computer
Science, pages 173–184. Springer, Berlin, Heidelberg. doi: 10.1007/3-540-55210-3_182.
[Brüggemann-Klein, 1993] Brüggemann-Klein,
A. (1993). Formal Models in Document Processing. Habilitation, Albert-Ludwig-Universität zu
Freiburg i. Br.
[Brüggemann-Klein and Wood, 1997] Brüggemann-Klein, A., and D. Wood (1997). One-unambiguous regular languages. Information and
computation, 142:182–206. doi: 10.1006/inco.1997.2695.
[Brüggemann-Klein and Wood, 2002] Brüggemann-Klein, A., and D. Wood (2002). The parsing of extended context-free grammars.
HKUST Theoretical Computer Science Center Research Report HKUST-TCSC-2002-08, The Hong Kong
University of Science and Technology Library.
[Brüggemann-Klein and Wood, 2004] Brüggemann-Klein, A., and D. Wood (2004). Balanced context-free grammars, hedge grammars and
pushdown caterpillar automata. In Proceedings of Extreme Markup Languages, Montréal,
[Buck et al., 2000] Buck, L., Goldfarb, C. F., and P.
Prescod (2000). Datatypes for DTDs (DT4DTD) 1.0. W3C Note 13 January 2000, World Wide Web
[Carey, 2009] Carey, B. M. (2009). Meet CAM: A new XML
validation technology. Take semantic and structural validation to the next level. IBM
developerworks, IBM Corporation. http://www.ibm.com/developerworks/xml/library/x-cam/?S_TACT=105AGX54&S_CMP=C0924&ca=dnw-1036&ca=dth-x&open&cm_mmc=6015-_-n-_-vrm_newsletter-_-10731_131528&cmibm_em=dm:0:13962324.
[Chomsky, 1955] Chomsky, N. (1955). Logical Syntax
and Semantics: Their Linguistic Relevance. Language, 31(1):36–45, 1955. doi: 10.2307/410891.
[Chomsky, 1956] Chomsky, N. (1956). Three Models for
the Description of Language. IRE Transactions on Information Theory, 2:113–124,
1956. doi: 10.1109/TIT.1956.1056813.
[Clark, 2001] Clark, J. (2001). TREX – Tree Regular
Expressions for XML Language Specification. Technical report, Thai Open Source Software Center
[Clark et al., 2003] Clark, J., J. Cowan, and M.
Murata, (2003). Relax NG Compact Syntax Tutorial. Working Draft 26 March 2003, OASIS –-
Organization for the Advancement of Structured Information Standards. http://relaxng.org/compact-tutorial-20030326.html.
[Comon et al., 2008] Comon, H., M. Dauchet, R.
Gilleron, C. Löding, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi (2007). Tree Automata
Techniques and Applications. Release November, 18th 2008. http://www.grappa.univ-lille3.fr/tata.
[Costello and Simmons, 2008] Costello, R. L., and
R. A. Simmons (2008). Tutorials on Schematron: Two Types of XML Schema Language. http://www.xfront.com/schematron/Two-types-of-XML-Schema-Language.html.
[DSD2] Møller, A. (2005). Document Structure
Description 2.0. Technical report, BRICS (Basic Research in Computer Science, Aarhus
University), 2005. http://www.brics.dk/DSD/dsd2.html.
[Fiorello et al., 2004] Fiorello, D., Gessa, N.,
Marinelli, P., and F. Vitali. DTD++ 2.0: Adding support for co-constraints. In Proceedings of
Extreme Markup Languages, Montréal, Québec.
[Gelade et al., 2009] Gelade, W, Martens, W., and F.
Neven (2009). Optimizing Schema Languages for XML: Numerical Constraints and Interleaving.
SIAM Journal on Computing, 38(5):2021–2043. doi: 10.1137/070697367.
[Goldfarb, 1978] Goldfarb, C. F. (1978). DCF GML
User’s Guide (IBM SH20-9160). IBM, 1978.
[Goldfarb, 1991] Goldfarb, C. F. (1991). The SGML
Handbook. Oxford University Press, Oxford.
[Gécseg and Steinby, 1997] Gécseg, F., and M. Steinby
(1997). Tree languages. In Handbook of Formal Languages, volume 3, pages 1-68. Springer, New
[Hopcroft et al., 2000] Hopcroft, J., R. Motwani,
and J. Ullman (2000). Introduction to Automata Theory, Languages, and Computation. 2nd
edition. Addison Wesley Longman, Amsterdam.
[Jeliffe, 2009] Jeliffe, R. (2009). Is Schematron a
rules language? Online: http://broadcast.oreilly.com/2009/01/is-schematron-a-rules-language.html.
[Kilpeläinen and Tuhkanen, 2007] Kilpeläinen,
P., and R. Tuhkanen (2007). One-unambiguity of regular expressions with numeric occurrence
indicators. Information and Computation, 205(6):890–916. doi: 10.1016/j.ic.2006.12.003.
[Klarlund et al., 2003] Klarlund, N., T.
Schwentick, and D. Suciu (2003). XML: Model, Schemas, Types, Logics and Queries. In Chomicki,
J., R. van der Meyden, and G. Saake, eds., Logics for Emerging Applications of Databases,
pages 1-41. Springer, Berlin, Heidelberg.
[Kracht, 2010] Kracht, M. (to appear). Modal Logic
Foundations of Markup Structures in Annotation Systems. In Mehler, A., Kühnberger, K.-U.,
Lobin, H., Lüngen, H., Storrer, A., and A. Witt, eds., Modeling, Learning and Processing of
Text Technological Data Structures, Studies in Computational Intelligence. Springer,
[Lee and Chu, 2000] Lee, D. and W. Chu. Comparative
Analysis of Six XML Schema Languages. ACM SIGMOD Record, 29(3):76–87, September
2000. doi: 10.1145/362084.362140.
[Maler and Andaloussi, 1995] Maler, E., and J. E.
Andaloussi (1995). Developing SGML DTDs: From Text to Model to Markup. Prentice Hall, Upper
Saddle River, New Jersey
[Mani, 2001] M. Mani (2001). Keeping chess alive: Do we
need 1-unambiguous content models? In Proceedings of Extreme Markup Languages, Montréal,
[Mani and Lee, 2002] Mani, M., and D. Lee (2002). XML
to Relational Conversion using Theory of Regular Tree Grammars. In Proceedings of the 28th
VLDB Conference, Hong Kong, China.
[Marcoux, 2008] Marcoux, Y. (2008). Graph
characterization of overlap-only TexMECS and other overlapping markup formalisms. In
Proceedings of Balisage: The Markup Conference 2008. Balisage Series on Markup Technologies,
vol. 1. Montréal, Québec. doi: 10.4242/BalisageVol1.Marcoux01.
[Martens et al., 2005] Martens, W., Neven, F., and
T. Schwentick (2005). Which XML Schemas Admit 1-Pass Preorder Typing? In Eiter, T., and L.
Libkin, eds., Database Theory – ICDT 2005, volume 3363 of Lecture Notes in Computer Science,
pages 68–82. Springer, Berlin, Heidelberg, 2005. doi: 10.1007/978-3-540-30570-5_5.
[Martens et al., 2006] Martens, W., Neven, F.,
Schwentick, T., and G. Bex (2006). Expressiveness and Complexity of XML Schema. ACM
Transactions on Database Systems (TODS), 31(3):770–813. doi: 10.1145/1166074.1166076.
[Martens et al., 2007] Martens, W., Neven, F. and T.
Schwentick (2007). Simple off the shelf abstractions for XML schema. SIGMOD Rec.,
36(3):15–22. doi: 10.1145/1324185.1324188.
[Martens et al., 2009] Martens, W., Neven, F. and T.
Schwentick (2009). Complexity of Decision Problems for XML Schemas and Chain Regular
Expressions. SIAM Journal on Computing, 39(4):1486–1530. doi: 10.1137/080743457.
[Møller and Schwartzbach, 2006] Møller, A., and M.
Schwartzbach (2006). An Introduction to XML and Web Technologies, chapter Schema Languages,
pages 92–187. Addison-Wesley, Harlow, England.
[Murata et al., 2001] Murata, M., D. Lee, and M.
Mani (2001). Taxonomy of XML Schema Languages using Formal Language Theory. In Proceedings of
Extreme Markup Languages, Montréal, Québec.
[Murata et al., 2005] Murata, M., D. Lee, M. Mani,
and K. Kawaguchi (2005). Taxonomy of XML Schema Languages Using Formal Language Theory. ACM
Transactions on Internet Technology, 5(4):660–704. doi: 10.1145/1111627.1111631.
[Nentwich, 2005] Nentwich, C. (2005). CLiX – A
Validation Rule Language for XML. Presented by Anthony Finkelstein at W3C Workshop on Rule
Languages for Interoperability, 27-28 April 2005, Washington D.C. http://www.w3.org/2004/12/rules-ws/paper/24/.
[NVDL] ISO/IEC 19757-4:2006. Information technology —
Document Schema Definition Languages (DSDL) — Part 4: Namespace-based Validation Dispatching
Language (NVDL), International Standard, International Organization for Standardization,
[Odgen, 1968] Odgen, W. (1968). A Helpful Result for
Proving Inherent Ambiguity. In Mathematical Systems Theory, 2(3):191–194. doi: 10.1007/BF01694004.
[Pawson, 2007] Pawson, D. (2007). ISO Schematron
[Papakonstantinou and Vianu, 2000] Papakonstantinou, Y., and V. Vianu (2000). DTD inference for views of XML data. In PODS ’00:
Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database
systems, pages 35–46, New York, NY, USA, ACM. doi: 10.1145/335168.335173.
[Piez, 2001] Piez, W. (2001). Beyond the “descriptive
vs. procedural” distinction. In Markup Languages – Theory & Practice,
3(2):141–172. doi: 10.1162/109966201317356380.
[RELAX Core] ISO/IEC TR 22250-1:2002. Information
technology – Document description and processing languages – Regular Language Description for
XML – part 1: RELAX Core. International Standard, International Organization for
[RELAX NG] ISO/IEC 19757-2:2008. Information technology –
Document Schema Definition Language (DSDL) – Part 2: Regular-grammar-based validation – RELAX
NG (ISO/IEC 19757-2). International Standard, International Organization for Standardization,
[RELAX NG (2nd Ed.)] ISO/IEC 19757-2:2008. Information
technology – Document Schema Definition Language (DSDL) – Part 2: Regular-grammar-based
validation – RELAX NG (ISO/IEC 19757-2). Second Edition. International Standard, International
Organization for Standardization, Geneva.
[Rizzi, 2001] Rizzi, R. (2001). Complexity of
context-free grammars with exceptions and the inadequacy of grammars as models for XML and
SGML. Markup Languages – Theory & Practice, 3(1):107–116. doi: 10.1162/109966201753537222.
[Rogers, 2003] Rogers, J. (2003). Syntactic
Structures as Multi-dimensional Trees. In Research on Language and Computation,
1(3-4):265–305. doi: 10.1023/A:1024695608419.
[Sasaki, 2010] Sasaki, F. (2010). How to avoid
suffering from markup: A project report about the virtue of hiding xml. In XML Prague 2010
Conference Proceedings, pages 105–123, Prague, Czech Republic, March 13–14 2010. Institute for
Theoretical Computer Science.
[Schematron] ISO/IEC 19757-3:2006 Information
technology — Document Schema Definition Languages (DSDL) — Part 3: Rule-based validation —
Schematron. International Standard, International Organization for Standardization,
[SGML] ISO 8879:1986. Information Processing — Text and
Office Information Systems — Standard Generalized Markup Language. International Standard,
International Organization for Standardization, Geneva.
[Sperberg-McQueen, 2003] Sperberg-McQueen,
C. M. (2003). Logic grammars and XML Schema. In Proceedings of Extreme Markup Languages,
[Sperberg-McQueen and Huitfeldt, 2004] Sperberg-McQueen, C. M. and C.
Huitfeldt (2004). GODDAG: A Data Structure for Overlapping Hierarchies. In King, P. and E. V.
Munson, eds. Proceedings of the 5th International Workshop on the Principles of Digital
Document Processing (PODDP 2000), volume 2023 of Lecture Notes in Computer Science, pages
139–160. Springer, 2004
[Stührenberg and Jettka, 2009] Stührenberg, M.
and D. Jettka (2009). A toolkit for multi-dimensional markup: The development of SGF to
XStandoff. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup
Technologies, vol. 3 (2009). Montréal, Québec. doi: 10.4242/BalisageVol3.Stuhrenberg01.
[Vitali et al., 2003] Vitali, F., Amorosi, N., and N.
Gessa. Datatype- and namespace-aware DTDs: A minimal extension. In Proceedings of Extreme
Markup Languages, Montré́al, Québec.
[van der Vlist, 2001] van der Vlist, E. (2001).
Comparing XML Schema Languages, 12 December 2001. http://www.xml.com/pub/a/2001/12/12/schemacompare.html.
[van der Vlist, 2003] van der Vlist, E. (2003).
RELAX NG. O’Reilly, Sebastopol.
[XML 1.0] Extensible Markup Language (XML) 1.0. W3C
Recommendation, World Wide Web Consortium, 10 February 1998. http://www.w3.org/TR/1998/REC-xml-19980210.
[XML 1.0 (Fifth Edition)] Extensible Markup Language (XML)
1.0 (Fifth Edition). W3C Recommendation, World Wide Web Consortium, 26 November 2008. http://www.w3.org/TR/2008/REC-xml-20081126/.
[XML Namespaces (Third Edition)] Namespaces in XML 1.0
(Third Edition). W3C Recommendation, World Wide Web Consortium, 8 December 2009. http://www.w3.org/TR/2009/REC-xml-names-20091208/.
[XML Schema 1.0 Part 0] XML Schema Part 0: Primer
Second Edition. W3C Recommendation, World Wide Web Consortium, 28 October 2004. http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/.
[XML Schema 1.0 Part 1] XML Schema Part 1:
Structures Second Edition. W3C Recommendation, World Wide Web Consortium, 28 October 2004.
[XML Schema 1.0 Part 2] XML Schema Part 2:
Datatypes Second Edition. W3C Recommendation, World Wide Web Consortium, 28 October 2004.
[XML Schema 1.1 Part 1] W3C XML Schema Definition
Language (XSD) 1.1 Part 1: Structures. W3C Working Draft, World Wide Web Consortium, 3
December 2009. http://www.w3.org/TR/2009/WD-xmlschema11-1-20091203/.
[XML Schema 1.1 Part 2] W3C XML Schema
Definition Language (XSD) 1.1 Part 2: Datatypes. W3C Working Draft, World Wide Web Consortium,
3 December 2009. http://www.w3.org/TR/2009/WD-xmlschema11-2-20091203/.
[XSLT 2.0] XSL Transformations (XSLT) Version 2.0. W3C
Recommendation, World Wide Web Consortium, 23 January 2007. http://www.w3.org/TR/2007/REC-xslt20-20070123/.
[XQuery 1.0] XQuery 1.0: An XML Query Language. W3C
Recommendation, World Wide Web Consortium, 23 January 2007. http://www.w3.org/TR/2007/REC-xquery-20070123/.