ZBNF-Main Description

ZBNF-Main Description



Written by Hartmut Schorrig, www.vishia.org. Latest edition 2011-01-15

The BNF, Backus-Naur-Format -->Wikipedia is created with the development if the computer language Algol, at beginning of the 60-th of last century. It was an important milestone of software technology. BNF allows first time to describe exactly the syntax of programing languages.

BNF was developed in the future, for example Niclaus Wirth from the Zurich University creates and uses EBNF for his programming language PASCAL -->Wikipedia EBNF. Also known are syntactical expressions adequate to BNF for example for syntax of command line arguments. Typical are the option brackets: [...].It is state of the art.

The BNF is not fully standardized. Several variants are used. Mostly BNF-like explanations are used for documentation. The italic style of text is used to signalize keywords. Such a documentation may be well able to read by human, but it isn't proper for computer driven evaluation. For automatically processing the semantic of the parts of syntactical constructs are important. Using the semantic of expressions, it content should be recognized and processed.

The ZBNF enhances the BNF with semantic aspects and some possibilities of syntax constructs.

1 Basics of syntax-definition in ZBNF


A ZBNF-syntax-script may be given either in a text file or as a String in a Java program. A ZBNF-syntax-script may have a head, which also defines the used encoding in the file. Than some control settings, starting with a $ may be followed, see Chapter: 10 All control variables. After them some syntax-definitions follows. The first syntax-definition is used to parse the input text, the other syntax-definitions are sub-syntax-definitions. Some explanations of semantic parts may be contained between any syntax-definitions, see Chapter: 7.5 explanation of the semantic in the ZBNF-script as help. The syntax-script may contain comment lines.

The following wording is used:

A syntax-definition is written as following:


Thereby syntaxident is the identifier of the syntax-definition, used as symbol for calling that syntax in other syntax-prescripts as component. syntaxPrescript is the definition of this syntax itself. The dot marks the end. In ZBNF the syntax would defined like following:


It means:A syntax-definition consist of the identifier, semantically named as syntaxident, following with ::= without any white spaces. Than a not here defined expression syntaxPrescript follows. At end a dot should be written. The dot is coded with \. The dot after the \. marks the end of the definition.

A ZBNF-syntax-script may import another script, using the $import-control variable. Than some syntax-definitions from the imported script can be used. It is a important form of reusing of syntax definitions.

An simple example for a ZBNF-Script will be given as following::

 <?ZBNF-www.vishia.org version="1.0" encoding="iso-8859-1"?>

 shopping-list::=shopping \\n
 <![=]*?> \\n
 { <position> \\n }.

 position::=<#?@amount> [<?@unit>peaces|x|] <$?text()>.

In this file at beginning the encoding is defined. Than using of the line mode is defined. The first ZBNF-syntax-definition shopping-list is the main-definition. There the parsing begins. The syntax requires a line shopping, than a line with ==========, than some <position> any in one line. The second syntax-definition is for <position>, it is a syntax-component.

Writing and printing rules in the following explanation: In the description below some syntax-prescripts are shown in ZBNF itself (pure ASCII), but a better readable form is used in the printed text:

Some pattern of ZBNF using are shown as examples. Than the mono-space font is used without special character fonts.

2 The semantic aspect


The "Z" in "ZBNF" is a reverse "S" for semantic. The semantic aspect isn't respect sufficient in the original BNF and its variants. If you write:

 variableDefinition::= <identifier> <identifier> ; .
 identifier ::= alphaChar [ digit | identifier].
 alphaChar = A|B|C|...

than the syntax is defined exactly. But the meaning of the first identifier, is it a type ?, and the second one, a variable name?, is unknown yet. A verbal explanation is needed additionally . The same situaltion is in the presentation of some command line calls like:

 XCOPY Quelle [Ziel] [/A | /M] [/D[:Datum]] [/P] [/S [/E]] [/V] [/W]
                     [/C] [/I] [/Q] [/F] [/L] [/G] [/H] [/R] [/T] [/U]
                     [/K] [/N] [/O] [/X] [/Y] [/-Y] [/Z]

   Quelle    Die zu kopierenden Dateien.
   Ziel      Position und/oder Name der neuen Dateien.
   /A        Kopiert nur Dateien mit gesetztem Archivattribut,
             ?dert das Attribut nicht.
   /M        Kopiert nur Dateien mit gesetztem Archivattribut,
             setzt das Attribut nach dem Kopieren zurck.
             (Sorry, its german, I have install Windows with german language.)

This example is the start of the content, which is kept typing help xcopy in Windows-XP (Microsoft). The meaning of the options are explained verbal. But after all with help of this BNF-like presentation it is able to recognize, that the options /A and /M are excluded together etc.

For a computer-aided information processing verbal explanations aren't usefully, a complex programming is necessary to process the result of a parser.

In ZBNF, the syntax above can be written in form:

 variableDefinition::= <identifier?type> <identifier?name> ; .

So the first identifier is explained as type, and the second as name in formal kind.

The idea of an association between the pure information-data with its meaning is a basic idea of XML. In XML a <tagname> is the semantic description, where the content of the element or an attribute is the pure information: <meaning>information<subtag>...subInfo</subtag></meaning> or <tag meaning="information">. Using this idea a computer-aided information processing is able to run, also if information comes from older versions of sources, from other providers with altered definitions and so on. The compatibility of information interchanging is better able to control.

The basic idea of binding a syntax with its semantic is a core idea of ZBNF. It enables the conversion of an any desired syntactical interpretable text to XML without additional programming effort, see Topic:.ZBNF2Xml.. The than following information processing can use the well known XML tool supports. It is possible to write:

  Text x ZBNF =: XML

The reverse conversion:

  XML  x XSLT =: Text

is the known XSLT-techniques. The x means a processing or cross product.

Details see Semantic definitions.

3 Terminal symbols


Terminal characters are that characters, which should be written in the input text in the given form. They are keywords of text recognizing. In BNF the terminal text often should be written in quotations like "terminal", but in ZBNF it isn't so. Terminal characters will be notated immediately. But there is an conflict with the special characters, which controls the syntax flow: [ ] | { } < > ? . To determine that characters as terminal character, it should be written with a backslash \ before. for example if a [ is necessary, it should be written as \[.

Special chars

The backslash \ is able to use adequate in string-literals in Java and C/C++ as escape char for control characters: \n \r \b \t \f with the meaning of Newline (0x0a), Carrige Return (0x0d), Backspace (0x08), Tabulator (0x09), Formfeed (0x0c). Such terminal characters are necessary often for format-separation. The backslash itself will be written as \\ There are some special escape sequences too:

Any Unicode-Characger can be coded with \uxxxx where xxxx is the 4-digit hexa code of UFT16-table. This kind of representation of terminal symbols allows require of characters additional and outside of the encoding of the script.

The encoding of the terminal characters used in a ZBNF-script-file are able to define, see Character encoding.

4 white-spaces and comments in the syntax script and input text


A white-space is a text part, which produces space without characters in a print output. Spaces, Tabulator, line-feed and its sequence are white spaces

4.1 A white-spaces in the syntax prescript allows a white-spaces in the input text


The syntax definition itself can be written in a free format with white-spaces. Normally if a white-space is written in the syntax script, also a white-space in the input text is allowed.

Using a control setting $white-spaces=white-spaces. in the ZBNF-Script outside a syntax definition (should be noted at start of syntax script) it can be defined, which characters are white-spaces. Default they are \ \t\r\n. for example a \t can exclude as white-space, because it has a special meaning in the text. Than it should be written:

$white-spaces=\ \r\n.

4.2 Switch off white-spaces in input text with <$Nowhite-spaces>


With a additional <$Nowhite-spaces> at beginning of a syntax definition it is possible to define, that white-spaces in the syntax prescript don't allow white-spaces in the input text. A syntax-definition should be written in form:

Syntaxdefinition::=syntaxident::=[<$semanticOftheDefinition>][<$Nowhite-spaces>] syntaxprescript..

The <$Nowhite-spaces> should be written after an optional <?semantic>, without white-spaces between.

If spaces or white-spaces are necessary in the syntax, it should be written as terminal characters like \s or \ , \t and so on or using Regular Expressions.

An example is the syntax prescript to parse a #define NAME xxx... in C or C++. In zbnfjax/zbnf/Cheader.zbnf there is the following part:

 defineDefinition::=<$Nowhite-spaces> <$?@name> [ ( { <$?parameter/@name> ? , } ) ]
                    <![ \t]*?> [ <#-?intvalue>
                               | 0x<#x?hexvalue>
                               | <""?stringvalue>
                    <![ \t]*?>
                    { <*|\n|\\|\r\n?value>
                    ? \\[\r]\n

The problem is: The define may be wrapped at end of line using a \, but the text at the next line have the same meaning like without \ end line-wrapping. The syntax-prescript parses first the <$?name> as identifier. The optional following parameter names should be written without spaces. Than a white-spaces is admissible, written in syntax using the Regular Expression <![ \t]*?>. Than either an integer-value or String in some variants are accepted, it is a mainstream use case of define. After them all other characters are captured until end of line or \. It is stored as <...?value>. An \ followed immediately by \n optional with a \r means, the next line is a next <...?value> of this define. The using of the parser result may concatenate this <...?value> to get the whole expression.

4.3 ##Comments in syntax prescript


The syntax prescript text allows a line and comment, started with two ## outside of a < ... > and outside of a prescription with \#. A single # is a normal terminal symbol. If two ## are necessary as terminal symbol, you should write \#\#.

White spaces between the syntax prescripts inside the ZBNF-script are ignored. Also comments are ignored. That comments or white-spaces haven't any meaning in opposite to the meaning of white-spaces inside a script.

4.4 Comment-Processing in the input text while parsing


Generally, a comment is recognized like a white-space. But it is possible to test comment constructs at some positions, though on other positions there are skipped. There are some rules:

  • Without any other decision a text in /* ... */ and a text after // until end of line is a comment. But using the control variable $comment=<*|\.\.\.?startCommentString>\.\.\.<*\.?endCommentString>. it is possible to set other characters instead. The default decision should be written as

  •  $CommentString=/*...*/.
  • Another variant will be

  •  $comment=(?...?).
  • With the control variable $endlineComment=<*.?$endlineComment>. the start characters of a endline comment are determined. The default decision should be written as

  •  $endlineComment=//.
  • Another variant will be

  •  $endlineComment=#.
  • With the control variable

  •  $setLineMode.
  • it is set, that a \n and \r are not recognized as white-space. It is a decision, if a input text is not a free format but line-oriented. white-spaces in the line are recognized, but may be supressed using the <$Nowhite-spaces> in the syntax prescript.

  • In generally it will be tested at first while parsing, whether the input text matches to the current terminal symbols. After them comments are skipped. It means, if the start character of a comment are matching to the current terminal symbol, the parse process recognizes it as parsed input, and the following text is parsed after them. for example comments are parsed in the Cheader.zbnf with the sequence:[ /**<description>*/], where it is defined:

  •  description::= <*{ * }|*/?!test_description>.

    It means, all characters until */. So a description started with /** is processed as a parsing result.

    The explicit terminal symbols \ , \n or [\]\n as combination of 0x0d 0x0a or only 0x0a and a \s are able to use too.

    5 Syntax Control



    5.1 Alternatives ...|...


    The alternative was the only one control of the old BNF in the 60-th. The BNF uses only alternatives and recursion. Yet the alternative isn't necessary to define

     Digit::=0 | <ZifferUngleichNull>.
     Digitsequence::= <Ziffer> | <Ziffer><Ziffernfolge>.
     positiveNumber::= <ZifferUngleichNull><Ziffernfolge>.

    which is shown in some examples of education. Process a digit is better done with a fix programmed algorithm. Therefore the syntax special construct <#?Number> or <$?identifier> is available in ZBNF.

    The alternative is usefully for better things like:


    where Consumer or BusinessClient may be a complexly syntactical construct. Also terminal symbols are usefully in alternatives:

     title::=  Mr\\. | Miss | Mrs\\. .

    Any syntax prescript (right side of a syntax definition after ::= can be written as an alternative.

     componentIdentifer::= alternative1 | alternative2 | alternative3.

    If alternatives are necessary inside a syntax prescript, they should be written in square-brackets as options.

     ...[ alternative1 | alternative2 | alternative3 ]...

    or it may be assigned in the forward or backward part of a repetition:

     ...{ alternative1 | alternative2 | alternative3  ? backAlternative1 | backAlternative2 } ...

    5.2 Options [...]



    5.2.1 Simple Option [...]


    The simple option is designated with square brackets, like in the old BNF: [ ] The general meaning, also in ZBNF, is: This is optional, it may be matched, or not. This contract is known also in all syntax descriptions.

    It is possible, that such an option is able between terminal symbols. for example in a report file the word telegram is written without the second e: telgram. It was an mistake, later versions writes telegramm. Now the parsing of older and newer reports should detect both variants. Therefore the terminal syntax is written as tel[e]gram.

    Inside the square brackets of the option any possible syntax prescript is possible. This part of syntax may get a special semantic designation. It should be written: [<?semantic> syntaxprescript ]. If the option matches in syntax processing, a parse result with the given semantic is produced than.

    5.2.2 Choice-options [ ... | ... | ... ]


    It is a combination of the option writing with some alternatives. In this notation at least one option should be matched. The square bracket of the option doesn't mean, it is optional all in all, rather it is a obligate to use one of it.

    5.2.3 Choice-options with empty choice [ ... | ... |]


    If at least is written |], (without spaces), it is an empty choice. It means, that if no alternative matches, it is okay also. If no alternative matches, no parse result is produced for that alternatives and for the whole option. If it is written

     [<?SomeChoices> Alternative3a | Alternative3b |]

    no parse result named SomeChoices is produced if no alternative matches.

    5.2.4 Choice-Option with first test after the option [|... | ... ]


    If it is written [|... the parser tests first, whether the syntax matches in the sequence after the option. Only if it doesn't match, the option is tested.That may be advisable in a construct like

     [|-<?negative>] <value>

    The <value> may mean a number, also negative. defined as value::=<#-?number>|<$?ident> if the input text contains a negativ number like -123, it will be matched as a number itself. The Semantic negative is not necessary and not produced. But if it is an <$?ident>, the negative sign should be parsed independent. Another example is:


    The parser should be detect a middlePart and a rightPart if the number only contains one dot, but not a leftPart and middlePart. If for example the input text contains 123.456, at first 123 would be parsed as right part. But because the dot after it doesn't match the following syntax, the parser starts at the middle part and matches.

    With this construct the principle of right-aligned parsing is possible to use.

    5.2.5 The exclusive option [? ... ] (negative test)


    It it is written

     [? syntax ]

    than the syntax does not match, if the syntax in the option bracket matched. Typically it is able to use for example for repetitions and its break:

     Example::={ [?;] <*;?text> ; } ;.

    A text ending with semicolon matches any time. But two semicolons one after another should be tem termination of this sequence. Without using this possibility the second semicolon would be recognized as a text, and the following input are confuse. With this notation the second semicolon is detect as not a repetition, the continuance detect the second semicolon after the repetition, and the following input may matched.

    5.2.6 The check option [! ... ] (positive test)


    It it is written

     [! syntax ]

    than the syntax is tested but not processed. Such tests may be required if an input should be pre-tested but processed in following syntax constructs. for example

     Example::= [!;|:|+] <nextPart> | <somewhatElse>.

    The problem may be, a next part starts with the shown characters. The characters have to be parsed as part of the <nextPart>. But the decision that it is a <nextPart> is arrived before entry in this test.

    5.2.7 The expected test [> ... ]


    If it is written

     [> syntax ]

    it means, that the syntax inside the square brackets should be matched, Otherwise not only the parsing of that bough is failed, but the whole parsing process is aborted. This construct should be used only in syntactical environment where previous checks determine, that the following syntax have to be matched. for example:


    The /** ... */ as whole construct is optional. But if a /** is detect, the next part of the syntax bough <description> should be matched unconditionally.

    Another examlpe is:

     [ condition: [><?which>A|B|C]] ]

    The keyword condition: is optional. But if it is detect, A or B or C have to be matched. In some cases the parsing process is terminated with an syntax error anyway, but before the knock out is detected, all other variants are tested. It needs calculation time. But in the case if the tested part is in a part of text, which is also detectable as comment, it is a important feature.

    5.3 Repetition { ... ? ... }


    In the originally BNF from the 60-th, the repetition wasn't defined. For repetition constructs, the recursion was used. In the http://www.en.wikipedia.org/wiki/EBNF the repetition was established now with the brace.

    In ZBNF at least one match is obligate. If no pass should be also okay, it should be written as:


    The second distinction to EBNF is: A backward bough may be defined if necessary. It is defined starting with a ? with the back syntax until }. If the back syntax matches, the repetition is obligate. It is a frequently situation in practice. for example between enumerations and some other cycle of parts a special character like a comma is written. But the comma isn't written at the end of the sequence. If a comma is detect, a next cycle should be start. An example is shown above already, the parameter of a define in C/C++ are comma-separated:

     defineDefinition::= ... <$?@name> [ ( { <$?parameter/@name> ? , } ) ]

    In this case the whole prescript in (...) is optional. But if a (...) is used, at least one parameter name should be present between the parenthesis. If a comma is written, a next identifier have to be following.

    Because to forward and backward bough of a repetition is a syntax prescript too, the alternatives can be used just all other syntax control constructs. It may be typical to write for example:

     { <variantA> | <varianteB> ? [<?delimiter> + | , ] | : <?specialDelimiter> }

    or construct nested repetitions.

    6 ZBNF-Components <syntax?semantic>


    Syntax-components are complex syntax prescripts in generally. It may be written as extra syntax definitions. At least the clarity may be increased using that. Syntax components was able to define also in the old BNF. It was denoted as meta-morpheme, or may be non-terminal as opposite to terminal symbols. It is the view of syntactical definition. But from view of the semantic, it is better to denote as component. It is a part of the syntax, which use a <component> and it is defined in an own syntax prescript. The parse result builds a node at this point, which contains the whole component with an own parse result bough. It builds a tree.

    The notation with angle brackets was used also in the old BNF. Another frequently used notification is writing in italic script. But this is not able to use in technical (ASCII) text formats, only able and well to use in a manually explanations with printed texts. The third used form is the notation as a simple identifier. But it is only possible, if terminal characters are written in quotations. It is the choice in EBNF.

    In ZBNF the simple expression of requiring a syntax component is written as


    The Syntax of the component now should be defined anywhere in the syntax script in form


    In the ZBNF not only the syntax of a component is relevant, but also its meaning for post-processing. It is the semantic. If the notation above is used, the semantic identification is identically with the syntax identifier. But also a alternate semantic is able to declare. for example:

     {bill-postal-address: <Address?BillAddress> | Supply-postal-address: <Address?SupplyAdress>}

    In both cases, for the postal-address of the bill and for the supply, the same syntax is used. But the meaning and the post processing of both addresses a different. Therefore the semantic is different. The semantic is used to identify the parsers result:


    It is possible also to prevent a semantic for the component. In this case no extra component is produced as result, but the syntax is written in an extra definition. See Chapter: 7 Semantic-specification variants. It should be noted as


    The semantic may be more as an simple identifier. Especially producing an XML-expression some special cases are able too. The technical implementation accepted the whole character sequence until the > as semantic expression. But there are writing rules, see Chapter: 7.1 Semantic writing rules

    Additionally there are some special characters after the ?, see Transformation of result and Inner syntax parsing.

    6.1 Speciell Syntax-components for fix constructs like numbers, identifiers: <#...?>, <*...?...>


    The following notation is able in ZBNF (examples):

     <#?number>  <$?identifier>  <*|*/?string>  <![=]*?RegularExpression>

    Numbers and Identifiers are able to process in hard-coded-software better as in a syntax description. Therefore the standard expressions are written in the showed forms. There are parsed hard-coded. Its syntax is well known and defined, it shouldn't be a part of the users syntax script.

    The generally notation is <syntax-symbol?semantic> adequate to syntax components. Additionally a number of chars may be defined. It should be noted without space after the < as positive number, for example:


    This example means that any characters, but exactly 16, should be parsed and assigned to the semantic cell. It may be used to parse text in tables with fix column width.

    The following notations are possible to use as <syntax-symbol?...>




    An identifier is expected, written like the known form in the programming languages Java, C: it should consisting of alphabetic A to Z or a to z, the digits 0 to 9 and the underline _, but not starting with a digit.

    There is a possibility to exclude some identifiers from recognitions here: The syntax prescript may contain a keyword $keywords::= followed by some identifier, for example:


    The example shows, that the keyword of a programming language are excluded here.

    $ addChars

    The identifier may contain but not start with any of the addChars. for example addChars may be $- if the $ and also the - can be a part of identifier. The - can used inside identifiers of the tag-names in XML. To parse a personal German name it is possible to write <$äöü-?firstName> So a name Jörg-Dieter is accepted.


    It is the syntax of a positive integer number, consisting of the digits 0..9, but not starting with 0. A simple 0 is admissible. The number is converted in a long representation if the Java.ZBNF-parser javadoc:_org/vishia/zbnf/ZbnfParser is used.


    A negative sign before the digits is admissible. It is a optional negative number.


    It is a float number maybe with exponent in the Java- and C-standard notation. Internally a double value is stored.

    #f* Factor

    It is possible to parse a float number and store a multiplied representation with the Factor. Factor should be a float number. Using this construction it is possible to store a value depending on its position in the syntax in a special unit. for example: [<#f?length> mm| <#f*10?length> cm| <#f*25.4?length> inch] stores the value always in millimeter, the post process doesn't regard the unit. Another example is the processing of a float-type number as integer, writing 56.34, parsing with <#f*100?value>. The value 5634.0 will be stored, processing as integer number 5634 without special conversion. This possibility may be usefully in combination of the parser with the direct storing of result in a Java application using Storing of parser result in Java objects.


    A hexadecimal number will expected. The hexa-digits a..f or A..F should be used. Leading zeros are accepted. A designation like 0x is not part of this ZBNF-component. To accept a digit hexa- or decimal, you should write : [0x<#x?number>|<#?number>].

    * endChars

    All characters until ones of the given endChars will be accepted. The endChars may be for example \n\r for the end of line to parse the whole text inclusive white spaces of the current line. If no such endChars will be found, the input doesn't match to this syntax. If ones of the endChars will be found immediately, it is correct. The text is empty than.

    Example: <*!\??CharsUntilExclamationOrQuestionMark>: The ? or the ! should be the end character of this text sequence. Because the ? is also used as separator in the syntax prescript itself, it have to be circumscribed with \?.

    Note: This construct may be in opposite to the key notes of the parsing process. It doesn't test the matching, but a non-matching will be searched. It is possible, that a lot of text will be accepted unchecked, until the end character is found. for example if a space is used: <*\ ?textToSpace>, but there isn't a space but a line feed after the text part, than the line feed character is also accepted as part of the text, continued at next line with text until a space is found at last. The result may be error-prone, not seen first. Better will be written instead: <*\s?...> or <*\t\n\r\ ?...>.

    Note: The sequences *| *"" *'' have special meaning, see below. Therefore this characters can't be written as first endChars but later in the string of endChars. Or start with \|.

    *"" endChars

    All characters until ones of the given endChars outside of a quotation will be accepted. The circumscription of a quotation mark with \" inside the quotation will be recognized correctly. A quoted String isn't expected anyway. But if a part of text is quoted, the endChars aren't tested there.

    *| str | str

    All characters until ones of the given char sequence str will be accepted. Before the terminating char sequences a character | should be written in syntax script. The basically behaviour is like <*endChars?...>.

    Example: <*|===|@?description>: Any string until either === or @ will be red from input text and stored as description.

    * | str | str

    All characters until ones of the given char sequence str will be accepted. The resulting string is trimmed without leading and trailing whitespaces. Note that a space is written after the asterisk!

    Example: <* |===|@?description>: Any string until either === or @ will be red from input text. The trimmed string is stored as description.

    toLastChar: endChars

    All characters until ones of the given last char in endChars will be accepted. The searching process starts from the end of the whole parsing text.

    This special form is able to use especially, if an inner syntax with a short length is parsed, or a short-length input for example an answer of a command line call. for example a file path is tested. The last / or \ is the end of a directory path, the part after them is the filename. Note: The backslash should be written twice, because a singe backslash is the circumscription-char in the syntax script.


    toLastCharIncl: endChars

    Adequate to toLastChar:endChars, but the terminated character is also part of parsing result and shouldn't be parsed extra. The example above should be written shown below, but the path contains the / or \ too:



    A string in quotation marks will be expected. Inside the quotation a circumscription with \" is recognized as a "-character, not as end of quotation, just \n \t etc. for the ASCII-control-characters are accepted too. All spaces, line feed etc. are accepted like written.

    The input doesn't match, if the input doesn't start with a " or the ending " isn't found. The " should be coded with ASCII 0x22. Special codes from enhanced code tables, maybe from Winword output etc. are not accepted.


    Adequat variant of "" but the single quotation marks (ASCII=0x27) are expected as quotation characters. The circumscription with \ is accepted too.

    *{ indentChars }| str | str

    This is a special form of *|str|str, regarding an indentation. It is used and necessary especially to parse the conventional comment style of block comments in sources (header-files of C/C++, Java):

     /**Example Comment of a method
      * It does this and that
      * @param x value.

    for example the syntax is /** <*{ * }|*/?comment>*/. The parsing of this indentation-accepting construct starts at the Position after the /**. This position in line is the first indent position. The parsing ends at */, but any new line will stored as a new line, but without the indentation. So the parse result for comment is stored without the indentation, in form:

     Example Comment of a method
     It does this and that
     @param x value.

    This may be simple, no additional characters should be given. But programmers writes not exactly often. for example it is given:

      /** Comment to any method
        * * List bullet
        *   this line is written right-shifted
         this line is written left-shifted without the asterisk

    or tabulators are used. The parser stores the result proper too:

      Comment to any method
      * List bullet
      this line is written right-shifted
      this line is written left-shifted without the asterisk

    The characters space and * are accepted in indentation area left from the first indentation position. Any other character left of the first indentation position is accepted as beginning of the active line. If the last character in the { ... } is a space, also tabs and spaces after the indentation position are interpreted as indentation in this line. The second asterisk in the example is right-hand from the first indentation position, therefore it is a part of the result string.

    . method : arg )

    This is a universal form to parse the current text with a given user method. The method should be found in a special Java-class assigned to the parser calling ZbnfParser.setUserParsingClass(Class). Methods there should be defined with static boolean method(String). This feature is not available yet, but planned in Release 1.1

    ! regularExpression

    A regular expression are used to describe the syntax. The regular expression follows the definition of the java class java.util.regex.Pattern Examples are shown in the table below.

    6.2 Using Regular Expressions<!...?>


    Regular Expressions are a powerfully method to describe a syntax of any text. But the handling of complexly regex with well readable assignment to results isn't proper in any case. The additional using of regex in ZBNF maybe a good idea. The input text of a whole syntax part described with the given regular expression is stored as result.

    The notation form of regular-expression-using in ZBNF is:


    Be aware that a backslash, often used in Regex, should be written twice: \\, because the syntax script use it also as circumscription character. See examples below. The examples are simple and explains the principles, the whole usability is not explained here, see usual documents. It may/should be proper to use only simple constructs of regular expression in ZBNF. The complexly constructs are not necessary because ZBNF has own implementations of adequate features.




    A simple dot is a place holder for any character.


    The asterisk doesn't mean any character like in ZBNF or wildcard in file paths, it means any number of repetition of the characters left from it. In this case, combination of dot and asterisk, any character is accepted any time. But there should be a limit. In ZBNF it is able to write a maximal number outside of the Regex, at ex. <32!.*?Exact32Chars>. With them 32 characters are parsed.


    Like .*, but at least one character is necessary.


    One of the characters between [] are accepted.


    That is a useful combination: Any desired number of the characters between [] are accepted, also nothing.


    Adequate [abc]* but at least one character should be found.


    Character range: One of the characters form ..a.. to cc in the order of ASCII is accepted.


    It is possible to combine more ranges of characters. This example means: all alphabetic characters.


    That means a word, consisting of upper and lower alphabetic characters.


    That means a word beginning with an upper case alphabetic, than lower case chars.


    That is the fix terminal string written. In ZBNF a simple terminal should be used instead.


    It means at ex. Meyer or Meier, (a typical German family name. But also Mexer or MeAer matches. The dot is a wildcard for any character.


    Now Meyer or Meier is expected.


    Any character which isn't a white-space. Note: The backslash should be written twice in a ZBNF script.


    All characters until a white-space.


    Any white-space, inclusive Linefeed (Hexa 0a) and Carrige Return (Hexa 0d).


    All white-spaces, also no white-space.


    All white-spaces, at least one white-space.


    A word character: [a-zA-Z_0-9]


    A word consisting of at least one char.


    any characters outside a word.


    all characters outside a word until start of next word.

    7 Semantic-specification variants


    In generally, the semantic is written in syntax component expressions after an question mark:


    where the syntax may be a ZBNF-defined component:


    or a standard syntax, presented in the chapter above, for example <#?semanticOfNumber>.

    There are some specials:

     structDefinition::=struct [<$?typetagIdent>] \\{ { <structContent?> } \\} <$?name>;.
     classDefinition::=class [<$?typetagIdent>] \\{ { [ <classContent?> | <structContent?> ] } \\} <$?name>;.
        [ <unionDefinition>
        | <structDefinition>
        | <attribute>
        | <defineDefinition>
        | <structContentInsideCondition>

    7.1 Semantic writing rules


    A main area of application is the conversion of free but syntactical texts to XML. Therefore the semantic writing rules are oriented first to use the conversion to XML. But the evaluation of the parsers result in free Java programming or using a reflection based writer for Zbnf-results to Java-instances described in javadoc:_org/vishia/zbnf/ZbnfJavaOutput or Topic:.ZbnfJava.ZbnfJavaOutput. is able too. The ZbnfJavaOutput requires adequate fix rules for writing semantic like XML-output. The rules for both use cases are coordinated, so the same syntax script can be used for both post-processing variants. A free Java programming may accept all writing forms of semantic, but it should regard this same rules. A test output of any parse result done in XML maybe necessary or well able to use in some cases.

    For XML-output the semantic in ZBNF determines the names of tags and attributes and contingently childs of the XML-tree. The structure of ZBNF-syntax-components determines the possible structure of the generated XML-tree. But the existence of currently data in the parsed input determine whether or not a XML-element or a bough in the tree is created.

    for example the following syntax:

     syntax::= {<?set> <head> { <data> } } -end-.
     head::= idx = <#?@index>.
     data::= value = <#?value>.

    with the following data:

     idx=1 value=5 value=6 idx=123 value=7 value=23 -end-

    creates the followed XML-tree:

         <head index="1" />
         <head index="2" />

    This example demonstrates the basic idea of XML by the way. The source of data is shorter, but maybe no clearly structure is cognizable. The XML tree contains a structure: a set with head and data. The semantic of the data is contained in the XML-text.

    The semantic of any syntax element is written inside the <...?...> after the question mark or after the special designations ?!, ?+ or ?-, till the closing >.In ZBNF it is:


    or better able to read with italic characters in the printing variant:

    syntax_component_call ::= < syntax >? [ ! | + | - ] [ semantic ] > .

    The following writing rules for semantic regarding the XML necessities are permitted:




    Writing a simple ident, a new element in XML-Output with this given ident as tag name is created and added to the current XML-element. The ident should be written like an identifier in Java, starting with alphabetic character, containing alphabetic and numbers and the underline, but also a - is conventional for tag-names in XML. It is admissible.

    For Java-output using javadoc:_org/vishia/zbnf/ZbnfJavaOutput the - is replaced by an underliner _ before searching a Java element. In that case using a - is also compatible for Java-Output. But note, at ex. Ident-part and Ident_part are not different in Java output

    The associated Zbnf-parse-result is written as textual value of the element, if it is a simple parse result. If the Zbnf-parse-result is a component, its content will expand as child of the created element.

    For Java-output the found field or method is set respectively called with the Zbnf-parse-results value. Thereby the type of parse result is regarded. If the parse result is a component, the field or return value of called new_-method should be a reference to a container for the components content. Details see javadoc:_org/vishia/zbnf/ZbnfJavaOutput or Topic:.ZbnfJava.ZbnfJavaOutput.


    Writing a @ as first char of semantic, an attribute with the given name is created for XML-output. The associated Zbnf-parse-result is written as value of this attribute. If the parse result is a component, an immediate result of the whole component will try to get. An immediate result is stored either if the input text is stored, or it is an option written like [<?@semantic> inner syntax...]. If no text is available, the value of the attribute will be empty. But the result of the Zbnf-components elements will be written as child of the current node, where also the attribute is added.

    For Java-output it is the same like ident, the @ will be ignored. But note, at ex. @ident and ident are not different in a Java output


    A colon : inside the ident separates a namespace from a tag name, if it is used for XML-output. The name-space symbol should be defined using a $xmlns:key=url.-control setting, see Chapter: 10 All control variables. For Java output the namespace part is ignored. A namespace is a typical XML requirement to produce unique element identifications. For Java output the uniqueness is given already by writing in the special given user instance.


    For XML-output an element with the tagname tagParent is searched or created as child of the current node. Than the rest of the semantic string after the / is processed with the found or created element. This construct may be used especially, if an element with the value in an attribute should be create, writing tag/@attribute. But respect, that more as one parse result with that same semantic writes all in the same element tag, the @attribute will be replaced anytime. for example the following construct is a good idea:

     val1=<#?result/@val1>; val1=<#?result/@val2>;

    Both attributes val1 and val2 are written in the same element: <result val1="..." val2="..." /> But the repetition construct overwrites the same attribute anytime:

     { val= <#?result/@value> }

    That case is better to write in the following form. In that form the repetition creates an new element <result> in any pass, and in that extra element the attribute will be set:

     {<?result> val=<#?@value> }

    For Java-output an adequate behaviour is supposed: A field named with the name tagParent is searched or created, respectively a method new_tagParent() is called, which may return a new created object or the reference to an existing object too, depending from the user programming. Than the ident right of / is try to add on it.


    The kind of writing with slash on end is a special form of tagParent/ident, where the ident is empty. Thats why an XML-element with the tag-Name is searched and created only, if it isn't found. In the example in the cell above, writing of

     {<?result/> val=<#?@value> }

    produces only one element like { val= <#?result/@value> }. In that case it isn't convenient. A able to use example will be:

     <component?tag> ..other syntax...[ <componentPart?tag/> ]

    In the input text matching to two divided parts in syntax prescript should be written in the same XML-Element respectively in the same Java-object.

    The behaviour on JavaOutput is adequate.

    semantic =value

    Especially for the form <...?@attribute=value> used: The attribute gets the given value. Note: Writing the value in "..." is not supported yet, the "" will be taken as part of value. But this style of writing is to accept, because it is similar as in XML. It will be regarded in future.

    semantic &

    This is a special form. First it is a signal to the Zbnf-parser, it should store also the parsed input text of a component. Second it is a signal to store this input text as text of the element. Use the form @ident& to store the input text of a component in an attribute of the parent element of the component.

    semantic +

    This is a special form for Zbnf2Xml-Conversion. It means, that an expansion of formatting in the node's result text using Wikistyle will be done. The Zbnf2Xml-Converter calls the method prepareXmlNode(...) defined in the interface:_org/vishia/zbnf/Zbnf2Xml.PrepareXmlNode. The implementing instance calls setWikistyleFormat(...) of the class:_org/vishia/xmlSimple/WikistyleTextToSimpleXml. If this feature is used, an input text may be formatted in the Wiki-Style, and the translated XML contains the adequate XHTML-Formatting tags. In generally, it is a simple opportunity to work with formatting texts.

    semantic . prepare ()

    It is the universal method to post-processing the parser's result to expand in XML children. Matching to prepare an class should be named in the header of the ZBNF-script, which implements the interface:_org/vishia/zbnf/Zbnf2Xml.PrepareXmlNode. This feature is not available yet, planned in version 1.1 of ZBNF.

    7.2 Semantic definition [<?semantic>...] for an option- or repetition-part


    The square brackets of an option designates a sub-syntax maybe also alternatives. Writing


    without space after the left square bracket, the option as whole unit produces an own semantic-signed parse result. If the option is not used, this semantic isn't produced: [<?semantic> really-option] or if the empty bough is possible: [<?semantic> A | B |].

    An adequate behavior is given on repetitions:

    .{<?semantic> ...}

    Each entry of the repetition produces a parse result with the given semantic. The content of the repetition is stored as child of this ZBNF-component. for example

     testRepetition::={ <head> : {<?dataBlock> <data?> | <info?> ? , } ; }.
     head::= idx = <#?@index>.
     info::= <""?@info>.
     data::= <#f?@value>.

    produces with the following data:

     idx=5 : 7.34,  23, "text", 0.01;
     idx=0 : 34;

    the following XML-tree (parse result):

     <testRepetition >
       <head index="5" />
       <dataBlock value="7.34" />
       <dataBlock value="23.0" />
       <dataBlock info="text" />
       <dataBlock value="0.01" />
       <head index="0" />
       <dataBlock value="34.0" />

    7.3 Content of alternatives [<?whichOption> a | b |]


    A special case is writing

    .[<?whichOption> a | b |]

    If the alternatives hasn't an own semantic, the parsed text without leading and trailing white-spaces is stored as the result with that given semantic. It is an important feature. for example some assign operators are given in form:

     assignOperator::=<?> [<?@assignOperator> = | += | -= | *= | /= | &= | \|= | \<\<= | \>\>= ] .

    Thereby the parse result of testing the variants of operators are stored with semantic assignOperator, the parse result is the operator itself, for example -=. The <?> means, no semantic is stored to the ZBNF-component, it is clearly, because the semantic is produced in the option bracket (next chapter).

    7.4 Differently semantic to a ZBNF-component


    If a ZBNF-component is written like

    .component::=<?semantic> ...

    the following behaviour is present:

    A special case of this is the form

    .component::=<?> ...

    It means, that the component hasn't an own semantic as default. But the same rules as shown above are valid. for example if it is written <component?specialSemantic> this component has the given semantic. Only if a simple call is done: <component>, no semantic is produced. That form is able to write twice too: The call can be written as <component?>. Thereby it is shown both at calling position and at definition that no semantic should be used.

    7.5 explanation of the semantic in the ZBNF-script as help


    In the syntax script it is possible to explain a semantic with plain language. The explanation should be placed outside of syntax definitions, above or below. It should be written like:

    ?en:mySemantic::= "Explanation text. It may be more detailed.
      It is a helpness.".
    ?de:mySemantic::= "Erklärungstext, etwas umfangreicher als Hilfestellung".

    It is able to specify it in several languages. The explanation should be written in quotation marks. A dot on end is necessary. The related semantic shouldn't be only an identifier of a ZBNF-component, it may be a semantic identifier inside a syntax definition too. But ist should be written as a path of semantic. for example:

     definition::= <$?type> <$?name> [ = <#f?value> ]
     ?en:definition/value::="This value means this or that.".

    This information isn't use in the parsing process. The parser accepts but ignores it while reading the syntax script. It is a documentation in the script only, able to read manually. But for future extensions, especially for ZBNF-based editors for example as Eclipse-plugin, this information may be used as context sensitive help while typing a part of input text, which is matching to the syntax.

    8 Innerer Syntax of parsed text <syntax?!innerSyntax>


    Especially if an input text is parsed like <*endChars?...> or <*|endstr?...> or <""?...>, the string-result may be evaluate additionally with an inner syntax. The string is recognized first because it is matching to the outer syntax. But following, the inner syntax is tested with this provisional result. The result of that syntax test supplies the conclusive parse result.

    The notation form is: .<...?!syntax>

    There isn't a semantic after the exclamation mark, but the identifier of the inner syntax as a ZBNF-component. If the inner syntax isn't matched, the outer syntax isn't recognized as matching too. Another bough in the outer syntax is tested than.

    The special construct [>...] for required positive test may be used to prescribe the matching of the inner syntax, see Topic:ZBNF_syntaxDescription.syntaxControl.options.requiredPositive.

    9 Transformation of a semantic (parse result) to another ZBNF-component


    In some cases information are placed outside of an syntactical construct, but this information should be assigned to any inner syntactical content. The information outside may be written only once, but they are duplicated in meaning. The style of writing a definition for some variables of equal types in C/C++ or Java is an example:

     /**Description valid for all Variables. */
     int a,b,c;

    The variables a, b, and c are of type int all. The description should be assigned to all three variables too. The style of writing is some times shorter as:

     /**Description valid for all Variables. */
     int a;
     /**Description valid for all Variables. */
     int b;
     /**Description valid for all Variables. */
     int c;

    The parsed result of ZBNF should be duplicate the commonly information, so that the short and the long variants are not differed in result because there is no difference in meaning. The ZBNF knows a construct, written like:

     attributeSyntax::= [/**<description?-?>*/] <type?-?> {<attributedef?+attribute>  ?,};.

    The example is part of Cheader.zbnf. The rule of writing is: If a ?- is written instead a simple ?, than the content of that ZBNF-component isn't added to the parse result, but it is temporary stored in a buffer associated to the syntax prescript. If more as one of such construct is found, the information are accumulated in this temporary store. A writing style like ?-? means, that the semantic for storing is identical with the syntax identifier. It is a short way to write <ident?-ident>. Otherwise, ?-> means that the content of the ZBNF-component is stored but it hasn't an own semantic element. stored

    The opposite is the construct with chars ?+. If this ZBNF-component is parsed successfully, the current temporary stored parse result of the syntax prescript is written inside the component at begin of that. The temporary result is not cleared yet. If a second ?+ is found, the same content is stored also there. In the example the ?-?> is outside of the repetition brace, but the ?+...> may repeated some times, any repetition's result gets the same common content from ?-...>.

    Another commonly presented example may be following:

     syntax::=<HeadInfo?-?> <AnotherInfo> [ <Variante1?+?> | <Variante2?+?> | <Variante3?+?> ]

    In the XML presentation the HeadInfo should be part of the Variante... but the <AnotherInfo> should be outside. The textual content is written in the given form, simple and easy, without consideration of a XML representation. The XML representation converted by Zbnf2Xml will be produced for example as:

     <Variante2><HeadInfo>...</HeadInfo> ... </Variante2>

    Note: First this variant of temporary storing of parse results was developed for the improvement of calculation time while parsing. Instead writing

     syntax::= [ <Variante1> | <Variante2> | <Variante3> ]
     Variante1::= <HeadInfo> <RestOfVariante1>.
     Variante2::= <HeadInfo> <RestOfVariante2>.
     Variante3::= <HeadInfo> <RestOfVariante3>.

    the <HeadInfo> is parsed and recognized one time, after them the variants are tested. The long form repeats the detection of matching of <HeadInfo> if the first <Variante1> doesn't match etc. It is suboptimal for calculation time of the parsing process. But such constructs to save calculation time aren't good for structure description in the source of syntax. Not it is planned (not ready yet, release 1.1), that the result of a matching input isn't purged if a syntax bought doesn't matched. If a second bough starts with the same syntax, it is recognized because the syntax ident is stored in the result items, and it isn't tested twice.

    10 All control variables


    This chapter shows all control variable. A control variable is placed outside of any syntax definition, typically at begin of the syntax script before the first syntax definition. if a imported syntax script contains also control variables, they are accepted too. If a control variable, which should be written only one time, is written more as one time, the last setting will be valid. All control variables are inputted before the parsing process starts.

    $import " path ".

    The named syntax script are imported at this position. path is the absolute or relative from current script-file path to a file containing a ZBNF-script to import. The imported script may contain some used syntax-definitions. This control variable can be written more as one time to import more as one script.

    $keywords= { keyword ? | } .

    Some identifier are stored as keywords. If a identifier is parsed, writing <$..?...> and the parse result is equal to any of the keywords, the test is declared as non-matching. Thereby a falsity can be prevent, for example if the input text contains public final int x and the syntax script contains

     declaration::=,,<type> <$?name>,,.

    the input doesn't match. final is a identifier and would match to <$?name> but it is a keyword and doesn't match therefore.


    Writes an attribute srcline="99" srccol="99" in any node of a XML output file if that information are available. On parsing simple usual short Strings that information are not available. But if a File is parsed that information should be given.


    Writes an attribute srctext="parsed text" in any node of a XML output file if that information are available. Therewith it is possible to assign the parser's result with the source text.


    The line-mode is activated for all syntax-prescripts. It means, that a line feed character \n resp. 0x0a isn't accepted as white-space. The line-mode should be used if the input text is line-oriented. The linefeed itself should be tested as terminal character writing \n. A character \r or 0x0d is accepted as white-space.

    If the line-mode isn't activated, default the \n is a white-space. But see $.

    $endlineComment= startsequence .

    Another start-sequence for endline-comments in a input text is set. The default start-sequence is // like in C and Java. Note that the start sequence for endline comments in the syntax-script is ## independent of this setting. The start sequence should be not longer as 5 characters. All characters between = and the terminating . are accepted as sequence. Don't write additional white-spaces! But the circum-scripting with \ can be used. for example \. means a dot as start-sequence-character.

    for example: $endlineComment=???. or $endlineComment=\.\.\... In the second example three dots are the start sequence of endline comments.

    $comment= startsequence ... endsequence .

    Another start- and end-sequence for comments in a line or over some lines in an input text is set. The default start- and end-sequence is /* and */ like in C and Java. The sequences should be not longer as 5 characters. All characters between = and ... respectively ... and the terminating . are accepted as sequence. Don't write additional white-spaces! But the circumscribing with \ can be used. for example \. means a dot as start-sequence-character.

    for example: $comment=[?...?]. or $comment=[\....\.].. In the second example comments are written [.between that.].

    $inputEncodingKeyword=" encoding-detect-string ".

    The key-string to detect a encoding in the first line of an input text is defined. The input text should contain the char sequence

    encoding-detect-string =" encoding "

    where encoding is one of the known encoding identifier such as ISO-8859-1 or UTF-8. See Chapter: 11.2 Definition of encoding of the input file to parse

    $inputEncoding=" encoding ".

    The encoding of the input text is defined here. This define may be used concurrently to a $inputEncodingKeyword=".... If no input encoding keyword is found in the input text, this given encoding is valid than.

    $xmlns:namespacekey=" namespace-url ".

    A XML-namespace is defined. The namespace-key can be used in semantic identifier while a conversion to XML is done: Zbnf2Xml. The namespace declaration is written also in the outputted XML-text. This control variable can be written more as one time to have some namespace declarations.

    $main= syntax-definition

    The so designated syntax-definition is used as the main script valid for the top of input text (the whole input text). If such an control variable isn't used, the first syntax-definition is the main-definition.

    11 Character encoding



    11.1 Definition of encoding of a ZBNF-script-file


    A text given as content of file is written in a specific encoding. There are some ordinary encodings used, at least the UTF8 and some 8-bit-ASCII tables like ISO-8859-1. See http://en.wikipedia.org/wiki/ASCII, The UTF-16-format isn't used frequently but it may be regarded also.

    In a ZBNF script file the encoding is able to define adquatee XML: The first line starts with an head information, containing the encoding also. This line contains only characters from the 7-bit-US-ASCII, which are identical in all encoding tables. Using UTF-16 may be able to detect too, because any second byte is 0.

    The first line should be written in form:

    .<?ZBNF-www.vishia.org version="1.0" encoding="ISO-8859-1" ?>

    If the ZBNF-script is given in a String inside Java, this line isn't necessary because inside Java UTF-16 is used. It is necessary only for file input of the ZBNF-script.

    11.2 Definition of encoding of the input file to parse


    The ZBNF-Parser used as class:_org/vishia/zbnf/ZbnfParser inputs the text in a internally String form. Java uses internally an UTF-16-encoding. From this view the encoding is a problem of the environment of the parser call.

    In the Zbnf2Xml-Application, called from command line, a file is inputted, At this level the encoding should be able to define in a proper way. Maybe, the content of a file doesn't be determined by encoding problems, because its content format is given already. But it may be, that a encoding is able to define in the input file, if the format is free at first. Therefore there are some possibilities or variants:

  • The encoding of the input is defined in the ZBNF-Script directly writing for example

    $InputEncoding= ISO-8859-1 .

    Than the Zbnf2Xml converter reads the required encoding from the ZBNF-script file.

  • The encoding of the input should be given in the input text itself. Than a keyword is able to define in the ZBNF-script, for example:


    The rule therefore is:

  • The encoding may be given as command line argument of Zbnf2Xml.

  • The adequate possibilities are able to use for a users application of ZBNF parsing. The possibility of definition the encoding or the encoding keyword in the ZBNF-script is a part of definition of the ZBNF.