Inhalt
Topic:.ZBNF_syntaxDescription.
Written by Hartmut Schorrig, www.vishia.org. Latest edition 2019-12
The BNF, Backus-Naur-Format -->Wikipedia is created with the development if the computer language Algol, at beginning of the 60-th of last century. It was an important milestone of software technology. BNF allows first time to describe exactly the syntax of programing languages.
BNF was developed in the future, for example Niclaus Wirth from the Zurich University creates and uses EBNF for his programming language PASCAL -->Wikipedia EBNF. Also known are syntactical expressions adequate to BNF for example for syntax of command line arguments. Typical are the option brackets: [...].It is state of the art.
The BNF is not fully standardized. Several variants are used. Mostly BNF-like explanations are used for documentation. The italic style of text is used to signalize keywords. Such a documentation may be well able to read by human, but it isn't proper for computer driven evaluation. For automatically processing the semantic of the parts of syntactical constructs are important. Using the semantic of expressions, it content should be recognized and processed.
The ZBNF enhances the BNF with semantic aspects and some possibilities of syntax constructs.
Topic:.ZBNF_syntaxDescription..
A ZBNF-syntax-script may be given either in a text file or as a String in a Java program. A ZBNF-syntax-script may have a
head, which also defines the used encoding in the file. Than some control settings, starting with a $
may be followed, see Chapter: 10 All control variables. After them some syntax-definitions follows. The first syntax-definition is used to parse the input text, the other syntax-definitions
are sub-syntax-definitions. Some explanations of semantic parts may be contained between any syntax-definitions, see Chapter: 7.4 explanation of the semantic in the ZBNF-script as help. The syntax-script may contain comment lines.
The following wording is used :
syntax-script is the whole script, which contains maybe the title line with encoding, some variables and some syntax-definitions.
syntax-definition means a definition of a syntax, including the identifier of the definition and the prescript.
syntax-prescript is a String-sequence, which defines any syntax in ZBNF. A syntax-prescript is the right part of a syntax-definition after the ::=
or maybe a part of them.
A syntax-definition is written as following:
syntaxident::=
syntaxPrescript
.
Thereby syntaxident
is the identifier of the syntax-definition, used as symbol for calling that syntax in other syntax-prescripts as component.
syntaxPrescript is the definition of this syntax itself. The dot marks the end. In ZBNF the syntax would defined like following:
Syntaxdefinition::=<$?syntaxident>::=<syntaxPrescript>\..
It means:A syntax-definition consist of the identifier, semantically named as syntaxident
, following with ::=
without any white spaces. Than a not here defined expression syntaxPrescript
follows. At end a dot should be written. The dot is coded with \.
The dot after the \.
marks the end of the definition.
A ZBNF-syntax-script may import another script, using the $import
-control variable. Than some syntax-definitions from the imported script can be used. It is a important form of reusing of
syntax definitions.
An simple example for a ZBNF-Script will be given as following::
<?ZBNF-www.vishia.org version="1.0" encoding="iso-8859-1"?> $setLinemode. shopping-list::=shopping \\n <![=]*?> \\n { <position> \\n }. position::=<#?@amount> [<?@unit>peaces|x|] <$?text()>.
In this file at beginning the encoding is defined. Than using of the line mode is defined. The first ZBNF-syntax-definition
shopping-list
is the main-definition. There the parsing begins. The syntax requires a line shopping
, than a line with ==========
, than some <position>
any in one line. The second syntax-definition is for <position>
, it is a syntax-component.
Writing and printing rules in the following explanation: In the description below some syntax-prescripts are shown in ZBNF itself (pure ASCII), but a better readable form is used in the printed text:
terminal symbol [ ] < > =
: Terminal symbols are written in a mono-space font. The special syntax control characters []{}?.
used as terminal symbols are written also immediately in this form (without circumscription with \
in a ZNF-script).
monospacedItalic
: At that position any identifier should be written in a syntax-prescript. In ZBNF it is defined as <$?semantic>
.
italic: It means in ZBNF <component>
or any partial syntax-prescript. At some positions ... are written as wild-card for a partial syntax, which isn't significant
in the given context. A special semantic aspect <syntax?semantic>
can't be shown in this form. But the semantic is declared in the explanation text or it should be self-declaring. Also a
special syntax aspect doesn't may be shown in a formal kind.
[ option ] { repetition }. : The syntax control symbols are written in the standard paragraph font.
Some pattern of ZBNF using are shown as examples. Than the mono-space font is used without special character fonts.
Topic:.ZBNF_syntaxDescription.semanticAspect.
The "Z" in "ZBNF" is a reverse "S" for semantic. The semantic aspect isn't respect sufficient in the original BNF and its variants. If you write:
variableDefinition::= <identifier> <identifier> ; . identifier ::= alphaChar [ digit | identifier]. alphaChar = A|B|C|...
than the syntax is defined exactly. But the meaning of the first identifier
, is it a type ?, and the second one, a variable name?, is unknown yet. A verbal explanation is needed additionally . The
same situaltion is in the presentation of some command line calls like:
XCOPY Quelle [Ziel] [/A | /M] [/D[:Datum]] [/P] [/S [/E]] [/V] [/W] [/C] [/I] [/Q] [/F] [/L] [/G] [/H] [/R] [/T] [/U] [/K] [/N] [/O] [/X] [/Y] [/-Y] [/Z] [/EXCLUDE:Datei1[+Datei2][+Datei3]...] Quelle Die zu kopierenden Dateien. Ziel Position und/oder Name der neuen Dateien. /A Kopiert nur Dateien mit gesetztem Archivattribut, �dert das Attribut nicht. /M Kopiert nur Dateien mit gesetztem Archivattribut, setzt das Attribut nach dem Kopieren zurck. (Sorry, its german, I have install Windows with german language.)
This example is the start of the content, which is kept typing help xcopy
in Windows-XP (Microsoft). The meaning of the options are explained verbal. But after all with help of this BNF-like presentation
it is able to recognize, that the options /A
and /M
are excluded together etc.
For a computer-aided information processing verbal explanations aren't usefully, a complex programming is necessary to process the result of a parser.
In ZBNF, the syntax above can be written in form:
variableDefinition::= <identifier?type> <identifier?name> ; .
So the first identifier
is explained as type
, and the second as name
in formal kind.
The idea of an association between the pure information-data with its meaning is a basic idea of XML. In XML a <tagname>
is the semantic description, where the content of the element or an attribute is the pure information: <meaning>information<subtag>...subInfo</subtag></meaning>
or <tag meaning="information">
. Using this idea a computer-aided information processing is able to run, also if information comes from older versions of
sources, from other providers with altered definitions and so on. The compatibility of information interchanging is better
able to control.
The basic idea of binding a syntax with its semantic is a core idea of ZBNF. It enables the conversion of an any desired syntactical interpretable text to XML without additional programming effort, see Topic:.ZBNF2Xml.. The than following information processing can use the well known XML tool supports. It is possible to write:
Text x ZBNF =: XML
The reverse conversion:
XML x XSLT =: Text
is the known XSLT-techniques. The x
means a processing or cross product.
Details see Semantic definitions.
Topic:.ZBNF_syntaxDescription..
Terminal characters are that characters, which should be written in the input text in the given form. They are keywords of
text recognizing. In BNF the terminal text often should be written in quotations like "terminal"
, but in ZBNF it isn't so. Terminal characters will be notated immediately. But there is an conflict with the special characters,
which controls the syntax flow: [ ] | { } < > ? .
To determine that characters as terminal character, it should be written with a backslash \
before. for example if a [
is necessary, it should be written as \[
.
Special chars
The backslash \
is able to use adequate in string-literals in Java and C/C++ as escape char for control characters: \n \r \b \t \f
with the meaning of Newline (0x0a), Carrige Return (0x0d), Backspace (0x08), Tabulator (0x09), Formfeed (0x0c). Such terminal
characters are necessary often for format-separation. The backslash itself will be written as \\
There are some special escape sequences too:
\s
is a white-space-symbol in the line, but not a line feed.
\
backslash following by a simple space means simple space as terminal symbol.
\e
means end of text. If such an terminal symbol is required, the input text should be ended at this position.
Any Unicode-Characger can be coded with \uxxxx
where xxxx
is the 4-digit hexa code of UFT16-table. This kind of representation of terminal symbols allows require of characters additional
and outside of the encoding of the script.
The encoding of the terminal characters used in a ZBNF-script-file are able to define, see Character encoding.
Topic:.ZBNF_syntaxDescription..
A white-space is a text part, which produces space without characters in a print output. Spaces, Tabulator, line-feed and its sequence are white spaces
Topic:.ZBNF_syntaxDescription...
The syntax definition itself can be written in a free format with white-spaces. Normally if a white-space is written in the syntax script, also a white-space in the input text is allowed.
Using a control setting $white-spaces=
white-spaces.
in the ZBNF-Script outside a syntax definition (should be noted at start of syntax script) it can be defined, which characters
are white-spaces. Default they are \ \t\r\n
. for example a \t
can exclude as white-space, because it has a special meaning in the text. Than it should be written:
$white-spaces=\ \r\n.
Topic:.ZBNF_syntaxDescription...
With a additional <$Nowhite-spaces>
at beginning of a syntax definition it is possible to define, that white-spaces in the syntax prescript don't allow white-spaces
in the input text. A syntax-definition should be written in form:
Syntaxdefinition::=syntaxident::=
[<$semanticOftheDefinition>
][<$Nowhite-spaces>
] syntaxprescript.
.
The <$Nowhite-spaces>
should be written after an optional <?semantic>
, without white-spaces between.
If spaces or white-spaces are necessary in the syntax, it should be written as terminal characters like \s
or \
, \t
and so on or using Regular Expressions.
An example is the syntax prescript to parse a #define NAME xxx...
in C or C++. In zbnfjax/zbnf/Cheader.zbnf there is the following part:
defineDefinition::=<$Nowhite-spaces> <$?@name> [ ( { <$?parameter/@name> ? , } ) ] <![ \t]*?> [ <#-?intvalue> | 0x<#x?hexvalue> | <""?stringvalue> |] <![ \t]*?> { <*|\n|\\|\r\n?value> ? \\[\r]\n }.
The problem is: The define may be wrapped at end of line using a \
, but the text at the next line have the same meaning like without \
end line-wrapping. The syntax-prescript parses first the <$?name>
as identifier. The optional following parameter names should be written without spaces. Than a white-spaces is admissible,
written in syntax using the Regular Expression <![ \t]*?>
. Than either an integer-value or String in some variants are accepted, it is a mainstream use case of define. After them
all other characters are captured until end of line or \
. It is stored as <...?value>
. An \
followed immediately by \n
optional with a \r
means, the next line is a next <...?value>
of this define. The using of the parser result may concatenate this <...?value>
to get the whole expression.
Topic:.ZBNF_syntaxDescription...
The syntax prescript text allows a line and comment, started with two ##
outside of a < ... >
and outside of a prescription with \#
. A single #
is a normal terminal symbol. If two ##
are necessary as terminal symbol, you should write \#\#
.
White spaces between the syntax prescripts inside the ZBNF-script are ignored. Also comments are ignored. That comments or white-spaces haven't any meaning in opposite to the meaning of white-spaces inside a script.
Topic:.ZBNF_syntaxDescription...
Generally, a comment is recognized like a white-space. But it is possible to test comment constructs at some positions, though on other positions there are skipped. There are some rules:
Without any other decision a text in /* ... */
and a text after // until end of line is a comment. But using the control variable $comment=<*|\.\.\.?startCommentString>\.\.\.<*\.?endCommentString>.
it is possible to set other characters instead. The default decision should be written as
$CommentString=/*...*/.
Another variant will be
$comment=(?...?).
With the control variable $endlineComment=<*.?$endlineComment>.
the start characters of a endline comment are determined. The default decision should be written as
$endlineComment=//.
Another variant will be
$endlineComment=#.
With the control variable
$setLineMode.
it is set, that a \n
and \r
are not recognized as white-space. It is a decision, if a input text is not a free format but line-oriented. white-spaces
in the line are recognized, but may be supressed using the <$Nowhite-spaces>
in the syntax prescript.
In generally it will be tested at first while parsing, whether the input text matches to the current terminal symbols. After
them comments are skipped. It means, if the start character of a comment are matching to the current terminal symbol, the
parse process recognizes it as parsed input, and the following text is parsed after them. for example comments are parsed
in the Cheader.zbnf with the sequence:[ /**<description>*/]
, where it is defined:
description::= <*{ * }|*/?!test_description>.
It means, all characters until */
. So a description started with /**
is processed as a parsing result.
The explicit terminal symbols \
, \n
or [\]\n
as combination of 0x0d 0x0a or only 0x0a and a \s
are able to use too.
Topic:.ZBNF_syntaxDescription.syntaxControl.
Topic:.ZBNF_syntaxDescription.syntaxControl..
The alternative was the only one control of the old BNF in the 60-th. The BNF uses only alternatives and recursion. Yet the alternative isn't necessary to define
DigitNotZero::=1|2|3|4|5|6|7|8|9. Digit::=0 | <ZifferUngleichNull>. Digitsequence::= <Ziffer> | <Ziffer><Ziffernfolge>. positiveNumber::= <ZifferUngleichNull><Ziffernfolge>.
which is shown in some examples of education. Process a digit is better done with a fix programmed algorithm. Therefore the
syntax special construct <#?Number>
or <$?identifier>
is available in ZBNF.
The alternative is usefully for better things like:
Customer::=<Consumer>|<BusinessClient>.
where Consumer
or BusinessClient
may be a complexly syntactical construct. Also terminal symbols are usefully in alternatives:
title::= Mr\\. | Miss | Mrs\\. .
Any syntax prescript (right side of a syntax definition after ::=
can be written as an alternative.
componentIdentifer::= alternative1 | alternative2 | alternative3.
If alternatives are necessary inside a syntax prescript, they should be written in square-brackets as options.
...[ alternative1 | alternative2 | alternative3 ]...
or it may be assigned in the forward or backward part of a repetition:
...{ alternative1 | alternative2 | alternative3 ? backAlternative1 | backAlternative2 } ...
Topic:.ZBNF_syntaxDescription.syntaxControl.options.
Topic:.ZBNF_syntaxDescription.syntaxControl.options..
The simple option is designated with square brackets, like in the old BNF: [ ]
The general meaning, also in ZBNF, is: This is optional, it may be matched, or not. This contract is known also in all syntax
descriptions.
It is possible, that such an option is able between terminal symbols. for example in a report file the word telegram
is written without the second e
: telgram
. It was an mistake, later versions writes telegramm
. Now the parsing of older and newer reports should detect both variants. Therefore the terminal syntax is written as tel[e]gram.
Inside the square brackets of the option any possible syntax prescript is possible. This part of syntax may get a special
semantic designation. It should be written: [<?semantic> syntaxprescript ]
. If the option matches in syntax processing, a parse result with the given semantic is produced than.
Topic:.ZBNF_syntaxDescription.syntaxControl.options..
It is a combination of the option writing with some alternatives. In this notation at least one option should be matched. The square bracket of the option doesn't mean, it is optional all in all, rather it is a obligate to use one of it.
Topic:.ZBNF_syntaxDescription.syntaxControl.options..
If at least is written |]
, (without spaces), it is an empty choice. It means, that if no alternative matches, it is okay also. If no alternative matches,
no parse result is produced for that alternatives and for the whole option. If it is written
[<?SomeChoices> Alternative3a | Alternative3b |]
no parse result named SomeChoices
is produced if no alternative matches.
Topic:.ZBNF_syntaxDescription.syntaxControl.options..
If it is written [|...
the parser tests first, whether the syntax matches in the sequence after the option. Only if it doesn't match, the option
is tested.That may be advisable in a construct like
[|-<?negative>] <value>
The <value>
may mean a number, also negative. defined as value::=<#-?number>|<$?ident>
if the input text contains a negativ number like -123
, it will be matched as a number itself. The Semantic negative
is not necessary and not produced. But if it is an <$?ident>
, the negative sign should be parsed independent. Another example is:
coplexlyNumberString::=[|<#?leftPart>\\.][|<#?middlePart>\\.]<#?rightPart>.
The parser should be detect a middlePart
and a rightPart
if the number only contains one dot, but not a leftPart
and middlePart
. If for example the input text contains 123.456
, at first 123
would be parsed as right part. But because the dot after it doesn't match the following syntax, the parser starts at the
middle part and matches.
With this construct the principle of right-aligned parsing is possible to use.
Topic:.ZBNF_syntaxDescription.syntaxControl.options.requiredNegative.
It it is written
[? syntax ]
than the syntax does not match, if the syntax in the option bracket matched. Typically it is able to use for example for repetitions and its break:
Example::={ [?;] <*;?text> ; } ;.
A text
ending with semicolon matches any time. But two semicolons one after another should be tem termination of this sequence.
Without using this possibility the second semicolon would be recognized as a text, and the following input are confuse. With
this notation the second semicolon is detect as not a repetition, the continuance detect the second semicolon after the repetition, and the following input may matched.
Topic:.ZBNF_syntaxDescription.syntaxControl.options.requiredNegative.
It it is written
[! syntax ]
than the syntax is tested but not processed. Such tests may be required if an input should be pre-tested but processed in following syntax constructs. for example
Example::= [!;|:|+] <nextPart> | <somewhatElse>.
The problem may be, a next part starts with the shown characters. The characters have to be parsed as part of the <nextPart>
. But the decision that it is a <nextPart>
is arrived before entry in this test.
Topic:.ZBNF_syntaxDescription.syntaxControl.options.expectedVariant.
If it is written
[> syntax ]
it means, that the syntax inside the square brackets should be matched, Otherwise not only the parsing of that bough is failed, but the whole parsing process is aborted. This construct should be used only in syntactical environment where previous checks determine, that the following syntax have to be matched. for example:
[/**[><description>*/]]
The /** ... */
as whole construct is optional. But if a /**
is detect, the next part of the syntax bough <description>
should be matched unconditionally.
Another examlpe is:
[ condition: [><?which>A|B|C]] ]
The keyword condition:
is optional. But if it is detect, A
or B
or C
have to be matched. In some cases the parsing process is terminated with an syntax error anyway, but before the knock out
is detected, all other variants are tested. It needs calculation time. But in the case if the tested part is in a part of
text, which is also detectable as comment, it is a important feature.
Topic:.ZBNF_syntaxDescription.syntaxControl..
In the originally BNF from the 60-th, the repetition wasn't defined. For repetition constructs, the recursion was used. In the http://www.en.wikipedia.org/wiki/EBNF the repetition was established now with the brace.
In ZBNF at least one match is obligate. If no pass should be also okay, it should be written as:
[{...}]
The second distinction to EBNF is: A backward bough may be defined if necessary. It is defined starting with a ?
with the back syntax until }
. If the back syntax matches, the repetition is obligate. It is a frequently situation in practice. for example between enumerations
and some other cycle of parts a special character like a comma is written. But the comma isn't written at the end of the sequence.
If a comma is detect, a next cycle should be start. An example is shown above already, the parameter of a define in C/C++
are comma-separated:
defineDefinition::= ... <$?@name> [ ( { <$?parameter/@name> ? , } ) ]
In this case the whole prescript in (...)
is optional. But if a (...)
is used, at least one parameter name should be present between the parenthesis. If a comma is written, a next identifier
have to be following.
Because to forward and backward bough of a repetition is a syntax prescript too, the alternatives can be used just all other syntax control constructs. It may be typical to write for example:
{ <variantA> | <varianteB> ? [<?delimiter> + | , ] | : <?specialDelimiter> }
or construct nested repetitions.
Topic:.ZBNF_syntaxDescription.ZbnfComponent.
Syntax-components are complex syntax prescripts in generally. It may be written as extra syntax definitions. At least the
clarity may be increased using that. Syntax components was able to define also in the old BNF. It was denoted as meta-morpheme, or may be non-terminal as opposite to terminal symbols. It is the view of syntactical definition. But from view of the semantic, it is better to denote as component. It is a part of the syntax, which use a <component>
and it is defined in an own syntax prescript. The parse result builds a node at this point, which contains the whole component
with an own parse result bough. It builds a tree.
The notation with angle brackets was used also in the old BNF. Another frequently used notification is writing in italic script. But this is not able to use in technical (ASCII) text formats, only able and well to use in a manually explanations with printed texts. The third used form is the notation as a simple identifier. But it is only possible, if terminal characters are written in quotations. It is the choice in EBNF.
In ZBNF the simple expression of requiring a syntax component is written as
<SyntaxIdentifier>
The Syntax of the component now should be defined anywhere in the syntax script in form
SyntaxIdentifier::=
SyntaxPrescript
.
In the ZBNF not only the syntax of a component is relevant, but also its meaning for post-processing. It is the semantic. If the notation above is used, the semantic identification is identically with the syntax identifier. But also a alternate semantic is able to declare. for example:
{bill-postal-address: <Address?BillAddress> | Supply-postal-address: <Address?SupplyAdress>}
In both cases, for the postal-address of the bill and for the supply, the same syntax is used. But the meaning and the post processing of both addresses a different. Therefore the semantic is different. The semantic is used to identify the parsers result:
<SyntaxIdentifer?SemanticIdentifier>
It is possible also to prevent a semantic for the component. In this case no extra component is produced as result, but the syntax is written in an extra definition. See Chapter: 7 Semantic-specification variants. It should be noted as
<SyntaxIdentifier?>
The semantic may be more as an simple identifier. Especially producing an XML-expression some special cases are able too.
The technical implementation accepted the whole character sequence until the >
as semantic expression. But there are writing rules, see Chapter: 7.1 Semantic writing rules
<SyntaxIdentifer?.attribute=value>
It is possible to store one or more attributes with given value to the syntax item. The values of the attributes can be given
in ""
or not. The evaluation of the attributes can be done while using the parse result. This feature is new since 2017-03.
The syntax of a syntax item which refers a syntax component:
<SyntaxIdentifer
[{?
[+
|-
|%
|]semantic
|?.attribute=value
}]>
whereby semantic
can contain all characters except ?>
. attribute
should be an identifier.
The syntax formally written in ZBNF is:
syntaxItem::=\< [ #<#?number> | !<regex> ...TODO exactly syntax | <$?SyntaxIdentifier ] [{\?[+<?addOuterItems>|-<?storeAsOuterItem>|%<?debugBreak>|]<*\>\??semantic> |\.[<?attribute><$?name>=[<""?value>|<*\>\??value>] }].
The formally notation is worse able to read but exactly in meaning.
addOuterItems
and storeAsOuterItem
see 9 Transformation of a semantic (parse result) to another ZBNF-component
debugBreak
The %
sets a debug flag in the syntax item. It is helpfull if a detail of the parser should be debugged, or the evaluation of the
parser's result should be debugged in an Integrated Development Environment such as Eclipse.
Additionally there are some special characters after the ?
, see Transformation of result and Inner syntax parsing.
Topic:.ZBNF_syntaxDescription.ZbnfComponent..
The following notation is able in ZBNF (examples):
<#?number> <$?identifier> <*|*/?string> <![=]*?RegularExpression>
Numbers and Identifiers are able to process in hard-coded-software better as in a syntax description. Therefore the standard expressions are written in the showed forms. There are parsed hard-coded. Its syntax is well known and defined, it shouldn't be a part of the users syntax script.
The generally notation is <
syntax-symbol?semantic>
adequate to syntax components. Additionally a number of chars may be defined. It should be noted without space after the
<
as positive number, for example:
<16*?cell>
This example means that any characters, but exactly 16, should be parsed and assigned to the semantic cell
. It may be used to parse text in tables with fix column width.
The following notations are possible to use as <
syntax-symbol?...>
syntax-symbol |
explanation |
|
An identifier is expected, written like the known form in the programming languages Java, C: it should consisting of alphabetic
There is a possibility to exclude some identifiers from recognitions here: The syntax prescript may contain a keyword $keywords::=class|interface|super|new|return|if|else. The example shows, that the keyword of a programming language are excluded here. |
|
The identifier may contain but not start with any of the addChars. for example addChars may be |
|
It is the syntax of a positive integer number, consisting of the digits |
|
A negative sign before the digits is admissible. It is a optional negative number. |
|
It is a float number maybe with exponent in the Java- and C-standard notation. Internally a double value is stored. |
|
It is possible to parse a float number and store a multiplied representation with the |
|
A hexadecimal number will expected. The hexa-digits |
|
All characters until ones of the given endChars will be accepted. The endChars may be for example Example:
Note: This construct may be in opposite to the key notes of the parsing process. It doesn't test the matching, but a non-matching
will be searched. It is possible, that a lot of text will be accepted unchecked, until the end character is found. for example
if a space is used:
Note: The sequences |
|
All characters until ones of the given endChars outside of a quotation will be accepted. The circumscription of a quotation mark with |
|
All characters until ones of the given char sequence str will be accepted. Before the terminating char sequences a character Example: |
|
All characters until ones of the given char sequence str will be accepted. The resulting string is trimmed without leading and trailing whitespaces. Note that a space is written after the asterisk! Example: |
|
All characters until ones of the given last char in endChars will be accepted. The searching process starts from the end of the whole parsing text. This special form is able to use especially, if an inner syntax with a short length is parsed, or a short-length input for
example an answer of a command line call. for example a file path is tested. The last testFilePath::=[<toLastChar:/\\?path>[/|\\]]<*?name>. |
|
Adequate to testFilePath::=[<toLastCharIncl:/\\?path>]<*?name>. |
|
A string in quotation marks will be expected. Inside the quotation a circumscription with The input doesn't match, if the input doesn't start with a |
|
Adequat variant of |
|
This is a special form of /**Example Comment of a method * It does this and that * @param x value. */ for example the syntax is Example Comment of a method It does this and that @param x value. This may be simple, no additional characters should be given. But programmers writes not exactly often. for example it is given: /** Comment to any method * * List bullet * this line is written right-shifted this line is written left-shifted without the asterisk */ or tabulators are used. The parser stores the result proper too: Comment to any method * List bullet this line is written right-shifted this line is written left-shifted without the asterisk The characters space and |
|
This is a universal form to parse the current text with a given user method. The method should be found in a special Java-class assigned to the parser calling |
|
A regular expression are used to describe the syntax. The regular expression follows the definition of the java class |
Topic:.ZBNF_syntaxDescription.ZbnfComponent..
Regular Expressions are a powerfully method to describe a syntax of any text. But the handling of complexly regex with well readable assignment to results isn't proper in any case. The additional using of regex in ZBNF maybe a good idea. The input text of a whole syntax part described with the given regular expression is stored as result.
The notation form of regular-expression-using in ZBNF is:
<!
Regex
?
Semantik
>
Be aware that a backslash, often used in Regex, should be written twice: \\
, because the syntax script use it also as circumscription character. See examples below. The examples are simple and explains
the principles, the whole usability is not explained here, see usual documents. It may/should be proper to use only simple
constructs of regular expression in ZBNF. The complexly constructs are not necessary because ZBNF has own implementations
of adequate features.
Regex |
Meaning |
|
A simple dot is a place holder for any character. |
|
The asterisk doesn't mean any character like in ZBNF or wildcard in file paths, it means any number of repetition of the characters left from it. In this case, combination
of dot and asterisk, any character is accepted any time. But there should be a limit. In ZBNF it is able to write a maximal
number outside of the Regex, at ex. |
|
Like .*, but at least one character is necessary. |
|
One of the characters between [] are accepted. |
|
That is a useful combination: Any desired number of the characters between [] are accepted, also nothing. |
|
Adequate |
|
Character range: One of the characters form ..a.. to |
|
It is possible to combine more ranges of characters. This example means: all alphabetic characters. |
|
That means a word, consisting of upper and lower alphabetic characters. |
|
That means a word beginning with an upper case alphabetic, than lower case chars. |
|
That is the fix terminal string |
|
It means at ex. |
|
Now |
|
Any character which isn't a white-space. Note: The backslash should be written twice in a ZBNF script. |
|
All characters until a white-space. |
|
Any white-space, inclusive Linefeed (Hexa 0a) and Carrige Return (Hexa 0d). |
|
All white-spaces, also no white-space. |
|
All white-spaces, at least one white-space. |
|
A word character: [a-zA-Z_0-9] |
|
A word consisting of at least one char. |
|
any characters outside a word. |
|
all characters outside a word until start of next word. |
Topic:.ZBNF_syntaxDescription.semantic.
In generally, the semantic is written in syntax component expressions after an question mark:
<syntax?semantic>
where the syntax may be a ZBNF-defined component:
syntax::=
....
or a standard syntax, presented in the chapter above, for example <#?semanticOfNumber>
.
There are some specials:
<syntax>
means a ZBNF-syntax component with same semantic identifier as syntax
. It is a typical case, if syntax components are only singletons. Writing of <syntax?syntax>
produces the same result.
<syntax?>
In this case there is no semantic assigned. If Zbnf2Xml is used, no own element is created associated to that ZBNF-syntax component. But the parse-result of the component is assigned
as child to the current result of the calling environment. It's an interesting special case. In some cases a complexly sub-syntax
may be defined in an extra syntax prescript, but this sub-syntax hasn't a meaning of a ZBNF-component, it is only a sub-syntax,
not also a ZBNF-component. for example:
structDefinition::=struct [<$?typetagIdent>] \\{ { <structContent?> } \\} <$?name>;. classDefinition::=class [<$?typetagIdent>] \\{ { [ <classContent?> | <structContent?> ] } \\} <$?name>;. structContent::=<?> [ <unionDefinition> | <structDefinition> | <attribute> | <defineDefinition> | <structContentInsideCondition> ].
The inner content of structContent
supplies ZBNF-components, but not the wrapper of this definition itself.
<?Semantic>
Thereby no ZBNF-component is required, but a semantic entry is created if this bough of syntax is passed.
<syntax?!subSyntax>
In this case no semantic is given, but the result of the ZBNF-Syntax-Component is evaluated with the given subSyntax
. See Chapter: 8 Innerer Syntax of parsed text <syntax?!innerSyntax>.
<syntax?-semantic>
The -
means, that the semantic respectively the parse result of this component isn't assigned to the current result, but it is
stored, ... see next. The semantic may be the same as the syntax identifier, than <syntax?-?>
can be written.
<syntax?+semantic>
A stored result of a component is assigned additionally to this component. The semantic can be empty here, writing <syntax?+>
or the semantic may be the same as the syntax identifier, writing <syntax?+?>
. The second question mark replaces the same semantic identifier. See Chapter: 9 Transformation of a semantic (parse result) to another ZBNF-component.
<syntax?%semantic>
The %
sets a debug flag in the syntax item. It is helpfull if a detail of the parser should be debugged, or the evaluation of the
parser's result should be debugged in an Integrated Development Environment such as Eclipse.
<syntax?@semantic>
The @
is a part of the semantic identifier. It is used especially for XML output. Then it creates an attribute in the XML file.
<syntax?semantic=value>
The whole string from ?
to the >
or a next ?
is the semantic identifier. It can be evaluate for special cases.
<syntax?semantic?.attribute=value?.attr2="value2">
Stores one or more attributes with given value to the syntax item. The values of the attributes can be given in ""
or not. The evaluation of the attributes can be done while using the parse result. This feature is new since 2017-03.
<syntax?"!"semantic>
parses with the given semantic but stores the parsed input as simple String with the given semantic. This feature is new
since 2017-06. The form <syntax?"!"@>
uses the same identifier for the semantic as for the syntax. The parsing process of the sub component is correct done, but
the the detailed result tree is not used. This feature can be used if the text should be checked with the parser but it should
be used as a whole in the post processing. Hint: Do not confuse with <syntax?!subsyntax>
. The last one uses a simple syntax usually like <*\n?!subsyntax>
and parses the detail in the subsyntax
.
<syntax?""semantic>
Stores both, the parsed String and the semantic result of the component. It is similar as [<?semantic> ...syntax ...]
but for a named sub component instead an inlined component.
Topic:.ZBNF_syntaxDescription.semantic.semanticRules.
A main area of application is the conversion of free but syntactical texts to XML. Therefore the semantic writing rules are
oriented first to use the conversion to XML. But the evaluation of the prsers result in free Java programming or using a reflection
based writer for Zbnf-results to Java-instances described in javadoc:_org/vishia/zbnf/ZbnfJavaOutput or Topic:.ZbnfJava.ZbnfJavaOutput. is able too. The ZbnfJavaOutput
requires adequate fix rules for writing semantic like XML-output. The rules for both use cases are coordinated, so the same
syntax script can be used for both post-processing variants. A free Java programming may accept all writing forms of semantic,
but it should regard this same rules. A test output of any parse result done in XML maybe necessary or well able to use in
some cases.
For XML-output the semantic in ZBNF determines the names of tags and attributes and contingently childs of the XML-tree. The structure of ZBNF-syntax-components determines the possible structure of the generated XML-tree. But the existence of currently data in the parsed input determine whether or not a XML-element or a bough in the tree is created.
for example the following syntax:
syntax::= {<?set> <head> { <data> } } -end-. head::= idx = <#?@index>. data::= value = <#?value>.
with the following data:
idx=1 value=5 value=6 idx=123 value=7 value=23 -end-
creates the followed XML-tree:
<syntax> <set> <head index="1" /> <data><value>5</value></data> <data><value>5</value></data> </set> <set> <head index="2" /> <data><value>7</value></data> <data><value>23</value></data> </set> </syntax>
This example demonstrates the basic idea of XML by the way. The source of data is shorter, but maybe no clearly structure
is cognizable. The XML tree contains a structure: a set
with head
and data
. The semantic of the data is contained in the XML-text.
The semantic of any syntax element is written inside the <...?...>
after the question mark or after the special designations ?!
, ?+
or ?-
, till the closing >
.In ZBNF it is:
syntax_component_call::=\<<syntax>?[!|+|-][<semantic>]\>.
or better able to read with italic characters in the printing variant:
syntax_component_call ::= <
syntax[?
[ !
| +
| -
] [ semantic ] ] >
.
The following writing rules for semantic regarding the XML necessities are permitted:
semantic |
Rule |
|
Writing a simple ident, a new element in XML-Output with this given ident as tag name is created and added to the current
XML-element. The For Java-output using javadoc:_org/vishia/zbnf/ZbnfJavaOutput the The associated Zbnf-parse-result is written as textual value of the element, if it is a simple parse result. If the Zbnf-parse-result is a component, its content will expand as child of the created element. For Java-output the found field or method is set respectively called with the Zbnf-parse-results value. Thereby the type of
parse result is regarded. If the parse result is a component, the field or return value of called |
|
Writing a For Java-output it is the same like |
|
A colon |
|
For XML-output an element with the tagname val1=<#?result/@val1>; val1=<#?result/@val2>; Both attributes { val= <#?result/@value> } That case is better to write in the following form. In that form the repetition creates an new element {<?result> val=<#?@value> } For Java-output an adequate behaviour is supposed: A field named with the name |
|
The kind of writing with slash on end is a special form of {<?result/> val=<#?@value> } produces only one element like <component?tag> ..other syntax...[ <componentPart?tag/> ] In the input text matching to two divided parts in syntax prescript should be written in the same XML-Element respectively in the same Java-object. The behaviour on JavaOutput is adequate. |
semantic
|
Especially for the form |
semantic
|
This is a special form. First it is a signal to the Zbnf-parser, it should store also the parsed input text of a component.
Second it is a signal to store this input text as text of the element. Use the form |
semantic
|
This is a special form for Zbnf2Xml-Conversion. It means, that an expansion of formatting in the node's result text using
Wikistyle will be done. The Zbnf2Xml-Converter calls the method |
semantic
|
It is the universal method to post-processing the parser's result to expand in XML children. Matching to prepare an class should be named in the header of the ZBNF-script, which implements the interface:_org/vishia/zbnf/Zbnf2Xml.PrepareXmlNode. This feature is not available yet, planned in version 1.1 of ZBNF. |
Topic:.ZBNF_syntaxDescription.semantic..
Immediately after the opened square brackets of an option [<?semantic>
[<?
semantic
>
...]
it is possible to write a semantic specification. Then the option or repetition as whole unit produces an own semantic-specificated
parse result. If the option is not used, this semantic isn't produced: [<?semantic> really-option]
or if the empty bough is possible: [<?semantic> A | B |]
.
An adequate behavior is given on repetitions:
{<?
semantic
>
...}
Each entry of the repetition produces a parse result with the given semantic. The content of the repetition is stored as child of this ZBNF-component. for example
testRepetition::={ <head> : {<?dataBlock> <data?> | <info?> ? , } ; }. head::= idx = <#?@index>. info::= <""?@info>. data::= <#f?@value>.
produces with the following data:
idx=5 : 7.34, 23, "text", 0.01; idx=0 : 34;
the following XML-tree (parse result):
<testRepetition > <head index="5" /> <dataBlock value="7.34" /> <dataBlock value="23.0" /> <dataBlock info="text" /> <dataBlock value="0.01" /> <head index="0" /> <dataBlock value="34.0" /> </testRepetition>
Topic:.ZBNF_syntaxDescription.semantic...
Additionally to the creation of a component for the option or repetition the parsed source for this option, alternative or repetition is stored as String. The parser stores the parsed source part as String in the parse result item for this option, alternaive or repetition. The evaluation of the parser's result, done either with srcJava_vishia_Zbnf/org/vishia/zbnf/Zbnf2Xml or srcJava_vishia_Zbnf/org/vishia/zbnf/ZbnfJavaOutput or other. Depending of this tools the parsed String is placed in a proper way:
ZbnfJavaOutput searches a routine set_semantic(String text)
or searches a field String semantic
to store the String. If the String output is not necessary, but only the sub components output, the void set_semantic(String text) {}
should be existing, but it does nothing.
Additionally ZbnfJavaOutput searches a routine Syntax new_semantic()
and set...
or void add_semantic(Syntax obj)
to create and store the sub result of the option, alternative or repetition. If there is not a sub result, this routines
are not necessary. That is forex on [<?semantic> TEXT1 | TEXT2]
If the semantic starts with @semantic
it is an attribute in XML which can only store a String, not a sub result. Adequate no sub result Object is created, only
the String is stored in a variable semantic
or via operation set_semantic(String)
.
If the semantic is given with <?obj/@attr>
a new_obj()
routine is searched for storing both, the String set_attr(String)
and the sub results.
Zbnf2Xml writes the String as text in an element, or as text in an attribute.
Topic:.ZBNF_syntaxDescription.semantic...
A special case is writing
[<?
whichOption
>
a
|
b
|]
If the alternatives hasn't an own semantic, the parsed text without leading and trailing white-spaces is stored as the result with that given semantic. It is an important feature. for example some assign operators are given in form:
assignOperator::=<?> [<?@assignOperator> = | += | -= | *= | /= | &= | \|= | \<\<= | \>\>= ] .
Thereby the parse result of testing the variants of operators are stored with semantic assignOperator
, the parse result is the operator itself, for example -=
. The <?>
means, no semantic is stored to the ZBNF-component, it is clearly, because the semantic is produced in the option bracket
(next chapter).
Topic:.ZBNF_syntaxDescription.semantic..
On the definition of a ZBNF-component it is possible to write:
component::=<?
semantic
>
...
Then the component is stored with the given semantic identifier instead the component
identifier in an XML output. For generated data (Chapter: 12.3 Generation of destination data classes for the parsing result) the class identifier is semantic
instead component
. Hence for different (more as one) components it is possible to have the same type. Of course the internal data should be
matching together, elsewhere compiling errors occurs.
For storing data of usage of the component the following behaviour is present:
If the ZBNF-component is called writing <component>
or <component?@>
, the semantic
on the component definition is used.
If the ZBNF-component is called writing <component?
specialSemantic>
, the here given specialSemantic is used.
If the ZBNF-component is called writing <component?>
, no semantic is produces for the whole component.
A special case of this is the form
component::=<?>
...
It means, that the component hasn't an own semantic as default. But the same rules as shown above are valid. for example if
it is written <component?
specialSemantic>
this component is stored with the given semantic. Only if a simple call is done: <component>
, no semantic is produced. That form is able to write twice too: The call can be written as <component?>
. Thereby it is shown both at calling position and at definition that no semantic should be used.
Topic:.ZBNF_syntaxDescription.semantic.semanticHelp.
In the syntax script it is possible to explain a semantic with plain language. The explanation should be placed outside of syntax definitions, above or below. It should be written like:
?en:mySemantic::= "Explanation text. It may be more detailed. It is a helpness.". ?de:mySemantic::= "Erklärungstext, etwas umfangreicher als Hilfestellung".
It is able to specify it in several languages. The explanation should be written in quotation marks. A dot on end is necessary. The related semantic shouldn't be only an identifier of a ZBNF-component, it may be a semantic identifier inside a syntax definition too. But ist should be written as a path of semantic. for example:
definition::= <$?type> <$?name> [ = <#f?value> ] ?en:definition/value::="This value means this or that.".
This information isn't use in the parsing process. The parser accepts but ignores it while reading the syntax script. It is a documentation in the script only, able to read manually. But for future extensions, especially for ZBNF-based editors for example as Eclipse-plugin, this information may be used as context sensitive help while typing a part of input text, which is matching to the syntax.
Topic:.ZBNF_syntaxDescription.innerSyntax.
Especially if an input text is parsed like <*endChars?...> or <*|endstr?...> or <""?...>, the string-result may be evaluate additionally with an inner syntax. The string is recognized first because it is matching to the outer syntax. But following, the inner syntax is tested with this provisional result. The result of that syntax test supplies the conclusive parse result.
The notation form is:
<
...?!
syntax>
There isn't a semantic after the exclamation mark, but the identifier of the inner syntax as a ZBNF-component. If the inner syntax isn't matched, the outer syntax isn't recognized as matching too. Another bough in the outer syntax is tested than.
The special construct [>
...]
for required positive test may be used to prescribe the matching of the inner syntax, see Topic:ZBNF_syntaxDescription.syntaxControl.options.requiredPositive.
Topic:.ZBNF_syntaxDescription.transformationOfResult.
In some cases information are placed outside of an syntactical construct, but this information should be assigned to any inner syntactical content. The information outside may be written only once, but they are duplicated in meaning. The style of writing a definition for some variables of equal types in C/C++ or Java is an example:
/**Description valid for all Variables. */ int a,b,c;
The variables a
, b
, and c
are of type int
all. The description should be assigned to all three variables too. The style of writing is some times shorter as:
/**Description valid for all Variables. */ int a; /**Description valid for all Variables. */ int b; /**Description valid for all Variables. */ int c;
The parsed result of ZBNF should be duplicate the commonly information, so that the short and the long variants are not differed in result because there is no difference in meaning. The ZBNF knows a construct, written like:
attributeSyntax::= [/**<description?-?>*/] <type?-?> {<attributedef?+attribute> ?,};.
The example is part of Cheader.zbnf. The rule of writing is: If a ?-
is written instead a simple ?
, than the content of that ZBNF-component isn't added to the parse result, but it is temporary stored in a buffer associated
to the syntax prescript. If more as one of such construct is found, the information are accumulated in this temporary store.
A writing style like ?-?
means, that the semantic for storing is identical with the syntax identifier. It is a short way to write <ident?-ident>
. Otherwise, ?->
means that the content of the ZBNF-component is stored but it hasn't an own semantic element. stored
The opposite is the construct with chars ?+
. If this ZBNF-component is parsed successfully, the current temporary stored parse result of the syntax prescript is written
inside the component at begin of that. The temporary result is not cleared yet. If a second ?+
is found, the same content is stored also there. In the example the ?-?>
is outside of the repetition brace, but the ?+...>
may repeated some times, any repetition's result gets the same common content from ?-...>
.
Another commonly presented example may be following:
syntax::=<HeadInfo?-?> <AnotherInfo> [ <Variante1?+?> | <Variante2?+?> | <Variante3?+?> ]
In the XML presentation the HeadInfo
should be part of the Variante...
but the <AnotherInfo>
should be outside. The textual content is written in the given form, simple and easy, without consideration of a XML representation.
The XML representation converted by Zbnf2Xml will be produced for example as:
<AnotherInfo>...</AnotherInfo> <Variante2><HeadInfo>...</HeadInfo> ... </Variante2>
Note: First this variant of temporary storing of parse results was developed for the improvement of calculation time while parsing. Instead writing
syntax::= [ <Variante1> | <Variante2> | <Variante3> ] Variante1::= <HeadInfo> <RestOfVariante1>. Variante2::= <HeadInfo> <RestOfVariante2>. Variante3::= <HeadInfo> <RestOfVariante3>.
the <HeadInfo>
is parsed and recognized one time, after them the variants are tested. The long form repeats the detection of matching of
<HeadInfo>
if the first <Variante1>
doesn't match etc. It is suboptimal for calculation time of the parsing process. But such constructs to save calculation
time aren't good for structure description in the source of syntax. Not it is planned (not ready yet, release 1.1), that the
result of a matching input isn't purged if a syntax bought doesn't matched. If a second bough starts with the same syntax,
it is recognized because the syntax ident is stored in the result items, and it isn't tested twice.
Topic:.ZBNF_syntaxDescription.controlVariables.
This chapter shows all control variable. A control variable is placed outside of any syntax definition, typically at begin of the syntax script before the first syntax definition. if a imported syntax script contains also control variables, they are accepted too. If a control variable, which should be written only one time, is written more as one time, the last setting will be valid. All control variables are inputted before the parsing process starts.
$import "
path
".
The named syntax script are imported at this position. path is the absolute or relative from current script-file path to a file containing a ZBNF-script to import. The imported script may contain some used syntax-definitions. This control variable can be written more as one time to import more as one script.
$keywords=
{ keyword
? |
} .
Some identifier are stored as keywords. If a identifier is parsed, writing <$..?...>
and the parse result is equal to any of the keywords, the test is declared as non-matching. Thereby a falsity can be prevent,
for example if the input text contains public final int x
and the syntax script contains
$keyword=private|public|protected|final|static. declaration::=,,<type> <$?name>,,.
the input doesn't match. final
is a identifier and would match to <$?name>
but it is a keyword and doesn't match therefore.
$setXmlsrcline.
Writes an attribute srcline="99" srccol="99"
in any node of a XML output file if that information are available. On parsing simple usual short Strings that information
are not available. But if a File is parsed that information should be given.
$setXmlsrctext.
Writes an attribute srctext="parsed text"
in any node of a XML output file if that information are available. Therewith it is possible to assign the parser's result
with the source text.
$setLinemode.
The line-mode is activated for all syntax-prescripts. It means, that a line feed character \n
resp. 0x0a isn't accepted as white-space. The line-mode should be used if the input text is line-oriented. The linefeed itself should be tested as terminal character writing \n
. A character \r
or 0x0d is accepted as white-space.
If the line-mode isn't activated, default the \n
is a white-space. But see $
.
$endlineComment=
startsequence
.
Another start-sequence for endline-comments in a input text is set. The default start-sequence is //
like in C and Java. Note that the start sequence for endline comments in the syntax-script is ##
independent of this setting. The start sequence should be not longer as 5 characters. All characters between =
and the terminating .
are accepted as sequence. Don't write additional white-spaces! But the circum-scripting with \
can be used. for example \.
means a dot as start-sequence-character.
for example: $endlineComment=???.
or $endlineComment=\.\.\..
. In the second example three dots are the start sequence of endline comments.
$comment=
startsequence
...
endsequence
.
Another start- and end-sequence for comments in a line or over some lines in an input text is set. The default start- and
end-sequence is /*
and */
like in C and Java. The sequences should be not longer as 5 characters. All characters between =
and ...
respectively ...
and the terminating .
are accepted as sequence. Don't write additional white-spaces! But the circumscribing with \
can be used. for example \.
means a dot as start-sequence-character.
for example: $comment=[?...?].
or $comment=[\....\.].
. In the second example comments are written [.between that.]
.
$inputEncodingKeyword="
encoding-detect-string
".
The key-string to detect a encoding in the first line of an input text is defined. The input text should contain the char sequence
encoding-detect-string
="
encoding
"
where encoding is one of the known encoding identifier such as ISO-8859-1
or UTF-8
. See Chapter: 11.2 Definition of encoding of the input file to parse
$inputEncoding="
encoding
".
The encoding of the input text is defined here. This define may be used concurrently to a $inputEncodingKeyword="...
. If no input encoding keyword is found in the input text, this given encoding is valid than.
$xmlns:namespacekey="
namespace-url
".
A XML-namespace is defined. The namespace-key can be used in semantic identifier while a conversion to XML is done: Zbnf2Xml. The namespace declaration is written also in the outputted XML-text. This control variable can be written more as one time to have some namespace declarations.
$main=
syntax-definition
The so designated syntax-definition is used as the main script valid for the top of input text (the whole input text). If such an control variable isn't used, the first syntax-definition is the main-definition.
Topic:.ZBNF_syntaxDescription.encoding.
Topic:.ZBNF_syntaxDescription.encoding.syntaxScript.
A text given as content of file is written in a specific encoding. There are some ordinary encodings used, at least the UTF8 and some 8-bit-ASCII tables like ISO-8859-1. See http://en.wikipedia.org/wiki/ASCII, The UTF-16-format isn't used frequently but it may be regarded also.
In a ZBNF script file the encoding is able to define adquatee XML: The first line starts with an head information, containing
the encoding also. This line contains only characters from the 7-bit-US-ASCII, which are identical in all encoding tables.
Using UTF-16 may be able to detect too, because any second byte is 0
.
The first line should be written in form:
<?ZBNF-www.vishia.org version="1.0" encoding="
ISO-8859-1
" ?>
If the ZBNF-script is given in a String inside Java, this line isn't necessary because inside Java UTF-16 is used. It is necessary only for file input of the ZBNF-script.
Topic:.ZBNF_syntaxDescription.encoding.input.
The ZBNF-Parser used as class:_org/vishia/zbnf/ZbnfParser inputs the text in a internally String form. Java uses internally an UTF-16-encoding. From this view the encoding is a problem of the environment of the parser call.
In the Zbnf2Xml-Application, called from command line, a file is inputted, At this level the encoding should be able to define in a proper way. Maybe, the content of a file doesn't be determined by encoding problems, because its content format is given already. But it may be, that a encoding is able to define in the input file, if the format is free at first. Therefore there are some possibilities or variants:
The encoding of the input is defined in the ZBNF-Script directly writing for example
$InputEncoding=
ISO-8859-1
.
Than the Zbnf2Xml converter reads the required encoding from the ZBNF-script file.
The encoding of the input should be given in the input text itself. Than a keyword is able to define in the ZBNF-script, for example:
$inputEncodingKeyword="My-Encoding".
The rule therefore is:
The first line of a input file may contain that keyword, read in 7-bit-US-ASCII, followed by an =
and the encoding string either in ""
or not. This given encoding is used. Thereby only the first 250 chars of the first line are tested. for example:
First line of the file... with some head information, My-Encoding="UTF-8", some others...
If the first line doesn't contain such an String, it is also accepted. Than either the other possibilities are used.
The encoding may be given as command line argument of Zbnf2Xml.
The adequate possibilities are able to use for a users application of ZBNF parsing. The possibility of definition the encoding or the encoding keyword in the ZBNF-script is a part of definition of the ZBNF.
Topic:.ZBNF_syntaxDescription.parseResult.
Primary the result is stored in a simple ArrayList
. Each result item contains its index, its length and the reference to its parent. Hence a tree of result items can be build.
The Arraylist-Resultitem can be gotten via the operation inside the srcJava_Zbnf/org/vishia/zbnf/ZbnfParser with invocation of getFirstParseResult()
which returns the first item of type srcJava_Zbnf/org/vishia/zbnf/ZbnfParseResultItem.
The evaluation of the parsers result can be done immediately with this result tree. But usually it is more effective to store the data in dedicated Java class instances (Chapter: 12.2 Writing the parsing result in Java data for further processing). But this raw data form is used as basic both for Java output and XML output.
In the past there was an consideration to change the primary data storing mechanism and produce primary a tree of data which
follows the syntax component structure. This consideration was oriented to either an XML node tree or a tree of common nodes.
But last not least the original found solution in 2005 was used furthermore. It is a plain data structure (ArrayList
) for storage while parsing and a tree structure because of some specific elements parent
and offsetAfterEnd
which is used to build a special iterator iteratorChildren()
. The view of data in debugging is simple and direct, better than a tree view.
Topic:.ZBNF_syntaxDescription.parseResult..
For parsing and XML output the srcJava_Zbnf/org/vishia/zbnf/Zbnf2Xml can be used. It was the first intension to use the parsers result in 2005. Post processing can be done with XSLT, it was
the goal in 2005. Meanwhile the XML output is used often firstly to check the syntax, because the conversion to XML does not
depend on other ressources such an proper destination data class (using ZbnfJavaOutput
) and the result can be viewed with a standard XML viewer with folding mechanism or as plain text with some search possibilities.
Some text editors for example Notepad++ support search and folding.
Topic:.ZBNF_syntaxDescription.parseResult.ZbnfJavaOutput.
Last changed: 2019-12
This is a common application of the parser: The data should be processed with any Java algorithm. The srcJava_Zbnf/org/vishia/zbnf/ZbnfJavaOutput does this work. There are some operations which may use a given ZbnfParser
with already set syntax (for reusing for several files) or some operations which instantiate the parser.
Typically it may be used:
ZbnfJavaOutput parserJava = = new org.vishia.zbnf.ZbnfJavaOutput(log); parserJava.parser.setSyntax(sSyntax);
The parsing process can be done by invocation:
MyDstData dataRaw = new MyDstData_Zbnf(); String sError = this.parserJava.parseFileAndFillJavaObject(dataRaw, srcfile); if(sError !=null) { System.err.println(sError); } else { ... evaluate dataRaw ... }
The class MyDstData_Zbnf
should follow the syntax definition. It can be written manually (with some special primary handling of the parse result)
or it can be generated using Chapter: 12.3 Generation of destination data classes for the parsing result (next chapter). The storing of the parsers result is done using the reflection mechanism of Java. It is possible to write (or generate, in future) an immeidately writing algorithm for the data. But the
reflection access is usually enough fastly for common purpose.
The class ZbnfJavaOutput
is written and used since 2006 and has resolved many requests.
Topic:.ZBNF_syntaxDescription.parseResult.GenZbnfJavaData.
Last changed: 2019-06
It is possible to generate java destination classes from a given syntax using srcJava_Zbnf/org/vishia/zbnf/GenZbnfJavaData. It should be called
java -jar .../zbnf.jar org.vishia.zbnf.GenZbnfJavaData -s:SYNTAX.zbnf -dirJava:../../srcJava_XXX -pkg:my.package -class:MyDstClass
It produces two classes: MyDstClass.java
for the data itself and MyDstClass_Zbnf.java
to write the data using the .
The class GenZbnfJavaData
is written only starting in may 2019, but it is used in some projects for example FBcl.