+ ? * in bnf specification
category: code [glöplog]
Dear lazyweb, could someone tell me what the * + ? means in this BNF? I don't think they're tokens
http://www.verilog.com/VerilogBNF.html
http://www.verilog.com/VerilogBNF.html
It says in the definition table what they mean. * is 0 or more. + is 1 or more, and ? means it may or may not exist.
thanks
Verilog .. NOOOOOOOOOOOooooooooooooo.............
Ok, another thing. How do I deal with operator matching when the operator tokens overlap? Look at BINARY_OPERATOR and UNARY_OPERATOR. They both have '+' and '-', for instance. My tokenizer isn't aware of the parser so I have to match it with zero-knowledge.
You can introduce a third token kind.
UNARY_OR_BINARY is one of the following tokens: + -
<unary_operator>
::= UNARY_OPERATOR
||= UNARY_OR_BINARY
<binary_operator>
::= BINARY_OPERATOR
||= UNARY_OR_BINARY
But I usually have a different token for each operator (PLUS, MINUS, TIMES...) and let the parser deal with it. It's useful in case all the binary operators don't have all the same precedence. (which happens almost all the time).
UNARY_OR_BINARY is one of the following tokens: + -
<unary_operator>
::= UNARY_OPERATOR
||= UNARY_OR_BINARY
<binary_operator>
::= BINARY_OPERATOR
||= UNARY_OR_BINARY
But I usually have a different token for each operator (PLUS, MINUS, TIMES...) and let the parser deal with it. It's useful in case all the binary operators don't have all the same precedence. (which happens almost all the time).
Binary and unary operators won't necessarily occur in the same context, so if your parser is LL(x) the fact that they overlap may not be blocking.
I'm very curious what you are making, sigflup. Please tell us more.
LLB> yeah that's what I was thinking about doing.
ponce> but my tokenizer (lex) is not aware of context. I could write it into the parser grammar (yacc), if it's not consumed by a different token rule, which I don't think it is. Thank you
trc_wm> slowly a schematic editor for verilog. Did I mention slowly?
ponce> but my tokenizer (lex) is not aware of context. I could write it into the parser grammar (yacc), if it's not consumed by a different token rule, which I don't think it is. Thank you
trc_wm> slowly a schematic editor for verilog. Did I mention slowly?
Hmm.. OUTPUT_SYMBOL and LEVEL_SYMBOL overlap too. This can't be written into the parser grammar because they could very well be tokenized and IDENTIFIERS.
I think there is something fundamental that I'm not getting here.
I think there is something fundamental that I'm not getting here.
well.. it's on stack-overflow right now http://stackoverflow.com/questions/16613587/to-tokenize-terminals-or-write-them-into-parser-grammar
Yes, it's the same: Handle it in the grammar.
<level_symbol> ::= <OUTPUT_SYMBOL> | ? | b | B
Lexers are dumb, do the clever thing in the parser (btw, I never liked the distinction between lexers and parsers - I try to get rid of the lexer when I can).
<level_symbol> ::= <OUTPUT_SYMBOL> | ? | b | B
Lexers are dumb, do the clever thing in the parser (btw, I never liked the distinction between lexers and parsers - I try to get rid of the lexer when I can).
LLB> you'd like dparser then. It very much blurs the lexer and the parser. I finally ended up settling on it because there was an already written grammar.
This is about one of my favourite subjects of computer science: formal languages.
There is a hierarchy of formal languages named after the famous linguist Noam Chomsky. The top of the hierarchy is the class of regular languages. Such languages can be specified by regular expressions, that is expressions that consist of literals as well as the operators + (allows the previous literal to be appended once or not at all) and * (allows the previous literal to be appended as many times as desired or not at all). Moreover, it is possible to use brackets in order to signify that several literals together form one kind of "super-literal" to which the operators + and * can be applied as well.
BNF is for context-free languages, which is the next class of languages below regular languages. Since any regular language is also context-free, regular expressions may be used in BNF as well. So this explains why these symbols appear in your code.
The next class of languages would be context-sensitive languages, followed by general languages. Moreover, theoretical computer science knows recursive languages, which are a proper superset of context-sensitive languages, and recursively enumerable languages, which are a proper superset of recursive languages. Furthermore, there are co-recursively enumerable languages, which are a proper superset of recursive languages as well. The class of recursive languages is the intersection of the classes of recursively enumerable and co-recursively enumerable languages. For recursive languages it is possible to specify abstract machines known as Turing deciders which accept a word if it is an element of the language and reject it if it is not. For recursively enumerable languages it is possible to specify Turing machines which accept a word if it is an element of the language and either reject it or enter an endless loop if it is not. For co-recursively enumerable languages it is possible to specify Turing machines which either accept a word or enter an endless loop if it is an element of the language and reject it if it is not.
A decision problem is a problem that is about deciding whether a given word is part of a given language. If the language is recursive, the problem is called decidable. If the language is recursively enumerable, the problem is called semi-decidable. This is the link between formal languages and computability theory.
There is a hierarchy of formal languages named after the famous linguist Noam Chomsky. The top of the hierarchy is the class of regular languages. Such languages can be specified by regular expressions, that is expressions that consist of literals as well as the operators + (allows the previous literal to be appended once or not at all) and * (allows the previous literal to be appended as many times as desired or not at all). Moreover, it is possible to use brackets in order to signify that several literals together form one kind of "super-literal" to which the operators + and * can be applied as well.
BNF is for context-free languages, which is the next class of languages below regular languages. Since any regular language is also context-free, regular expressions may be used in BNF as well. So this explains why these symbols appear in your code.
The next class of languages would be context-sensitive languages, followed by general languages. Moreover, theoretical computer science knows recursive languages, which are a proper superset of context-sensitive languages, and recursively enumerable languages, which are a proper superset of recursive languages. Furthermore, there are co-recursively enumerable languages, which are a proper superset of recursive languages as well. The class of recursive languages is the intersection of the classes of recursively enumerable and co-recursively enumerable languages. For recursive languages it is possible to specify abstract machines known as Turing deciders which accept a word if it is an element of the language and reject it if it is not. For recursively enumerable languages it is possible to specify Turing machines which accept a word if it is an element of the language and either reject it or enter an endless loop if it is not. For co-recursively enumerable languages it is possible to specify Turing machines which either accept a word or enter an endless loop if it is an element of the language and reject it if it is not.
A decision problem is a problem that is about deciding whether a given word is part of a given language. If the language is recursive, the problem is called decidable. If the language is recursively enumerable, the problem is called semi-decidable. This is the link between formal languages and computability theory.