Godpatterns: The Wonderful World Of Grammars Part II: A Lua Grammar Parser

This article follows on from my previous introduction into the wonderful world of grammars. In this section we'll be mainly writing in Lua code, only using C# to polish our Lua / Grammar editor. By the end we'll have muddled through and have something that parses a very basic grammar. This can then be extended upon allowing us to investigate a range of different grammars.

Let us begin

Now we wish to write some LUA code to parse our grammar and then later do a run through it and present the output (this will be covered in part 3). We'll start with small steps and work our way up. It's also handy at this point to have some idea of what our grammar will look like and how it will be formatted. I imagine rules will basically fall like this:


[non-terminal] -> [mix] of [non-terminals] [and] terminals

There's a problem with this formatting though - how do we recognise where one rule ends and another rule begins? We could have each rule on a new line. This is pretty elegant but what if your rule is quite complicated and you'd like to space it out? Then the one line rule might get in the users way.

The best way would be to have the computer handle it. As you write - it labels the rules. A small sidebar marks the start of every rule with a number 1..n. This still leaves the problem of how to seperate the rules by computer though.

A simple way, the way I'm going to use is to have all rules of the format:


([non-terminal]) -> ([mix] of [non-terminals] [and] terminals)

That way it's easier to know when a rule ends. As I said we're starting simple, we can refine later. The first thing we're going to do is have the entire Grammar read into a lua variable. Then lua will read out the rules in a numbered manner. If you know nothing about LUA programming, this is a really good time to learn. Though, it's quite possible that you can pick it up on the way. Okay - let's do our simple task.

Detecting A Rule

Usually we'd have Grammar to define our Grammar and the work off this. This grammar would say things like "non-terminals are enclosed in [,]'s, terminals aren't" but I'd rather wing it as I'm lazy and it shouldn't be that complicated. Let's have a look at some pseudo code for reading a rule.


ReadARule

First non-space character should be '('
If it's not report Then an error and a line number

Eat(
ReadANonTerminal
Eat)

ReadAnArrow

Eat(
ReadASetofNonTerminalsAndTerminals
Eat)

So yes this might be familiar because yes we're kinda writing a parser for a language. But our language is very simple and specialized so not to worry. They key is to be very strict and be quite willing to throw out error messages left, right and center. Of course this makes it very easy to be the programmer perhaps slightly harder to be the user. We could have the computer intercept these errors and then show the user exactly where his error seems to be on the form. Maybe some time in the distant future.

Enough babling though let's see if we can't turn the above into Lua code. We don't have to exactly follow the suggested functions in the pseudo but it does seem to provide some clarity. Here's some code to start us on our path:


--CFG  Version 0.01

--Globals
noOfRules = 0; --the number of the rule we're currently parsing


--Functions
function ReadARule(rules)

Output(noOfRules .. ": " .. rules);
noOfRules = noOfRules + 1;
end

--Main Code


ReadARule(GRAMMAR);

From this you can probably pick up on a few things. "--" indicate comments much like "//" in C#, C and various other languages. The function doesn't use braces, instead it has the keyword end. And Lua is dynamically typed so we don't have to explicitly say what our variables are actually going to be.

What does the code do? Well it reads the entire grammar and then put's the number 0 in front of it and outputs it. So we've solved the problem for the one-rule grammar - yay us! ".." is string concatination. We'll read through the grammar and extra rules. Each time we extract a rule we'll number it. Then we'll output all this information. Eventually we're going to be filling in a data structure with these rules so we have to seperate them into their constiuent parts. With the above code we're well on our way.

Polishing Small Flaws

Writing this though has brought one or two potential problems to light. One, which I know about, is that writing in Lua, if you make a mistake then the entire program will crash. We'll deal with this but not now. Another problem is that Ouput erases the entire contents of the output box - I think this is more of a feature and while it can be changed doesn't need to be. Especially now we know how to use the "..". The last one is when we press the GO button - it doesn't save the LUA content to disk. So if it crashes we lose our changes. This is a bit unforgivable so we'll rectify it now.

It's simple enough - so to the GO button function and before anything is done call the Save Lua File button pressing handler. The code goes like so:


private void buttonGo_Click(object sender, System.EventArgs e)
{
 //Save Lua file
 menuItem3_Click(sender, e);
 lua["GRAMMAR"] = textBoxGrammar.Text;
 lua.DoString(this.textBoxLua.Text);
}

An improvement on this would be to disable the GO button until there is both some LUA code and some Grammar stuff too. Let's do that then. First we'll add some code to the editing the Lua box. Like so:


private void textBoxLua_TextChanged(object sender, System.EventArgs e)
{
 if(textBoxLua.Text != "")
     buttonGo.Enabled = true;
 else
     buttonGo.Enabled = false;

Then to make everything work well we should also add some enable / disable code when loading a LUA file.


public void ReadLuaFile(string fileName)
{
 if(fileName.EndsWith(".gin"))
 {
     try
     {
         textBoxLua.Clear();
         IFormatter formatter = new BinaryFormatter();
         Stream stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read);
         textBoxLua.Text = (string) formatter.Deserialize(stream);
         stream.Close();

     }
     catch (Exception e)
     {
         // Let the user know what went wrong.
         MessageBox.Show(this, "The file could not be read.",e.Message);
     }
 }
 else // save as plain text
 {
     //the user might want to call it .txt or .lua or whatever
     //no restrictions will be placed on the extension but only plain
     //text will be written.

     using (StreamReader sr = new StreamReader(fileName))
     {
         textBoxLua.Text = sr.ReadToEnd();
     }
 }
 
 if(textBoxLua.Text != "")
     buttonGo.Enabled = true;
 else
     buttonGo.Enabled = false;
 
}

Select buttonGo in the form designer, go to properties and set enabled to false. That should cover all our bases. There's one last thing that's worth polishing - stay in the form designer and select the Grammar textBox. Set AcceptsTabs to true. Go the luaTextBox and do the same. This means when we press tab we'll move along a tab in the text box. Instead of moving to another control on the form. This will help when we're formating our code.

Back To Writing Our Parser

To break up our rules we need to do some pretty intensive string processing. Therefore we'll want to use LUAs string library. This is something we have to say we're going to use in our C# code. So here we go:


public Form1()
{

 InitializeComponent();

 //Do Lua stuff
 lua.OpenStringLib();
 lua.RegisterFunction("Output" , this, this.GetType().GetMethod("luaWriteToOutput" ));

That was easy wasn't it? :D

Writing A Parser

Isn't really a beginner topic but whatcha gonna do? Ours is pretty simple and we have lua's groovy string functions to help us. In fact if I was feeling really clever, which I'm not, I'm a sure a program could be written that would strip out our rules - really simply. The only thing that worries me is the error states. What if it's given a malformed grammar can it report where it had the problem? I can't think of an easy way to do this so I'm going to handle it the old fashioned way (kind of, I'm new to lua and don't know much about it's gubbins)

First thing we want to do is check that's there a leading bracket. This is the only possible thing we can have because we can't actually make a run on an empty grammar! So if it's not there then we'll report an error. If it is there then we'll remove it, thus preventing us from detecting it again. Our first program then should be able to perform the following:

It kills off that first bracket. So we need a function that will do this and handle any possible error.

Read A Bracket Function

We're going to add three globals. Two are for error reporting. The third is the current state of the grammar we reading. As we read the grammar we're going to remove the bits we've read. For now we're storing this in a seperate global. Let's have a look at these globals then:

--Globals
noOfRules = 0;

readingGrammar = GRAMMAR;

error = false;
errorString = "no error";

That's not too bad. We're going to have code, that if an error is detected, then the error message is displayed to the Output window. You'll see how this works shortly. Let's look at the function that will eat a '('. It demands that the next non-space character is a '('. If this is not the case it returns an error. If it is the case then it updates the readingGrammar so that the paranthesis has been eaten. Here it is in code form:


function EatOBracket(s)
_, bracketstart = string.find(s, "%(");

--Error Checking
if(bracketstart == nil) then
error = true;
errorString = "Expected to find an opening (";
return "Error"
end

--Check all characters before this are empty
--If not then error

prefix = string.sub(s,1,bracketstart-1);
if(string.find(prefix,"%S+")) then
error = true;
errorString = "Expected ( got: " .. prefix;
return;
end

--Remove all read characters
readingGrammar = string.sub(s, bracketstart+1);

end

If you're as new to Lua programming as I am, this may take some explaining. I've called it EatOBracket for eating an opening bracket. Let's take this apart piece by piece.


_, bracketstart = string.find(s, "%(");

On the assigment side are two variables "_" and "bracketstart". "_" is a throw away variable I don't care what goes in there. This is a standard lua naming convention. string.find looks through the string and returns the first instance of (. It then returns the start position and the end position. As it's only one character long we only need to record one of these values. I chose to record the end position in a a variable I confusingly called bracketstart. That's all straight forward the thing that may seem a little out of place is the "%" mark. This is because ( is a special character and to say we don't want to make use of it's special use, rather we just want to search for it, then we must prefix it with an escape character - namely %. Mystery solved.


--Error Checking
if(bracketstart == nil) then
error = true;
errorString = "Expected to find an opening (";
return "Error"
end

If ( can't be found then it returns nil (think null) for both the start and end value. We check if nil has been returned and if it has then we have an error to deal with. So we set up our Global error variables and return the string; "Error". If nil hasn't been returned then we know where the first '(' is, so let's assume that's the case and get on with the code.


prefix = string.sub(s,1,bracketstart-1);
if(string.find(prefix,"%S+")) then
error = true;
errorString = "Expected ( got: " .. prefix;
return;
end

This code says "Does the next ( appear cleanly in the code or is there something before it?". For instance the following would be an error.


([Dog]) -> ("Frodo") "Spot"
([Cat]) -> ("Cuddles")

But unless we added the above code we wouldn't have been able to pick up on it. So lets check out that last piece of error checking code.

We assign prefix to be the sub string of all characters preceeding the (. The next line says if there are any non-space characters present then .... "%S+" means non-space characters.


%s means space characers
%S means the complement of s. (that is all non-space characters)
+  means one or more repetitions.

So we're asking can we find any non-space characters, if the answer is "No, we can't" then we can proceed. If we can find other characters then we inform the user of the error. Provided we get past both these error tests then we can remove the '('.


--Remove all read characters
readingGrammar = string.sub(s, bracketstart+1);

This says that everything starting after the ( character we to the end we cut out and assign to reading grammar. If you don't put a third argument here it defaults to -1 which is the last character of a string. (All this useful information is avaliable in Chapter 20 of Programming in Lua.)

Now we're ready to test this out. Though to do this we need a little bit more framework. Last time we had an early ReadARule function so let's modify that a bit.


function ReadARule(rules)

--Eat The First Bracket
if(EatOBracket(rules) == "Error") then return end;

Output(noOfRules .. ": " .. readingGrammar);
noOfRules = noOfRules + 1;
end

Notice the rather crude error testing we've added in :D This will serve. Next we have our "main" function.


--Main Code


ReadARule(GRAMMAR);

if(error == true) then
Output(errorString)
end

We attempt to read one rule. If during the course of our reading an error flag is raised then we output that error (output clears the text box before writing so we'll only ever see the error). Now you can play about with different grammar inputs and make sure only the correct code is accepted. The output window will display an error message or it will display the string you entered with the first ( removed.

Reading a non terminal

We expect a non-terminal to be enclosed in square braces. Within the square braches can be anything apart from a closing square brace. You'd actually think we'd be able to share code with previous bracket detection thing - I'd tried this and struggled to get it to work. Passing in %[ just didn't seem to want to go, maybe I'm missing something obvious. I didn't bother for too long before I soldiered on!

We want the meaty goodness of the non-terminal. But where do we store this meaty goodness? Well with LUA this isn't a very hard choice it having the only one major data structure. A TABLE! (notice how this all links back to the first tutorial - it's not planning, it's magic). Our table isn't too hard to set up. Another global and away we go.


--Globals
ruleTable = {};

At this point you might be thinking, Dan, this seems to be sloppy coding whats with all the globals. Luas a prototypingy, scriptingy language and I'm experimenting not laying this code down forever. That's why! It can be done of course in lua, it's tables are really objects! :o Suprise! But this isn't something I'm going to get into here as I wouldn't know what I'm talking about (even moreso than usual).

Anyway CODE!


function ReadNonTerminal(s)
--must begin and end with [,]'s

_, bracketstart = string.find(s, "%[");

--Error Checking
if(bracketstart == nil) then
error = true;
errorString = "Expected to find an opening [";
return "Error"
end

_, bracketend = string.find(s, "%]");

--Error Checking
if(bracketend == nil) then
error = true;
errorString = "Expected to find an closing ]";
return "Error"
end

--more error checking nothing occurs before the non-terminal

prefix = string.sub(s,1,bracketstart-1);
if(string.find(prefix,"%S+")) then
error = true;
errorString = "Expected [ got: " .. prefix;
return "Error";
end

--remove this section from the global grammar, yet add to the
--current rule.
ruleTable[1] = string.sub(s, bracketstart, bracketend);
readingGrammar = string.sub(s, bracketend+1);


end

So we find a starting [ and check that one exists, then we find the ending ] and check that exists too. Once we're satisfied of their existance we check nothing comes before the non-terminal that shouldn't, errant characters or numbers, that kind of thing. If we're still okay then we add the non-terminal as the start of the rule. We're merely storing the rules as strings for now. Then we cut off what we've read. Wonderbar.

Let's see the modifications to the main code that will give us some feedback on what we've done.


function ReadARule(rules)

--Eat The First Bracket
if(EatOBracket(rules) == "Error") then return end;

if(ReadNonTerminal(readingGrammar) == "Error") then return  end;

Output(noOfRules .. ": " .. ruleTable[1] ..
"\n| What's left: ".. readingGrammar);
noOfRules = noOfRules + 1;
end

It reads out what we've currently got of the rule and what's left of the grammar. For some reason the newline does not register with the text box. I'm willing to leave this as an enduring mystery for now. Everything is looking pretty spiffy I'm quite happy. Once again feel free to experiment.

Polishing

Once again a few annoyances have crept in, oh for some kind of line-number tool. I've tried to make one but I failed. I feel the sneaking supiscion that a RichText box would solve all my woes. But not yet! Lua crashing the window is still a pain but I need to read some documentation to solve that! So psstf save that for later. The main ease of use improvement I can think of right now is making sure the code is saved if we shut down. So let's do that!

Hook up a nice closing event. We want to do things before the form closes - and this is were we'll do them!


private void Form1_Closing(object sender, System.ComponentModel.CancelEventArgs e)
{
 if(textBoxGrammar.Text.Length != 0)
     buttonSaveGrammar_Click(this, null);

 if(textBoxLua.Text.Length != 0)
     menuItem3_Click(this,null);
}

If we close with out saving we prompt for save or autosave. Because I'm messing about with the grammar so much at the moment I'm going to comment the first one out. It makes editing that much easier.

Read The Closing Brace

Now we wish to read the closing brace. Maybe here we can do some code sharing.

Let's generalize the bracket reading function:


function EatBracket(s, bType)
_, bracketstart = string.find(s, bType);

--Error Checking
if(bracketstart == nil) then
error = true;
errorString = "Expected to find: "
     .. string.sub(bType, 2); --remove the %
return "Error";
end

--Check all characters before this are empty
--If not then error

prefix = string.sub(s,1,bracketstart-1);
if(string.find(prefix,"%S+")) then
error = true;
errorString = "Expected " .. string.sub(bType, 2)
           .. " got: " .. prefix;
return "Error";
end

--Remove all read characters
readingGrammar = string.sub(s, bracketstart+1);

end

First let's get the above working for the opening bracket again - shouldn't be too hard. All we have to do is pass in the extra argument of %(


if(EatBracket(readingGrammar, "%(") == "Error") then return end;

No problems. Now let's add a closing brace. Should be easy as pie.


if(EatBracket(readingGrammar, "%(") == "Error") then return end;

if(ReadNonTerminal(readingGrammar) == "Error") then
return
end;

if(EatBracket(readingGrammar, "%)") == "Error") then return end;

That works very nicely. Feel free to experiment. Next we have to eat the arrow this should also be pretty easy to code.

Reading An Arrow

This should be relatively simple. All we do is match "->". A lot of the code is also similar to before so we're going to add it all in one big chunk.


function EatArrow(s)
--eat the arrow -> and remove it from the code

arrowtail, arrowhead = string.find(readingGrammar, "->");

--Check we can find it
if(arrowhead == nil) then
error = true;
errorString = "Expecting to find arrow ->." ..
      "Instead have the following " ..readingGrammar;
return "Error";
end

--Check there are no errant characters behind it
prefix = string.sub(s,1,arrowtail-1);
if(string.find(prefix,"%S+")) then
error = true;
errorString = "Expected -> got: " .. prefix;
return "Error";
end

--Add Arrow to rule string
ruleTable[1] =
ruleTable[1] .. string.sub(s, arrowtail, arrowhead);
readingGrammar = string.sub(s, arrowhead+1);

end --END OF ARROW EATER

So we search for the "->" arrow and we record it's start and it's end. If it's not there then we return an error. If there's any non-space characters before the arrow we also return an error. If everything is fine then we add arrow to the rule string and remove the arrow from the grammar we're reading. Easy!

Let's add it to the ReadARuleFunction

function ReadARule(rules)

--Eat The First Bracket
if(EatBracket(readingGrammar, "%(") == "Error") then return end;

if(ReadNonTerminal(readingGrammar) == "Error") then
return
end;

if(EatBracket(readingGrammar, "%)") == "Error") then return end;


if(EatArrow(readingGrammar) == "Error") then
return
end;

Simple!

After this an opening bracket

Now another opening bracket, it is the easiest thing in the world. We just add another line to our ReadARule function.


if(EatBracket(readingGrammar, "%(") == "Error") then return end;

if(ReadNonTerminal(readingGrammar) == "Error") then
return
end;

if(EatBracket(readingGrammar, "%)") == "Error") then return end;


if(EatArrow(readingGrammar) == "Error") then
return
end;

if(EatBracket(readingGrammar, "%(") == "Error") then return end;

Reading a terminal or a non-terminal

Now this is a bit tricky. we have to look adhead and check what we'll be reading - a terminal or a non-terminal or a end bracket and then deal with it.

Let's think this through with some psuedo code. Of course the C way would be reading character by character but I don't know how to do that easily in Lua this is the first thing I've ever programmed in it! And if I may reiterate once again - it's not speed we're after.


can we detect an [ if
YES: IS THE PREFIX CLEAR
YES: READ A NON-TERMINAL
NO: can we detect a "
 YES: IS THE PREFIX CLEAR
 NO: can we detect a (
 ...

Messy. I can think of maybe a better way.



While (detect) is false and error is false)
{
if( detect[ is false
and detect" is false)
error = true;
errorString = "Expected ')' got " .. readingGrammar;
end;
}

If detect is a success it auto reads and eats the terminal or non-terminal. We'll create a main function called EatRuleBody. This needs three detect functions. We also need to write a eat function for a non-terminal, by the by, I decided a non-terminal would be put in quotes "".

First let's read an empty rule.


([nothing]) -> ()

This shouldn't be too hard. But until we finish the code we run the problem of a possible infinite loop. We'd rather avoid this if at all possible.

function EatRuleBody(s)

result = DetectCloseBrace(s);
while(result == "False" and not error) do
result = DetectCloseBrace(readingGrammar);
end

if(result == error) then return "Error" end

end --END OF EAT RULE BODY

With the above code if we have

([nothing]) -> ("nothing")

We'll enter an infinite loop and the program will crash - so don't do that! We can however report a correct error from:

([nothing]) -> (

Here we expect the closing brace but can't find it. With the EatRuleBody code our error messages may not be in the same order as the errors present in the code. Not to worry though. Let's write our detect function which might be more aptly named -"detect and if possible eat".


function DetectCloseBrace(s)

_, bracketstart = string.find(s, "%)");

--Error Checking
if(bracketstart == nil) then
error = true;
errorString = "Expected to find ) instead got: "
     .. s;
return "Error";
end

--Check all characters before this are empty
--If then return false

prefix = string.sub(s,1,bracketstart-1);
if(string.find(prefix,"%S+")) then
return "False";
end

--if it's the last character then eat the brace
if(EatBracket(readingGrammar, "%)") == "Error") then return "False" end;

return "True"; --success don't really have to return this

end

It's very similar to the eat code. But finding characters in the prefix doesn't mean an error this time, instead it means that there's earlier stuff to read.

Detecting Non-Terminals

This is also pretty easy but we want to add the non-terminal to the rule in our ruleTable. The current non-terminal function replaces the entire rule with the non-terminal - as before we had just assumed this was alpha bit of the production rule. Therefore let's make a little edit to the non-terminal code. Before we begin a rule we initiate the rule string to the empty string. This is am important thing to do in a dynamically typed language.


function ReadARule(rules)

ruleTable[1] = "";

Then instead of replacing the entire rule with the discovered non-terminal we instead concatinate the non-terminal to the current rule we already have.


function ReadNonTerminal(s)
--must begin and end with [,]'s

_, bracketstart = string.find(s, "%[");

--Error Checking
if(bracketstart == nil) then
error = true;
errorString = "Expected to find an opening [";
return "Error"
end

_, bracketend = string.find(s, "%]");

--Error Checking
if(bracketend == nil) then
error = true;
errorString = "Expected to find an closing ]";
return "Error"
end

--more error checking nothing occurs before the non-terminal

prefix = string.sub(s,1,bracketstart-1);
if(string.find(prefix,"%S+")) then
error = true;
errorString = "Expected [ got: " .. prefix;
return "Error";
end


--remove this section from the global grammar, yet add to   --the current rule.
 ruleTable[1] =
ruleTable[1] .. string.sub(s, bracketstart, bracketend);
readingGrammar = string.sub(s, bracketend+1);

end --END OF NT EATER

After adding this we remove our infinite loop problem. Let's flesh out the EatRuleBody function.


function EatRuleBody(s)

result = DetectCloseBrace(s);
while(result == "False" and not error) do

  if(DetectNonTerminal(readingGrammar) == "False") then
error = true;
errorString = "Expected end brace ) got "
            .. readingGrammar;
return;
end;

result = DetectCloseBrace(readingGrammar);
end

if(result == error) then return "Error" end

end --END OF EAT RULE BODY

If we can't detect an end-brace and we can't detect a non-terminal then there's an error - we should at the least expect an end-brace so that's what we tell the user. If a non-terminal is detected it's eaten.

The following errors are detected and a suitable error message is generated.


([d]) -> ([g]
([d]) -> ((
([d]) -> (error)
([d]) -> ([tt)

It works pretty well now. Once we add support for non-terminals then we've pretty much got a fully functional parser. (If you ever want to write a programming language - winging it, like this, probably isn't the way to do it. Look into Backus Naur Form, Bison, Yacc and Lex).

Terminal Detector and Eater

Well this is going to be pretty much exactly the same as the non-terminal stuff but using "'s. Easy! Just copy paste with some minor modification. Actually there's a few hiccups with using the same symobol '"' for start and end. We must cut off the first one before searching for the second. Also we must use the correct character codes.


function DetectTerminal(s)
_, bracketstart = string.find(s, "\"");

--We don't have to have a terminal
if(bracketstart == nil) then
  return "False";
else
 prefix = string.sub(s,1,bracketstart-1);
 if(string.find(prefix,"%S+")) then
  return "False";
 end

 if(ReadTerminal(readingGrammar) == "Error") then
    return "False";
 end;

 return "True";
end

end --END DECTECT TERMINAL

It's pretty much the same as earlier. If we detect a ' " ' and there's nothing before it - well then read in a terminal. Otherwise return False. Now let's look at the ReadTerminal statement.


function ReadTerminal(s)
--must begin and end with "'s

_, bracketstart = string.find(s, "\"");

--Error Checking
if(bracketstart == nil) then
error = true;
errorString = "Expected to find an opening \"";
return "Error"
end


--more error checking nothing occurs before the terminal

prefix = string.sub(s,1,bracketstart-1);
if(string.find(prefix,"%S+")) then
 error = true;
 errorString = "Expected \" got: " .. prefix;
 return;
end

s2 = string.sub(s, bracketstart+1);

_, bracketend = string.find(s2, "\"");

--Error Checking
if(bracketend == nil) then
error = true;
errorString = "Expected to find an closing \"";
return "Error"
end

--remove this section from the global grammar, yet add to   --the current rule. plus one cos we cut off the start
ruleTable[1] =
ruleTable[1] .. string.sub(s, bracketstart, bracketend+1);
readingGrammar = string.sub(s2, bracketend+1);

end --END OF TERMINAL EATER

We have to find two "'s then we have to make sure there's nothing before the first ". Then we need to cut out what's between the "'s including the "'s. It's that easy ... kinda. I think the code is reasonably followable. Once this has been written then we can update EatRuleBody. It's a simple add on that will allow our code to accept pretty much any rule body - even the empty one.

function EatRuleBody(s)

result = DetectCloseBrace(s);
while(result == "False" and not error) do

if(DetectNonTerminal(readingGrammar) == "False"
      and DetectTerminal(readingGrammar) == "False") then
 error = true;
 errorString = "Expected end brace ) got "
              .. readingGrammar;
 return "Error";
end;


result = DetectCloseBrace(readingGrammar);
end

if(result == error) then return "Error" end

end --END OF EAT RULE BODY

Once we make some simple changes to ReadARule we're practically done. So let's make those changes now.


function ReadARule(rules)

ruleTable[1] = "";

--Eat The First Bracket
if(EatBracket(readingGrammar, "%(") == "Error") then return end;

if(ReadNonTerminal(readingGrammar) == "Error") then
return
end;

if(EatBracket(readingGrammar, "%)") == "Error") then return end;


if(EatArrow(readingGrammar) == "Error") then
return
end;

if(EatBracket(readingGrammar, "%(") == "Error") then return end;

 if(EatRuleBody(readingGrammar) == "Error") then return end; 

ruleTable[noOfRules] = noOfRules .. ": " .. ruleTable[noOfRules] ..
    "\n| What's left: ".. readingGrammar;
noOfRules = noOfRules + 1;
end --END EAT A RULE

So that's a simple addition. More of a problem is that at various places in the code we've used the magic number 1, as in the first rule. What should have been there is the noOfRules variable. This means all the functions are working with the current rule, rather than just rule number 1. If you look at the start of ReadARule there's an example of this.


function ReadARule(rules)

ruleTable[1] = "";

Must be changed to read:

function ReadARule(rules)

ruleTable[noOfRules] = "";

Talking of which we should change the global too so it's starts from 1, which is standard in Lua. So let's do that now:


noOfRules = 1; --index in Lua tends to be from 1

Now checking through the code for more '1' offenders. There's one at the end of the non-terminal rule eater:

--remove this section from the global grammar, yet add to   --the current rule.
ruleTable[noOfRules] =
ruleTable[noOfRules] .. string.sub(s, bracketstart, bracketend);
readingGrammar = string.sub(s, bracketend+1);

end --END OF NT EATER

And at the end of the terminal eater! An epidemic.


--remove this section from the global grammar, yet add to   --the current rule. plus one cos we cut off the start
ruleTable[noOfRules] =
  ruleTable[noOfRules] .. string.sub(s, bracketstart, bracketend+1);
readingGrammar = string.sub(s2, bracketend+1);

end --END OF TERMINAL EATER

And at the end of the arrow eater - I'm not showing the code, as I'm positive you have the idea :D. You should confirm everything works by running the parser on a test grammar.

There are other problems with arrow eater too. It doesn't have proper error checking - oh no! Before that though we need to setup our read a rule so it returns errors correctly. I know Lua does exceptions but I haven't read about them yet, so this is the way I'm doing it. I might go to exceptions in a future revision.


function ReadARule(rules)

ruleTable[noOfRules] = "";

--Eat The First Bracket
if(EatBracket(readingGrammar, "%(") == "Error") then
return "Error";
end;

if(ReadNonTerminal(readingGrammar) == "Error") then
return "Error";
end;

if(EatBracket(readingGrammar, "%)") == "Error") then
return "Error";
end;

if(EatArrow(readingGrammar) == "Error") then
return "Error";
end;

if(EatBracket(readingGrammar, "%(") == "Error") then
return "Error";
end;

if(EatRuleBody(readingGrammar) == "Error") then
return "Error";
end;

ruleTable[noOfRules] = noOfRules .. ": " ..
                    ruleTable[noOfRules] ..
                    "\n| What's left: "..
                    readingGrammar;
noOfRules = noOfRules + 1;
end --END EAT A RULE

This should help things go a little smoother when we're attempting to detect errors over the entire grammar. Well I've messed about with the code now and I've got the read out working. The formatting is unpleasent but nothing something i'm going to worry over.

function ReadAllRules(s)

while( string.find(readingGrammar,"%S+") ) do

if(ReadARule(readingGrammar) == "Error") then
 error = true;
 return "Error";
end;

end

This reads all the rules. It says "keeping reading rules until there's only empty space stuff left". It also handles any errors perfectly (though error messages supplied at this point may be a little off!). So this records all the rules in our table. Subarashi!

Next up is to write out the contents of our table. For some reason this took me a while though it's very straight forward.


function OutputResults()

all = "";
stop =  noOfRules-1;

for i = 1, stop do
all = all .. ruleTable[i];
end

return all;

end

It's noOfRules minus one because we always increment the value everytime a rule is parsed. So when parsing rule ten we up the amount to 11 because that's the next rule we'll be looking at. So that's the minus one, everything else is pretty normal. Let's look at the main code.

ReadAllRules(GRAMMAR);


if(error == true) then
Output(errorString)
else
Output(OutputResults());
end

And that seems to work pretty well. Of course, as I said, the formatting is quite nasty but at least it's doing what we want :D.

That'll do it. For the parser that's pretty much all there is to it! Yes some of the error reporting is wonky and so is the output but hey it's definitely working! It's something that can be refined and used.

[Grammar Parser in GIN format]
[Grammaratron Program with slight polishing]

Godpatterns

Pages

Sunday, April 24, 2005

The Wonderful World Of Grammars Part II: A Lua Grammar Parser

No comments: