Musing on #ctran.... - Random

thelastpsion, 4 months ago

Musing on #ctran.

I'm starting to wonder if there's any point in having the lexer and parser as two separate classes.

Other than testing, the lexer is only ever going to be called by the parser, and only once during the process.

It might be better to just have a lexer-parser class that grabs a file, tokenises it, then (if it's happy with the file it's tokenised) immediately turns it into a tree.

Is there a really good reason why they should be separate classes?

#compiler #objectpascal #oop

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Image

Image alternative text

M0CUV, 4 months ago

@thelastpsion without looking at the code (yet!) I’d keep them separate to reduce complexity. In a test-driven approach, your code is always being reused at least twice: by tests and the application. Forcing it to be reusable leads to it also being decoupled by design.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

thelastpsion, 4 months ago

Had a crazy idea for #ctran.

What if the "parser" part, rather than grabbing a token at a time, grabbed a line of tokens as an array?

New lines are significant in #Psion OO category files, but a newline doesn't have to be tokenised. I'm already recording which line I'm lexing, and the parser is only ever going to work directly with the lexer.

This way I don't have to deal with newline tokens, or worry about skipping too many tokens.

Processing the file gets easier, the code gets neater.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

thelastpsion, 4 months ago

I could even have the lexer part build an array of lines, with each line being an array of tokens. I don't know if that would use up more RAM, but it would mean that I wouldn't have to have an EOF token.

Checking for EOF with the make-a-line-of-tokens method would be much easier, too. No need to worry about an EOF appearing randomly - just issue after everything with a weird line number.

I already know exactly how many tokens there should be for each line type. If it's too few, throw an error.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

thelastpsion, 4 months ago

There is, of course, an argument for completely merging the two together. Don't bother tokenising, just build the tree. I'm kind of halfway there already, as the lexer needs to know what part of the file it's in to know how to make tokens.

But lexing/tokenising this file first does make it easier to see whether the file that's being processed is somehow malformed. If the lexer says all's well, then we can do more complex stuff.

It's belt-and-braces.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

thelastpsion, 4 months ago

About 30 lines of code later, and I can now pull lines of tokens, one by one. I've also been able to remove the generation of newline tokens, as they're no longer needed.

I didn't realise in #FreePascal that Result is preserved in a function! I'm having to nil one of the dynamic arrays every time, otherwise it remembers what it generated last time.

#ctran's parser has now been disabled because it relied on newlines, but that's fine. It's about to be torn apart.

ALT text explains the output.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kroc, 4 months ago

@thelastpsion The function name is an implicit variable you can assign the return value to; Return is convenience name for this. This means that you can read the current return value back and append to it etc.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

thelastpsion, 4 months ago

@kroc Ah! So it works quite differently to return in C. Makes sense.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bread80, 4 months ago

@thelastpsion @kroc Result is your friend. You can built a return value with it - such as appending substrings. Or you can assign a default return value but overwrite that with a different value if needed.

For something more akin to C’s return look at Exit() which is useful if you want to abort on a guard clause or save on nested conditionals.

And while we’re here, look up Continue and Break for loops.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment