alcinnz, How about a tool where you take some sample input, annotate with an approximation of the desired Abstract Syntax Tree, & it'll generate a parser? By merging rules to match each sample string for each rule?
A postprocessing pass would generalize away from any overfitting, auto-incorporating a Unicode library to steer devs in the right direction. Where that Unicode library would be code-generated indirectly from Unicode's datafiles via this very tool!
2/4?