Believe it or not with what we know we’re almost ready to start writing a simple parser
1) We want to define a predicate that recognizes the following chars: “!$%&|*+-/:<=>?@\^_~#”
2) the predicate should be named symbol and its type and mode
signatures should look like:
:- pred symbol(char, list(char), list(char)).
:- mode symbol(out, in, list(char)::out) is semidet. symbols parser
unified).3) in main take the first command line argument and checks if the
symbol parser matches it * if the first command line parameter
doesn’t exist print out an error and exit (do nothing).
4) if the parser (symbol predicate) matches (unifies) the input
string we should print out “Found value: “ the value we found, if
not it should print out “No match.”
Note: while it may seem arbitrary structuring the arguments this way has a purpose - this will be explored further in the series.
strings and charsPreviously we’ve discussed ints and lists - now to actually
implement parts of the scheme interpreter we need to introduce two new
data types char and string.
While both char and string specifics are implementation dependent
(they’re backed by char and a char* in the C grade and Char and
String in java) we can think of char as a UTF code
point and a string as a
container of chars (whether this is a list, an array or something else
entirely is implementation specific, and uninteresting).
To use chars or strings in your code (in type declarations, or using
predicates from them you must first import the module).
:- import_module char, string.If you don’t export any predicate (and you shouldn’t!) that has char or string you should put the above imports into the implementation section (and not to the interface section)
What I’ve been glossing over so far is that list in mercury is
actually list(T) - lists in mercury are defined for any (the
parametric type T is meant to be similar to a variable name) type i.e
we can have an list(int), list(char), list(string) etc.
Lists in mercury are exactly like the list you may know from other languages - each cell or element of the list points to the rest of the list (or is an empty list).
An empty list in mercury is just [] and we can deconstruct a list to
its head and tail like so:
[Head | Tail] = List.This will unify Head with the first element of the list or fail if
the list is empty. If Head will be unified then Tail will always
unify to the rest of the list (the whole list sans the first element).
If we’re not interested in some variable (anywhere in our predicate not
only in lists) we can name it either _ or any name that starts with an
underscore. That variable is considered “anonymous”. The mercury
compiler will warn us if we have unused variables - so in the case we
really don’t care about them it’s good form to make them anonymous.
We’ll need the above concepts to get the first argument given to the
program (and don’t forget to :- import_module list!)
The io module provides an useful predicate for that:
:- pred io.command_line_arguments(list(string)::out, io::di, io::uo) is det.Mercury has some syntax sugar for predicates with only one mode - the
variable modes and determinism can be added directly to the type
declaration as long as every variable has a mode (separated by ::).
The above predicate will (always - hence det… just a reminder) unify
to the argument list “consuming” the di io instance and providing a
new one.
To get a list of chars from a string use:
:- pred string.to_char_list(string::in, list(char)::out) is det.There’s also one predicate that may come in in handy - checking if a
string contains a char:
:- pred string.contains_char(string::in, char::in) is semidet.This predicate will unify if the input ‘string’ contain the input ‘char’.
Mercury like most languages has a if statement built-in, the syntax is:
( condition ->
consequent
; alternative)Or using a more “literal” style:
if condition then
consequent else
alternativeWhile I prefer the second style I’ll use the first throughout they tutorial because:
The if then else expression will try to unify consequent if
condition unifies or alternative if it does not.
If you’re coming out of a functional programming background you may think that ifs in mercury are expressions returning a value - but that isn’t the case - the whole model of logic programming doesn’t understand what “returning a value” is - there’s only unification.
One caveat of using if expressions is that you can’t instantiate something in the condition for example introduce a variable - it has to be done in the consequent and alternative instead (and this is how we can “conditionally” pass some values). Consider:
( 1 = 1 ->
X = "Hello world",
; X = "Goodbye world"),
io.write_string(X, IO_In, IO_out).This small snippet would print out “Hello world” or, if we change the
condition from 1 = 1 to 1 = 2 it would print “Goodbye world”.
This on the other hand would not compile:
( 1 = 1, X = "Hello world" ->
io.write_string("1 = 1", IO_1, IO_2)
; io.write_string("1 != 1", IO_1, IO_2),
io.write_string(X, IO_2, IO_3)).Though referencing X is completely valid inside the if statement.
This is how our program should behave:
$ ./main
No arguments given!
$ ./main a
No match!
$ ./main '>'
Found match: >
$ ./main '$'
Found match: $If you’re trying to implement this by yourself the things that tricky might be tricky are:
chars, lists, or strings (and you
should!) - import them in the implementation section.. after a predicate
declaration) most of them will pinpoint your error.io - remember the main predicate should unify some “clean” input
io instance and the io instance that has gathered all effects,
this may mean that you should construct your main predicate in the
following way:main(IO_1, IO_Last) :-
io.write_string("Hello", IO_1, IO_2),
Some_Variable = "world",
io.write_string(Some_Variable, IO_2, IO_3),
% we can print chars too!
io.write_char('C', IO_3, IO_Last).Note the new io.write_char predicate.
Code for this part is located here
Let’s get the obvious imports end exports out of the way:
:- module main.
:- interface.
:- import_module io.
:- pred main(io, io).
:- mode main(di, uo) is det.
:- implementation.
:- import_module char, string, list.Now for the symbol predicate:
:- pred symbol(char::out, list(char)::in, list(char)::out) is semidet.
symbol(C, ListIn, ListOut) :-
ListIn = [C | ListOut],
string.contains_char("!$%&|*+-/:<=>?@^_~#", C).There’s really nothing tricky behind it. You might wonder about
ListIn = [C | ListOut]. but that’s equivalent to
ListIn = [C | X], X = ListOut. We consume one character (the first
one), and unify ListOut with what’s unconsumed.
The predicate may obviously fail on trying to unify C with the head
(or the fist element) of ListIn - but that’s ok - well handle that in
main. Next we just check if C is any of the chars that we wanted to
match by using contains_char (and that may also fail to unify).
Lastly let’s look at the main predicate:
main(IO_1, IO_Last) :-
io.command_line_arguments(Arguments, IO_1, IO_2),
( Arguments = [First | _Rest] ->
string.to_char_list(First, CharList),
( symbol(C, CharList, _Other_Chars) ->
io.write_string("Found match: ", IO_2, IO_3),
io.write_char(C, IO_3, IO_4),
io.write_string("\n", IO_4, IO_Last)
; io.write_string("No match!\n", IO_2, IO_Last))
; io.write_string("No arguments given!\n", IO_2, IO_Last)).Again nothing tricky here - first we use command_line_arguments from
the io module then we check using the if-then-else statement if the
Arguments have a first element - if not we print “No arguments
given!\n”)
If so we convert the First element of the list into CharList a list
of characters and then again check if
symbol(C, CharList, _Other_Chars) succeeded - if so we print out what
we’ve matched if not we print out “No match!\n”.
You might think that programming in mercury is a little cumbersome with
all the io variables being passed around - don’t fret mercury has
special mechanisms that alleviate this - we will introduce them in due
time.
That’s all for this part - we’ll introduce recursion and extend our simple parser to handle whitespace. See you!