pStringWalk creates string parsing objects.
local W = pStringWalk.new("foobar")
local first = W:litReq("foo", "Missing crucial 'foo'")
local second = W:litReq("bar", "Missing irreplacable 'bar'")
print(first, second) --> foo bar
- API: pStringWalk (The Module)
- API: Walker (The Object)
- W:_status
- W:assert
- W:bytes
- W:bytesReq
- W:error
- W:find
- W:findReq
- W:getByteMode
- W:getIndex
- W:getLineCharDisplay
- W:getLineCharNumbers
- W:getName
- W:getTerseMode
- W:goEOS
- W:isEOS
- W:lit
- W:litReq
- W:match
- W:matchReq
- W:peek
- W:plain
- W:plainReq
- W:pop
- W:popAll
- W:push
- W:req
- W:reset
- W:seek
- W:setByteMode
- W:setLineCharDisplay
- W:setName
- W:setString
- W:setTerseMode
- W:step
- W:warn
- W:ws
- W:wsNext
- W:wsReq
- Module Notes
API: pStringWalk (The Module)
pStringWalk.countLineChar
Gets the line and character numbers for a byte position in a UTF-8 string.
local ln, cn = pStringWalk.countLineChar(s, i, j, ln, cn)
-
s: The string to scan. -
i: The byte position in the string. -
j: Where to start scanning in the string (1 on the first call). -
ln: The initial line number to use (1 on the first call). -
cn: The initial character number to use (1 on the first call).
Returns: The line and character numbers.
Notes
This function can be used instead of W:getLineCharNumbers for collecting sequential line and character numbers in a loop. While the walker method begins counting from the first byte every time, this function can start from any valid UTF-8 start byte.
The results are unreliable if the UTF-8 encoding is bad, if i is out of bounds or greater than j, or if any of the numeric arguments are not integers.
pStringWalk.new
Creates a new walker.
local W = pStringWalk.new([s], [name])
-
[s]: (empty string) An optional string to assign. -
[name]: (nil) An optional name to use when generating warnings and error messages.
Returns: The walker.
API: Walker (The Object)
W:_status
Prints details about the walker’s internal state to the terminal. Intended for debugging.
W:_status()
W:assert
Raises an error if exp evaluates to false.
local retval = W:assert(exp, [err])
-
exp: The expression to evaluate. -
[err]: The error message.
Returns: the result of exp, for the convenience of variable assignment.
W:bytes
Gets a substring, from the walker’s position to an offset in bytes. If the offset is end-of-string (for example, attempting to get 10 bytes from the string "foo"), then nil is returned.
local sub_str = W:bytes([n])
-
n: (1) How many bytes to read from the walker’s position.
Returns: The substring, or nil if the offset goes beyond the end of the string.
Notes
It’s an error to provide an offset of zero or less, or a fractional value.
W:bytesReq
Like W:bytes, but raises an error when the request is unsuccessful.
local sub_str = W:bytesReq([n], [err])
-
n: (1) How many bytes to read from the walker’s position. -
[err]: The error message.
Returns: The substring.
W:error
Raises an error. Depending on the walker’s configuration, the output may include a name, a line number, and a character number. If the walker is in Terse Mode, then a generic error message will be displayed instead.
W:error([str], [level])
W:find
Calls string.find at the current position. If a match is found, returns the i and j indices and captures, and advances the walker’s position past j.
local i, j --[[, captures...]] = W:find(ptn)
-
ptn: The search pattern string.
Returns: The boundaries of the result, and up to 16 captures, or nil if there wasn’t a match.
W:findReq
Like W:find, but raises an error when the search is unsuccessful.
local i, j --[[, captures...]] = W:findReq(ptn, [err])
-
ptn: The search pattern string. -
[err]: The error message.
Returns: The boundaries of the result, and up to 16 captures.
W:getIndex
Gets the walker’s position, in bytes. If there are stack frames, then the lowest stack position is used.
local i = W:getIndex()
Returns: Byte index of the walker in the string.
Notes
The walker’s current byte index can be read directly at W.I. This may be wanted instead of the position of the lowest stack frame.
W:getLineCharDisplay
Gets the display settings for line and character numbers.
local line, char = W:getLineCharDisplay()
Returns: Two booleans: the first for line number display, the second for character number display.
W:getLineCharNumbers
Gets a line and character number for the walker’s position. If there are stack frames, then the lowest stack position is used.
local ln, cn = W:getLineCharNumbers()
Returns: The walker’s line number and character number.
Notes
This method is only valid for correctly encoded UTF-8 strings.
The count always starts from index 1 of the string. To count line and character numbers incrementally, see the function pStringWalk.countLineChar.
W:getName
Gets the current Walker Name, if any.
local name = W:getName()
Returns: The Walker Name, or nil.
W:getTerseMode
Gets the Terse Mode setting.
local enabled = W:getTerseMode()
Returns: true or false.
W:goEOS
Moves the walker position to end-of-string.
W:goEOS()
Returns: W, for method chaining.
W:isEOS
Tells if the walker position is end-of-string.
local eos = W:isEOS()
Returns: true if the walker position is end-of-string, false if not.
W:lit
Compares a substring at the walker’s position against a string literal. If a match is found, returns the substring and advances the walker’s position. If not, returns nil.
local match = W:lit(s)
-
s: The string literal to compare.
Returns: The matching substring, or nil.
Notes
The search is anchored to the walker’s position. To search the remainder of the string for a literal substring, use W:plain or W:plainReq.
The successful return value is always the same as the s argument. This allows for short circuit evaluations, like:
local facing_dir = W:lit("left") or W:litReq("right", "bad direction")
W:litReq
Like W:lit, but raises an error when the search is unsuccessful.
W:litReq(str, [err])
-
s: The string literal to compare. -
[err]: The error message.
Returns: The match.
W:match
Behaves like string.match. If a match is found at the current position, returns the captures (or the whole result, if the pattern contained no captures) and advances the position.
local match --[[or captures...]] = W:match(ptn)
-
ptn: The search pattern string.
Returns: The match, or up to 16 captures, or nil.
Notes
This method uses string.find under the hood, but it modifies the output to be like that of string.match.
W:matchReq
Like W:match, but raises an error if the search was unsuccessful.
local match --[[or captures...]] = W:matchReq(ptn, [err])
-
ptn: The search pattern string. -
[err]: The error message.
Returns: The match, or up to 16 captures.
W:peek
Gets a substring, from the walker’s current position to an offset in bytes. Does not advance the walker.
local sub_str = W:peek([n])
-
[n]: (1) How many bytes to read from the walker’s position. A value of 1 will return the current byte.
Returns: The substring.
Notes
If the walker position is end-of-string, then an empty string is returned.
It’s an error to provide an offset of zero or less, or a fractional value.
W:plain
Calls string.find at the current position, in plain mode. If a match is found, returns the i and j indices, and advances the position past j.
local i, j = W:plain(ptn)
-
ptn: The search pattern string.
Returns: The boundaries of the result, or nil if there wasn’t a match.
Notes
All pattern-matching symbols are treated as ordinary characters. As such, this method does not support captures.
W:plainReq
Like W:plain, but raises an error if the search was unsuccessful.
local i, j = W:plainReq(ptn, [err])
-
ptn: The search pattern string. -
[err]: The error message.
Returns: The boundaries of the result.
W:pop
Pops the last string and position from the stack.
W:pop()
Returns: W, for method chaining.
Notes
Stack frames are not automatically popped when reaching end-of-string.
It’s an error to call this on an empty stack.
W:popAll
Pops all strings from the stack.
W:popAll()
Returns: W, for method chaining.
Notes
Unlike W:pop, this method does not raise an error when the stack is empty.
W:push
Pushes a new string, moving the existing string and position to the stack. The walker’s new position is 1.
W:push(str)
-
str: The new string to push.
Returns: W, for method chaining.
W:req
The assertion method used by the *Req method variations. Raises an error if the first return value is nil/false.
local a, b, c --[[, etc.]] = W:req(fn, [err], ...)
-
fn: The function to call. It takes a walker as its first argument, and…as its remaining arguments. -
[err]: The error message. -
…: Additional arguments forfn.
Returns: Up to 18 values returned by fn.
W:reset
Resets the position to 1 and empties the string stack.
W:reset()
Returns: W, for method chaining.
W:seek
Sets the walker’s byte position, clamped between 1 and #str + 1.
local i = W:seek(n)
-
n: The desired byte position.
Returns: The new (clamped) byte position.
Notes
This method can move the walker to a UTF-8 continuation byte.
W:setByteMode
Turns Byte Mode on or off.
W:setByteMode(enabled)
-
enabled:trueto enable Byte Mode,false/nilto disable it.
Returns: W, for method chaining.
Notes
Terse Mode overrides this setting.
W:setLineCharDisplay
Turns on or off the printing of line and character numbers.
W:setLineCharDisplay(line, char)
-
line:trueto enable the printing of line numbers,false/nilto disable it. -
char:trueto enable the printing of character numbers,false/nilto disable it.
Returns: W, for method chaining.
Notes
Byte Mode and Terse Mode both override these settings.
W:setName
Sets or clears the Walker Name.
W:setName([name])
-
[name]: (nil) The name, ornilto unset any current name.
Returns: W, for method chaining.
Notes
The Walker Name is not cleared by W:setString or W:reset.
Terse Mode overrides this setting.
W:setString
Assigns a string to the walker, resets the position to 1, and empties the string stack.
W:setString(s)
-
s: The string to assign.
Returns: W, for method chaining.
W:setTerseMode
Turns Terse Mode on or off.
W:setTerseMode(enabled)
-
enabled:trueto enable Terse Mode,false/nilto disable it.
Returns: W, for method chaining.
W:step
Moves the walker’s byte position forward or backward. The final position is clamped between 1 and #str + 1.
local i = W:step(n)
-
n: How many bytes to advance or rewind (negative).
Returns: The new byte position.
Notes
This method can move the walker to a UTF-8 continuation byte.
W:warn
Prints a warning message to the console. Depending on the walker’s configuration, the output may include a name, a line number, and a character number.
W:warn(...)
-
…: Arguments for print.
Returns: W, for method chaining.
Notes
Warnings are not printed when Terse Mode is active.
W:ws
Advances the walker position past ASCII whitespace until it either rests on a non-whitespace byte or reaches end-of-string.
local advanced = W:ws()
Returns: true if the walker position advanced, false if not.
W:wsNext
If the walker is currently on a non-whitespace character, advances to the next bit of whitespace, or to the end of the string if none is found. Does not advance if the walker is already on whitespace.
W:wsNext()
W:wsReq
Like W:ws, but raises an error if the position did not advance.
W:wsReq([err])
-
[err]: The error message.
Module Notes
Walkers
A walker object ties Lua search functions to an internal position. Generally, when a search is successful, the position advances past the match region, and when unsuccessful, it stays put (or throws an error). Some knowledge of Lua’s string module is necessary to understand how walkers work.
By convention, the walker is assigned to the variable W.
Walker Options
Byte Mode
When active, the walker position is reported in warnings and errors as a byte index. Use this when the walker’s string contains arbitrary data.
Line Char Display
When active, line and character numbers are included in warnings and errors.
Terse Mode
When active, errors display only a generic message, with no name or positional information (byte index, line + character number). Warnings are completely silenced.
Field Names
It’s convenient to attach state to a walker while parsing. Besides method names, the following field names are reserved for internal use:
-
Any field that is a single, upper-case ASCII letter (A-Z).
-
Any field that begins with an underscore.
16 Captures
The limit of 16 returned captures is not connected to Lua’s actual maximum returnable values; it is a limitation of the library. Captures #17 and up may be correctly processed by string.find and string.match, but they will not be returned by the walker methods.
'Req' methods
Methods ending in req will raise an error when unsuccessful. These methods take an optional argument, err, for the error string. If err is not provided, a generic error message will be used instead. As with Lua’s assert function, avoid constructing the error string directly within the method call, as those arguments will be evaluated even if the method is successful:
local chunk = W:matchReq("foobar", "missing foobar for " .. some_upvalue)
-- ^
-- This concatenates every time!
Errors Within Protected Calls
When a walker raises an error within a pcall, its state is not cleaned up. Any such walker should either be discarded or fully reset.
Unicode Code Points
Lua’s string search library treats all characters as single bytes. In UTF-8, the first 127 code points are one byte, while the rest are 2, 3 or 4 bytes in length. It’s easy to match code points 0-127, but multi-byte characters do not work with sets ([a-zA-Z]) or pattern items (?, *, etc.).
That said, it’s possible to match single code points with a pattern defined in Lua 5.3: utf8.charpattern. pStringWalk contains a modified version of this pattern, stored in pStringWalk.ptn_code.
-- Matches one UTF-8 character at the walker's position
local u8_str = W:match(pStringWalk.ptn_code)
The pattern assumes that the UTF-8 encoding is valid, so it can return invalid code points if the string is corrupt. There are multiple ways to check a string’s UTF-8 encoding from Lua, including Lua 5.3’s utf8.len (which is included in LÖVE), utf8_validator.lua, and PILE Base’s own pUtf8.
Terms
-
Continuation byte: The second, third or fourth byte of a code point that is encoded in UTF-8.
-
EOS: End of String, signified by the byte index being greater than the string length.
VERSION: 2.106