PARSE-WORD

[ RfDs/CfVs | Other proposals ]

Status

Due to the name conflict, I have shelved the RfD with this name. However, there has been enough support for the new name PARSE-NAME for this functionality, so this RfD is replaced by the PARSE-NAME RfD.

Change history

2005-04-22 A section on white space was added. A name conflict was discovered, a strawpoll for a new name was held, and this RfD was replaced by the one for PARSE-NAME.

Problem

How do we parse a word from the input stream?

PARSE does not skip leading delimiters, and you cannot specify that you want to parse for white space.

WORD skips leading delimiters, but you cannot specify parsing for white space, it creates a counted string (not the preferred representation), the length of the string is therefore limited, it requires a separate buffer (typically limitinmg the string size even more, and the copying to that buffer consumes time); WORD also requires passing a delimiter, although skipping leading delimiters only makes sense for white-space delimiters. ANS Forth does not specify the lifetime of the resulting string precisely.

Proposal

PARSE-WORD  ( "name" -- c-addr u ) CORE-EXT
Skip leading white space and parse name delimited by a white space character.

c-addr is the address within the input buffer and u is the length of the selected string. If the parse area is empty or contains only white space, the resulting string has length zero.

Typical Use

PARSE-WORD some-name TYPE

Remarks

Lifetime
The lifetime of the resulting string is specified implicitly through "within the input buffer", as is done in PARSE; i.e., the string will be usable until the next input buffer is read, for whatever reason (REFILL, INCLUDED, etc.). Should the lifetime be made more explicit?
Existing practice
ANS Forth mentions a PARSE-WORD with essentially the same definition in A.6.2.2008. Open Firmware also defines PARSE-WORD with the same definition. The only difference between these definitions and the current definition is that the current definition makes it explicit what happens when there is only white space in the input buffer.

Several systems have implemented a PARSE-WORD compatible with this specification, e.g., Gforth and Quartus.

A number of systems have been named that define a PARSE-WORD incompatible with this specification (e.g., they often pass a delimiter on the stack). The systems mentioned are MPE's VFX Forth and all MPE v6+ embedded targets, MinForth, CHForth, F83, Jforth, 4th. Of these systems VFX, MinForth and CHForth are ANS Forth implementations, F83, 4th and JForth are not (although 4th partially stays close to ANS Forth). Coos Haak (CHForth) indicated that the next version of CHForth will have a PARSE-WORD compatible to this specification.

PARSE-WORD in Mops works like the one proposed here, but it refills the input buffer if the parse area is empty or contains only white space.

Names
Given the differences in behaviour of existing words with that name, we need a different name for this word. People have proposed the following names:
NextWord (conflict: exists with different meaning in Win32Forth)
         (existing practice: SP-Forth/4)
TOKEN
  Supported by: Alex McDonald
  Could live with it: Bernd Paysan
  Strongly opposed: Michael L Gassanenko
NAME (existing practice: exists with this meaning in Gforth)
  Supported by: Coos Haak
  Supported by: Albert van der Horst
  Could live with it: Bernd Paysan
  (Michael Gassenenko points out that it does not fit with >NAME NAME>)
EXTRACTWORD
EXTRACT-WORD
  Supported by: Ward McFarland
GET-WORD
GET-NAME
GET-TOKEN
PARSE-NAME
  Supported by: Stephen Pelc
  Second choice: Albert van der Horst
  Could live with it: Ward McFarland
  Could also support: Alex McDonald
  Could live with it: Marcel Hendrix
  Can live with it: David N. Williams
  Supported by: Anton Ertl
  Is happy with it: Mike Hore
PARSE-STREAM
Some people support PARSE-WORD despite the conflict: Charles Melice, Ward McFarland, Bernd Paysan.
What is white space?
I believe that the only white space allowed in ANS Forth programs consists of BL (section 3.4.1.1, treatment of control characters is implementation-defined), but this proposal was written such that it would not need rewriting if the definition of white space was extended in the future.

One other probably widely used and accepted white space character is TAB, but extending the definition of white space may be better left to another RfD.

Systems that support TAB: Gforth, VFX Forth and MPE v6+ embedded systems.

Implementation and Tests

Comments

Andrey Cherezov writes: "In the SP-Forth/4 (spf.sourceforge.net) there is 'NextWord' word compatible with this specification."
Anton Ertl