1 chars = 1

[ RfDs/CfVs | Other proposals ]

Poll standings

See below for voting instructions.
Systems
[ ] conforms to ANS Forth.
  Gforth
  CForth (Mitch Bradley)
  Open Firmware (Mitch Bradley)
  SwiftForth (Leon Wagner)
  SwiftX (Leon Wagner)
  bigForth (Bernd Paysan)
  JavaForth (Peter Knaggs)
  IronForth (Peter Knaggs)
[ ] already implements the proposal in full since release [ ].
  Gforth 0.1alpha
  amForth 0.1 (Matthias Trute)
  CForth (Mitch Bradley)
  Open Firmware (Mitch Bradley)
  SwiftForth all (Leon Wagner)
  SwiftX all (Leon Wagner)
  TurboForth (Mark Wills, for TI-99/4A)
  bigForth 1.0 (Bernd Paysan)
  4th 3.62.4 (Hans Bezemer)
[ ] implements the proposal in full in a development version.
[ ] will implement the proposal in full in release [ ].
[ ] will implement the proposal in full in some future release.
[ ] There are no plans to implement the proposal in full in [ ].
  JavaForth (Peter Knaggs)
  IronForth (Peter Knaggs)
[ ] will never implement the proposal in full.
Programmers
[ ] I have used (parts of) this proposal in my programs.
 Bernd Paysan
 Leon Wagner
 Mitch Bradley
 Anton Ertl
 Matthias Trute
 Mark Wills
 Hans Bezemer
[ ] I would use (parts of) this proposal in my programs if the systems
     I am interested in implemented it.
 Anton Ertl
 Matthias Trute
 Mark Wills
[ ] I would use (parts of) this proposal in my programs if this
     proposal was in the Forth standard.
 Mark Wills
[ ] I would not use (parts of) this proposal in my programs.
 Tim Partridge

Informal results
Peter Knaggs writes:
This proposal is insufficient on its own, it should be paired with a byte  
access library/wordset.

Proposal etc.

Problem

While Forth-94 and Forth-2012 provide the option to systems that
1 CHARS > 1, to my knowledge no mainstream Forth system has
made use of this option.  Consequently, many Forth programmers write
programs that (according to the standard documentation requirements)
should declare an environmental dependency on 1 CHARS = 1.

Even those programmers that want to avoid this dependency cannot
easily test that their program does not have it, because so few Forth
systems are available that do not satisfy this dependency.  And in any
case, spending an effort to avoid an environmental dependency that all
interesting systems satisfy seems wasteful; alternatively, declaring
the environmental dependency does not beneit anyone, either.



Solution

Characters take one address unit (au)in memory.

CHARS can be implemented as noop
CHAR+ is equivalent to 1+

These words are not removed.  So programmers who like to program for
systems (whether existing or not) where 1 CHARS>1 still can do so.

As a consequence, on byte-addressed machines, characters normally take
one byte; on word-addressed machines, characters and cells take one
machine word.


Remarks

What about word-addressed machines?

Implementing standard Forth on word-addressed machines already
necessitates that char=au: a character in standard Forth must be at
least on address unit wide, and be at most as large as a cell; on
word-addressed machines, cell=au, so char=au follows from that.

There have been occasionaly ideas about supporting a packed character
representation (with more than one character per au) for
word-addressed machines, but that is beyond the scope of this
proposal.

What about nibble-addressed/bit-addressed machines?

No implementations of standard Forth on such machines are known to me;
probably the benefits of standard systems do not outweigh the costs on
such restricted machines.  Such machines are becoming rarer over time,
so if nobody has created such a system yet, it is unlikely to happen
in the future.  

Still, if anybody wants to create such a system, they would have two
options: Declare an environmental restriction of having a non-standard
character size, or implement an address unit size of (e.g.) 8 bits.
Which option is preferable depends on the use of the system.

What about other unusual hardware setups?

I heard of an embedded system with 16-bit address units and 32-bit
chars.  If we standardize on 1 chars = 1, the simplest way forward for
such a system would be to declare an environmental restriction of
having a non-standard char size; alternatively, one could try to
change the char size to 16 bits, or the address unit size to 32 bits.
I don't know enough about the system to make any recommendation among
these options.

What about Unicode?

An important motivation for the introduction of CHARS and CHAR+ in
Forth-94 was probably the seeming move to 16-bit characters with
Unicode in the early 1990s, which also led to 16-bit characters in
Windows NT (released 1993) and in Java (released 1995).

However, 16 bits turned out to be too little for storing Unicode code
points (Unicode 2.0 1996), so UTF-16 (Unicode based on 16-bit basic
units) is a variable-width encoding; at around the same time UTF-8
was introduced for encoding Unicode with 8-bit basic units, which
requires less conversion effort for software based on 8-bit chars, and
has the same disadvantage of being variable-width as UTF-16 (which
turns out to play little role in most software).

As a result, with a few exceptions (see below) Forth systems have
stayed with 8-bit characters, and they are not alone in this: UTF-8 is
the dominant file representation of Unicode, and most Unix software
handles Unicode also internally as UTF-8 (can anyone fill in here for
Windows).

Jax4th and JavaForth

Jax4th was written by Jack Woehr (Jax) as a proof of concept
implementation of Forth-94, and implements 16-bit characters on a
byte-addressed platform.  It is not maintained (current platform:
Windows NT 3.1).

JavaForth was written in 2014 by Peter Knaggs with primitives in Java.
It uses 32-bit cells, 16-bit chars and 8-bit address units.  Because
Java does not allow access to raw memory, JavaForth has to map Forth
memory to a Java array of some Java type anyway, and is pretty free to
choose address unit, character and cell size; it seems to me that it
would be simplest if Java Forth actually implemented
cell=char=au=32bits.

My impression is that Peter Knaggs chose the address unit and
character size to allow testing whether programs have an environmental
dependency on char=au (but that is still difficult for most programs
because JavaForth implements only the core words).  While one might
feel that the effort that went into getting this to work would be
wasted if this proposal is accepted, this effort has been expended
already, and is no good reason not to standardize the common practice
that char=au.  JavaForth could adopt the proposal by having 8-bit
chars or 16-bit aus, or in any other way; or it could keep the current
choices (and declare an environmental restriction on char>au) to serve
as testing platform for programs that should be able to cope with that
environmental restriction.



Typical use

s" Some string" dup       buffer: s s swap move

instead of Forth-2012:

s" Some string" dup chars buffer: s s swap Cmove



Proposal

3.1.2 Replace "be at least one address unit wide" with "be exactly one
address unit wide".

Keep the rest as-is.  While some wordings could be simplified (in
particular wrt character-aligned addresses, and the definitions of
CHARS and CHAR+), keeping them as-is gives guidance to programmers who
want to write char>au-capable code, and to system implementors who
implement a Forth system with the environmental restriction char>au.


Reference implementation

Nearly all standard Forth systems, e.g., SwiftForth, VFX, iForth,
Gforth, ...


Test cases

T{ 0 char+ -> 1 }T
T{ 1 chars -> 1 }T


Experience

Almost universally implemented and widely used.


Comments

None yet.

Author

M. Anton Ertl

Voting instructions

Fill out the appropriate ballot(s) below and mail it/them to me <anton@mips.complang.tuwien.ac.at>. Your vote will be published (including your name (without email address) and/or the name of your system) here. You can vote (or change your vote) at any time by mailing to me, and the results will be published here.

Note that you can be both a system implementor and a programmer, so you can submit both kinds of ballots.

Ballot for systems
If you maintain several systems, please mention the systems separately in the ballot. Insert the system name or version between the brackets or in the line below the statement. Multiple hits for the same system are possible (if they do not conflict).
[ ] conforms to ANS Forth.
[ ] already implements the proposal in full since release [ ].
[ ] implements the proposal in full in a development version.
[ ] will implement the proposal in full in release [ ].
[ ] will implement the proposal in full in some future release.
[ ] There are no plans to implement the proposal in full in [ ].
[ ] will never implement the proposal in full.
If you want to provide information on partial implementation, please do so informally, and I will aggregate this information in some way.
Ballot for programmers
Just mark the statements that are correct for you (e.g., by putting your name on the line below the statement you agree with). If some statements are true for some of your programs, but not others, please mark the statements for the dominating class of programs you write.
[ ] I have used (parts of) this proposal in my programs.
[ ] I would use (parts of) this proposal in my programs if the systems
     I am interested in implemented it.
[ ] I would use (parts of) this proposal in my programs if this
     proposal was in the Forth standard.
[ ] I would not use (parts of) this proposal in my programs.
If you feel that there is closely related functionality missing from the proposal (especially if you have used that in your programs), make an informal comment, and I will collect these, too. Note that the best time to voice such issues is the RfD stage.