String[]
�is assigned to the set of recognized characters.
If "str"
is a string, then a character stream
is created when Stream["str"]
is invoked.
If this character
stream
is assigned the Name, strStreamObject,
Reckon[strStreamObject]
returns the current character..
String[String[str1], String[str2]] -> String[str1][str2] -> "str1str2"
If "string" uses more then one character set, (it is not simple),
Slot["string"]
�returns
the ordered sequence of simple strings composing "string".
If "string" is a simple string,
Slot["string"]
returns:
"s", "t", "r", "i", "n", "g"
Cardinal["string"]
returns the number of characters,
(in this case: 6).
character
set-distinguished
substring fields are
accessed directly with substring-specifying
prefixii. Specifically�
Slot["string"]( [m]
| [m, n] | [Cardinal] | [Cardinal, m] )
�returns, respectively, the�
(mth
| nth
in the mth substring
| number of substrings
| length of the mth substring).
String[Name]["whatever"]
�returns the character
set name
of "whatever" if it is a simple
string,
If "whatever"
is a concatenation of subStrings
from different character
sets,
an ordered sequence
of character
set names
from each subString
is returned.
String[Name["charSetNam"]][RuleList]
�where�
RuleList = {Noops, Digits, Letters, ParaPuncs}
�is the elicitation used to define a character set.
String[Cardinal]["whatever"] returns the and character codes for the characters in "whatever".
Cardinal[String][ "charSetName"[cc1, cc2,�]�] turns character codes, cc1, cc2,�, into a string using character set, "charSetName".
See English definition of "string".
When this was written (around Valentine's Day, 2008), Grok32`s language
specification generally
used technical names
specifying the functions or
programs they elicit. These names
are hyperlinked to their specifications. Normally, these
technical names CAPITALIZE the
first letter (of lowercase lettered words joined without spaces) to
conform to the lexical
requirements on Grok32` names.
This standard makes it clear when a specific program is being
referenced, but it does not produce "friendly" text. I found that this
paper reads much better when the words
are as plainly
descriptive as possible. Consequently, the hyperlinks leading to
named functions (or other named
objects) are written as plain text (wherever possible) here. This
document is a deceptively simple central reference for the interpreted
computer string.
As far as the Grok32`
kernal is
concerned�
String[This example] is
equivalent to "This example".
This equivalence is a consequence of the rules regarding quotes in the standard
ASCII rule list, whose rules are used throughout
this language
specification.
A string is a sequence of characters from one or more character sets.
A string is simple
if it uses just one character
set.
The "string
name" of this character
set
is returned by the following:
(1)
String[Name]["string"]
If "string" is simple, then�
(2a)
Slot["string"]
�will decompose "string" into its characters,
"s", "t", "r",
"i", "n", "g"
Strings with more then one character
set
are not simple.
If, for example, "stringgg"
is one character
set
in its first 3 characters,
and another in the rest, then�
(2b) Slot["stringgg"] ->
Sequence["str", "inggg"]
Slotting a (non-simple)
compound string returns the sequence of substrings each
with uniform character
sets.
Similarly, if "string" in (1) is compound,
then a sequence of character
set string names
is returned.
Regardless of character
sets,
Cardinal["string"]
returns the number of characters, (in this case: 6).
(* This function decomposes
any string into a sequence of elements with
the form:
{"stringName",
"str"}
�where "stringName"
is the character
set
name, and "str" is the string.
*)
Name[StringDecompose[str_String], Function[
With[{strNam = String[Name][str]},
If[Sequence ===
Slot[0][strNam],
(* This will only be true if "str" is a complex String. *)
Tally[{String[Name][Slot], Slot}, Slot[str]],
{strNam[str], str}
]
]]
�returns a field in the string
"str".
Specifically,
(3) Slot["str"][m]
�returns the mth character in "str"
provided "str" is simple and
it exists. If not, the mth simple substring
is returned if it exists.
(4) Slot["str"][m, n]
�returns the nth character in
the mth substring of "str".
Substrings are distinguished by character
set.
�returns the number of characters
in "str" if it is simple.
It returns the number of character
sets
in "str" if it is not a simple
string.
�returns the length of the mth substring. See also SlotCardinal.
When any of the above, (3), (4), (5), or (6), fail to have any
assigned value, the Slot["str"][strDatum]
construct returns unchanged.
This means that for each undefined String
Slot field alternative,
strDatum:(
[m] | [m, n] | [Cardinal] | [Cardinal, m] ),
True[
Slot["str"][strDatum] ]
�evaluates True for
each alternative undefined string
field value chosen by strDatum.
�evaluates False for each alternative
assigned string field
value chosen by strDatum.
(7) String[String[str1], String[str2]]
�is equivalent to,
String[str1str2],
�which is equivalent to,
"str1str2".
In other words, (7) is an elicitation that joins strings together:
String["str1", "str2"] -> "str1str2"
When a character
set
is defined,
characters
are segregated into different sets
defined by
the set's
rule list.
Character
sets
are not obligated
to conform to any rules. However, every character
set
must have a designated EndString
and an escape
sequence character.
But unless a character
is recognized
in its character
set rule
list,
it will not be interpreted as part of an expression. The set
of all recognized characters defines the complete scope
of character
parsing when a string is reckoned.
Character
subsets are used to parse input strings
into expressions.
The rule list
implements the rules on characters
used for names, Numbers, and
other
basic types.
The set
of recognized characters
is the union
of rule list
subsets. This super-set
is assigned the following name�
(8) String[]
If a character is in "String[]", then it could be used in an expression's literal representation. By contrast, if a character is not in "String[]", then it be unrecognized (by the Kernal) in any expression except a literal string.
If strm is a StreamObject, and Stream[Type][strm] returns String, then its elements are characters and strm is a character stream. "String" is used as a synonym for character. A character stream element is a character whether it is a member of String[] or not.
A string, "charS1charS2",
can be parsed as a character
stream
using the following elicitation:
(9) Stream["charS1charS2"]
The above returns a StreamObject whose elements are characters from "charS1charS2".
Since "charS1charS2", may contain characters with varying character set and byte-size, the character stream element size can vary. Some character sets have elements with varying size. In particular, Unicode (specifically, UTF-8), have characters requiring one to four bytes.
Where a character set
has elements with varying size, a bit is reserved in each byte (or
whatever data-chunk is used to number elements in the character set.
This
If this character
stream
is assigned to the name, strStreamObject,
Reckon[strStreamObject]
returns the current character.
(See also Tally[func, String[�]].)
If Stream[n] is a StreamObject, the byte-size of the current character is given by�
Stream[n][Type[Cardinal]]
(See also Expression[Cardinal][expr]). Furthermore,
Stream[n][Type]
�returns�
�which means that Stream[n]'s element is a character.
Each character in a character set has its own unique Cardinal character code.
Every string
ends with a distinct "EndString" character.
All of the named
characters used to reckon
an expression from a string, are found in the Context "`Construct`String`Expression`Character`Names`".
The NullString
is the string with just the EndString.
(EndString is defined
as
the first element after escape
sequence in the Noops
sublist.)
Generally, the EndString is handled automatically; it is rarely explicitly written since a string's end is interpreted from the syntax. For example�
(10) String[This is an example.]
�does not require an EndString
because the expression
syntax clearly
demarks the string.
For most purposes, the EndString
need only be considered when a character
set
is defined. See escape
sequence.
This language
specification rarely represents a string in the
form used above. Instead, (10) is normally written as:
"This is an example."
The standard ASCII rule list defines the double quote as a ParaPunc that evokes the string keyword.
A string's character set is called its string name or character set name.
If "whatever" is a string composed from one character set, then�
�returns the character
set name
of "whatever".
If "whatever" is a concatenation of subStrings
from
different character
sets,
an ordered sequence
of character
set names
for each subString
is returned.
A string's characters are transformed into StringName-labeled numbers using Cardinal[String][�] constructs. (This and the result returned by (10) is an example of the One-or-More-Result-Programming-Standard.)
The character set name is an identifying ASCII string assigned when a character set is compiled together with its rule list. A character set's rule list assigns character semantics. These compilations are kept in "Construct`String`Name`" and are ASCII string named. The default ASCII character set arithmetic operators and escape sequences implement linguistics in conformance with C's strings, and this Language Specification.
The string object includes the semantics which govern the way a string is transformed into an expression. Any linguistic or mathematical system may be modeled with character set rule lists for transforming a language string into an expression.
As a construct,
the string is by
far, the most complex. The procedure
for transforming a string into an expression, uses sequences, lists, and sets
to parse strings
as members of character
sets
customized to any language.
See the following related sub-Contexts supporting `Construct`String`.
`Construct`String`Expression`Characters` describes the character-set abstraction and why ASCII is used in this American English LanguageSpecification.
`Construct`String`Expression`Character`Names` is a complete account of the CharacterNames used by the Grok32` kernal to transform a string into an interpreted expression.
`Construct`String`Expression`RuleList` explains how RuleList subdivides characters into one of the following subsets:
{Noops, Digits, Letters, ParaPuncs, Operators}
These subsets segregate characters
according to their syntactic effect in parsing an expression from
a string.
See rule list
for
specific character
assignments and the standards expected of a character
set.
`Construct`String`Expression`Character`Set` describes the linguistic-string-abstraction as decomposable-character set. The fields and RuleList (see above) used to specify a character set are here.
The Quasi Arbitrary Byte String is a compact means to represent Strings, Cardinals, or ProtoCode of arbitrary size.
`Compile`String`ProtoCode` contains the specifications for protocode. Protocode is an efficient means for coding Grok32` expressions as a byte sequence of known length.
String Implementation Notes details string's character code representation and the string's implementating data type in C.
Grok32` uses 128 character (7-bit) ASCII text by default. Other character sets are related through the ASCII character set. Any character set may be used to represent Grok32` code provided the characters have been assigned by evaluating an expression with the following form:
(11) String[Name["charSetNam"]][RuleList]
where
RuleList = {Noops, Digits, Letters, ParaPuncs, Operators}.
If "whatever" is a string composed from one character set, then�
(12) String[Cardinal]["whatever"]
returns
(13) "charSetName"[n1, n2,�]
�where "charSetName" is the character set name of "whatever", and [n1, n2,�] is a sequence of Cardinal character codes for each character in "whatever."
If "whatever" is a string composed from many character sets, then (12) will return a sequence of expressions like (13). Written using the standard ASCII rule list, the result returned by (12) matches the following PatternSequence:
(14) __[_String[__Cardinal]]
The above is the short for:
Pattern[Pattern[Type[String]][Pattern[Type[Cardinal]][Sequence]]][Sequence]
The ElicitationForm
in (12) has the elegant property of having a Head whose
form matches elements in its result (14).
The following constructs a character
string from the operant character
set.
(15)
Cardinal[String][cc1, cc2,�]
In the above, "cc1, cc2,�", is a sequence of Cardinals interpreted
as character
codes.
The default character
set
is ASCII.
This form will only work reliably where the operant character
set
is clear.
For example,
String["str", Cardinal[String][32, 53, 56]]
�joins the two strings such that
the CardinalString uses the same character
set
as the last character
of the preceeding "str". In other words, a CardinalString
like (15) takes the character
set
from any preceeding string,
and presumes ASCII
otherwise.
(16)
Cardinal[String]["charSetName"[cc1,
cc2,�],�]
�returns a string using the indicated character set, "charSetName", whose characters are specified with the character codes "cc1, cc2,�".
The character-glyph model strives to separate the units of textual content (characters) from the units of textual display (glyphs). The Grok32` String abstraction naturally conforms to this model* for the simple reason that string characters do not have anything to do with their glyphs aside from being well-labeled, and therefore easily referenced by glyphs in any conceivable textual display software. This String object regards print forms (glyphs) as a software application, and not the venue of this Language Specification.
See String Implementation Notes.
*See "The Unicode Character-Glyph Model: Case
Studies" by John H. Jenkins.
string n.
1. A small cord or slender strip of leather, or the like, used esp. for binding, fastening, or tying things; a cord larger than a thread and smaller then a rope; as a shoestring.
2. A thread or cord strung with a number of objects or parts in close and orderly succession; hence, a line or series of things arranged on or as if on a thread; a series; succession; chain; as, a string of shells or beads; a string of arguments; a string of fish, or sausages, of logs; a string of Indians filing through the woods.
3. Hence, a designated group of players or contestants as ranked according to rated skill or proficiency; as, players on the third string were used toward the end of the game; -- often used attributively; as, a third-string player.
4. a The cord of a musical instrument, commonly of gut or wire, as of a piano, harp or violin. See PIANO, n., 1. The greater the number of vibrations per second, the higher is the tone produced.
b pl. Stringed instruments, esp. of an orchestra�
5. The line or cord of a bow. Ps. xi.2.
6. A fiber, as of a plant, esp. a fine root, the vein of a leaf, or the tough fiber connecting the halves of a string-bean pod.
�
19. Print. Under the piecework system, the
proofs of matter set by one compositor, usually pasted in a strip or
strips to
facilitate measurement of his work.
[The last definition, (19.) above, is a good metaphor for the String object developed here in this document. The printer's "string" has been abstracted into the field of computer graphic as a character glyph sequence in the now totally digital string.
The personal computer revolution has made everyone
into a printer. The Print industry has been automated
such that anyone with a Personal Computer prints character
Strings with a sophistication and beauty equal or beyond the best
printers from time past.
"string" has more then 22
definitions in the 1949
Websters.]
operant adj.
that which operates; operative.
[From Websters1949Unabridged.]
(c) 2004-2008 by
John Van Wie Bergamini.