String


String[]

�is assigned to the set of recognized characters.


If "str" is a string, then a character stream is created when Stream["str"] is invoked.
If this character stream is assigned the Name, strStreamObject,

Reckon[strStreamObject] returns the current character..

  

string join:

String[String[str1], String[str2]] -> String[str1][str2] -> "str1str2"

 

If "string" uses more then one character set, (it is not simple),

Slot["string"]

�returns the ordered sequence of simple strings composing "string".
If "string" is a simple string, Slot["string"] returns:
"s", "t", "r", "i", "n", "g"
Cardinal["string"] returns the number of characters, (in this case: 6).
character set-distinguished substring fields are accessed directly with substring-specifying prefixii. Specifically�

Slot["string"](  [m] | [m, n] | [Cardinal] | [Cardinal, m] )
�returns, respectively, the�
(mth  | nth  in the mth substring | number of substrings | length of the mth substring).

 

String[Name]["whatever"]
�returns the character set name of "whatever" if it is a simple string,

If "whatever" is a concatenation of subStrings from different character sets,
an ordered sequence of character set names from each subString is returned.

 

String[Name["charSetNam"]][RuleList]

�where�

RuleList = {Noops, Digits, Letters, ParaPuncs}

�is the elicitation used to define a character set.

 

String[Cardinal]["whatever"] returns the  and character codes for the characters in "whatever".

 

Cardinal[String][ "charSetName"[cc1, cc2,�]�] turns character codes, cc1, cc2,�, into a string using character set, "charSetName".



 

See English definition of "string".


Note:

When this was written (around Valentine's Day, 2008), Grok32`s language specification generally used technical names specifying the functions or programs they elicit. These names are hyperlinked to their specifications. Normally, these technical names CAPITALIZE the first letter (of lowercase lettered words joined without spaces) to conform to the lexical requirements on Grok32` names. This standard makes it clear when a specific program is being referenced, but it does not produce "friendly" text. I found that this paper reads much better when the words are as plainly descriptive as possible. Consequently, the hyperlinks leading to named functions (or other named objects) are written as plain text (wherever possible) here. This document is a deceptively simple central reference for the interpreted computer string.



String Representation


As far as the Grok32` kernal is concerned�

 

                String[This example] is equivalent to "This example".


This equivalence is a consequence of the rules regarding quotes in the standard ASCII rule list, whose rules are used throughout this language specification.


 

A string is a sequence of characters from one or more character sets.


A string is simple if it uses just one character set.
The "string name" of this character set is returned by the following:

(1)                String[Name]["string"]

If "string" is simple, then�

(2a)              Slot["string"]


�will decompose "string" into its characters,


            "s", "t", "r", "i", "n", "g"


Strings with more then one character set are not simple.
If, for example, "stringgg" is one character set in its first 3 characters, and another in the rest, then�


(2b)              Slot["stringgg"] ->
                                                    Sequence["str", "inggg"]


Slotting a (non-simple) compound string returns the sequence of substrings each with uniform character sets.
Similarly, if "string" in (1) is compound, then a sequence of character set string names is returned.


Regardless of character sets, Cardinal["string"] returns the number of characters, (in this case: 6).




Programming Example: StringDecompose

(*    This function decomposes any string into a sequence of elements with the form:
            {"stringName", "str"}

        �where "stringName" is the character set name, and "str" is the string.
*)


Name[StringDecompose[str_String], Function[

With[{strNam = String[Name][str]},
    If[Sequence === Slot[0][strNam],                         (* This will only be true if "str" is a complex String. *)

        Tally[{String[Name][Slot], Slot}, Slot[str]],

         {strNam[str], str}
    ]
]]


String Sectioning


Cardinal["string"]
returns the length of the whole "string".

 As noted in (1) and (2), a string can be parsed into (uniform) substrings and characters using Slot["string"] constructions.

Non-simple strings are concatenations of substrings distinguished by character set.
Substring fields can be specifically accessed using a ("String Slot ") field alternative specifier.
This field alternative specifier is given the arbitrary name strDatum.
Assuming m and n are integers, then strDatum stands for any value that matches the following Pattern:

               

                strDatum:(  [m] | [m, n] | [Cardinal] | [Cardinal, m] )
Then�

                Slot["str"][strDatum]


�returns a field in the string "str".


Specifically,

 

(3)                    Slot["str"][m]

 

�returns the mth character in "str" provided "str" is simple and it exists. If not, the mth simple substring is returned if it exists.


 

(4)                    Slot["str"][m, n]

 

�returns the nth character in the mth substring of "str".
Substrings are distinguished by character set.



(5)                    Slot["str"][Cardinal]

 

�returns the number of characters in "str" if it is simple.
It returns the number of character sets in "str" if it is not a simple string.

 


(6)                    Slot["str"][Cardinal, m]

 

�returns the length of the mth substring. See also SlotCardinal.


When any of the above, (3), (4), (5), or (6), fail to have any assigned value, the Slot["str"][strDatum] construct returns unchanged.
This means that for each undefined String Slot field alternative,

                strDatum:(  [m] | [m, n] | [Cardinal] | [Cardinal, m] ),


                True[ Slot["str"][strDatum]  ]


�evaluates True for each alternative undefined string field value chosen by strDatum.
�evaluates False for each alternative assigned string field value chosen by strDatum.






String Join

 

(7)                                String[String[str1], String[str2]]

           �is equivalent to,

                                    String[str1str2],

            �which is equivalent to,

                                    "str1str2".

 

In other words, (7) is an elicitation that joins strings together:

 

                        String["str1", "str2"] -> "str1str2"



 


Recognized Character Set

String[]

 

When a character set is defined, characters are segregated into different sets defined by the set's rule list.  Character sets are not obligated to conform to any rules. However, every character set must have a designated EndString and an escape sequence character. But unless a character is recognized in its character set rule list, it will not be interpreted as part of an expression. The set of all recognized characters defines the complete scope of character parsing when a string is reckoned.


Character subsets are used to parse input strings into expressions.
The rule list implements the rules on characters used for names, Numbers, and other basic types.
The set of recognized characters is the union of rule list subsets.  This super-set is assigned the following name

 

(8)                    String[]

 

If a character is in "String[]", then it could be used in an expression's literal representation.  By contrast, if a character is not in "String[]", then it be unrecognized (by the Kernal) in any expression except a literal string.

 

If strm is a StreamObject, and Stream[Type][strm] returns String, then its elements are characters and strm is a character stream. "String" is used as a synonym for character.  A  character stream element is a character whether it is a member of String[] or not.

 

 



Stream["str"] invokes a character stream.

 


A string, "charS1charS2", can be parsed as a character stream using the following elicitation:

 

(9)                 Stream["charS1charS2"]

 

The above returns a StreamObject whose elements are characters from "charS1charS2".

Since "charS1charS2", may contain characters with varying character set and byte-size, the character stream element size can vary. Some character sets have elements with varying size. In particular, Unicode (specifically, UTF-8), have characters requiring one to four bytes.


Where a character set has elements with varying size, a bit is reserved in each byte (or whatever data-chunk is used to number elements in the character set. This


If this character stream is assigned to the name, strStreamObject, Reckon[strStreamObject] returns the current character.
(See also Tally[funcString[�]].)

 

If Stream[n] is a StreamObject, the byte-size of the current character is given by�

            Stream[n][Type[Cardinal]]

(See also Expression[Cardinal][expr]).  Furthermore,

            Stream[n][Type]

�returns�

            String

�which means that Stream[n]'s element is a character.

 




Character(Set | Code | Name)



Each character in a character set has its own unique Cardinal character code

Every string ends with a distinct "EndString" character.
All of the named characters used to reckon an expression from a string, are found in the Context  "
`Construct`String`Expression`Character`Names`".

 

The NullString is the string with just the EndString.
(EndString is defined as the first element after escape sequence in the Noops sublist.)

Generally, the EndString is handled automatically; it is rarely explicitly written since a string's end is interpreted from the syntax.  For example�

 

(10)                        String[This is an example.]

 

�does not require an EndString because the expression syntax clearly demarks the string.
For most purposes, the EndString need only be considered when a character set is defined.  See escape sequence.

 

This language specification rarely represents a string in the form used above.  Instead, (10) is normally written as:

                        "This is an example."

 

The standard ASCII rule list defines the double quote as a ParaPunc that evokes the string keyword.


 



StringName

 

A string's character set is called its string name or character set name.

If "whatever" is a string composed from one character set, then�

 

(10)                    String[Name]["whatever"]

 

�returns the character set name of "whatever".


If "whatever" is a concatenation of subStrings from different character sets, an ordered sequence of character set names for each subString is returned.

 

A string's characters are transformed into StringName-labeled numbers using Cardinal[String][�] constructs.  (This and the result returned by (10) is an example of the One-or-More-Result-Programming-Standard.)

 

The character set name is an identifying ASCII string assigned when a character set is compiled together with its rule list.  A character set's rule list assigns character semantics.  These compilations are kept in "Construct`String`Name`" and are ASCII string named.  The default ASCII character set arithmetic operators and escape sequences implement linguistics in conformance with C's strings, and this Language Specification.

 

The string object includes the semantics which govern the way a string is transformed into an expression.  Any linguistic or mathematical system may be modeled with character set rule lists for transforming a language string into an expression

 

As a construct, the string is by far, the most complex. The procedure for transforming a string into an expression, uses sequences, lists, and sets to parse strings as members of character sets customized to any language.

See the following related sub-Contexts supporting `Construct`String`.

 

 

 


Defining Character Sets

 

Grok32`  uses 128 character (7-bit) ASCII text by default.  Other character sets are related through the ASCII character set. Any character set may be used to represent Grok32`  code provided the characters have been assigned by evaluating an expression with the following form:

 

(11)                    String[Name["charSetNam"]][RuleList]

        where

                        RuleList = {Noops, Digits, Letters, ParaPuncs, Operators}.

 

 

 


StringCardinals

 

If "whatever" is a string composed from one character set, then�

 

(12)                    String[Cardinal]["whatever"]

 

            returns

 

(13)                    "charSetName"[n1, n2,�]

 

�where "charSetName" is the character set name of "whatever", and [n1, n2,�] is a sequence of Cardinal character codes for each character in "whatever."  

 

If "whatever" is a string composed from many character sets, then (12) will return a sequence of expressions like (13).  Written using the standard ASCII rule list, the result returned by (12) matches the following PatternSequence:

 

(14)                   __[_String[__Cardinal]]


The above is the short for:


        Pattern[Pattern[Type[String]][Pattern[Type[Cardinal]][Sequence]]][Sequence]


The ElicitationForm in (12) has the elegant property of having a Head whose form matches elements in its result (14).





CardinalString


The following constructs a character string from the operant character set.


(15)                    Cardinal[String][cc1, cc2,�]

 

In the above, "cc1, cc2,�", is a sequence of Cardinals interpreted as character codes.
The default character set is ASCII.  This form will only work reliably where the operant character set is clear.
For example,

                        String["str", Cardinal[String][32, 53, 56]]

 

�joins the two strings such that the CardinalString uses the same character set as the last character of the preceeding "str".   In other words, a CardinalString like (15) takes the character set from any preceeding string, and presumes ASCII otherwise.

(16)                    Cardinal[String]["charSetName"[cc1, cc2,�],�]

�returns a string using the indicated character set, "charSetName", whose characters are specified with the character codes "cc1, cc2,�".



 


The Character-Glyph Model

 

The character-glyph model strives to separate the units of textual content (characters) from the units of textual display (glyphs).  The Grok32` String abstraction naturally conforms to this model* for the simple reason that string characters do not have anything to do with their glyphs aside from being well-labeled, and therefore easily referenced by glyphs in any conceivable textual display software.  This String object regards print forms (glyphs) as a software application, and not the venue of this Language Specification.

 

See String Implementation Notes.

 

*See "The Unicode Character-Glyph Model: Case Studies" by John H. Jenkins.





 English Definition of "string"


string n.

1. A small cord or slender strip of leather, or the like, used esp. for binding, fastening, or tying things; a cord larger than a thread and smaller then a rope; as a shoestring.

2. A thread or cord strung with a number of objects or parts in close and orderly succession; hence, a line or series of things arranged on or as if on a thread; a series; succession; chain; as, a string of shells or beads; a string of arguments; a string of fish, or sausages, of logs; a string of Indians filing through the woods.

3. Hence, a designated group of players or contestants as ranked according to rated skill or proficiency; as, players on the third string were used toward the end of the game;  -- often used attributively; as, a third-string player.

4. a The cord of a musical instrument, commonly of gut or wire, as of a piano, harp or violin.  See PIANO, n., 1. The greater the number of vibrations per second, the higher is the tone produced. 

   b pl. Stringed instruments, esp. of an orchestra�

5.  The line or cord of a bow.            Ps. xi.2.

6. A fiber, as of a plant, esp. a fine root, the vein of a leaf, or the tough fiber connecting the halves of a string-bean pod.

19. Print. Under the piecework system, the proofs of matter set by one compositor, usually pasted in a strip or strips to facilitate measurement of his work.

[The last definition, (19.) above, is a good metaphor for the String object developed here in this document.  The printer's "string" has been abstracted into the field of computer graphic as a character glyph sequence in the now totally digital string.

 

The personal computer revolution has made everyone into a printer. The Print industry has been automated such that anyone with a Personal Computer prints character Strings with a sophistication and beauty equal or beyond the best printers from time past.


"string" has more then 22 definitions in the 1949 Websters.]

operant adj.

that which operates; operative.

                                                            [From Websters1949Unabridged.]

 

 


Grok32`

(c) 2004-2008 by

John Van Wie Bergamini.



Hosted by www.Geocities.ws

1