SuperPick

A suggested data structure

The Pick data structure is capable of storing data in a compact and reasonably easily read form. It has some of the characteristics of just about any form that has been used - heirarchical, relational, object - it "sort of" looks like them all. One expert even said it would look like "the next thing" and when XML came along it did!. However, it has some deficiencies and curiosities that the data format I here call SuperPick will, I hope, correct.

The Pick Data Model

Pick organises data as a string of writable characters using the eight bit ASCII character set. The first 32 characters (0-31) are unused. Characters 32 to 127 are the letters and numbers of English in the seven bit character set. The three characters 254, 253, and 252 are delimiters called attribute mark, value mark, and subvalue mark respectively. The original R83 Pick implementation used character 255 as a hidden delimiter. One implementation of Pick uses character 128 as a null and character 251 as a further delimiter. The use of these specific high-order ASCII characters means that Pick is incompatible with other character sets. This is one of the areas that SuperPick addresses.

Attributes

The most fundamental thing in Pick is the idea of a record (item) consisting of a string of data split by the attribute mark into a number of fields (attributes). Pick uses a dictionary definition to indicate the nature of these fields by reference to their postion. The dictionary definitions are, however, indicative and not prescriptive. This is a weakness and a strength of the Pick data model, but one can see how the model somewhat resembles the relational one. I should note that in at least one implementation it is possible to overlay an SQL model which enforces prescriptive definitions. It is also true to say that many Pick applications enforce this by various means - most of these depend on maintaining a site discipline.

Values

Any attribute can be set up as a string consisting of repeated fields separated by value marks. These are called multivalues and are one of the most flexible, useful, and controversial aspects of the Pick data structure. The use of multivalues is often regarded by relational data advocates as something akin to a heresy because it appears to violate a 'rule' of atomicity of data. If one here establishes the basic concept that Pick is a data structure, and not a database, then the difficulty disappears. One is then left with a very compact method of grouping data that is in the nature of a list. There is an inherent order in the list which may or may not be meaningful. My own view is that it is unwise in the practical situation to assume the order of the list means anything.

A single-valued attribute is closely comparable to an attribute in XML. One with multiple values in it compares to repeated elements in XML.

Associated Values

A further subtlety is to link a number of attributes that are multivalued. These are called associated multivalues and are a way to define data that is tabular in nature. In this case the position of each value has meaning in that the 'nth' value in each attribute are linked with each other as part of a meaningful set of data. A good practical example is using associated multivalues to store the lines of an invoice in the same record as the invoice header information. This somewhat resembles the object model of data, but is more usually referred to as a 'nested relational' model. It compares with an XML element enclosing repeated sets of other elements.

Subvalues

Pick allows each value to be further divided into subvalues. In my experience, this is not at all common - mainly because the enquiry language has no way to use this data. I have seen it used in defining complex structures used to drive specific '4GL' tools. The only other time I have seen this used (in a way I felt made sense) was in a product called ALL where an attribute could contain an entire table. The value marks were used to separate table rows and the subvalue marks were analogous to attribute marks in the standard Pick data structure. I saw this as being effectively the same thing as associated multivalues. It had the benefit of keeping related data close to each other in the physical sense (you lose the benefit of keeping similar data close to each other). It also might mean that data could be held more compactly, depending on the existence and nature of 'empty' fields that may occur in the data.

Foreign Key Fields

The enquiry language in Pick (this goes by various names, but I will here use the R83 name 'Access') allows for a great deal of subtlety and power by means of complex dictionary definitions. Among the most useful mechanisms is the ability to refer to the contents of other files by using attributes or multivalues from one file to refer to keys (or parts of keys) in other files. In Pick terminology, these referred to as 'T-correlatives'. Dictionary definitions used by Access contain a number of other correlatives. They can also use other things called 'conversions' (EG converting a day number to a date). These mechanisms provide very sophisticated means to extract and format data for reporting purposes. This is a large and somewhat specialised part of the Pick system, so I won't go into any more detail.

Problems with Pick

The Character Set

While it has not stopped the Pick model being used with other languages, I feel that the use of the very high order ASCII characters as delimiters makes for difficulties that could have been avoided. The low order ASCII characters (0-31) were originally used for communication control, but there are a parallel set of characters from 128 to 159 which, unlike Pick's delimiters, are not used in the ISO 8859 character sets. The ISO 8859 character set is the basis for most Internet protocols. Note that Unicode is identical to ISO 8859-1 (Latin-1) for the first 256 characters.

Limited nesting

In the traditional Pick model, an item (record) could have one level of nesting. That is, it could have within itself one or more tables; but it could not contain tables nested within tables (ignoring for now the slight extension of using multi-subvalues within associated multivalues). There would be many who think that this limitation is more apparent than real, and that a complex nested structure will bring only problems if you use it to store data. That is a valid point and one with which I have much sympathy. My reason for addressing this in SuperPick is to allow the model to have the complete theoretical capabilty of storing all possible data that may exist in XML.

Storing Objects

Traditionally Pick stored written data. There was no capability to store anything like multimedia data - or even data in somebody elses format. One of the latest Pick implementations has addressed this by having two numbers stored at the beginning of every item. The first number is the length of the key, the second is the length of the data. However, this mechanism imposes an overhead on the storage of every item. The mechanism I have used for SuperPick is not always used and can be used at the field or attribute level.

SuperPick Defined

Delimiters and Enclosures

SuperPick has four delimiters and two pairs of enclosing characters using the low end of the high order ASCII characters as shown in the following table. The SuperPick characters sort of line up with the low order ones so that it's easier to remember them. The low order characters could be used if necessary (they are undefined in ISO 8859-1 and HTML), but I think it's much clearer to use the ones shown.


ASCII

Symbol

Description

ASCII

Symbol

SuperPick Description

2
3
14
15
28
29
30
31
STX
ETX
SO
SI
FS
GS
RS
US
Start of Text
End of Text
Shift Out
Shift In
File Separator
Group Separator
Record Separator
Unit Separator
130
131
142
143
156
157
158
159
SOD
EOD
SSF
ESF
IS
ES
RVS
VS
Start of Object Description
End of Object Description
Start of Sub-File
End of Sub-File
Item Separator
Element Separator
Repeat Value Separator
Value Separator

The Element Separator (ES) is equivalent to the Pick Attribute Mark. The Repeat Value Separator (RVS) and Value Separator (VS) are analogous to the Value Mark and SubValue Mark - they define table rows and columns respectively. A simple list of like values would be separated by RVS characters much like an ordinary unassociated multivalue in Pick. SuperPick uses a mechanism like the ALL one described earlier to present tables, but allows the table to be row-oriented or column-oriented.

A row-oriented table will have repeating sets of values in the form
R1C1(VS)R1C2(VS)R1C3(RVS)R2C1(VS)R2C2(VS)R2C3
just like the ALL mechanism.

A column-oriented table will have sets of repeating values in the form
R1C1(RVS)R2C1(VS)R1C2(RVS)R2C2(VS)R1C3(RVS)R2C3
like the associated multivalue mechanism used in Pick.

Elements and Values

What was known as an Attribute in Pick is called an Element in SuperPick. This corresponds better with the XML conventions. In SuperPick, a value can only be a simple string. An element can be just a single value, in which case it would correspond to an XML attribute, or it can be one of two more complex structures. One of the complex structures is a sub-file, the other is an object.

Another major difference between Elements and Values is that the first element in an item is the key. This may only consist of one or more non-repeating values and must be unique. This is the zeroth element, as contrasted to the first value (column) in a table, which is number 1, and does not have to be unique.

Sub-files

An element may start with the Start of Sub-File (SSF) character. In this case, everything until a matching End of Sub-File (ESF) character is a sub-file with items delimited by Item Separators (IS). While it may appear that the sub-file makes the RVS/VS table structure somewhat redundant, it is retained because my experience with Pick shows its value and efficiency. While the sub-file is useful for assembling data, it is awkward in the practical database situation to have data held in such large chunks. The limited nesting allowed in the RVS/VS table is just enough to be useful. The table does not have to have a unique key, either.

Object Representation

An element may start with a Start of Object Definition (SOD). A text definition of an object lies between the SOD and a matching End of Object Definition (EOD). The definition can be anything required (a URL for example), but there is a special case where the definition has a number (in text) at the beginning. If the first character after the SOD is an ASCII number 1-9 (characters 49 to 57) followed by ASCII numbers 0-9 (characters 48 to 57) then the number gives the length of the binary data for the object and this will follow the EOD.

Some notes on data presentation

In general the data is expected to be written, and a set of items can be displayed in the spreadsheet style format with a row for each item and a column for each element. Where an element is an object, then some icon could be displayed and clicking on it could activate a process to show a picture, play a song, whatever. If an element is a subfile or table, then clicking on an icon could open up a sub-screen displaying a further depth of rows and columns.

Prescriptive Data Definition

The SuperPick model requires a data definition that, while capable of being changed, will disallow updating with non-conforming data. I propose that a single special item in a Pick-like data dictionary should contain all the enforced data. Other dictionary items can be used for display massaging, just like Pick. The special item should have a reserved name so it is easy to locate - I propose SPID (for SuperPick Item Definition).

The SPID should specify the layout and structure expected, but I'm not sure just how much detail it needs to have. The SPID needs to name each element and value in the structure, and this is easier to do if we have the rule that each name must be different. Note that XML does not have this requirement, but it makes for a much more easily understood system in my opinion. Using the SPID, a SuperPick data stream can be transformed into good XML. A generic SPID should exist to define the layout of all SPIDs.

The SPID defines each element in an item. We require the element number (or position), its name, and its type. There are four types: value, table, object, and sub-file.

Lets look at the data from the bottom up. We have objects that are elements, values that are elements, and values that are in tables. We have elements that are sub-files, elements that are within sub-files, elements that are tables, and elements that are in the overall item. Tables can be row or column oriented. A single column row oriented table is a list. A single row column oriented table is only used in a key - the simplest key having only one column.


Thing

Code

Position
Element
Reference

Object ElementOelement?
Value ElementVelement?
Table ValueTtableyes
Sub-file ElementSelement?
Element in Sub-file?element?
Row Table ElementRelement?
Key Table Element (type 3)Kelement?
Column Table ElementCelement?
List ElementLelement?
Element in Item?elementno

A draught of the SPID definition item is as described below. The structure is in a sub-file. Another sub-file describes the limits on values. These are in sub-files to enforce unique names. The sub-files are separated so there's not a lot of empty limitations for elements.


SPID Item Layout

Number

Type

Name

Description

0VKeyAlways "SPID"
1VNameA name (or a short description)
2VDescriptionA full(er) description
3SStructureA sub-file defining the structure of the whole thing
4SLimitsA sub-file defining the limitations set on Values


Structure Sub-file Layout

Column
Number

Type

Name

Description

0VUniqueNameThe unique element or value name
1VTypeThe type code
2VPositionThe element or column number
3VWithinAn element name (if it is inside one)


Limits Sub-file Layout

Column
Number

Type

Name

Description

1VValueNameThe value name
2LLimitationThe limitations on the value

I'm presuming one will be able to enter statements in some form of logic to define the limitations. An example would be something like

"= 'SPID'" or "TYPE = DATE" or ">= DATE()"

Perhaps even more complex limitations involving calculations involving and/or comparisons against other values might be possible. A good basis for this might be I-types as used by Universe and Unidata. 1

Hosted by www.Geocities.ws