The Pick data structure is capable of storing data in a compact and reasonably easily read form. It has some of the characteristics of just about any form that has been used - heirarchical, relational, object - it "sort of" looks like them all. One expert even said it would look like "the next thing" and when XML came along it did!. However, it has some deficiencies and curiosities that the data format I here call SuperPick will, I hope, correct.
Pick organises data as a string of writable characters using the eight bit ASCII character set. The first 32 characters (0-31) are unused. Characters 32 to 127 are the letters and numbers of English in the seven bit character set. The three characters 254, 253, and 252 are delimiters called attribute mark, value mark, and subvalue mark respectively. The original R83 Pick implementation used character 255 as a hidden delimiter. One implementation of Pick uses character 128 as a null and character 251 as a further delimiter. The use of these specific high-order ASCII characters means that Pick is incompatible with other character sets. This is one of the areas that SuperPick addresses.
The most fundamental thing in Pick is the idea of a record (item) consisting of a string of data split by the attribute mark into a number of fields (attributes). Pick uses a dictionary definition to indicate the nature of these fields by reference to their postion. The dictionary definitions are, however, indicative and not prescriptive. This is a weakness and a strength of the Pick data model, but one can see how the model somewhat resembles the relational one. I should note that in at least one implementation it is possible to overlay an SQL model which enforces prescriptive definitions. It is also true to say that many Pick applications enforce this by various means - most of these depend on maintaining a site discipline.
Any attribute can be set up as a string consisting of repeated fields separated by value marks. These are called multivalues and are one of the most flexible, useful, and controversial aspects of the Pick data structure. The use of multivalues is often regarded by relational data advocates as something akin to a heresy because it appears to violate a 'rule' of atomicity of data. If one here establishes the basic concept that Pick is a data structure, and not a database, then the difficulty disappears. One is then left with a very compact method of grouping data that is in the nature of a list. There is an inherent order in the list which may or may not be meaningful. My own view is that it is unwise in the practical situation to assume the order of the list means anything.
A single-valued attribute is closely comparable to an attribute in XML. One with multiple values in it compares to repeated elements in XML.
A further subtlety is to link a number of attributes that are multivalued. These are called associated multivalues and are a way to define data that is tabular in nature. In this case the position of each value has meaning in that the 'nth' value in each attribute are linked with each other as part of a meaningful set of data. A good practical example is using associated multivalues to store the lines of an invoice in the same record as the invoice header information. This somewhat resembles the object model of data, but is more usually referred to as a 'nested relational' model. It compares with an XML element enclosing repeated sets of other elements.
Pick allows each value to be further divided into subvalues. In my experience, this is not at all common - mainly because the enquiry language has no way to use this data. I have seen it used in defining complex structures used to drive specific '4GL' tools. The only other time I have seen this used (in a way I felt made sense) was in a product called ALL where an attribute could contain an entire table. The value marks were used to separate table rows and the subvalue marks were analogous to attribute marks in the standard Pick data structure. I saw this as being effectively the same thing as associated multivalues. It had the benefit of keeping related data close to each other in the physical sense (you lose the benefit of keeping similar data close to each other). It also might mean that data could be held more compactly, depending on the existence and nature of 'empty' fields that may occur in the data.
The enquiry language in Pick (this goes by various names, but I will here use the R83 name 'Access') allows for a great deal of subtlety and power by means of complex dictionary definitions. Among the most useful mechanisms is the ability to refer to the contents of other files by using attributes or multivalues from one file to refer to keys (or parts of keys) in other files. In Pick terminology, these referred to as 'T-correlatives'. Dictionary definitions used by Access contain a number of other correlatives. They can also use other things called 'conversions' (EG converting a day number to a date). These mechanisms provide very sophisticated means to extract and format data for reporting purposes. This is a large and somewhat specialised part of the Pick system, so I won't go into any more detail.
While it has not stopped the Pick model being used with other languages, I feel that the use of the very high order ASCII characters as delimiters makes for difficulties that could have been avoided. The low order ASCII characters (0-31) were originally used for communication control, but there are a parallel set of characters from 128 to 159 which, unlike Pick's delimiters, are not used in the ISO 8859 character sets. The ISO 8859 character set is the basis for most Internet protocols. Note that Unicode is identical to ISO 8859-1 (Latin-1) for the first 256 characters.
In the traditional Pick model, an item (record) could have one level of nesting. That is, it could have within itself one or more tables; but it could not contain tables nested within tables (ignoring for now the slight extension of using multi-subvalues within associated multivalues). There would be many who think that this limitation is more apparent than real, and that a complex nested structure will bring only problems if you use it to store data. That is a valid point and one with which I have much sympathy. My reason for addressing this in SuperPick is to allow the model to have the complete theoretical capabilty of storing all possible data that may exist in XML.
SuperPick has four delimiters and two pairs of enclosing characters using the low end of the high order ASCII characters as shown in the following table. The SuperPick characters sort of line up with the low order ones so that it's easier to remember them. The low order characters could be used if necessary (they are undefined in ISO 8859-1 and HTML), but I think it's much clearer to use the ones shown.
|
||||||||||||||||||
The Element Separator (ES) is equivalent to the Pick Attribute Mark. The Repeat Value Separator (RVS) and Value Separator (VS) are analogous to the Value Mark and SubValue Mark - they define table rows and columns respectively. A simple list of like values would be separated by RVS characters much like an ordinary unassociated multivalue in Pick. SuperPick uses a mechanism like the ALL one described earlier to present tables, but allows the table to be row-oriented or column-oriented.
A row-oriented table will have repeating sets of values in the form
R1C1(VS)R1C2(VS)R1C3(RVS)R2C1(VS)R2C2(VS)R2C3
just like the ALL mechanism.
A column-oriented table will have sets of repeating values in the form
R1C1(RVS)R2C1(VS)R1C2(RVS)R2C2(VS)R1C3(RVS)R2C3
like the associated multivalue mechanism used in Pick.
What was known as an Attribute in Pick is called an Element in SuperPick. This corresponds better with the XML conventions. In SuperPick, a value can only be a simple string. An element can be just a single value, in which case it would correspond to an XML attribute, or it can be one of two more complex structures. One of the complex structures is a sub-file, the other is an object.
Another major difference between Elements and Values is that the first element in an item is the key. This may only consist of one or more non-repeating values and must be unique. This is the zeroth element, as contrasted to the first value (column) in a table, which is number 1, and does not have to be unique.
An element may start with the Start of Sub-File (SSF) character. In this case, everything until a matching End of Sub-File (ESF) character is a sub-file with items delimited by Item Separators (IS). While it may appear that the sub-file makes the RVS/VS table structure somewhat redundant, it is retained because my experience with Pick shows its value and efficiency. While the sub-file is useful for assembling data, it is awkward in the practical database situation to have data held in such large chunks. The limited nesting allowed in the RVS/VS table is just enough to be useful. The table does not have to have a unique key, either.
An element may start with a Start of Object Definition (SOD). A text definition of an object lies between the SOD and a matching End of Object Definition (EOD). The definition can be anything required (a URL for example), but there is a special case where the definition has a number (in text) at the beginning. If the first character after the SOD is an ASCII number 1-9 (characters 49 to 57) followed by ASCII numbers 0-9 (characters 48 to 57) then the number gives the length of the binary data for the object and this will follow the EOD.
In general the data is expected to be written, and a set of items can be displayed in the spreadsheet style format with a row for each item and a column for each element. Where an element is an object, then some icon could be displayed and clicking on it could activate a process to show a picture, play a song, whatever. If an element is a subfile or table, then clicking on an icon could open up a sub-screen displaying a further depth of rows and columns.
The SuperPick model requires a data definition that, while capable of being changed, will disallow updating with non-conforming data. I propose that a single special item in a Pick-like data dictionary should contain all the enforced data. Other dictionary items can be used for display massaging, just like Pick. The special item should have a reserved name so it is easy to locate - I propose SPID (for SuperPick Item Definition).
The SPID should specify the layout and structure expected, but I'm not sure just how much detail it needs to have. The SPID needs to name each element and value in the structure, and this is easier to do if we have the rule that each name must be different. Note that XML does not have this requirement, but it makes for a much more easily understood system in my opinion. Using the SPID, a SuperPick data stream can be transformed into good XML. A generic SPID should exist to define the layout of all SPIDs.
The SPID defines each element in an item. We require the element number (or position), its name, and its type. There are four types: value, table, object, and sub-file.
A value is the written representation of a single atomic piece of data. It can be a number or date or time just like Pick represents these. So values will require some means of defining what they are - text, number, date, etc.
Define four types of table, which will have to be defined in the SPID.
Note that a key element can be a type 3 table. This structure is unlikely to be used otherwise as it is clearer to define a number of single-value elements.
The SPID will have to define the values for the table, so each one needs a position reference.
The SPID only has to have minimal information - what's in it is not meaningful to SuperPick.
These will be defined in a similar way to the enclosing item. Exactly how this is done is the puzzle.
Lets look at the data from the bottom up. We have objects that are elements, values that are elements, and values that are in tables. We have elements that are sub-files, elements that are within sub-files, elements that are tables, and elements that are in the overall item. Tables can be row or column oriented. A single column row oriented table is a list. A single row column oriented table is only used in a key - the simplest key having only one column.
|
||||||||||||||||||||||||||||||||||||||||||||||||||
A draught of the SPID definition item is as described below. The structure is in a sub-file. Another sub-file describes the limits on values. These are in sub-files to enforce unique names. The sub-files are separated so there's not a lot of empty limitations for elements.
|
||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
I'm presuming one will be able to enter statements in some form of logic to define the limitations. An example would be something like
"= 'SPID'" or "TYPE = DATE" or ">= DATE()"
Perhaps even more complex limitations involving calculations involving and/or comparisons against other values might be possible. A good basis for this might be I-types as used by Universe and Unidata.