QParse
for
MS-DOS

Download QParse for DOS v.0.15 (67,496 bytes) 



What's in the zip
QParse.exe
The MS-DOS executable.
QParse.Ini
EQLog.Ini
SrchEngs.Ini
The example *.ini files for QParse for DOS
This html file.


Updates
Currently no updates to this version of the program.



Description
QParse is a programmable text-based (ascii) file parser. It is used to filter data from a text file and put that data in a text-based spreadsheet (in CSV format).

The main *.Ini file (Qparse.Ini) contains two sections: [Options] and [Routines]. The [Options] section can contain settings to be used instead of the command line options. NOTE: Currently the Qparse.ini has precedence over the command line switches. The [Routines] section lists all the routine names along with the source file to find the routines code in.



Command Line Syntax
QPARSE src_file [dst_file] /F:func_name [[/s:val] [/s:val]
 
src_file The text-file to be processed.
Dst_file The optional output file [Generally with be a *.CSV file]
/F:func_name The routine, as listed in the *.ini file(s), to use to process the source file
/s:val Switches which may be one of the following:
/A:0|1 Overwrite [0] or Append [1] the specified output file
/C:0| Specifies whether or not the searches should be Case-Sensitive [0::Ignore Case, 1::Case-Sensitive]
/E:0-0 Specifies the Encoding function to use. Currently no Encoding functions are available, there the only option is 0
/H:0|1 Specifies whether or not to print a header line.
/I:0|1 Specifies whether or not to Ignore blank lines.
/L:num Specifies the size in characters of Fixed-width format output fields.
/O:0-3 Specifies the Output Format. [0::CSV, 1::Fixed, 2::Deliminated, 3::Custom]
/Q:cstr Specifies the String to use instead of the Double Quote character. The string can use the C Escape sequence format.
/S:cstr Specifies the String to use instead of the Comma for the field separator character.
/T:cstr Specifies the String to use instead of "\r\n" for the Input Line Termination marker.
/W:cstr Specifies the String to be ignored (deleted before processing). The default is "\r".
/X0:cstr Specifies the String to be used for the False conditions of the XIST command. The default is "0".
/X1:cstr Specifies the String to be used for the True condition of the XIST command. The default is "1".
src_file is any filespec not including wildcards, that Qparse will search through to extract information.
dst_file is optional, and if not specified, the screen is used for output. It too can not contain wildcards


Example QParse.Ini file
[Options]

[Routines]
Functions=2
Func1=MarchantItemPrice,Test.Ini
Func2=HotBot,Test.Ini
Example Test.Ini file
[MerchantItemPrice]
LAST, "You have entered ", "You have entered ", "."
EACH, "tells you, 'That'll be", "] ", " tells you, 'That'll be "
FROM, "tells you, 'That'll be ", " platinum"
UPTO, " gold"
UPTO, " silver"
UPTO, " copper"
APND, 2
FROM, "per ", "'."
FROM, "for the ", "'."
EEND, ""
REND, ""

[HotBot]
SKIP, "<DIV", 1
LINC
UNTL, ">next</a>"
FROM, "&rsource=INK>", "</a></b><br>"
UPTO, "<br><font size=1><i>"
UPTO, "</i> "
UPTO, "<br>"
UEND
REND

Description of the above *.Ini files
The abouve files are used just as examples, the actual *.Ini files included in the *.zip file are much more complete, and even the MerchantItemPrice function is more coplex.

In the example routine MerchantItemPrice, we first want to know what zone we are in, so we use the LAST command to find that out. The following is such a line in the eqlog.txt file:

[Sat Aug 12 00:13:33 2000] You have entered West Freeport.
Next, we want to know who the merchant is, so we use the EACH command, since each line with a merchant telling you how much an item costs contains the rest of the data we need (merchant name, platinum pieces, gold pieces, silver pieces, copper pieces, and the item name). The following is such a line in the eqlog.txt file:
[Sat Aug 12 01:07:18 2000] Innkeep Juna tells you, 'That'll be 3 silver 1 copper per Ration'.
We find the merchant name using the EACH command, looking for any line contains "tells you, 'That'll be " and extracting the characters of the "] " string, and up to but not including the " tells you, 'That'll be " string. Next, we need to find out the cost of the item. Since the cost begins immediately following the last part of the previous EACH command, we use an UPTO command to find the number of each coins we need. Finally, note that we use the APND command since items which are stackable are reported as price per item, where as items which are not stackable are reported as price for the item. The following is a line from the eqlog.txt file showing the format used for a non-stackable item:
[Sat Aug 12 01:07:23 2000] Innkeep Juna tells you, 'That'll be 6 silver 3 copper for the Small Lantern'.

The example routine HotBot converts an Html page generated from a search at http://www.hotbot.com in to a spreadsheet of the relevant information for each record. Note that because of the first web page reported also contains a ">next</a>" string, this routine does not include the first web page reported. This can be easily accomplished by using a FINDFEND command prior to the UNTL command. I may show how to do this in the next update of this document.

First, we start off by getting past all the header information. This is done by SKIPping to the "<DIV" string, which also contains the first reported web page. To get around the ">next</a>" string that is found on that line, we use the LINC command to advance to the next one.

Since every reported web page is on its own line, we use the UNTL command to process every one of the following lines. The rest you can pretty figure out by looking at a web page generated by the hotbot engine.


QParse.Ini [Options]
Option Command Line Switch OPTS Comand switch
CASESENSITIVE=0|1 /C:0|1 "/C",0|1
IGNOREBLANKLINES=0|1 /I:0|1 "/I",0|1
OUTPUTFORMAT=0|1|2|3 /O:0|1/2/3 "/O",0|1/2/3
ENCODEFORMAT=0 /E:0 "/E",0
MAXFIELDSIZE=num /L:num "/L",num
LINETERMINATOR=c-string /T:c-string "/T",c-string
FIELDSEPARATOR=c-string /S:c-string "/S",c-string
QUOTEMARK=c-string /Q:c-string "/Q",c-string
WHITESPACE=c-string /W:c-string "/W",c-string
APPEND=0|1 /A:0|1 n/a

Command Properties
Syntax
 
 Cmnd  Parm1  Parm2  Parm3  Parm4  Type
 AFTR  after match  up to match  [heading]  Output
 APND  fields  [heading]  Output Control
 BFOR  before match  back to match  [heading]  Output
 DATE  format  [heading]  Format
 EACH  line contaiins  after match  up to match  [heading]  Output Conditional
 EEND  Control
 FALS  Output
 FEND  Control
 FIND  occurances  match  [heading]  Conditional
 FROM  after match  up to match  [heading]  Output
 FRST  fields  [heading]  Output Control
 HOME  Cursor
 IGNR  fields  Output Control
 LAST  contains  after match  up to match  [heading]  Output
 LDEC  Cursor
 LEND  Cursor
 LINC  Cursor
 LINE  contains  [heading]  Output
 LKUP  return field  lookup field  csv filespec  [heading]  Format
 MSTR  start position  end position  [heading]  Output
 NEXT  lines ahead  after match  up to match  [heading]  Output
 OPTS  option switch  option value      Control 
 ONGO  new line  Control
 ONEA  new line  Control
 ONFI  new line  Control
 ONST  [heading]  Output
 ONUN  new line  Control
 NOGO  new line  Control
 NOEA  new line  Control
 NOFI  new line  Control
 NOST  [heading]  Output
 NOUN  new line  Control
 PREV  lines behind  after match  up to match  [heading]  Output
 REND  Control
 SKIP  occurances  match  Control
 SLCT  start  non-match  string array  [heading]  Format
 TRUE  [heading]  Output
 UEND  Control
 UNTL  match  [heading]  Conditional Loop
 UPTO  up to match  [heading]  Output
 VALU  non-number  number type  [heading]  Format
 XIST  exists  start line  end line  [heading]  Output



Commands in Detail
A F T R
Use the AFTR command to find a string that starts after the first matching after match and upto but not including the following upto match, where a non-existant upto match returns everything to the end of the line. This command returns a Null (or empty string) if Parm1 (after match) is Not found.
A P N D
Use APND to merge multiple routine fields into a single output field. This is useful when the source file has two or more different formats for the same data. See the Test.ini file routines, MerchantItemPrice and HotBot above. NOTE: For APND to work correctly, you must include even non-output fields in the # of Cmds to Merge parameter.
B F O R
The BFOR command is used when the required data is presented before the data's specifier. This command returns a Null (or empty string) if Parm1 (before match) is Not found. If Parm2 (back to match) is not found, then it goes back to the beginning of the line. This command searches the line from the end of the line backwards for the first occurrence of Parm1 (before match) and from the beginning of the found Parm1 (before match) until the end of the first ocurrence of Parm2 (back to match).
D A T E
Use the DATE command to reformat the following field into a standard date formatted field of the form "Mn/Dy/Yr Hr:MN:SC". The structure of the format field (Parm1) is a literal string which may contain the following escape sequences to specify where the various components of the date can be found:
D1 = day of month (no leading zero);
D2 = 2-digit day of month (leading zero);
D3 = 3-character day of week;
D4 = day of week (unabbreviated);
M1 = numeric month (no leading zero);
M2 = 2-digit numeric month (leading zero);
M3 = 3-character month abbreviation;
M4 = unabbreviated month name;
Y2 = 2-digit year (numbers below 80 are assumed to be 20##, while numbers 80 or above are assumed to be 19##);
Y4 = 4-digit year;
H1 = twelve-hour date format (no leading zero);
H2 = 2-digit 24-hour format; N1 = minutes (no leading zero);
N2 = 2-digit minutes (leading zero);
S1 = seconds (no leading zero);
S2 = 2-digit seconds (leading zero);
S3 = floating point value of seconds (no leading zero);
S4 = 2-digit mantissa floating point value of seconds (leading zero);
AP = am/pm without periods (i.e. AM, PM, am, pm);
AM = a.m./p.m. with periods (i.e. a.m., p.m., A.M., P.M.)
E A C H
The EACH command is an Output Conditional command. This means that it is used to extract data for output and is a condition tester for whether or not the record can be displayed. Whenever a line contains the same string as Parm1 (line contains), then an internal flag is set, and the complete record is outputted whenever the next EEND, FEND, or REND is encountered. This function along with FIND and UNTL are the main trigger functions that allow a record to outputted. If Parm2 (after match) is not found, then the extracted data starts from the beginning of the line, while if Parm3 (up to match) is not found, a Null, or empty string is returned.
E E N D
EEND marks the end of an EACH loop. Actually, it marks the end of either an EACH or a FIND loop. If a previous EACH or FIND command had a match to Parm1 (line contains), then the complete record from the beginning command to the current command is displayed, the flags are reset and command execution continues if there are any commands that follow.
F A L S
FALS returns an Xist False String (see /X0: command line switch). Currently both the TRUE and FALS commands are fairly useless except for when used within an APND or FRST command to actually have something other than a Null string returned. Under all circumstances currently both the TRUE and FALS commands return values. Note that the ON?? and NO?? commands do not have any control during the output phase (refer to the ONGO and/or NOGO command).
F E N D
EEND marks the end of a FIND loop. Actually, it marks the end of either an EACH or a FIND loop. If a previous EACH or FIND command had a match to Parm1 (line contains), then the complete record from the beginning command to the current command is displayed, the flags are reset and command execution continues if there are any commands that follow.
F I N D
The FIND command is a Conditional command. It is used primarily as a flag to indicate if a record should be outputted or not. Secondly, it is used skip several occurances of a string.
F R O M
Use the FROM command to scan an entire line and extract the data between the two matching parameter string, Parm1 (after match) and Parm2(up to match). This command is basically the same as the EACH command except it doesn't enable output and doesn't have a line contains paramter. The only other difference is that FROM returns a Null string if either Parm1 (after match) or Parm2 (up to match) are not found.
F R S T
Similar to APND, the FRST command converts several fields in to a single field. However, instead of simply appending each field, FRST outputs only the first field that is not empty. The exception being, if all the following fields are empty, then FRST will return a Null string.
H O M E
Use the HOME command to move the cursor to the beginning of the line. This command modifies the character position only.
I G N R
You can use the IGNR command to block the output of Parm1 (fields) following commands used for testing for the ON?? and NO?? commands.
L D E C
Use the LDEC command to go back to the previous line. This command modifies the line position only.
L E N D
Use the LEND command to move the cursor to the end of the line. This command modifies the character position only.
L I N C
Use the LINC command to go to the next line. This command modifies the line position only.
L I N E
The LINE command is used to output the entire line as a single record when the line contains Parm1 (line contains). This is useful in a routine as the only Output command to extract specific lines. This command does not modify either the line position or column position.
L K U P
Use the LKUP command to format the following field. This is done by finding the first match of the following command's return value in the Parm3 (csv filespec) specified *.csv file, under the Parm2 (lookup field) column, and returning the corresponding Parm1 (return field) value. If no match can be found, then the LKUP command returns a Null (empty) string.
L N L T
Use the LNLT command to move the cursor Parm1 (spaces) character(s) to the left. This command modifies the character position only.
L N R T
Use the LNRT command to move the cursor Parm1 (spaces) character(s) to the right. This command modifies the character position only.
M S T R
When working with fixed or preformatted data, such as a DOS directory listing, use the MSTR command. Parm1 (start position and Parm2(end position) are numeric values that represent character positions from the beginning of the line when positive, or from the end of the line when negative. This command does not modify either the line position or column position.
N E X T
The NEXT command is just like the FROM command except that it doesn't test the current line, but rather the Parm1 (lines ahead) line. Secondly, if Parm1 (lines ahead) is negative, it will still test positive (lines ahead) line, and also if Parm1 (lines ahead) is negative and Parm3 (up to match) is not found, then command returns a Null string.This command does not modify either the line position or column position. If Parm2 (after match) is not found, then it starts from the beginning of the line. If Parm3 (up to match) is not found, then it extracts to the end of the line.
O P T S
Use the OPTS command to automatically set or change the program options while the function is running. The Parm1 (option switch) parameter takes the same form as the command line switches without the colon or value following the colon. For example, to set the option for case-sensitive searching, use OPTS, "/C", 1.
There are a couple of differences in format of OPTS command and the command line switches. The first of these is that the /H: print header switch is ignored as an OPTS option switch. Furthmore, the command line switches /T:, /S:, and /W: that normally take c-encoded strings as parameters, under the OPTS command only take numeric values representative of the ASCII code for the character(s). If the value is between 0 and 255, inclusive, then it is considered to be a single character string. Values less than 0, down to -32,768, and values from 256 to 32,767 are considered a 2-character string, where the first character's ASCII value is equal to ((Parm2/256) AND 255) and the second character's ASCII value is equal to (Parm2 AND 255). For most cases where a 2-character string is required you will probably want to write the paramter value in BASIC hexadecimal format of &H????, where ? is a single hexadecimal digit. Using this format a string of return (&H0D) character followed by the new line (&H0A) character would be represented as &H0D0A. In other words, Parm2 (option value) must be an integer value between -32,768 and 32,767, and for those option switch cases where a string is expected, the value of Parm2 is interpreted as single character string, if and only if, the value of Parm2 is between 0 and 255, otherwise is interpreted as a dual-character string with the high-byte being the first character.
A couple of switches not included on the command line are: /END which when set allows commands that normally return a Null string when the up to or back from parameters is not found, will instead use the end of the line as the match. Commands which this switch affects are: BFOR, EACH, FROM, and UPTO.
Similarly, there is also the /START switch which when set allows commands that normally return Null when the after parameter is not found, will instead start from the beginning of the line. Commands affected by this switch include: AFTR and FROM.
O N G O
Use the ONGO command to change the current line new line lines, when the previous Non-Control command in a function is successful (has a Non-Null return string). New line is relative to the current line, so that a negative new line will change the current line to a previous, while a positive new line will skip lines forward.
O N E A
Use the ONEA command just like the ONGO command. The only difference is that ONEA tests to see if any previous EACH command was successful. That is if any previous EACH command has a Non-Null return value, then ONEA will change the current line the relatively specified new line.
O N F I
Use the ONFI command is just like the ONEA command. The only difference is that ONFI tests all previous FIND commands.
O N S T
Use the ONST command similar to the ONGO command. The main difference is that ONST instead of changing the current line, will instead set its return value to Xist True, otherwise it will return an Xist False value (refer to the /X0: and /X1: command line switches for Xist values).
O N U N
Use the ONUN command is just like the ONEA command. The only difference is that ONUN tests all previous UNTL commands.
N O G O
Use the NOGO command to change the current line new line lines, when the previous Non-Control command in a function is Not successful (has a Null return string). New line is relative to the current line, so that a negative new line will change the current line to a previous, while a positive new line will skip lines forward.
N O E A
Use the NOEA command just like the NOGO command. The only difference is that NOEA tests to see if any previous EACH command was Not successful. That is if any previous EACH command has a Null return value, then NOEA will change the current line the relatively specified new line.
N O F I
Use the NOFI command is just like the NOEA command. The only difference is that NOFI tests all previous FIND commands.
N O S T
Use the NOST command similar to the NOGO command. The main difference is that NOST instead of changing the current line, will instead set its return value to Xist True, otherwise it will return an Xist False value (refer to the /X0: and /X1: command line switches for Xist values), whenever the previous Non-Control command was Not successful (i.e returns a Null value).
N O U N
Use the NOUN command is just like the NOEA command. The only difference is that NOUN tests all previous UNTL commands.
P R E V
The PREV command is just like the NEXT command above, except that it tests Parm1 (lines back) line.
R E N D
REND marks the end of the record and acts just like EEND and/or FEND commands. Actually execution may continue after the REND command, if there any other commands which follow it. The reason behind REND's operation, is that it allows a file to contain multiple records, that have different but definable structures.
S K I P
SKIP behaves exactly like the FIND command except that it doesn't set an 'Ok to output the record' flag. In order for record to outputted, the routine must contain at least one of the following groups: ((EACH or FIND) and (EEND or FEND or REND)) or (UNTL and UEND)
S L C T
SLCT is used to translate the numeric value of the following command's return value into something else. Parm1 (start) is the starting value for Parm3 (string array). For most cases 0 or 1 should be the value used for Parm1 (start), depending upon whether or not the following command's return value begins at 0 or 1, respectively. If the following command's return value is a Null string, then Parm2 (non-match) is returned, otherwise a value from Parm3 (string array) is returned. Parm3 (string array) is a semi-colon seperated array of strings. The index of the string that is returned is equal to the following command's return value minus Parm1 (start) plus one. Thus if the following command's return value is 0 and the SLCT command's Parm1 (start) value is 0, then the first sub-string of Parm3 (string array) is returned.
T R U E
TRUE returns an Xist True String (see /X1: command line switch). Currently both the TRUE and FALS commands are fairly useless except for when used within an APND or FRST command to actually have something other than a Null string returned. Under all circumstances currently both the TRUE and FALS commands return values. Note that the ON?? and NO?? commands do not have any control during the output phase (refer to the ONGO and/or NOGO command).
U E N D
The UEND command checks to see if the previous UNTL command test had been satisified. If not, then it outputs the record data between the first UNTL command and the current UEND command, then reads the next line and restarts the routine execution from the first UNTL command.
U N T L
The UNTL command is a Conditional Loop. Output is enabled for all the commands between the first UNTL and the first UEND, until a line is found containing Parm1 (match).
U P T O
The UPTO command extracts all the data from the current position to the start of a matching Parm1 (up to match). If a match is not found then a Null string is returned.
V A L U
The VALU command converts the return value of the following command to a standard csv text-based numeric value. This is useful when extracting data that is in one of the following formats: BASIC formats: (&H? for hexadecimal values; &O? for octal values #.####E## for floating point values); C formats: (0x? for hexadecimal values; 0? for octal values). No other formats are currently supported, however, at the minimum ASM formats (i.e. 0?h for hexadecimal values; 0?o for octal values; and 0?b for binary values) is planned. If you have ideas for other formats to support, email me at NookieMonster@MailAndNews.com.
X I S T
For Yes/No or True/False types of data, use the XIST command. It searches relative to the current line, all lines between Parm1 (start line) and Parm2 (end line) in ascending order, and returns the ExistTrue value (see /X1: switch) if any of the lines contains Parm3 (exists) or the ExistFalse value (see /X0: switch).


Got comments, suggestions, bug reports, cool *.ini functions you have come up with, or just something to say, then send them to NookieMonster@MailAndNews.com

The original page is at http://thunder.prohosting.com/~nmonster/qparse/qparsed.html.