# Description: # # A script to reformat a plain text file document which contains no # particular format. The text file stores some kind of structured content. The idea # of the script is to reformat plain text files which contain 'information' rather # than 'display instructions', although this script also accepts some display instruction # indicators. # # The script will work in conjuction with the cgi script 'edit-wiki.cgi' to allow the # visitor to the web page to edit the page. It does this by generating an HTML form # which points to the cgi script # # Automatically Formatted Text Structures: # # The script recognises some 'cues' within the plain text document. # I refer to these cues or 'structures' as 'Invisible Markup Language' (IML) or Mas # o Menos Markup Language (MMML). The basic ideas is to have as little actual # 'markup' in the text document as possible, and the markup which is present should # 'look good' in the plain text file. So, # instead of using, say, # %^* Section Heading # which is valid markup but looks ugly in the text file, we use all capitals # which, in my opinion, looks better in the text file # # The document title: # --------------------------- # A line beginning with a '=' character is the 'document title'. In the resulting HTML it # will be made big and put at the top of the web-page. It will also be used as the HTML title # Links to document formats: # --------------------------- # A line beginning with '*' will be hyperlinked, and formatted in a special kind of way. # These 'star' lines can be used to provide links to various formats of one particular document # such as pdf, text, html ... # Hyper-linking URLs: # --------------------------- # URLs get automatically hyperlinked if they begin with http:// or www. The display text of the # hyper-link will be the actual url itself. To change this display text use the format # "Hyperlink display text" http://some.url.dom/ This format does not have to begin the line. # To create a link to a relative url use the format link://some/relative/url/path # Document Section Headings: # --------------------------- # Lines which contain only capital (upper-case) letters are interpreted as 'section headings' # These section headings will be used to form a HTML hyper-linked table of contents if that is # required (see the '[]' code below). # # Unformatted blocks of text: # --------------------------- # If you want blocks of the text file to be 'untouched' by the transformation script then # you can surround those blocks with -->> and --<< This is useful for displaying listings # of computer code and for displaying tables of data. # Document Title Codes: # --------------------------- # Certain codes can be placed after the document title (as defined above: a line beginning with '=') # This codes can have a large influence on the appearance of the resulting HTML. All codes can # be combined in what ever combination # {x} means number all section headings (including in the Table of Contents) # {~} means make the section headings into 'capital case' # These can be combined, for example: {~x} # [] means display the table of contents # [~] means display the TOC with items in 'capital case' # (+) means display automatic google translation links in the HTML # (XX) display 'user messages' in the language specified by the two letter language code # A list of these codes can be found at http://www.w3.org/WAI/ER/IG/ert/iso639.htm # At the moment only common european languages are supported, and even those, incompletely # # In some cases the function of these codes is duplicated by parameters which may be passed to the # shell script. # # Specific Display Indicators: # --------------------------- # The following are designed to allow greater control over the appearance of the resulting HTML. # This contradicts the general principle of this script which is to encourage the creation of # 'information' rather than display instructions, but these indicators do not have to be used, # and they may be grouped together at the end of the text file to minimize their corrupting influence # # background-color: some-color # This changes the background color of the resulting HTML # background-image: some-image-url # This changes the background image of the resulting HTML # section-heading-css-class: # Determines the css class which will govern the appearance of the section headings # css-style-sheet: # Determines the css style sheet which will be applied to the document # Title Image Html: some-title-image-html # This is designed to allow the placement of an image such as a logo at the top # of a web-page which will also act as a link to some home page. This is not # a particular useful concept # Title Decoration: # This is a strange concept which I had of allowing Ascii style decorations at # the top of the HTML page underneath the heading for the page. This was after # noticing that some ascii characters, when enlarged with a big font can be quite # aesthetically pleasing # # [IMAGE-INDEX-BEGIN] / [IMAGE-INDEX-END] # These block tags are use to create and 'image index' which is a list # of Hyperlinked images with descriptive text next to each of the images # These tags are soly for presentation purposes # [BAND-INDEX-BEGIN] / [BAND-INDEX-END] # These tags are used to delineate a list of musical bands. These two sets of block # tags were really invented in order to fulfill the requirement of the website which # is at http://poseidonia.ella-associates.org (august 2003). Since this site is for # a music festival which will soon be over, and since it is hosted on a rented server # in the United States, it is quite likely that this site will not exist for very long. # # Examples: # If the scripts are on the system 'path' the the leading './' characters # below are not necessary # # ./plaintext2html-wiki.sh concert-details.txt notran > concert-details.html # This command line, executed in some kind of a bash shell, will transform # a plain text file which isn't is any particular format, into an HTML file # (that is it will create a new HTML file and leave the original text file # unchanged) and will not display the automatic translation links to # Google. Also an HTML table of contents (with one entry for each heading, # if there are headings) will be inserted in the HTML document. # # ./plaintext2html-wiki.sh mjb-work.txt notran notoc > mjb-work.html # The text file will be transformed into HTML but no table of contents will # be inserted nor any translation links. # # ./plaintext2html-wiki.sh mjb-work.txt tran notoc > mjb-work.html # If translation links are desired but no table of contents, use a command # line similar to above. The string 'blah' could be anything as long as its # not 'notran'. This slighty dodgy 'feature' is owing to the fact that I am # not using any 'getopt' style option parsing. # # ./plaintext2html-wiki.sh stuff.txt notran toc "http://63.105.73.195/cgi-bin/some-weird-script" # This transforms the file stuff.txt omitting translation links, inserting # a hyperlinked table of contents, and setting the target for the 'edit # document' form to the URL specified in the last parameter. # # For examples of using 'invisible markup language' you can try looking at the URLs # http://poseidonia.ella-associates.org/ and # http://poseidonia.ella-associates.org/band-details.html # however, the whole point about the invisible markup language is that you dont really need to # know how to use it, since it guesses how it should behave. # # Parameters: # textFileName [required] # The name of the text file which is to be transformed from text into html # notran [optional] # If the second parameter is the string 'notran' then the javascript links # to the google automatic language translation engine will NOT be inserted # into the HTML page. This is useful, for example, when the HTML page is # going to be located within a 'password-protected' directory, because the # Google translation engine will not be able to access the page, and # therefor the translation links will not work. # notoc [optional] # If the third parameter is the string "notoc", then no HTML table of # contents will be generated. # forumProcessorUrl [optional] # This parameter indicates where the processing script is located. If it # is omitted, currently the url will default to # http://www.ella-associates.org/cgi-bin/add-comment # output-language [optional] {Not implemented} # This is the language in which the message on the generated HTML page # will appear. For example messages next to the comment boxes and the # translation links. # noforum {Not implemented} # If this parameter is present no HTML form will be produced in the ouput # and therefor the web-visitor will not be able to add comments to the # pages. # path-to-style-sheet [optional] {Not implemented} # Still to implement # This is the full path (relative to the Web Server Document Root) # to the style sheet which is to be used by the generated HTML page # # # Notes: # This script was derived directly from text2html-collab.sh which in turn # was derived from poseidontext2html-collab.sh which was derived from # plaintext2html-forum.sh and so on # # When I say derived I mean that the script was created by modifying that script # This script should also transform quotes into " & into & etc The # script appears to be working reasonably well in conjunction with the # 'edit-wiki' cgi script. # # The 'nl' number-line program, when used with the -bp option # double spaces the text file with lines containing only spaces. Therefore # some extra 'sed' lines are necessary to remove these blank lines # # See Also: # There is a whole set of 'plaintext' transformation scripts which can be found # (august 2003) at http://www.ella-associates.org/utils/ This is probably not a very # long-term URL # # edit-wiki.cgi # This is the cgi script which can work in conjunction with the current script # diary2html.sh, # Turns a 'diary' style text file into HTML # linkdoc2html.sh, # Turns a text file which has a list of URL links and descriptions into HTML # linkdoc2html-index.sh # As above but also adds an HTML 'table of contents' for possible 'section headings' # linkdoc2html-forum.sh # Turns a text file with a URL list into an HTML file which has the capability # to be contributed to by a web-visitor (using cgi-scripts) # plaintext2pdf.sh, # Turns a text file into a pdf file with an optional table of contents. Relies on # the 'htmldoc' program # plaintext2html-forum.sh # Renders a text file as HTML and displays a form which allows the web-visitor # to add 'comments' to the page (text file) # plaintext2html.sh # Turns a text file with possible section headings and urls into an HTML file # glossary2xml.sh # Turn a text file which is a sort of 'glossary' into a dodgy xml file. This # needs to be improved to produce DocBook XML, which has the tag # alphabetize-glossary.sh # Re-arranges a text file which contains a series of definitions of 'items' or 'terms' # so that the items are ordered alphabetically. # add-comment # a cgi-script which can be used in conjuction with some of the # scripts above to add content specified by web-visitors to a web page # script-summary.txt # contains more short descriptions of scripts and what they do. # plaintext2docbook.sh # Turns text files into DocBook Xml, fairly well but not fail-proof # text2html-collab.sh # This is very similar to the current script but the HTML editing box is on # the same page as the text # Author: # m.j.bishop # # Bugs and Ideas # In Internet explorer the 'capital case' function does not work properly. The # entire text is lower-cased and the first letter is NOT upper-cased. This is # probably another 'text-area' line ending problem # # When you hit 'refresh' in Netscape and IE the contents of the HTML textarea # are not 'refreshed'. That is, the contents do not reflect the true contents # as dictated by the HTML source code. Rather the editings of the user are # preserved. I presume this is customizable. # # In IE when you hit refresh the page does not refresh at all, which means that # the user is unable to see the changes which she has made. # # This script needs more internationalization # # The script could also check if there are translations of the current # HTML or text file, using the standard naming convention of name.file-type.language-code # An example of this naming convention is stuff.html.es which should # be an HTML file which contains Spanish language content. This present # script could check for files which have the same name as the source # file but which have a different language code extension, and could # therefore automatically add a link to the translated file (in addition, # perhaps to the Google translation links). The script would only # check in the current directory for these 'translated' files. # # When there are no numbers for section headings capital case in not working 6june # # This is becoming a script which attempts to do absolutely everything. This may # well not be a good idea # # Dependencies: # iso2html.sed # A script which turns accented characters into HTML entities. This is # a bit tricky since the American Server uses UTF-8 rather than ISO-8859 # I could get around this by using a directive in the head, but I am not sure # if all browsers can handle the utf-8 character set. # capital-case.sed # A script which 'capital cases' words. That is the first letter is upper # case and all the rest lower case # capital-case-headings.sed # This does the same as above but on for lines which are 'section headings' # which means all capital letters # edit-wiki # This is not a total dependency, but the HTML form generated by the # current script will not do anything useful without this script # procgi # A bash shell cgi HTML form value extractor used by 'edit-collab' # The 'uncle-sam.gif' image # which is used next to the text box message # The images used on the 'poseidon site' # various Unix tools, a Bash shell # iconv # A unix program to convert from different text encodings, such as # iso-8859-1 to utf-8 # # History: # # june 3, 2003 # Adapted this script from 'poseidon-text2html-forum.sh' # june 6, 2003 # Improved the handling of accented characters. All section headings are # now governed by the variable 'sHeadingPattern' # june 30 # adapted from text2html-collab.sh if [ "$1" = "" ] then (echo "usage: $0 textFileName [notran] [notoc] [forum-processor-url] [noforum]"; \ echo "PRESS q TO EXIT THIS HELP. PRESS [space-bar] TO SCROLL DOWN, b to SCROLL UP"; \ cat $0) | sed -n "/^[ ]*#/p" | less exit 1; fi #-- This is the pattern which determines what sort of lines will #-- be interpreted as 'section headings'. I cannot use the for the 'awk' line #-- because awk does not seem to accept the notation \{n,\} sAccentString='A-ZÁÉÍÓÚÀÈÌÒÙÄËÏÖÜÂÊÎÔÛ·ÇÑ' sHeadingPattern="[-$sAccentString 0-9.\/\\:_\&@#]*[$sAccentString][$sAccentString][$sAccentString][$sAccentString]*[-$sAccentString 0-9.\/\\:_\&@#]*" sOutputLanguage="english" sRawPageTitle="" sPageTitle="" bTableOfContents="true" bTranslationLinks="" sTitleDecoration=$(expand $1 | sed -n "/^[ ]*=[^=]/{N;s/^.*\n//g;s/\.\.[ ]*//g;s/[ ]*$//g;p;q;}") sRawPageTitle=$(expand $1 | sed -n "/^[ ]*=[^=]/{s/^[ ]*=[ ]*//g;s/[ ]*$//g;p;q;}") sPageTitle=$(\ echo $sRawPageTitle | \ sed -e "s/{.\{0,4\}}//g" -e "s/\[.\{0,4\}\]//g" -e "s/(.\{0,4\})//g" | \ sed -e "s//\>/g" | \ iconv --to-code=ISO-8859-1 --from-code=UTF-8 | \ sed -f /var/www/utils/iso2html.sed) #-- If a 'title image' is specified extract it and create the necessary HTML #-- deal with title images which have the image size specified like this (50x60) sTitleImageHtml=$( \ expand $1 | \ sed -n "/^[ ]*Title[ ]*Image:[ ]*([0-9]\+x[0-9]\+)/ {s/^[ ]*//g; s/[ ]*$//g; s/^Title[ ]*Image:[ ]*(\([0-9]\+\)x\([0-9]\+\))[ ]*\(.*\)/<\/a>/g;p;q;}") if [ "$sTitleImageHtml" = "" ] then #-- deal with the cases where there is no image size spedified sTitleImageHtml=$( \ expand $1 | \ sed -n "/^[ ]*Title[ ]*Image:/ {s/^[ ]*//g; s/[ ]*$//g; s/^Title[ ]*Image:\(.*\)/<\/a>/g;p;q;}") fi #-- Get the URL of the background image sBackgroundImage=$( \ expand $1 | \ sed -n "/^[ ]*[bB]ackground-[iI]mage:/ {s/^[ ]*//g; s/[ ]*$//g; s/^background-image:\(.*\)/\1/gi;p;q;}") sBackgroundColor=$( \ expand $1 | \ sed -n "/^[ ]*[bB]ackground-[Cc]olor:/ {s/^[ ]*//g; s/[ ]*$//g; s/^background-color:\(.*\)/\1/gi;p;q;}") sCssStyleSheet=$( \ expand $1 | \ sed -n "/^[ ]*css-style-sheet:/ {s/^[ ]*//g; s/[ ]*$//g; s/^css-style-sheet:\(.*\)/\1/gi;p;q;}") sSectionHeadingCssClass=$( \ expand $1 | \ sed -n "/^[ ]*section-heading-css-class:/ {s/^[ ]*//g; s/[ ]*$//g; s/^section-heading-css-class:\(.*\)/\1/gi;p;q;}") sHtmlHeaderFile=$( \ expand $1 | \ sed -n "/^[ ]*html-header-file:/ {s/^[ ]*//g; s/[ ]*$//g; s/^html-header-file:\(.*\)/\1/gi;p;q;}") #-- The code below allows a wiki user to specify whether a page should have #-- the section headings numbered by inserting '{x}' after the page title sSectionNumberFlag=$(echo $sRawPageTitle | sed "s/{.\?x.\?}//g") if [ "$sRawPageTitle" = "$sSectionNumberFlag" ] then bNumberSections="false" else bNumberSections="true" fi #-- This determines if Section Headings in the body of the page should #-- be made into capital case or not (using code {~} ) sCapitalCaseSectionFlag=$(echo $sRawPageTitle | sed "s/{.\?[~].\?}//g") if [ "$sRawPageTitle" = "$sCapitalCaseSectionFlag" ] then bCapitalCaseHeadings="false" else bCapitalCaseHeadings="true" fi #-- Whether a Section Heading table-of-contents is generated depends on #-- either a script parameter, or the '[]' in the page title if [ "$3" = "notoc" ] then bTableOfContents="false" else bTableOfContents="true" fi sTableOfContentsFlag=$(echo $sRawPageTitle | sed "s/\[.\?\]//g") if [ "$sRawPageTitle" = "$sTableOfContentsFlag" ] then bTableOfContents="false" else bTableOfContents="true" fi if [ "$2" = "notran" ] then bTranslationLinks="false" fi sCapitalCaseTOCFlag=$(echo $sRawPageTitle | sed "s/\[[~]\]//g") if [ "$sRawPageTitle" = "$sCapitalCaseTOCFlag" ] then bCapitalCaseTOC="false" else bCapitalCaseTOC="true" fi sTranslationLinksFlag=$(echo $sRawPageTitle | sed "s/(.\?.\?+.\?.\?)//g") if [ "$sRawPageTitle" != "$sTranslationLinksFlag" ] then bTranslationLinks="true" else bTranslationLinks="false" fi sOutputLanguageFlag=$(echo $sRawPageTitle | sed "s/(.\?[A-Z][A-Z].\?)//g") if [ "$sRawPageTitle" != "$sOutputLanguageFlag" ] then sOutputLanguageCode=$(echo $sRawPageTitle | sed "s/.*(.\?\([A-Z][A-Z]\).\?).*/\1/g") else sOutputLanguageCode="EN" fi case "$sOutputLanguageCode" in ES) sOutputLanguage="spanish";; IT) sOutputLanguage="italian";; EN) sOutputLanguage="english";; CA) sOutputLanguage="catalan";; AL) sOutputLanguage="german";; FR) sOutputLanguage="french";; PO) sOutputLanguage="portuguese";; esac echo "" echo "" echo " " echo " " echo " " #-- The linees below are to stop browsers and servers cacheing these HTML pages #-- which is important since they are editable echo " " echo " " echo " " echo "" echo "" echo "" echo "" echo "" #-- for debugging if [ "a" = "a" ] then echo "" echo "" echo "" echo "" echo "" echo "" echo "" echo "" echo "" echo "" echo "" echo "" echo "" echo "" echo "" echo "" #echo "" fi echo "$sPageTitle" echo "" if [ "$sCssStyleSheet" != "" ] then echo "" if [ "$sBackgroundColor" != "" ] then echo "" fi else echo ' ' fi #-- A css style sheet was not specified echo "" echo "" #-- Put in the contents of the HTML header file, if it was specified if [ "$sHtmlHeaderFile" != "" ] then if [ -r $sHtmlHeaderFile ] then cat $sHtmlHeaderFile else echo "" fi fi if [ "$bTranslationLinks" = "true" ] then echo "
" if [ "$sOutputLanguage" = "spanish" ] then echo "Vea este pagina en (aproximado):" echo "English" elif [ "$sOutputLanguage" = "french" ] then echo "Voir la cette page dedans (approximatif):" echo "English" elif [ "$sOutputLanguage" = "italian" ] then echo "Osservi questa pagina come (approssimativo):" echo "English" else echo "See this page in (approximate):" echo "Español|" echo "Français|" echo "Italiano|" echo "Deutsch|" echo "Português" fi echo "
" fi #-- Put the page heading before the table of contents echo "

" if [ "$sTitleImageHtml" != "" ] then echo "$sTitleImageHtml
" fi echo "$sPageTitle

" #--echo '
({*})
' #-- #echo '|g;}" | \ #-- Example of Format Below: * My Title|/my/path/to/file-no-extension|html|txt|xml|pdf| sed "/
/,/<\/pre>/!{/^[ ]*\*.*|.*|.*/ s/^[ ]*\*[ ]*\([^|]*\)|\([^|]*\)|\([a-zA-Z]\{1,8\}\)|\([a-zA-Z]\{1,8\}\)|\([a-zA-Z]\{1,8\}\)|\([a-zA-Z]\{1,8\}\)|/\1<\/b> (Formats:<\/em> \3<\/a> | \4<\/a> | \5<\/a> | \6<\/a>)/gi;}" | \
   #-- Example of Format Below: * My Title|/my/path/to/file-no-extension|html|txt|pdf| but not in 
s
   sed "/
/,/<\/pre>/!{/^[ ]*\*.*|.*|.*/ s/^[ ]*\*[ ]*\([^|]*\)|\([^|]*\)|\([a-zA-Z]\{1,8\}\)|\([a-zA-Z]\{1,8\}\)|\([a-zA-Z]\{1,8\}\)|/\1<\/b> (Formats:<\/em> \3<\/a> | \4<\/a> | \5<\/a>)/gi;}" | \
   #-- Example of Format Below: \[*\] My Title|/my/path/to/file-no-extension|pdf|html| but not in 
s
   sed "/
/,/<\/pre>/!{/^[ ]*\*.*|.*|.*/ s/^[ ]*\*[ ]*\([^|]*\)|\([^|]*\)|\([a-zA-Z]\{1,8\}\)|\([a-zA-Z]\{1,8\}\)|/\1<\/b> (Formats:<\/em> \3<\/a> | \4<\/a>)/gi;}" | \
   #-- Example of Format Below: * My Title|/full/path/to/htmlfile|/full/path/to/text/file|/full/path/to/pdffile|
   #-- but not in 
s
   sed "/
/,/<\/pre>/!{/^[ ]*\*.*|.*|.*/ s/^[ ]*\*[ ]*\([^|]*\)|\([^|]*\)|\([^|]*\)|\([^|]*\)|/\1<\/b> (Formats:<\/em> html<\/a> | text<\/a> | pdf<\/a>)/gi;}" | \
   #-- Example of Format Below: * My Title|/full/path/to/htmlfile|/full/path/to/text/file| but not in 
s
   sed "/
/,/<\/pre>/!{/^[ ]*\*.*|.*|.*/ s/^[ ]*\*[ ]*\([^|]*\)|\([^|]*\)|\([^|]*\)|/\1<\/b> (Formats:<\/em> html<\/a> | text<\/a>)/gi;}" | \
   #-- Trick to make 'txt' links into 'text' links for readability
   sed "s/>txt<\/a>/>text<\/a>/gi" | \
   #-- Example of Format Below: * My Title|/full/path/to/any-old-file|
   sed "/
/,/<\/pre>/!{/^[ ]*\*.*|.*|.*/ s/^[ ]*\*[ ]*\([^|]*\)|\([^|]*\)|/\1<\/b>(\2<\/a>)/gi;}" | \
   #-- Example of Format Below: [*] /full/path/to/any-old-file
   sed "s/^[ ]*\[\*\][ ]*\([^ ]\{2,\}\)/\1<\/a>/gi" | \
   #-- Example of Format Below: * http://domain.org/resource.html
   sed "s/^[ ]*\*[ ]*\(http:\/\/[^ ]\{2,\}\)/\1<\/a>/gi" | \
  #-- Hyperlink urls with different display text like: "Some Display" http://blah.com
  sed "/
/,/<\/pre>/!s/\"\([^\"]\{1,100\}\)\"[ ]\{0,4\}\(http:\/\/[-a-z:_\%0-9\~\\\/\"\'\.\&\@?=#]\{3,\}\)/\1<\/a>/gi" | \
  #-- Hyperlink relative URLs with different display text like: "Some Display Text" link://relative/link.html
  sed "/
/,/<\/pre>/!s/\"\([^\"]\{1,100\}\)\"[ ]\{0,4\}link:\/\/\([-a-z:_\%0-9\~\\\/\"\'\.\&\@?=#]\{3,\}\)/\1<\/a>/gi" | \
  #-- Hyperlink urls beginning with 'link://'
  sed "/
/,/<\/pre>/!s/link:\/\/\([-a-z:_\%0-9\~\\\/\"\'\.\&\@?=#]\{3,\}\)/\1<\/a>/gi" | \
  #-- Hyperlink URLs beginning with http, except between 
 tags
  #-- The style immediately below is more 'academic'
  #sed "/
/,/<\/pre>/!s/\([^\">]\)\(http:\/\/[-a-z:_\%0-9\~\\\/\"\'\.\@\&?=#]\{3,\}\)/\1[*]<\/a>\2<\/tt>/gi" | \
  sed "/
/,/<\/pre>/!s/\([^\">'=]\)\(http:\/\/[-a-z:_\%0-9\~\\\/\"\'\.\@=#]\{3,\}\)/\1\2<\/a>/gi" | \
  #-- Hyperlink URLs beginning with http at the beginning of lines, except between 
 tags
  #sed "/
/,/<\/pre>/!s/^\(http:\/\/[-a-z:_\%0-9\~\\\/\"\'\.\@#]\{3,\}\)/[*]<\/a>\1<\/tt>/gi" | \
  sed "/
/,/<\/pre>/!s/^\(http:\/\/[-a-z:_\%0-9\~\\\/\"\'\.\@#]\{3,\}\)/\1<\/a>/gi" | \
  #-- Hyperlink email addresses with a 'mailto:' link
  sed "/
/,/<\/pre>/!s/\([^ ]\{2,\}@[^ \"']\{2,\}\)/\1<\/a>/g" | \
  #-- Hyperlink URLs beginnning with 'www.'
  #sed "/
/,/<\/pre>/!s/[^a-zA-Z\/\">]\(www\.[-a-z:_\%0-9\~\\\/\"\'\.\@#]\{2,\}\)/[*]<\/a>\1<\/tt>/gi" | \
  sed "/
/,/<\/pre>/!s/\([^a-zA-Z\/\">]\)\(www\.[-a-z:_\%0-9\~\\\/\"\'\.\@#]\{2,\}\)/\1\2<\/a>/gi" | \
  #-- Format comments added by web-users
   sed "s/^\([ ]*added[ ]\{0,4\}by:\)\([^,]\{1,\}\)\,[ ]*on[ ]*\(.*\)/\1<\/em> \2<\/tt> on \3<\/em><\/u>/gi" | \
  #-- Turn spaces into non-breaking-spaces unless they are between 'pre' tags
  sed "/
/,/<\/pre>/!{/\[IMAGE-INDEX-BEGIN\]/,/\[IMAGE-INDEX-END\]/! s/[ ]\{2\}/\ \ /g;}" | \
  #-- Make paragraphs where there are blank lines
  #sed "/
/,/<\/pre>/!s/^[ ]*$/

/g" | \ #-- Make 'fake' headings sed "/

/,/<\/pre>/!s/{{ //g" | \
  sed "/
/,/<\/pre>/!s/ }}/<\/em><\/strong>/g" | \
  sed "/
/,/<\/pre>/!s/[ ]*==[ ]*\(.*\)/\1<\/em><\/strong><\/font>/g" | \
  #-- Turn line breaks into 
tags unles they are between 'pre' tags. This isn't really #-- a good idea since you dont know the width of the target screen #sed "/
/,/<\/pre>/!{/\[IMAGE-INDEX-BEGIN\]/,/\[IMAGE-INDEX-END\]/! s/^/
/g;}" #sed "/
/,/<\/pre>/!{/

/!s/^/
/g;}" | \ sed "/
/,/<\/pre>/!{/\[BAND-DETAIL-LIST-BEGIN\]/,/\[BAND-DETAIL-LIST-END\]/!s/^/
/g;}" | \ sed -e "s/\[BAND-DETAIL-LIST-BEGIN\]/

' #-- Insert the table of contents if [ "$bTableOfContents" = "true" ] then #bNumberSections="false" if [ "$bCapitalCaseTOC" = "true" ] then sDoCapitalizeCommand="sed -f /var/www/utils/capital-case.sed" else sDoCapitalizeCommand="cat" fi #-- The section below creates the table of contents for the web-page. #-- This line is designed to only number lines which match a pattern #-- nl -bpPATTERN does this but it also double spaces the text file for some #-- reason. However this can be fixed #-- if [ "$bNumberSections" = "true" ] then echo "" expand $1 | \ sed "/^$sHeadingPattern$/!d" | \ sed -e "s/^[ ]*//g" -e "s/[ ]*$//g" | \ #-- capital case the table of contents if it has been requested, if not, do nothing eval "$sDoCapitalizeCommand" | \ nl -s" " | \ sed "s/^[ ]*\([1-9][0-9]*\) /\1/g" | \ sed "s/^\([0-9]\+\)\(.*\)$/
\1. \2<\/a>/g" | \ #-- line below because the RedHat server uses UTF-8 character set iconv --to-code=ISO-8859-1 --from-code=UTF-8 | \ #-- Try to 'entitize' the accented characters sed -f /var/www/utils/iso2html.sed else echo "" expand $1 | \ sed "/^$sHeadingPattern$/!d" | \ sed -e "s/^[ ]*//g" -e "s/[ ]*$//g" | \ #-- capital case the table of contents if it has been requested, if not, do nothing eval "$sDoCapitalizeCommand" | \ nl -s" " | \ sed "s/^[ ]*\([1-9][0-9]*\) /\1/g" | \ sed "s/^\([0-9]\+\)\(.*\)$/
\2<\/a>/g" | \ #-- line below because the RedHat server uses UTF-8 character set iconv --to-code=ISO-8859-1 --from-code=UTF-8 | \ #-- Try to 'entitize' the accented characters sed -f /var/www/utils/iso2html.sed fi fi #-- The version of SED on RedHat linux does not like the syntax "\{,4\}" but "\{0,4\}" #-- is ok. # # What follows below is quite tricky. The order of each of the sed transformation DOES matter # The tricky bits are allowing for accented european characters, and converting back and forth # between unicode and iso latin etc # # In the context of this 'wiki' script it is reasonably important to display the text 'prettily' # so I am going to change the presentation of links etc. This allows the user to have # more control over how the web page is displayed finally #-- This variable is not used anymore since all section heading code is governed by #-- the 'sHeadingPattern' variable sNumberingPattern='^[ A-ZÁÉÍÓÚÀÈÌÒÚÄËÏÖÜ·ÇÑ0-9.\/\\:]*[A-ZÁÉÍÓÚÀÈÌÒÚÄËÏÖÜÇÑ][A-ZÁÉÍÓÚÀÈÌÒÚÄËÏÖÜÇÑ][A-ZÁÉÍÓÚÀÈÌÒÚÄËÏÖÜÇÑ]+[ A-ZÁÉÍÓÚÀÈÌÒÚÄËÏÖÜ·ÇÑ0-9.\/\\:]*$' if [ "$bNumberSections" = "true" ] then if [ "$bTableOfContents" = "true" ] then sSectionHeadReplacement="

\1. \2<\/a> [toc]<\/a><\/h3>" else sSectionHeadReplacement="

\1. \2<\/a> <\/h3>" fi else if [ "$bTableOfContents" = "true" ] then sSectionHeadReplacement="

\2<\/a> [toc]<\/a><\/h3>" else sSectionHeadReplacement="

\2<\/a> <\/h3>" fi fi if [ "$bCapitalCaseHeadings" = "true" ] then sCapHeadingsCommand="sed -f /var/www/utils/capital-case-headings.sed" else sCapHeadingsCommand="cat" fi expand $1 | \ sed "s/^[ ]*$//g" | \ #-- Number all lines that are 'section headings', allow for european accented characters nl -s" " -bp"^$sHeadingPattern$" | \ #-- Get rid of the 'blank' lines which nl puts into the output sed "/^[ ]\+$/d" | \ #-- Reformat the numbered section headings sed "s/^[ ]*\([1-9][0-9]*\) /\1/g" | \ #-- Delete the page title because its already been output sed "/^[ ]*=[^=]/d" | \ #-- Delete background image and color indicators sed "/^[ ]*[bB]ackground-[iI]mage:/d" | \ sed "/^[ ]*[bB]ackground-[Cc]olor:/d" | \ sed "/^[ ]*css-style-sheet:/d" | \ sed "/^[ ]*section-heading-css-class:/d" | \ sed "/^[ ]*html-header-file:/d" | \ #-- Delete 'title image' lines because they have already served their purpose sed "/^[ ]*Title[ ]*Image:/d" | \ #-- Encode special characters '<>&' as HTML entities sed -e "s//\>/g" | \ #-- Do a trick to get the '-->>' and '--<<' blocks of text to work sed -e "s/^[ ]*\-\-\>\>/
/g" -e "s/^[ ]*\-\-\<\</<\/pre>/g" | \
  #-- Make each 'section heading' into an HTML anchor to work with the 'Table of Contents'
  sed "s/^\([0-9]\{0,5\}\)\($sHeadingPattern\)$/$sSectionHeadReplacement/g" | \
  #-- If the section headings need to be 'capital cased', do so
  eval "$sCapHeadingsCommand" | \
  #-- line below because the RedHat server uses UTF-8 character set
  iconv --to-code=ISO-8859-1 --from-code=UTF-8 | \
  #-- Try to 'entitize' the accented characters
  sed -f /var/www/utils/iso2html.sed | \
  #-- Allow for 'ascii decorations'
  sed "/^\.\.[ ]/{s/[ ]*$//g; s/^\.\.[ ]*\(.*\)/
\1<\/font><\/center>/g;}" | \ #-- Allow spanish section tags sed "s/IM[ÁA]GEN\-[ÍI]NDICE\-PRINCIPIO/IMAGE\-INDEX\-BEGIN/g" | \ sed "s/IM[ÁA]GEN\-[ÍI]NDICE\-FINAL/IMAGE\-INDEX\-END/g" | \ #-- Lets deal with image index things. We have to get a few lines into the pattern space #-- so that we can hyperlink the image and the first label line #-- First deal with lines that have a size specification for the image as in (50x50) sed "/\[IMAGE\-INDEX\-BEGIN\]/,/\[IMAGE\-INDEX\-END\]/ {/^[ ]*Image/{N;N;s/Image:[ ]*(\([0-9]\+\)x\([0-9]\+\))[ ]*\([^ \n]\{2,\}\)[ ]*\n[ ]*Link:[ ]*\([^ \n]*\)[ ]*\n\(.*\)/<\/a>\5<\/a>/g;};}" | \ #-- Now deal with sections with no image size specification sed "/\[IMAGE\-INDEX\-BEGIN\]/,/\[IMAGE\-INDEX\-END\]/ {/^[ ]*Image/{N;N;s/Image:[ ]*\([^ \n]\{2,\}\)[ ]*\n[ ]*Link:[ ]*\([^ \n]*\)[ ]*\n\(.*\)/<\/a>\3<\/a>/g;};}" | \ sed "/\[IMAGE\-INDEX\-BEGIN\]/,/\[IMAGE\-INDEX\-END\]/ {s/^[ ]*$/

/;}" | \ #-- Lets deal with BAND-INDEX TAGS #-- When the images have a size specification, eg: Image: (60x80) image-file.jpg sed "/\[BAND-DETAIL-LIST-BEGIN\]/,/\[BAND-DETAIL-LIST-END\]/ {/^[ ]*Image/{s/Image:[ ]*(\([0-9]\+\)x\([0-9]\+\))[ ]*\([^ ]\{2,\}\)/

<\/td><\tr>/g;};}" | \ #-- When the images dont have a size specification, eg: Image: image-file.jpg sed "/\[BAND-DETAIL-LIST-BEGIN\]/,/\[BAND-DETAIL-LIST-END\]/ {/^[ ]*Image/{s/Image:[ ]*\([^ ]\{2,\}\)/
<\/td><\/tr>/g;};}" | \ #-- We need to deal with the 'ejemplos de música' lines which are a little bit tricky. Firstly #-- there are actually links to the examples on the following lines after the 'ejemplos' line and presumably #-- can be as numerous as they like. From a SED perspective this involves doing a 'N' command and #-- then checking if the line that has just been 'N'd contains a line in the format #-- "Some some name" some/path/to/a/song/file.mp3 #-- If the latest line does contain roughly this format then we need to get another line until #-- we run out. This is going to require something like the 't' command which does a conditional #-- jump based on whether a substitution was made or not. #-- #-- In order to find the exact details of how to do all this we need to go to #-- http://sed.sourceforge.net as always to find the answers. #-- As usual the following gem was found in Eric Pements 'one-liners' #-- If a line begins with an '=' it is appended to the previous and transformed #-- #-- sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D' #-- This piece of SED magic actually works though I am not entirely sure why # sed -e :a -e "/\[BAND-DETAIL-LIST-BEGIN\]/,/\[BAND-DETAIL-LIST-END\]/ {/^[ ]*Ejemplos de música \[+s\]/{N; s|\n[ ]*\"\([^\"]*\)\"[ ]*\([^ ]\+\)|\1 -|g;ta;};}" -e 'P;D' | \ sed -e :a -e "/\[BAND-DETAIL-LIST-BEGIN\]/,/\[BAND-DETAIL-LIST-END\]/ {/^[ ]*Ejemplos de música/{N; s|\n[ ]*\"\([^\"]*\)\"[ ]*\([^ ]\+\)|\1 -|g;ta;};}" -e 'P;D' | \ #-- Turn name: value into table cells sed "/\[BAND-DETAIL-LIST-BEGIN\]/,/\[BAND-DETAIL-LIST-END\]/ {/\1<\/strong><\/td>\2<\/td><\/tr>/g;}" | \ #-- Divide different groups using the blank lines in the source text sed "/\[BAND-DETAIL-LIST-BEGIN\]/,/\[BAND-DETAIL-LIST-END\]/ {s|^[ ]*$|
-
/g" -e "s/\[BAND-DETAIL-LIST-END\]/<\/table>/g" | \ #-- Allow some really silly HTML tags sed "/
/,/<\/pre>/!s/\[\[\([\/]\?\)strike\]\]/<\1strike>/gi" | \
  #-- Get rid of Image Index tags
  sed -e "s/\[IMAGE-INDEX-BEGIN\]//g" -e "s/\[IMAGE-INDEX-END\]//g"
#echo "
" echo "
" #-- Define the cgi program which will handle the updating of the document #-- according to the contents of the HTML textarea component if [ "$4" != "" ] then sProcessorUrl="$4" else #-- It would be possible to replace the Domain Name below with #-- an IP address, which would mean that the script would still #-- work even if the DNS configuration failed. I am not sure if this #-- is really a good idea or not. #sProcessorUrl="http://www.ella-associates.org/cgi-bin/edit-wiki.cgi" sProcessorUrl="http://63.105.73.195/cgi-bin/edit-wiki.cgi" fi #-- There is a problem in that I need to find the full path #-- name of the $1 variable, but I dont know how to do this. This #-- is necessary because the target processor is not in the same #-- directory as the source document (the text file) #-- For the time being I have used the remedy of seeing if the path #-- is relative or absolute. The slightly dodgy path generating code below #-- appears to be working. There is almost certainly a much easier way #-- of doing it sRelativePath=$(dirname $1) sFirstCharacter=$(echo $sRelativePath | sed "s/^\(.\).*$/\1/g") if [ "$sRelativePath" = "." ] then sFullPathName="$(pwd)/$1" elif [ "$sFirstCharacter" = "." ] then sFullPathName="$(pwd)/$1" elif [ "$sFirstCharacter" = "/" ] then sFullPathName="$1" else sFullPathName="$(pwd)/$1" fi # echo $sFullPathName echo "
" if [ "$bTranslationLinks" = "true" ] then echo "
" if [ "$sOutputLanguage" = "spanish" ] then echo "Vea este pagina en (aproximado):" echo "English" elif [ "$sOutputLanguage" = "french" ] then echo "Voir la cette page dedans (approximatif):" echo "English" elif [ "$sOutputLanguage" = "italian" ] then echo "Osservi questa pagina come (approssimativo):" echo "English" else echo "See this page in (approximate):" echo "Español|" echo "Français|" echo "Italiano|" echo "Deutsch|" echo "Português" fi echo "
" fi echo "" echo "" #rm -f $1.temp #rm -f plain-text-toc.temp