# Description: # A script to reformat a plain text file document which contains no # particular format. The text file stores the content of a web page which is # currently at http://poseidonia.ella-associates.org/ . This script is a # direct derivation of 'plaintext2html-forum.sh' This current script is # designed specifically for a particular webpage or text document. # # The script will work in conjuction with the cgi script '?' to allow the # visitor to the web page to edit the page. # # The script also generates an HTML form which allows the reader to edit the text # of the document The script recognises some 'cues' within the plain text document. # I refer to these cues or 'structures' as 'Invisible Markup Language' (IML) or Mas # o Menos Markup Language (MMML). The basic ideas is to have as little actual # 'markup' in the text document as possible, and the markup which is present should # 'look good' in the plain text file. So, # instead of using, say, # %^* Section Heading # which is valid markup but looks ugly in the text file, we use all capitals # which looks better in the text file # # A line beginning with = is a page title. # A line beginning with '*' will be hyperlinked. # URLs get automatically hyperlinked in some non-determinate way. All # Capitals lines are section headings. These section headings may then be # used as a table of contents and hyperlinked in various ways # # This script, like the linkdoc2html.sh script also accepts the format # * Document Title|Html-Url-Or-Path|Text-Url-Or-Path| # The script will render this into an emphasised 'document title' with # hyper-links to the different formats for the document. # # Blocks of text surrounded by '-->>' and '--<<' are not 'formatted' in any way # # The script also formats lines starting in 'added by:' to make those lines # stand out from the rest of the text. This is a 'courtesy' to the # '/cgi-bin/add-comment' script which added this line to a text file when it # inserts a user provided comment in the text file. # # Examples: # If the scripts are on the system 'path' the the leading './' characters # below are not necessary # # ./poseidontext2html-forum.sh concert-details.txt notran > concert-details.html # This command line, executed in some kind of a bash shell, will transform # a plain text file which isn't is any particular format, into an HTML file # (that is it will create a new HTML file and leave the original text file # unchanged) and will not display the automatic translation links to # Google. Also an HTML table of contents (with one entry for each heading, # if there are headings) will be inserted in the HTML document. # # ./poseidontext2html-forum.sh mjb-work.txt notran notoc > mjb-work.html # The text file will be transformed into HTML but no table of contents will # be inserted nor any translation links. # # ./poseidontext2html-forum.sh mjb-work.txt tran notoc > mjb-work.html # If translation links are desired but no table of contents, use a command # line similar to above. The string 'blah' could be anything as long as its # not 'notran'. This slighty dodgy 'feature' is owing to the fact that I am # not using any 'getopt' style option parsing. # # ./poseidontext2html-forum.sh stuff.txt notran toc "http://63.105.73.195/cgi-bin/some-weird-script" # This transforms the file stuff.txt omitting translation links, inserting # a hyperlinked table of contents, and setting the target for the 'edit # document' form to the URL specified in the last parameter. # # # Parameters: # textFileName [required] # The name of the text file which is to be transformed from text into html # notran [optional] # If the second parameter is the string 'notran' then the javascript links # to the google automatic language translation engine will NOT be inserted # into the HTML page. This is useful, for example, when the HTML page is # going to be located within a 'password-protected' directory, because the # Google translation engine will not be able to access the page, and # therefor the translation links will not work. # notoc [optional] # If the third parameter is the string "notoc", then no HTML table of # contents will be generated. # forumProcessorUrl [optional] # This parameter indicates where the processing script is located. If it # is omitted, currently the url will default to # http://www.ella-associates.org/cgi-bin/add-comment # output-language [optional] {Not implemented} # This is the language in which the message on the generated HTML page # will appear. For example messages next to the comment boxes and the # translation links. # noforum {Not implemented} # If this parameter is present no HTML form will be produced in the ouput # and therefor the web-visitor will not be able to add comments to the # pages. # path-to-style-sheet [optional] {Not implemented} # Still to implement # This is the full path (relative to the Web Server Document Root) # to the style sheet which is to be used by the generated HTML page # # # Notes: # The only difference between this script and the 'poseidontext2html-wiki.sh' # is that that script does not have the 'editing form or box' on the same # HTML page as the rendered text # # Because of the table used to create a left margin for the table of contents # and for the body of the text, this HTML is NOT friendly to 'lynx' which # does not support HTML tables. A CSS style-sheet command should be used # instead of the tables. # # This script should also transform quotes into " & into & etc The # script appears to be working reasonably well in conjunction with the # 'edit-poseidon-forum' cgi script. # # It would be nice to make some kind of 'sub' table of contents for any # comments which are present in a document. # # The translation links wont work from within the 'output' generated # by the 'add-comment' script # # This script has had problems with 'gawk' and different versions of awk. For # this reason the 'gawk' or 'awk' code has been removed and replaced with # code using the 'nl' program. This program, when used with the -bp option # double spaces the object file with lines containing only spaces. Therefore # some extra 'sed' lines are necessary to remove these blank lines # # See Also: # edit-poseidon-forum # This is the cgi script which can work in conjunction with the current script # poseidontext2html-wiki.sh # A very similar script # diary2html.sh, # Turns a 'diary' style text file into HTML # linkdoc2html.sh, # Turns a text file which has a list of URL links and descriptions into HTML # linkdoc2html-index.sh # As above but also adds an HTML 'table of contents' for possible 'section headings' # linkdoc2html-forum.sh # Turns a text file with a URL list into an HTML file which has the capability # to be contributed to by a web-visitor (using cgi-scripts) # plaintext2pdf.sh, # Turns a text file into a pdf file with an optional table of contents # plaintext2html-simple.sh # As below, but doesn't use certain 'bash' tricks # plaintext2html.sh # Turns a text file with possible section headings and urls into an HTML file # glossary2xml.sh # Turn a text file which is a sort of 'glossary' into a dodgy xml file # alphabetize-glossary.sh # Re-arranges a text file which contains a series of definitions of 'items' or 'terms' # so that the items are ordered alphabetically. # add-comment # a cgi-script which can be used in conjuction with some of the # scripts above to add content specified by web-visitors to a web page # script-summary.txt # contains more short descriptions of scripts and what they do. # Author: # m.j.bishop # # Bugs and Ideas # See the file linkdoc2html-forum.sh for the beginnings of an attempt to internationalize # the output of this script, in the sense that the messages which appear on the # HTML page should be capable of being in various languages, depending on what language # the source file is in. # # Add an output-language parameter to this script # Also, it would be good to add a 'style-sheet' parameter which would allow # this script to change the name or location of the style-sheet which is used # by the generated HTML file. # # In Netscape Navigator 4.61, if the style-sheet does not exist at all # then the browser is unable to display anything at all. # # The script could also check if there are translations of the current # HTML or text file, using the standard naming convention of name.file-type.language-code # An example of this naming convention is stuff.html.es which should # be an HTML file which contains Spanish language content. This present # script could check for files which have the same name as the source # file but which have a different language code extension, and could # therefore automatically add a link to the translated file (in addition, # perhaps to the Google translation links). The script would only # check in the current directory for these 'translated' files. # # Dependencies: # iso2html.sed # The images used on the 'poseidon site' # various Unix tools, a Bash shell if [ "$1" = "" ] then echo "usage: $0 textFileName [notran] [notoc] [forum-processor-url] [noforum]" echo "PRESS q TO EXIT THIS HELP. PRESS [space-bar] TO SCROLL DOWN, b to SCROLL UP" cat $0 | sed -n "/^[ ]*#/p" | less exit 1; fi #-- The section below creates the table of contents for the diary. #-- This line is designed to only number lines which match a pattern #-- In theory 'nl -bpPATTERN' should also do this, but it insisted on #-- 'double-spacing' the output #-- Also the expressions below try and get rid of things like "can't" and "won't" #-- because I want to apply some formatting to the content of quotes, and these #-- things will get in my way. #-- This is the pattern which determines what sort of lines will #-- be interpreted as 'section headings'. I cannot use the for the 'awk' line #-- because awk does not seem to accept the notation \{n,\} sHeadingPattern='[ A-Z0-9.\/\\:]*[A-ZÁÉÍÓÚÀÈÌÒÚÄËÏÖÜ·ÇÑ][A-ZÁÉÍÓÚÀÈÌÒÚÄËÏÖÜ·ÇÑ][A-ZÁÉÍÓÚÀÈÌÒÚÄËÏÖÜ·ÇÑ][A-ZÁÉÍÓÚÀÈÌÒÚÄËÏÖÜ·ÇÑ]*[ A-ZÁÉÍÓÚÀÈÌÒÚÄËÏÖÜ·ÇÑ0-9.\/\\:]*' sOutputLanguage="english" sRawPageTitle="" sPageTitle="" bTableOfContents="true" bTranslationLinks="" sRawPageTitle=$(expand $1 | sed -n "/^[ ]*=[^=]/{s/^[ ]*=[ ]*//g;s/[ ]*$//g;p;q;}") sPageTitle=$(\ echo $sRawPageTitle | \ sed -e "s/{.\?}//g" -e "s/\[.\?.\?\]//g" -e "s/(+)//g" -e "s/(l)//g" | \ sed -e "s/\</g" -e "s/>/\>/g" | \ iconv --to-code=ISO-8859-1 --from-code=UTF-8 | \ sed -f /var/www/utils/iso2html.sed) #-- The code below allows a wiki user to specify whether a page should have #-- the section headings numbered by inserting '{}' after the page title sSectionNumberFlag=$(echo $sRawPageTitle | sed "s/{.\?}//g") if [ "$sRawPageTitle" = "$sSectionNumberFlag" ] then bNumberSections="false" else bNumberSections="true" fi sCapitalCaseSectionFlag=$(echo $sRawPageTitle | sed "s/(l)//g") if [ "$sRawPageTitle" = "$sCapitalCaseSectionFlag" ] then bCapitalCaseHeadings="false" else bCapitalCaseHeadings="true" fi #-- Whether a Section Heading table-of-contents is generated depends on #-- either a script parameter, or the '[]' in the page title if [ "$3" = "notoc" ] then bTableOfContents="false" else bTableOfContents="true" fi sTableOfContentsFlag=$(echo $sRawPageTitle | sed "s/\[.\?\]//g") if [ "$sRawPageTitle" = "$sTableOfContentsFlag" ] then bTableOfContents="false" else bTableOfContents="true" fi if [ "$2" = "notran" ] then bTranslationLinks="false" fi sCapitalCaseTOCFlag=$(echo $sRawPageTitle | sed "s/\[[~]\]//g") if [ "$sRawPageTitle" = "$sCapitalCaseTOCFlag" ] then bCapitalCaseTOC="false" else bCapitalCaseTOC="true" fi sTranslationLinksFlag=$(echo $sRawPageTitle | sed "s/(+)//g") if [ "$sRawPageTitle" != "$sTranslationLinksFlag" ] then bTranslationLinks="true" else bTranslationLinks="false" fi if [ "a" = "b" ] then echo "sRawPageTitle=$sRawPageTitle" echo "bNumberSections=$bNumberSections" echo "bCapitalCaseHeadings=$bCapitalCaseHeadings" echo "bTableOfContents=$bTableOfContents" echo "bCapitalCaseTOC=$bCapitalCaseTOC" echo "bTranslationLinks=$bTranslationLinks" fi echo "" echo "" echo " " echo " " echo " " echo " " echo "" echo "" echo "" echo "" echo "" #echo "" echo "