# Description: # A script to reformat a plain text file document which contains # no particular format. The script also generates an HTML form which # allows the reader to add a comment or other text to the document. # The script recognises some special structures # within the plain text document. For example: # # Where the first non-whitespace character on a line is '=' then # all the following text on the line should be formatted as a # 'heading'. # If the first non-whitespace character is '*' then # the following text should be hyperlinked. # Also, url style strings should be recognised and given # a hyperlink token in from of them, such as '[*]'. I prefer this to underlining # the entire url, because I find that the underlining tends to interfer with # the readability of the text. Some people would say, "use style-sheets" but to # them I would reply that the 'heraldic' visual pattern of the underlined hyperlink # is imprinted in many internet users brains, and to change that 'iconography' can # lead to unnecessary confusion. # # Lines which consist of only capital letters and numbers (with at least a few # capital letters), are interpreted as headings, and constitute the automatically # generated table of contents. # # This script, like the linkdoc2html.sh script also accepts the format # * Document Title|Html-Url-Or-Path|Text-Url-Or-Path| # The script will render this into an emphasised 'document title' with # hyper-links to the different formats for the document. # # The script will also format blocks of text between the strings -->> and --<< # (where they are the first string on the line) as an HTML
block # # The script also formats lines starting in 'added by:' to make those # lines stand out from the rest of the text. This is a 'courtesy' to the # '/cgi-bin/add-comment' script which added this line to a text file # when it inserts a user provided comment in the text file. # # Examples: # ./plaintext2html-forum.sh mjb-work.txt notran > mjb-work.html # This command line, executed in some kind of a bash shell, will # transform a plain text file which isn't is any particular format, # into an HTML file (that is it will create a new HTML file and # leave the original text file unchanged) and will not display the # automatic translation links to Google. Also an HTML table of # contents (with one entry for each heading) will be inserted in the # HTML document. # # ./plaintext2html-forum.sh mjb-work.txt notran notoc > mjb-work.html # The text file will be transformed into HTML but no table of contents # will be inserted nor any translation links. # # ./plaintext2html-forum.sh mjb-work.txt tran notoc > mjb-work.html # If translation links are desired but no table of contents, use a # command line similar to above. The string 'blah' could be anything # as long as its not 'notran'. This slighty dodgy 'feature' is owing to the # fact that I am not using any 'getopt' style option parsing. # # ./plaintext2html-forum.sh stuff.txt notran toc "http://63.105.73.195/cgi-bin/add-comment" # This transforms the file stuff.txt omitting translation links, inserting a # hyperlinked table of contents, and setting the target for the comment form # to the URL specified in the last parameter. # # # Parameters: # textFileName [required] # The name of the text file which is to be transformed from text into html # notran [optional] # If the second parameter is the string 'notran' then the javascript links # to the google automatic language translation engine will NOT be inserted # into the HTML page. This is useful, for example, when the HTML page is # going to be located within a 'password-protected' directory, because # the Google translation engine will not be able to access the page, and # therefor the translation links will not work. # notoc [optional] # If the third parameter is the string "notoc", then no HTML table of # contents will be generated. # forumProcessorUrl [optional] # This parameter indicates where the processing script is located. # If it is omitted, currently the url will default to # http://www.ella-associates.org/cgi-bin/add-comment # output-language # Still to be implemented # This is the language in which the message on the generated HTML page # will appear. For example messages next to the comment boxes and the # translation links. # path-to-style-sheet # Still to implement # This is the full path (relative to the Web Server Document Root) # to the style sheet which is to be used by the generated HTML page # # # Notes: # Because of the table used to create a left margin for the table of contents # and for the body of the text, this HTML is NOT friendly to 'lynx' which # does not support HTML tables. A CSS style-sheet command should be used # instead of the tables. # # This script should also transform quotes into " & into & etc # The script appears to be working reasonably well in conjunction with # the 'add-comment' cgi script. # # It would be nice to make some kind of 'sub' table of contents for # any comments which are present in a document. # # The translation links wont work from within the 'output' generated # by the 'add-comment' script # # This script has had problems with 'gawk' and different versions of awk. For this reason # the 'gawk' or 'awk' code has been removed and replaced with code using the 'nl' # program. This program, when used with the -bp option double spaces the object file # with lines containing only spaces. Therefore some extra 'sed' lines are necessary # to remove these blank lines # # See Also: # diary2html.sh, # Turns a 'diary' style text file into HTML # linkdoc2html.sh, # Turns a text file which has a list of URL links and descriptions into HTML # linkdoc2html-index.sh # As above but also adds an HTML 'table of contents' for possible 'section headings' # linkdoc2html-forum.sh # Turns a text file with a URL list into an HTML file which has the capability # to be contributed to by a web-visitor (using cgi-scripts) # plaintext2pdf.sh, # Turns a text file into a pdf file with an optional table of contents # plaintext2html-simple.sh # As below, but doesn't use certain 'bash' tricks # plaintext2html.sh # Turns a text file with possible section headings and urls into an HTML file # glossary2xml.sh # Turn a text file which is a sort of 'glossary' into a dodgy xml file # alphabetize-glossary.sh # Re-arranges a text file which contains a series of definitions of 'items' or 'terms' # so that the items are ordered alphabetically. # add-comment # a cgi-script which can be used in conjuction with some of the # scripts above to add content specified by web-visitors to a web page # script-summary.txt # contains more short descriptions of scripts and what they do. # Author: # m.j.bishop # # Bugs and Ideas # See the file linkdoc2html-forum.sh for the beginnings of an attempt to internationalize # the output of this script, in the sense that the messages which appear on the # HTML page should be capable of being in various languages, depending on what language # the source file is in. # # Add an output-language parameter to this script # Also, it would be good to add a 'style-sheet' parameter which would allow # this script to change the name or location of the style-sheet which is used # by the generated HTML file. # # At the moment the script uses special 'stylesheet classes' for particular # elements, such as theelement, although this is probably not really # necessary; the style should be attached to theelement itself rather # than to a CSS class of the pre element as in# The second method is probably only necessary when there is more than one # type of style which you wish to apply to a particular HTML element in the # same document. # # In Netscape Navigator 4.61, if the style-sheet does not exist at all # then the browser is unable to display anything at all. # # The script could also check if there are translations of the current # HTML or text file, using the standard naming convention of name.file-type.language-code # An example of this naming convention is stuff.html.es which should # be an HTML file which contains Spanish language content. This present # script could check for files which have the same name as the source # file but which have a different language code extension, and could # therefore automatically add a link to the translated file (in addition, # perhaps to the Google translation links). The script would only # check in the current directory for these 'translated' files. # # Dependencies: # iso2html.sed # various Unix tools, a Bash shell if [ "$1" = "" ] then echo "usage: $0 textFileName [notran] [notoc] [forum-processor-url]" cat $0 | sed -n "/^[ ]*#/p" exit 1; fi #-- The section below creates the table of contents for the diary. #-- This line is designed to only number lines which match a pattern #-- In theory 'nl -bpPATTERN' should also do this, but it insisted on #-- 'double-spacing' the output #-- Also the expressions below try and get rid of things like "can't" and "won't" #-- because I want to apply some formatting to the content of quotes, and these #-- things will get in my way. #-- This is the pattern which determines what sort of lines will #-- be interpreted as 'section headings'. I cannot use the for the 'awk' line #-- because awk does not seem to accept the notation \{n,\} sHeadingPattern='[ A-Z0-9.\/\\:]*[A-Z][A-Z][A-Z][A-Z]*[ A-Z0-9.\/\\:]*' #-- I have disable the code below because in a cgi environment, this script doesn't #-- seem to have permission to create a file. It depends on who originally owns #-- the $1.temp file. #-- Gnu awk (gawk) seems to assume that pattern matching should be case insensitive #-- by default. The Begin clause below attempts to correct that. Although the script #-- says 'mawk' this is just a symbolic link to 'gawk' #-->--> #-- I am having all sorts of problems with GNU awk. For some reason it return lower case #-- lines, even when the regular expression dictates upper case lines. #-- One solution to the problem is to use 'nl' instead. For example the line below #-- almost does the trick # expand $1 | \ # sed "s/^[ ]*$//g" | \ # nl -s" " -bp'^[ A-Z0-9.\/\\:]*[A-Z][A-Z][A-Z]+[ A-Z0-9.\/\\:]*$' | \ # sed "/^[ ]\+$/d" | \ # sed "s/^[ ]*\([1-9][0-9]*\) /\1/g" | \ # The trouble-some gawk line # gawk 'BEGIN{IGNORECASE=0}/^[ A-Z0-9.\/\\]*[A-Z]+[ A-Z0-9.\/\\]*$/{ii++; print ii $0}!/^[ A-Z0-9.\/\\]*[A-Z]+[ A-Z0-9.\/\\]*$/' | \ echo "" echo "" echo " " echo " " echo " " echo " " echo "" echo "" echo "" echo "" echo "" echo "" echo "" echo "" echo "" echo "" if [ "$2" != "notran" ] then echo "" echo "See this page in (approximate):" echo "Español|" echo "Français|" echo "Italiano|" echo "Deutsch|" echo "Português" echo " " fi #-- Put the page heading before the table of contents #-- cat $1 | \ sed "/^[ ]*=[ ]*[^=].*/!d" | \ sed -e "s/\</g" -e "s/>/\>/g" | \ sed "s/^[ ]*=[ ]*\([^=].*\)/\1<\/h2><\/center>/gi" #- This line below is not 'lynx friendly' as style sheet #- should be used instead. echo "
" #-- Define the cgi program which will handle the adding of #-- comments to a particular text file. if [ "$4" != "" ] then sProcessorUrl=$4 else #-- It would be possible to replace the Domain Name below with #-- an IP address, which would mean that the script would still #-- work even if the DNS configuration failed. I am not sure if this #-- is really a good idea or not. #sProcessorUrl="http://www.ella-associates.org/cgi-bin/add-comment" sProcessorUrl="http://63.105.73.195/cgi-bin/add-comment" fi #-- There is a problem in that I need to find the full path #-- name of the $1 variable, but I dont know how to do this. This #-- is necessary because the target processor is not in the same #-- directory as the source document (the text file) #-- For the time being I have used the remedy of seeing if the path #-- is relative or absolute. The slightly dodgy path generating code below #-- appears to be working. There is almost certainly a much easier way #-- of doing it sRelativePath=$(dirname $1) sFirstCharacter=$(echo $sRelativePath | sed "s/^\(.\).*$/\1/g") if [ "$sRelativePath" = "." ] then sFullPathName="$(pwd)/$1" elif [ "$sFirstCharacter" = "." ] then sFullPathName="$(pwd)/$1" elif [ "$sFirstCharacter" = "/" ] then sFullPathName="$1" else sFullPathName="$(pwd)/$1" fi # echo $sFullPathName echo " " if [ "$2" != "notran" ] then echo "" echo "See this page in (approximate):" echo "Español|" echo "Français|" echo "Italiano|" echo "Deutsch|" echo "Português" echo " " fi echo "" echo "" #rm -f $1.temp #rm -f plain-text-toc.temp