# Description:
#
# A script to reformat a plain text file document which contains no
# particular format. The text file stores the content of a web page which is
# currently at http://poseidonia.ella-associates.org/ . This script is a
# direct derivation of 'plaintext2html-forum.sh' This current script is
# designed specifically for a particular webpage or text document.
#
# The script will work in conjuction with the cgi script '?' to allow the
# visitor to the web page to edit the page.
#
# Special Text Structures:
#
# The script also generates an HTML form which allows the reader to edit the text
# of the document The script recognises some 'cues' within the plain text document.
# I refer to these cues or 'structures' as 'Invisible Markup Language' (IML) or Mas
# o Menos Markup Language (MMML). The basic ideas is to have as little actual
# 'markup' in the text document as possible, and the markup which is present should
# 'look good' in the plain text file. So,
# instead of using, say,
# %^* Section Heading
# which is valid markup but looks ugly in the text file, we use all capitals
# which looks better in the text file
#
# A line beginning with = is a page title.
# A line beginning with '*' will be hyperlinked.
# URLs get automatically hyperlinked in some non-determinate way. All
# Capitals lines are section headings. These section headings may then be
# used as a table of contents and hyperlinked in various ways
#
# This script, like the linkdoc2html.sh script also accepts the format
# * Document Title|Html-Url-Or-Path|Text-Url-Or-Path|
# The script will render this into an emphasised 'document title' with
# hyper-links to the different formats for the document.
#
# Blocks of text surrounded by '-->>' and '--<<' are not 'formatted' in any way
#
# The script also formats lines starting in 'added by:' to make those lines
# stand out from the rest of the text. This is a 'courtesy' to the
# '/cgi-bin/add-comment' script which added this line to a text file when it
# inserts a user provided comment in the text file.
#
# Certain codes can also be placed after the document title in the
# source text file to influence the appearance of the HTML.
# {x} means number all section headings (including in the Table of Contents)
# {~} means make the section heading into 'capital case'
# These can be combined as in {~x}
# [] means display the table of contents
# [~] means display the TOC with items in 'capital case'
# (+) means display automatic google translation links in the HTML
#
# At the moment this codes are usual over-riding equivalent parameters, but this
# may change
# Examples:
# If the scripts are on the system 'path' the the leading './' characters
# below are not necessary
#
# ./text2html-collab.sh concert-details.txt notran > concert-details.html
# This command line, executed in some kind of a bash shell, will transform
# a plain text file which isn't is any particular format, into an HTML file
# (that is it will create a new HTML file and leave the original text file
# unchanged) and will not display the automatic translation links to
# Google. Also an HTML table of contents (with one entry for each heading,
# if there are headings) will be inserted in the HTML document.
#
# ./text2html-collab.sh mjb-work.txt notran notoc > mjb-work.html
# The text file will be transformed into HTML but no table of contents will
# be inserted nor any translation links.
#
# ./text2html-collab.sh mjb-work.txt tran notoc > mjb-work.html
# If translation links are desired but no table of contents, use a command
# line similar to above. The string 'blah' could be anything as long as its
# not 'notran'. This slighty dodgy 'feature' is owing to the fact that I am
# not using any 'getopt' style option parsing.
#
# ./text2html-collab.sh stuff.txt notran toc "http://63.105.73.195/cgi-bin/some-weird-script"
# This transforms the file stuff.txt omitting translation links, inserting
# a hyperlinked table of contents, and setting the target for the 'edit
# document' form to the URL specified in the last parameter.
#
#
# Parameters:
# textFileName [required]
# The name of the text file which is to be transformed from text into html
# notran [optional]
# If the second parameter is the string 'notran' then the javascript links
# to the google automatic language translation engine will NOT be inserted
# into the HTML page. This is useful, for example, when the HTML page is
# going to be located within a 'password-protected' directory, because the
# Google translation engine will not be able to access the page, and
# therefor the translation links will not work.
# notoc [optional]
# If the third parameter is the string "notoc", then no HTML table of
# contents will be generated.
# forumProcessorUrl [optional]
# This parameter indicates where the processing script is located. If it
# is omitted, currently the url will default to
# http://www.ella-associates.org/cgi-bin/add-comment
# output-language [optional] {Not implemented}
# This is the language in which the message on the generated HTML page
# will appear. For example messages next to the comment boxes and the
# translation links.
# noforum {Not implemented}
# If this parameter is present no HTML form will be produced in the ouput
# and therefor the web-visitor will not be able to add comments to the
# pages.
# path-to-style-sheet [optional] {Not implemented}
# Still to implement
# This is the full path (relative to the Web Server Document Root)
# to the style sheet which is to be used by the generated HTML page
#
#
# Notes:
# The only difference between this script and the 'poseidontext2html-wiki.sh'
# is that that script does not have the 'editing form or box' on the same
# HTML page as the rendered text
#
# This script should also transform quotes into " & into & etc The
# script appears to be working reasonably well in conjunction with the
# 'edit-collab' cgi script.
#
# The translation links wont work from within the 'output' generated
# by the 'add-comment' script
#
# This script has had problems with 'gawk' and different versions of awk. For
# this reason the 'gawk' or 'awk' code has been removed and replaced with
# code using the 'nl' program. This program, when used with the -bp option
# double spaces the object file with lines containing only spaces. Therefore
# some extra 'sed' lines are necessary to remove these blank lines
#
# See Also:
# edit-collab
# This is the cgi script which can work in conjunction with the current script
# diary2html.sh,
# Turns a 'diary' style text file into HTML
# linkdoc2html.sh,
# Turns a text file which has a list of URL links and descriptions into HTML
# linkdoc2html-index.sh
# As above but also adds an HTML 'table of contents' for possible 'section headings'
# linkdoc2html-forum.sh
# Turns a text file with a URL list into an HTML file which has the capability
# to be contributed to by a web-visitor (using cgi-scripts)
# plaintext2pdf.sh,
# Turns a text file into a pdf file with an optional table of contents
# plaintext2html-forum.sh
# Renders a text file as HTML and displays a form which allows the web-visitor
# to add 'comments' to the page (text file)
# plaintext2html.sh
# Turns a text file with possible section headings and urls into an HTML file
# glossary2xml.sh
# Turn a text file which is a sort of 'glossary' into a dodgy xml file
# alphabetize-glossary.sh
# Re-arranges a text file which contains a series of definitions of 'items' or 'terms'
# so that the items are ordered alphabetically.
# add-comment
# a cgi-script which can be used in conjuction with some of the
# scripts above to add content specified by web-visitors to a web page
# script-summary.txt
# contains more short descriptions of scripts and what they do.
# Author:
# m.j.bishop
#
# Bugs and Ideas
# In Internet explorer the 'capital case' function does not work properly. The
# entire text is lower-cased and the first letter is NOT upper-cased. This is
# probably another 'text-area' line ending problem
#
# When you hit 'refresh' in Netscape and IE the contents of the HTML textarea
# are not 'refreshed'. That is, the contents do not reflect the true contents
# as dictated by the HTML source code. Rather the editings of the user are
# preserved. I presume this is customizable.
#
# In IE when you hit refresh the page does not refresh at all, which means that
# the user is unable to see the changes which she has made.
#
# This script needs more internationalization
#
# The script could also check if there are translations of the current
# HTML or text file, using the standard naming convention of name.file-type.language-code
# An example of this naming convention is stuff.html.es which should
# be an HTML file which contains Spanish language content. This present
# script could check for files which have the same name as the source
# file but which have a different language code extension, and could
# therefore automatically add a link to the translated file (in addition,
# perhaps to the Google translation links). The script would only
# check in the current directory for these 'translated' files.
#
# When there are no numbers for section headings capital case in not working 6june
#
# Dependencies:
# iso2html.sed
# A script which turns accented characters into HTML entities. This is
# a bit tricky since the American Server uses UTF-8 rather than ISO-8859
# capital-case.sed
# A script which 'capital cases' words. That is the first letter is upper
# case and all the rest lower case
# capital-case-headings.sed
# This does the same as above but on for lines which are 'section headings'
# which means all capital letters
# edit-collab
# This is not a total dependency, but the HTML form generated by the
# current script will not do anything useful without this script
# procgi
# A bash shell cgi HTML form value extractor used by 'edit-collab'
# The 'uncle-sam.gif' image
# which is used next to the text box message
# The images used on the 'poseidon site'
# various Unix tools, a Bash shell
#
# History:
#
# june 3, 2003
# Adapted this script from 'poseidon-text2html-forum.sh'
# june 6, 2003
# Improved the handling of accented characters. All section headings are
# now governed by the variable 'sHeadingPattern'
if [ "$1" = "" ]
then
(echo "usage: $0 textFileName [notran] [notoc] [forum-processor-url] [noforum]"; \
echo "PRESS q TO EXIT THIS HELP. PRESS [space-bar] TO SCROLL DOWN, b to SCROLL UP"; \
cat $0) | sed -n "/^[ ]*#/p" | less
exit 1;
fi
#-- This is the pattern which determines what sort of lines will
#-- be interpreted as 'section headings'. I cannot use the for the 'awk' line
#-- because awk does not seem to accept the notation \{n,\}
sAccentString='A-ZÁÉÍÓÚÀÈÌÒÙÄËÏÖÜÂÊÎÔÛ·ÇÑ'
sHeadingPattern="[$sAccentString 0-9.\/\\:_\&@]*[$sAccentString][$sAccentString][$sAccentString][$sAccentString]*[$sAccentString 0-9.\/\\:_\&@]*"
sOutputLanguage="english"
sRawPageTitle=""
sPageTitle=""
bTableOfContents="true"
bTranslationLinks=""
sTitleDecoration=$(expand $1 | sed -n "/^[ ]*=[^=]/{N;s/^.*\n//g;s/\.\.[ ]*//g;s/[ ]*$//g;p;q;}")
sRawPageTitle=$(expand $1 | sed -n "/^[ ]*=[^=]/{s/^[ ]*=[ ]*//g;s/[ ]*$//g;p;q;}")
sPageTitle=$(\
echo $sRawPageTitle | \
sed -e "s/{.\{0,4\}}//g" -e "s/\[.\{0,4\}\]//g" -e "s/(.\{0,4\})//g" | \
sed -e "s/\</g" -e "s/>/\>/g" | \
iconv --to-code=ISO-8859-1 --from-code=UTF-8 | \
sed -f /var/www/utils/iso2html.sed)
#-- If a 'title image' is specified extract it and create the necessary HTML
#-- deal with title images which have the image size specified like this (50x60)
sTitleImageHtml=$( \
expand $1 | \
sed -n "/^[ ]*Title[ ]*Image:[ ]*([0-9]\+x[0-9]\+)/ {s/^[ ]*//g; s/[ ]*$//g; s/^Title[ ]*Image:[ ]*(\([0-9]\+\)x\([0-9]\+\))[ ]*\(.*\)/<\/a>/g;p;q;}")
if [ "$sTitleImageHtml" = "" ]
then
#-- deal with the cases where there is no image size spedified
sTitleImageHtml=$( \
expand $1 | \
sed -n "/^[ ]*Title[ ]*Image:/ {s/^[ ]*//g; s/[ ]*$//g; s/^Title[ ]*Image:\(.*\)/
<\/a>/g;p;q;}")
fi
#-- The code below allows a wiki user to specify whether a page should have
#-- the section headings numbered by inserting '{x}' after the page title
sSectionNumberFlag=$(echo $sRawPageTitle | sed "s/{.\?x.\?}//g")
if [ "$sRawPageTitle" = "$sSectionNumberFlag" ]
then
bNumberSections="false"
else
bNumberSections="true"
fi
#-- This determines if Section Headings in the body of the page should
#-- be made into capital case or not (using code {~} )
sCapitalCaseSectionFlag=$(echo $sRawPageTitle | sed "s/{.\?[~].\?}//g")
if [ "$sRawPageTitle" = "$sCapitalCaseSectionFlag" ]
then
bCapitalCaseHeadings="false"
else
bCapitalCaseHeadings="true"
fi
#-- Whether a Section Heading table-of-contents is generated depends on
#-- either a script parameter, or the '[]' in the page title
if [ "$3" = "notoc" ]
then
bTableOfContents="false"
else
bTableOfContents="true"
fi
sTableOfContentsFlag=$(echo $sRawPageTitle | sed "s/\[.\?\]//g")
if [ "$sRawPageTitle" = "$sTableOfContentsFlag" ]
then
bTableOfContents="false"
else
bTableOfContents="true"
fi
if [ "$2" = "notran" ]
then
bTranslationLinks="false"
fi
sCapitalCaseTOCFlag=$(echo $sRawPageTitle | sed "s/\[[~]\]//g")
if [ "$sRawPageTitle" = "$sCapitalCaseTOCFlag" ]
then
bCapitalCaseTOC="false"
else
bCapitalCaseTOC="true"
fi
sTranslationLinksFlag=$(echo $sRawPageTitle | sed "s/(.\?.\?+.\?.\?)//g")
if [ "$sRawPageTitle" != "$sTranslationLinksFlag" ]
then
bTranslationLinks="true"
else
bTranslationLinks="false"
fi
sOutputLanguageFlag=$(echo $sRawPageTitle | sed "s/(.\?[A-Z][A-Z].\?)//g")
if [ "$sRawPageTitle" != "$sOutputLanguageFlag" ]
then
sOutputLanguageCode=$(echo $sRawPageTitle | sed "s/.*(.\?\([A-Z][A-Z]\).\?).*/\1/g")
else
sOutputLanguageCode="EN"
fi
case "$sOutputLanguageCode" in
ES) sOutputLanguage="spanish";;
IT) sOutputLanguage="italian";;
EN) sOutputLanguage="english";;
CA) sOutputLanguage="catalan";;
AL) sOutputLanguage="german";;
FR) sOutputLanguage="french";;
PO) sOutputLanguage="portuguese";;
esac
#-- for debugging
if [ "b" = "a" ]
then
echo "
"
echo "sHeadingPattern=$sHeadingPattern"
echo "sRawPageTitle=$sRawPageTitle"
echo "bNumberSections=$bNumberSections"
echo "bCapitalCaseHeadings=$bCapitalCaseHeadings"
echo "bTableOfContents=$bTableOfContents"
echo "bCapitalCaseTOC=$bCapitalCaseTOC"
echo "bTranslationLinks=$bTranslationLinks"
echo "sTitleImageHtml=$sTitleImageHtml"
echo "sOutputLanguageCode=$sOutputLanguageCode"
echo "sOutputLanguage=$sOutputLanguage"
echo "
"
fi
echo ""
echo ""
echo " "
echo " "
echo " "
#-- The linees below are to stop browsers and servers cacheing these HTML pages
#-- which is important since they are editable
echo " "
echo " "
echo " "
echo ""
echo ""
echo ""
echo ""
echo ""
#echo ""
echo ""
if [ "$sTitleImageHtml" != "" ]
then
echo "$sTitleImageHtml"
fi
echo "$sPageTitle