# Description:
# A script to reformat a plain text file document based apon
# the Netbeans guides (xml/tomcat/users). This script was based
# apon the 'plaintext2html.sh' script. The netbeans Guides are
# in plain text format, after they have been produced with the
# 'reform-netbeans-usersguide.sh' etc. The Guides are indented
# according to the level of 'section' heading. The first level section
# headings have no spaces in front of them. The script recognises some
# special structures within the plain text document. For example:
#
# Where the first non-whitespace character on a line is '=' then all
# the following text on the line should be formatted as a 'document
# title'. Also, url style strings should be recognised and given a
# hyperlink token in from of them, such as '[*]'. I prefer this to
# underlining the entire url, because I find that the underlining
# tends to interfer with the readability of the text.
#
# The script will also format blocks of text between the strings -->>
# and --<< (where they are the first string on the line) as an HTML
#
block. But this wont be used becuase the Netbeans guides dont
# contain these characters.
#
# Examples:
# ./netbeans-guid2html.sh netbeans-userguide.txt
# This command line, executed in some kind of a bash shell, will
# transform a plain text file which isn't is any particular format,
# into an HTML file (that is it will create a new HTML file and
# leave the original text file unchanged) and will not display the
# automatic translation links to Google. Also an HTML table of
# contents (with one entry for each heading) will be inserted in the
# HTML document.
#
# ./netbeans-guide2html.sh netbeans-tomcat-guide.txt notran notoc
# The text file will be transformed into HTML but no table of contents
# will be inserted nor any translation links.
#
# ./netbeans-guide2html.sh mjb-work.txt blah notoc
# If translation links are desired but no table of contents, use a
# command line similar to above. The string 'blah' could be anything
# as long as its not 'notran'. This slighty dodgy 'feature' is owing to the
# fact that I am not using any 'getopt' style option parsing.
#
# Parameters:
# textFileName [required]
# The name of the text file which is to be transformed from text into html
# notran [optional]
# If the second parameter is the string 'notran' then the javascript links
# to the google automatic language translation engine will NOT be inserted
# into the HTML page. This is useful, for example, when the HTML page is
# going to be located within a 'password-protected' directory, because
# the Google translation engine will not be able to access the page, and
# therefor the translation links will not work.
# notoc [optional]
# If the third parameter is the string "notoc", then no HTML table of
# contents will be generated.
#
#
# Notes:
# On an file of size 1 megabyte this script takes several minutes to
# complete.
#
# The Html generated is somewhat dodgy but attempts to avoid some of
# the more heinous html sins, such as tags The HTML is
# unfriendly to Lynx the text browser because of the center aligned
# table which creates a left margin.
#
# Interestingly, almost all the functionality which this script
# provides, that is, making an html table of contents, could equally
# be achieved using the 'htmldoc' program with a line similar to
# htmldoc -f output.html --book --no-title theTextFile.txt This
# assumes that the html file contains heading tags in the correct
# order. The above is not quite accurate. There is also a perl program
# called 'txt2html' or something similar which does a similar thing.
#
# The Html Really needs to be 'chunked' that is, written out into
# several different file so that the Browser doesn't 'crap out' on the
# big 1 meg file.
#
# See Also:
# diary2html.sh, linkdoc2html.sh, plaintext2pdf.sh,
# reform-netbeans-userguide.txt, reform-netbeans-tomcatguide.txt
# Author:
# m.j.bishop
if [ "$1" = "" ]
then
echo "usage: $0 textFileName [notran] [notoc]"
cat $0 | sed -n "/^[ ]*#/p"
exit 1;
fi
#-- The section below creates the table of contents for the document
#-- This line is designed to only number lines which match a pattern
#-- In theory 'nl -bpPATTERN' should also do this, but it insisted on
#-- 'double-spacing' the output
#-- Also the expressions below try and get rid of things like "can't" and "won't"
#-- because I want to apply some formatting to the content of quotes, and these
#-- things will get in my way. Currently no formatting is applied to quotes.
#-- This is the pattern which determines what sort of lines will
#-- be interpreted as 'section headings'. I cannot use the for the 'awk' line
#-- because awk does not seem to accept the notation \{n,\}
#--sHeadingPattern='[ A-Z0-9.\/\\]*[A-Z]\{3,\}[ A-Z0-9.\/\\]*'
sHeadingPattern='[^ ]'
cat $1 | expand | \
mawk '/^[a-zA-Z0-9"][a-zA-Z0-9" ]*$/{ii++; print ii $0}!/^[a-zA-Z0-9"][a-zA-Z0-9" ]*$/ {print $0}' | \
sed "s/\([a-zA-Z]\{2,\}\)n[\"']t/\1nt/g" > $1.temp
(echo "
"; \
cat $1.temp | \
sed "/^\([0-9]\{1,\}\)\([a-zA-Z].*\)$/!d" | \
sed "s/^\([0-9]\{1,\}\)\([a-zA-Z].*\)$/ \1. \2<\/a>/g"; \
echo "
"
fi
#-- Put the page heading before the table of contents
#--
cat $1.temp | \
sed "/^[ ]*=[ ]*[^=].*/!d" | \
sed -e "s/\</g" -e "s/>/\>/g" | \
sed "s/^[ ]*=[ ]*\([^=].*\)/
\1<\/h2><\/center>/gi"
#--Unfriendly to Lynx
#--echo "
"
#-- Inset the table of contents
if [ "$3" != "notoc" ]
then
cat plain-text-toc.temp
fi
#-- Transform the text to HTML, insert anchors
#-- Also delete the heading line which has already been inserted in the HTML
#-- But, the line will also delete lines beginning in == or === etc, which
#-- may not be desirable.
#-- The line below was designed to make the contents of quotes look different
#-- but I think that it made the text less readable
#--
#-- sed "s/\(['\"]\)[^'\"]\{1,\}\1/&<\/tt>/g" | \
#--
#-- I have disabled the line which turns * beginning lines into hyperlinks
#-- since this was not desirable for the netbeans documentation
cat $1.temp | \
expand | \
sed "/^[ ]*=[ ]*\([^=].*\)$/d" | \
sed -e "s/\</g" -e "s/>/\>/g" | \
sed -e "s/^[ ]*\-\-\>\>/