# Description: # A script to reformat a plain text file document which contains a # personal resume or 'curriculum vita' into some kind of HTML. # The script recognises some special structures within the plain text # document. For example: # # The '=' character, when the first non-whitespace character on a # line indicates that all the following text on the line should be # formatted as a 'document heading' or 'page title'. # # The '*' character, indicates that the following white-space # delimited text should be formatted as an Html hyperlink, with the # text content of the hyperlink being the url itself. There are quite a # few other special structures which get formatted as peculiar HTML. # # This script also accepts the format: # [Beginning Of Line][spaces]*[spaces]The Document Title|Url-Or-Path/to/Html/File|Url-Or-Path/To/Text/File| # # This script also accepts the format (all on one line): # [Beginning Of Line][spaces]*[spaces]The Document Title|Url-Or-Path/to/Html/File|Url-Or-Path/To/Text/File| # |Url-Or-Path/To/Pdf/File| # # This script also accepts the format: # [Beginning Of Line][spaces]*[spaces]The Document Title|Url-Or-Path/to/Base/FileName|||| # An example of this format would be # * A Interesting Analysis|/alexis-info/docs/the-ramble|||| # This example assumes that there are files # /alexis-info/docs/the-ramble.html # /alexis-info/docs/the-ramble.txt # /alexis-info/docs/the-ramble.pdf # # This format is useful when all the different 'versions' (that is, document formats) # have the same base name and directory location, but have the appropriate file name # extension for their documents type. The script will automatically generate links # to each of these document formats in the order: html, text, pdf # # The script also accepts the format (all on one line) # [Beginning Of Line][spaces]* # [spaces]The Document Title|Url-Or-Path/to/Base/FileName|extension|extension|extension| # Where 'extension' is any file name extension # An example of this format would be # * A Interesting Analysis|/alexis-info/docs/the-ramble|txt|html|doc| # This example assumes that there are files # /alexis-info/docs/the-ramble.html # /alexis-info/docs/the-ramble.txt # /alexis-info/docs/the-ramble.doc # # For the sake of the 'readability' of the text file, this format is prefered to the previous # one. Both of these formats can also be used with two file name extensions instead of one. # # This script also accepts the format: # [Beginning Of Line][spaces]*[spaces]The Document Title|Url-Or-Path/to/Base/FileName||| # This produces the same results as the format above except that no link to a Adobe 'pdf' # file is created. # # The script also accepts the format (all on one line) # [Beginning Of Line][spaces]* # [spaces]The Document Title|Url-Or-Path/to/Base/FileName|extension|extension| # Where 'extension' is any file name extension # An example of this format would be # * A Interesting Analysis|/alexis-info/docs/the-ramble|txt|doc| # This example assumes that there are files # /alexis-info/docs/the-ramble.txt # /alexis-info/docs/the-ramble.doc # # This script also accepts the format (All on one line): # [Beginning Of Line][spaces]*[spaces] # The Document/Link Title|Url-Or-Path/to/File| # # The script also accepts the format: # [Beginning Of Line][spaces]http://blah # # The script will also format blocks of text between the strings -->> and --<< # (where they are the first string on the line) as an HTML
block
#
# This filter script also ignors lines starting in a '#' character. That is
# those lines will not be rendered into Html.
#
# Please see the file /var/www/alexis-info/docs/resources.txt for an
# example of a file which utilizes some of the formats described above.
#
# Example:
# ./resume2html.sh aRave.txt > aRave.html
#
# Parameters:
# textFileName
# The name of the text file which is to be transformed from text into html
# [notran]
# If the second parameter is the string 'notran' then the javascript links
# to the google automatic language translation engine will NOT be inserted
# into the HTML page. This is useful, for example, when the HTML page is
# going to be located within a 'password-protected' directory, because
# the Google translation engine will not be able to access the page, and
# therefor the translation links will not work.
# [notoc]
# [forum-processor-url]
#
# Notes:
# The idea of this script is to allow the text file to be as free of 'mark-up'
# as is possible. This can allow the simple maintenance of the text file, although
# the precision and utility of a system such as XML is not available.
# It should be possible to modify this script to produce XML instead of HTML
#
# This script has been successfully run on the Debian linux bash shell and the
# Redhat Linux bash shell.
# It is possible that it would also run on a Microsoft Windows bash shell,
# such as the Cygwin Bash shell.
#
# There is a GPL perl program called text2html which performs a similar task
# to this script.
#
# The HTML produced by this script is NOT friendly to Lynx, the text browser
# because it uses an HTML table to create a 'left margin' for the document
# A style sheet should be used instead.
#
# The code which used 'mawk' or 'awk' or 'gawk' in order to number certain lines
# which matched a regular expression have been removed and replaced with code
# which uses the 'nl' program. For some reason 'nl' place empty lines in between
# every line in the file when it uses a regular expression to number lines. These
# 'empty' lines actually contain a series of spaces and nothing else.
#
# For this reason, some extra 'sed' lines are necessary in order to get rid of this
# unwanted blank lines.
#
# This script should probably also use a script which is called 'iso2html.sed'
# and which is located at http://sed.sourceforge.net/ This script turns 'special'
# characters into HTML entities, so that they will be properly rendered by a Web
# Browser
#
# Dependencies
# A Bash Shell, a good modern sed (say GNU), the sed script 'iso2html.sed'
# A CSS StyleSheet in the relative location /stylesheets/swish-style.css which
# contains the various HTML styles used in the resulting HTML.
#
# This is only really a 'dependency' for Netscape Navigator 4.61 and possibly other
# older browsers, which are unable to display anything if a references Style Sheet
# is not present.
#
# See Also:
# txtdoc2html.sh, diary2html.sh, plaintext2html.sh
# plaintext2pdf.sh, plaintext2html-forum.sh, linkdoc2html.sh
# linkdoc2html-index.sh, lindoc2html-forum.sh
# Author:
# m.j.bishop
if [ "$1" = "" ]
then
echo "usage: $0 textFileName [notran] [notoc] [forum-processor-url]"
cat $0 | sed -n "/^[ ]*#[^\-]/p"
exit 1;
fi
#-- The section below creates the table of contents for the linkdoc.
#-- This line is designed to only number lines which match a pattern
#-- In theory 'nl -bpPATTERN' should also do this, but it insisted on
#-- 'double-spacing' the output
#-- Also the expressions below try and get rid of things like "can't" and "won't"
#-- because I want to apply some formatting to the content of quotes, and these
#-- things will get in my way.
#-- This is the pattern which determines what sort of lines will
#-- be interpreted as 'section headings'. I cannot use the for the 'awk' line
#-- because awk does not seem to accept the notation \{n,\}
sHeadingPattern='[ A-Z0-9.\/\\]*[A-Z]\{3,\}[ A-Z0-9.\/\\]*'
#-- I have disable the code below because in a cgi environment, this script doesn't
#-- seem to have permission to create a file.
#-- This is a real gotcha. If the file $1.temp already exists and is not writable
#-- by 'other' then the 'add-comment' script falls over because it cant successfully
#-- call this script. This problem only arises in a CGI environment where the
#-- Web server does not have root permissions. If the $1.temp file cannot be
#-- created then this script wont work. One solution is to manually give
#-- write permission to 'other'.
#-- This script (and the 'add-comment' script) will succeed the FIRST time in
#-- a cgi environment if the $1.temp file does not exist at all. This is
#-- because if the file does not exist then the Web Server has sufficient
#-- permissions to create it. HOWEVER, the second time and afterwards this
#-- script and the 'add-comment' script will FAIL because when the web
#-- Server creates the $1.temp file the first time it creates it without
#-- write permission for 'other'. That is to say, the Web Server essentially
#-- is able to create a file which it is not allowed to subsequently
#-- modify (nor re-create). Actually this whole second part may not be true
#-- The web server creates the file as 'mbishop' and probably cant write to
#-- it.
#--
#-- There are, no doubt, various solutions to this problem, including giving the
#-- web server sufficient permissions to recreate the file. etc. However the
#-- simplest solution is just to not use $1.temp. It is/was only used in three
#-- places. Removing it may or may not slow the script down. I dont know
# cat $1 | expand | \
# mawk '/^[ A-Z0-9.\/\\]*[A-Z]+[ A-Z0-9.\/\\]*$/{ii++; print ii $0}!/^[ A-Z0-9.\/\\]*[A-Z]+[ A-Z0-9.\/\\]*$/' | \
# sed "s/\([a-zA-Z]\{2,\}\)n[\"']t/\1nt/g" > $1.temp
echo ""
echo ""
echo " "
echo " "
echo " "
echo " "
echo ""
echo ""
echo ""
echo ""
echo ""
echo ""
echo ""
echo ""
echo ""
#-- The Google automatic translation links below, are sometimes disabled because they will
#-- not work from within a password protected directory, since Google does not
#-- have permission to view that directory.
if [ "$2" != "notran" ]
then
echo ""
echo "See this page in (approximate):"
echo "Español|"
echo "Français|"
echo "Italiano|"
echo "Deutsch|"
echo "Português"
echo " "
fi
#---- The file below contains a colorized table of the links
#---- cat /var/www/utils/translator-bar.html
#-- Put the page heading before the table of contents
#--
expand $1 | \
sed "/^[ ]*=[ ]*[^=].*/!d" | \
sed "s/\([a-zA-Z]\{2,\}\)n[\"']t/\1nt/g" | \
sed -e "s/\</g" -e "s/>/\>/g" | \
sed "s/^[ ]*=[ ]*\([^=].*\)/\1<\/h2><\/center>/gi"
#- This line below is not 'lynx friendly' a style sheet
#- should be used instead.
echo "
BACK TO THE TABLE OF CONTENTS
"
if [ "$2" != "notran" ]
then
echo ""
echo "See this page in (approximate):"
echo "Español|"
echo "Français|"
echo "Italiano|"
echo "Deutsch|"
echo "Português"
echo " "
fi
echo ""
echo ""