# Description: # # The following command lines were used to reformat the Netbeans # (www.netbeans.org) standard documentation for the xml module, into # a plain text version, with all the html files concatenated in the # same order that they are refered to in the table of contents. In # addition, the 'see also' sections for each html page has been # removed. These command lines were run on a ms-windows 2000 computer # using the 'unxutils' unix shell and tools which are located on # 'source-forge' # # The script also uses a Ms-Windows 'lynx' port, which is located at # http://www.jim.spath.com/lynx_win32/ Lynx (a text only browser) # is required in order to convert html into plain text with its # '-dump' command line option. The unix program 'html2text' will # also do this, but I have not been able to find a Ms-Windows port # of this program. The dos program 'htmStrip' also does this, but # does not seem to have good support for 'batch' processing. For # this version of lynx on ms-windows you will have to change you # 'path' environment variable and add an environment variable called # 'lynx_cfg' which will point to the lynx.cfg file. # # The result of these transformations is a 235 A4 page plain text manual using # an 8point font in Wordpad or a 298 page manual with a 10point font. # # Notes; # This scripts should also write 'section, or subject' headings when it # concatenates the various index files. This would allow these 'subject' # headings to form part of the table of contents for an output format # such as Adobe pdf. Otherwise, the table of contents for the users guide # is too long (approximately 400 items) # # This script was initially written and run on an MS Windows laptop using # the Cygwin shell, but I think it should run on unix too. # # See Also # netbeans-guide2html.sh, netbeans-guide2pdf.sh # plaintext2html.sh, plaintex2html.sh # Url: # http://www.ella-associates.org/alexis-info/utils/ # # Author: m.j.bishop #-- Some pseudo code which depends on which modules documentation #-- you would like to reformat #-- The jar documentation files are stored in #-- [Netbeans Installation Dir]\modules\docs # jar xf [module-name] # cd org\netbeans\modules\xml\core\docs # cd org\netbeans\modules\usersguide #-- for the Tomcat documentation the html files are extracted #-- to the following directory: #-- [Nb Install Dir]\modules\docs\org\netbeans\modules\tomcat\tomcat40\docs\tomcat4 #-- for the Tomcat docs, the xml files are called #-- 'tomcat-toc.xml' and 'tomcatMap.jhm' #-- for the users guide the map file is 'Map.jhm' #-- there does not appear to be a complete table of contents file #-- cat ide-toc.xml | grep "target=" | sed -e 's/.*target="//g' -e 's/".*$//g' > tocmap.txt #-- This uses the xml map file to find the corresponding html files #-- in the 'html' directory #-- Dont get rid of the leading directory name (eg 'html') below #-- #-- The line below looks for double quotes (") in the sed part of #-- the command line. The ms-windows command shell does not seem to #-- be able to do this, since it doesn't recognise the single quote (') #-- as a string delimiter. #-- Probably need to get rid of references to 'pending.html' which #-- indicates that no documenation is available. Also a number of files #-- occur twice in the output. Uniq wont solve this because it only #-- removes adjacent duplicates (?) #-- This command takes approximately 20 seconds to complete #-- on my win2000 laptop. for f in $(cat tocmap.txt); do grep "target=\"$f\"" Map.jhm; done | sed 's/.*url="\([^"]*\)".*/\1/g' | \ expand | sed "/^[ ]* newtoc.txt #cd html #-- dont do a 'cd' but use the directory reference in the #-- '-map.xml' file. Otherwise for the 'usersguide' documentation, you #-- would have to 'cd' into a large number of directories. #-- #-- or cd tomcat4 #-- The command below took about 50 seconds on my win2000 laptop for f in $(cat newtoc.txt); do lynx -dump -nolist $f ; done | less #-- Microsoft Windows 2000 contains a program called 'expand' which #-- interfers with the unix utility. To run this program on MS Windows you may #-- have to rename the ms 'expand' program. # E:\Program Files\NetBeans IDE 3.4\modules\docs\org\netbeans\modules\xml\core\docs\html> #-- maybe I should leave the see also section in?? cat all.txt | expand | \ sed -e "/\[splash\]/d" -e "/Legal Notices/d" | \ sed "/^[ ]*See also[ ]*$/,/^[ ]*[\-_]*[ ]*$/d" | \ sed "s/^[ ]*$//g" | tr -s "\n" > all-clean.txt # In order to convert this text output to # HTML or PDF see the scripts 'netbeans-guide2html.sh' and 'netbeans-guide2pdf.sh'