Skip to main content
IBM  
Shop Support Downloads
IBM Home Products Consulting Industries News About IBM
IBM developerWorks : XML : Education - Tutorials
Introduction to XML
ZIPPDF (letter)PDF (A4)e-mail
Main menuSection menuFeedbackPreviousNext
2. What is XML?
  


Processing HTML page 4 of 7


To wrap up this discussion of the sample HTML document, consider the task of extracting the postal code from this address. Here's an (intentionally brittle) algorithm for finding the postal code in HTML markup:

If you find a paragraph with two <br> tags, the postal code is the second word after the first comma in the second break tag.

Although this algorithm works with this example, there are any number of perfectly valid addresses worldwide for which this simply wouldn't work. Even if you could write an algorithm that found the postal code for any address written in HTML, there are any number of paragraphs with two break tags that don't contain addresses at all. Writing an algorithm that looks at any HTML paragraph and finds any postal codes inside it would be extremely difficult, if not impossible.


Main menuSection menuFeedbackPreviousNext
Privacy Legal Contact