Entry for August 26, 2007-supplemental +

|
A.3. Beautiful SoupBeautiful Soup is a Python parser for HTML and XML documents. It is designed to work with poorly written web pages. It is used in this book to create datasets from web sites that do not have APIs, and to find all the text on pages for indexing. The home page for this library is http://www.crummy.com/software/BeautifulSoup. A.3.1. Installation on All PlatformsBeautiful Soup is available as a single file source download. Near the bottom of the home page, there is a link to download BeautifulSoup.py. Simply download this and put it in either your working directory or your Python/Lib directory. A.3.2. Simple Usage ExampleThis example parses the HTML of the Google home page, and shows how to extract elements from the DOM and search for links. from BeautifulSoup import BeautifulSoup from urllib import urlopen soup=BeautifulSoup(urlopen('http://google.com')) soup.head.title Google links=soup('a') len(links) 21 links[0] iGoogle links[0].contents[0] u'iGoogle'
A more extensive set of examples is available at http://www.crummy.com/software/BeautifulSoup/documentation |