Python & XML

Book description

If you are a Python programmer who wants to incorporate XML into your skill set, this is the book for you. Python has attracted a wide variety of developers, who use it either as glue to connect critical programming tasks together, or as a complete cross-platform application development language. Yet, because it is object-oriented and has powerful text manipulation abilities, Python is an ideal language for manipulating XML.Python & XML gives you a solid foundation for using these two languages together. Loaded with practical examples, this new volume highlights common application tasks, so that you can learn by doing. The book starts with the basics then quickly progresses to complex topics, like transforming XML with XSLT, querying XML with XPath, and working with XML dialects and validation. It also explores the more advanced issues: using Python with SOAP and distributed web services, and using Python to create scalable streams between distributed applications (like databases and web servers).The book provides effective practical applications, while referencing many of the tools involved in XML processing and Python, and highlights cross-platform issues along with tasks relevant to enterprise computing. You will find ample coverage of XML flow analysis and details on ways in which you can transport XML through your network.Whether you are using Python as an application language, or as an administrative or middleware scripting language, you are sure to benefit from this book. If you want to use Python to manipulate XML, this is your guide.

Publisher resources

View/Submit Errata

Table of contents

  1. Python & XML
  2. Dedication
  3. A Note Regarding Supplemental Files
  4. Preface
    1. Audience
    2. Organization
    3. Conventions Used in This Book
    4. Using Code Examples
    5. How to Contact Us
    6. Acknowledgments
  5. 1. Python and XML
    1. Key Advantages of XML
      1. Application Neutrality
      2. Hierarchical Structure
      3. Platform Neutrality
      4. International Language Support
    2. The XML Specifications
      1. XML 1.0 Recommendation
      2. Namespaces in XML
      3. XML as a Foundation
    3. The Power of Python and XML
      1. Python Tools for XML
      2. The SAX and DOM APIs
      3. More Ways to Extract Information
    4. What Can We Do with It?
  6. 2. XML Fundamentals
    1. XML Structure in a Nutshell
    2. Document Types and Schemas
      1. Document Type Definitions
      2. Alternate Schema Languages
        1. XML Schema
        2. TREX
        3. RELAX-NG
        4. Schematron
    3. Types of Conformance
    4. Physical Structures
    5. Constructing XML Documents
      1. Characters in XML Documents
        1. The ASCII character set
        2. The ISO-8859-1 character set
        3. UTF-8 Encoding
      2. Text, Character Data, and Markup
        1. Names
      3. Whitespace in Character Data
      4. End-of-Line Handling
      5. Language Identification
      6. The Document Prolog
      7. Start, End, and Empty Element Tags
        1. Quotes around attribute values
      8. Comments
      9. Processing Instructions
      10. CDATA Sections
    6. Document Type Definitions
      1. Entity Declarations
      2. Element Type Declarations
        1. Content models
      3. Attribute Declarations
        1. Attribute data types
        2. Attribute values and constraints
    7. Canonical XML
      1. The Canonical XML Data Model
      2. Document Order
      3. Canonical XML Structure
    8. Going Beyond the XML Specification
      1. XML Namespaces
      2. Extracting Information Using XPath
      3. Using XLink to Link XML Documents
      4. Communicating with XML Protocols
      5. Replacing HTML with XHTML
      6. Transforming XML with XSLT
  7. 3. The Simple API for XML
    1. The Birth of SAX
    2. Understanding SAX
      1. Using SAX in an Application
      2. SAX Handler Objects
        1. ContentHandler
        2. ErrorHandler
        3. DTDHandler
        4. EntityResolver
        5. Other handler objects
      3. SAX Reader Objects
    3. Reading an Article
      1. Writing a Simple Handler
      2. Creating the Main Program
      3. Adding Intelligence
      4. Using the Additional Information
    4. Searching File Information
      1. Creating the Index Generator
        1. Creating the IndexFile class
        2. Running index.py
      2. Searching the Index
    5. Building an Image Index
      1. Creating Thumbnail Images
        1. Creating thumbnails on Windows
      2. Implementing the SAXThumbs Handler
      3. Viewing Your Thumbnails
    6. Converting XML to HTML
      1. The Generated Document
      2. The Conversion Handler
      3. Driving the Conversion Handler
    7. Advanced Parser Factory Usage
    8. Native Parser Interfaces
      1. Using PyExpat Directly
  8. 4. The Document Object Model
    1. The DOM Specifications
      1. Levels of the Specification
      2. Feature Specifications
    2. Understanding the DOM
    3. Python DOM Offerings
      1. Streamlining with Minidom
      2. Using Pulldom
      3. 4DOM: A Full Implementation
    4. Retrieving Information
      1. Getting a Document Object
        1. Loading a document using 4DOM
        2. Loading a document using minidom
      2. Determining a Node’s Type
      3. Getting a Node’s Children
      4. Getting a Node’s Siblings
      5. Extracting Elements by Name
      6. Examining NodeList Members
      7. Looking at Attributes
    5. Changing Documents
      1. Creating New Nodes
      2. Adding and Moving Nodes
      3. Removing Nodes
      4. Changing a Document’s Structure
    6. Building a Web Application
      1. Preparing the Web Server
        1. Ensuring the script’s execution
        2. Enabling write permission
      2. The Web Application Structure
        1. The Article class
        2. The Storage class
      3. Implementing Site Logic
        1. The ArticleManager class
      4. Controlling the Application
    7. Going Beyond SAX and DOM
  9. 5. Querying XML with XPath
    1. XPath at a Glance
    2. Where Is XPath Used?
    3. Location Paths
      1. An Example Document
      2. A Path Hosting Script
      3. Getting Character Data
      4. Specifying an Index
      5. Testing Descendent Nodes
      6. Testing Attributes
      7. Selecting Elements
      8. Additional Operators
    4. XPath Arithmetic Operators
    5. XPath Functions
      1. Working with Numbers
      2. Working with Strings
      3. Working with Nodes
    6. Compiling XPath Expressions
  10. 6. Transforming XML with XSLT
    1. The XSLT Specification
    2. XSLT Processors
    3. Defining Stylesheets
      1. Simplified Stylesheets
      2. Standalone Stylesheets
      3. Embedded Stylesheets
    4. Using XSLT from the Command Line
    5. XSLT Elements
      1. The Stylesheet Element
      2. Creating a Template Element
      3. Applying Templates
      4. Getting the Value of a Node
      5. Iterating over Elements
    6. A More Complex Example
      1. File Template
      2. Class Template
      3. Method Template
    7. Embedding XSLT Transformations in Python
      1. Creating the Source XML
      2. Creating a Simple Stylesheet
      3. Creating a Stylesheet with Edit Functions
      4. Creating the CGI Script
      5. Selecting a Mode
    8. Choosing a Technique
  11. 7. XML Validation and Dialects
    1. Working with DTDs
      1. Validating with the Internal DTD Subset
      2. Validating with an External DTD Subset
    2. Validation at Runtime
    3. The BillSummary Example
      1. The Flat File
      2. The Web Form
      3. Starting the CGI
      4. Conversion and Validation
        1. Converting text to XML
        2. Validating the XML
        3. Creating a validation handler
      5. Completing the CGI
        1. Defining success and error functions
        2. Converting the flat file to XML
        3. Validating the converted XML
        4. Displaying the XML
      6. Running the Application in a Browser
    4. Dialects, Frameworks, and Workflow
    5. What Does ebXML Offer?
      1. ebXML Document Structure
      2. Business Process and Modeling
      3. Phases of ebXML
  12. 8. Python Internet APIs
    1. Connecting Web Sites
      1. Continuing Improvement
      2. Python to the Rescue
    2. Working with URLs
      1. Encoding URLs
      2. Quoting URLs
      3. Unquoting URLs
    3. Opening URLs
      1. Using FTP
      2. Retrieving URLs
    4. Connecting with HTTP
      1. HTTP Conversations
      2. Request Types
      3. Getting a Document with Python
      4. Building a Query String with httplib
      5. Baking Cookies for the Server
      6. Performing a POST Operation
        1. Creating a POST catcher
        2. Ensuring proper URL encoding
        3. Performing a POST with httplib
        4. Illustrating a complete POST operation
    5. Using the Server Classes
      1. BaseHTTPServer Module Classes
      2. Server Core Concepts
        1. Instantiating a server class
        2. Serving a GET
        3. Serving a POST
      3. Building a Complete Server
        1. Running a GET request
        2. Running a POST request
  13. 9. Python, Web Services, and SOAP
    1. Python Web Services Support
    2. The Emerging SOAP Standard
      1. SOAP Messages
      2. Exchanging SOAP Messages
      3. Encoding SOAP Messages
      4. Constructing SOAP Envelopes
        1. SOAP packet requirements
        2. SOAP encoding style
      5. Using SOAP Headers
      6. SOAP Body Elements
      7. Error Message and SOAP Fault
        1. Fault element
        2. Fault codes
      8. SOAP Encoding Techniques
      9. SOAP Encoding Rules
      10. Simple Types
      11. Compound Types
      12. SOAP over HTTP
        1. The SOAPAction header
        2. SOAP HTTP responses
      13. SOAP for RPC
    3. Python SOAP Options
      1. Working with SOAPy
      2. Working with MSSOAP
      3. MSSOAP Serialization Basics
        1. Adding URIs and namespaces
        2. Creating the SOAP envelope
        3. Making the call
    4. Example SOAP Server and Client
      1. Requirements for Using MSSOAP
        1. Getting Microsoft SOAP Toolkit 2.0
        2. Making the samples web-visible
        3. Getting Python COM support
        4. Fixing MSSOAP with makepy.py
      2. Server Setup
      3. A Python SOAP Client
        1. Defining reusable basics
    5. What About XML-RPC?
  14. 10. Python and Distributed Systems Design
    1. Sample Application and Flow Analysis
      1. Decoupling Application Systems
      2. Routing Adds Flexibility
      3. Routing Adds Scalability
    2. Understanding the Scope
    3. Building the Database
      1. Creating a Profiles Database
      2. Creating a Customer Table
      3. Populating the Database
    4. Building the Profiles Access Class
      1. The Interfaces
      2. Getting Profiles
        1. Connecting with the database
        2. Building the XML document
        3. Returning a DOM instead of a string
      3. Inserting and Deleting Profiles
        1. Inserting a profile
        2. Deleting a profile
      4. Updating Profiles
      5. The Complete CustomerProfile Class
    5. Creating an XML Data Store
      1. A Large XML File
      2. Creating an XML Access Object
        1. The interfaces
        2. Using the XMLOffer class
        3. Creating the XMLOffer class
          1. Retrieval methods
          2. Modification methods
    6. The XML Switch
      1. XML Architecture
      2. Core XML Switch Classes
      3. The XMLMessage Class
        1. XMLMessage format
        2. XMLMessage class
        3. XML message code architecture
        4. XMLMessage code listing
      4. The XML Switch Service
      5. The XML Switch Client
        1. Using postMsg.html to send back XML
        2. Using the XSC client
        3. Using the XSC API
      6. The XMLSwitchHandler Server Class
        1. XMLSwitchHandler code architecture
        2. XMLSwitchHandler listing
    7. Running the XML Switch
    8. A Web Application
      1. Connecting to a Web Service
      2. The Components
      3. The Topology
      4. The Code Architecture
      5. The CGI Functionality
        1. Extracting profile information
        2. Updating profile information
        3. Displaying all offers
      6. The Complete sp.py Listing
      7. Running the Site as a User
  15. A. Installing Python and XML Tools
    1. Installing Python
      1. Windows
      2. Linux and Unix
    2. Installing PyXML
    3. Installing 4Suite
  16. B. XML Definitions
    1. XML Definitions
  17. C. Python SAX API
    1. Convenience Functions
    2. XMLReader
    3. ContentHandler
    4. DTDHandler
    5. EntityResolver
    6. InputSource
    7. ErrorHandler
    8. DeclHandler
    9. LexicalHandler
    10. Locator
    11. SAX Exceptions
  18. D. Python DOM API
    1. DOMException
      1. DOMException
    2. DOMImplementation
      1. DOMImplementation
    3. DocumentFragment
      1. DocumentFragment
    4. Document
      1. Document
    5. Node
      1. Node
    6. NodeList
      1. NodeList
    7. NamedNodeMap
      1. NamedNodeMap
    8. CharacterData
      1. CharacterData
    9. Attr
      1. Attr
    10. Element
      1. Element
    11. Text
      1. Text
    12. Comment
      1. Comment
    13. CDATASection
      1. CDATASection
    14. DocumentType
      1. DocumentType
    15. Notation
      1. Notation
    16. Entity
      1. Entity
    17. EntityReference
      1. EntityReference
    18. ProcessingInstruction
      1. ProcessingInstruction
    19. 4DOM Extensions
  19. E. Working with MSXML3.0
    1. Setting Up MSXML3.0
    2. Basic DOM Operations
      1. MSXML Nodes
      2. Using a NodeList
    3. MSXML3.0 Support for XSLT
      1. Source XML
      2. XSL Stylesheet
      3. Running an MSXML Transformation
    4. Handling Parsing Errors
    5. MSXML3.0 Reference
      1. MSXML3.0 Document Object
      2. MSXML3.0 Node Object
      3. MSXML3.0 NamedNodeMap Object
      4. MSXML3.0 NodeList Object
      5. MSXML3.0 ParseError Object
  20. F. Additional Python XML Tools
    1. Pyxie
    2. Python XML Tools
    3. XML Schema Validator
    4. Sab-pyth
    5. Redfoot
    6. XML Components for Zope
      1. Parsed XML
      2. Page Templates
    7. Online Resources
  21. Index
  22. Colophon
  23. Copyright

Product information

  • Title: Python & XML
  • Author(s): Christopher A. Jones, Fred L. Drake
  • Release date: December 2001
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491948859