path: root/doc/README.xml-output
diff options
authorGilbert Ramirez <gram@alumni.rice.edu>2003-12-06 06:09:13 +0000
committerGilbert Ramirez <gram@alumni.rice.edu>2003-12-06 06:09:13 +0000
commit058ef64db8ce40909a18c91ab4805804362f80cb (patch)
tree767a7824daa712556971559e29e563658d643d51 /doc/README.xml-output
parent33b25ac15eac2e2cb4269377c41eada622c81fc1 (diff)
Add the ability to print packet dissections in PDML (an XML-based format)
to tethereal. It could be added to Ethereal, but the GUI changes to allow the user to select PDML as a print format have not been added. Provide a python module (EtherealXML.py) to help parse PDML. Provide a sample app (msnchat) which uses tethereal and EtherealXML.py to reconstruct MSN Chat sessions from packet capture files. It produces a nice HTML report of the chat sessions. Document tethereal's PDML and EtherealXML.py usage in doc/README.xml-output Update tethereal's manpage to reflect the new [-T pdml|ps|text] option svn path=/trunk/; revision=9180
Diffstat (limited to 'doc/README.xml-output')
1 files changed, 206 insertions, 0 deletions
diff --git a/doc/README.xml-output b/doc/README.xml-output
new file mode 100644
index 0000000000..df3d77e920
--- /dev/null
+++ b/doc/README.xml-output
@@ -0,0 +1,206 @@
+Protocol Dissection in XML Format
+$Id: README.xml-output,v 1.1 2003/12/06 06:09:12 gram Exp $
+Copyright (c) 2003 by Gilbert Ramirez <gram@alumni.rice.edu>
+Tethereal has the ability to print its protocol dissection in an
+XML format, by using the "-Tpdml -V" options. Similar functionality
+could be put into the "Print" dialog of Ethereal, but that work has
+not been done yet.
+The XML that tethereal produces follows the Packet Details Markup
+Language (PDML) specified by the group at the Politecnico Di Torino
+working on Analyzer. The specification can be found at:
+A related XML format, the Packet Summary Markup Language (PSML), is
+also defined by the Analyzer group to provide packet summary information.
+The PSML format is not documented in a publicly-available HTML document,
+but its format is simple. Some day it may be added to tethereal so
+that "-Tpsml" (without "-V") would produce PSML.
+One wonders if the "-T" option should read "-Txml" instead of "-Tpdml"
+(and in the future, "-Tpsml"), but if tethereal was required to produce
+another XML-based format of its protocol dissection, then "-Txml" would
+be ambiguous.
+The PDML that tethereal produces is known not to be loadable into Analyzer.
+It causes Analyzer to crash. As such, the PDML that tethereal produces
+is be labled with a version number of "0", which means that the PDML does
+not fully follow the PDML spec. Furthemore, a creator attribute in the
+"<pdml>" tag gives the version number of [t]ethereal that produced the PDML.
+In that way, as the PDML produced by tethereal matures, but still does not
+meet the PDML spec, scripts can make intelligent decisions about how to
+best parse the PDML, based on the "creator" attribute.
+A PDML file is delimited by a "<pdml>" tag.
+A PDML file contains multiple packets, denoted by the "<packet>" tag.
+A packet will contain multiple protocols, denoted by the "<proto>" tag.
+A protocol might contain one or more fields, denoted by the "<field>" tag.
+A pseudo-protocol named "geninfo" is produced, as is required by the PDML
+spec, and printed as the first protocol after the opening "<packet>" tag.
+Its information comes from ethereal's "frame" protocol, which servers
+the similar purpose of storing packet meta-data. Both "geninfo" and
+"frame" protocols are provided in the PDML output.
+The "<pdml>" tag
+ <pdml version="0" creator="ethereal/0.9.17">
+The creator is "ethereal" (i.e., the "ethereal" engine. It will always say
+"ethereal", not "tethereal") version 0.9.17.
+The "<proto>" tag
+"<proto>" tags can have the following attributes:
+ name - the display filter name for the protocol
+ showname - the label used to describe this protocol in the protocol
+ tree. This is usually the descriptive name of the protocol,
+ but it can be modified by dissectors to include more data
+ (tcp can do this)
+ pos - the starting offset within the packet data where this
+ protocol starts
+ size - the number of octets in the packet data that this protocol
+ covers.
+The "<field>" tag
+"<field>" tags can have the following attributes:
+ name - the display filter name for the field
+ showname - the label used to describe this field in the protocol
+ tree. This is usually the descriptive name of the protocol,
+ followed by some represention of the value.
+ pos - the starting offset within the packet data where this
+ field starts
+ size - the number of octets in the packet data that this field
+ covers.
+ value - the actual packet data, in hex, that this field covers
+ show - the representation of the packet data ('value') as it would
+ appear in a display filter.
+Some dissectors sometimes place text into the protocol tree, without using
+a field with a field-name. Those appear in PDML as "<field>" tags with no
+'name' attribute, but with a 'show' attribute giving that text.
+Many dissectors label the undissected payload of a protocol as belonging
+to a "data" protocol, and the "data" protocol usually resided inside
+that last protocol dissected. In the PDML, The "data" protocol becomes
+a "data" field, placed exactly where the "data" protocol is in tethereal's
+protocol tree. So, if tethereal would normally show:
++-- Frame
++-- Ethernet
++-- IP
++-- TCP
++-- HTTP
+ |
+ +-- Data
+In PDML, the "Data" protocol would become another field under HTTP:
+ <proto name="frame">
+ ...
+ </proto>
+ <proto name="eth">
+ ...
+ </proto>
+ <proto name="ip">
+ ...
+ </proto>
+ <proto name="tcp">
+ ...
+ </proto>
+ <proto name="http">
+ ...
+ <field name="data" value="........."/>
+ </proto>
+This is a python module which provides some infrastructor for
+Python developers who wish to parse PDML. It is designed to read
+a PDML file and call a user's callback function every time a packet
+is constructed from the protocols and fields for a single packet.
+The python user should import the module, define a callback function
+which accepts one argument, and call the parse_fh function:
+import EtherealXML
+def my_callback(packet):
+ # do something
+fh = open(xml_filename)
+EtherealXML.parse_fh(fh, my_callback)
+# Now that the script has the packet data, do someting.
+The object that is passed to the callback function is an
+EtherealXML.Packet object, which corresponds to a single packet.
+EtherealXML Provides 3 classes, each of which corresponds to a PDML tag:
+ Packet - "<packet>" tag
+ Protocol - "<proto>" tag
+ Field - "<field>" tag
+Each of these classes has accessors which will return the defined attributes:
+ get_name()
+ get_showname()
+ get_pos()
+ get_size()
+ get_value()
+ get_show()
+Protocols and fields can contain other fields. Thus, the Protocol and
+Field class have a "children" member, which is a simple list of the
+Field objects, if any, that are contained. The "children" list can be
+directly accessed by calling users. It will be empty of this Protocol
+or Field contains no Fields.
+Furthemore, the Packet class is a sub-class of the PacketList class.
+The PacketList class provides methods to look for protocols and fields.
+The term "item" is used when the item being looked for can be
+a protocol or a field:
+ item_exists(name) - checks if an item exists in the PacketList
+ get_items(name) - returns a PacketList of all matching items
+General Notes
+Generally, parsing XML is slow. If you're writing a script to parse
+the PDML output of tethereal, pass a read filter with "-R" to tethereal to
+try to reduce as much as possible the number of packets coming out of tethereal.
+The less your script has to process, the faster it will be.
+'tools/msnchat' is a sample Python program that uses EtherealXML to parse PDML.
+Given one or more capture files, it runs tethereal on each of them, providing
+a read filter to reduce tethereal's output. It finds MSN Chat conversations
+in the capture file and produces nice HTML showing the conversations. It has
+only been tested with capture files containing non-simultaneous chat sessions,
+but was written to more-or-less handle any number of simultanous chat