aboutsummaryrefslogtreecommitdiffstats
path: root/doc/README.xml-output
diff options
context:
space:
mode:
Diffstat (limited to 'doc/README.xml-output')
-rw-r--r--doc/README.xml-output206
1 files changed, 206 insertions, 0 deletions
diff --git a/doc/README.xml-output b/doc/README.xml-output
new file mode 100644
index 0000000000..df3d77e920
--- /dev/null
+++ b/doc/README.xml-output
@@ -0,0 +1,206 @@
+Protocol Dissection in XML Format
+=================================
+$Id: README.xml-output,v 1.1 2003/12/06 06:09:12 gram Exp $
+Copyright (c) 2003 by Gilbert Ramirez <gram@alumni.rice.edu>
+
+
+Tethereal has the ability to print its protocol dissection in an
+XML format, by using the "-Tpdml -V" options. Similar functionality
+could be put into the "Print" dialog of Ethereal, but that work has
+not been done yet.
+
+The XML that tethereal produces follows the Packet Details Markup
+Language (PDML) specified by the group at the Politecnico Di Torino
+working on Analyzer. The specification can be found at:
+
+http://analyzer.polito.it/30alpha/docs/dissectors/PDMLSpec.htm
+
+A related XML format, the Packet Summary Markup Language (PSML), is
+also defined by the Analyzer group to provide packet summary information.
+The PSML format is not documented in a publicly-available HTML document,
+but its format is simple. Some day it may be added to tethereal so
+that "-Tpsml" (without "-V") would produce PSML.
+
+One wonders if the "-T" option should read "-Txml" instead of "-Tpdml"
+(and in the future, "-Tpsml"), but if tethereal was required to produce
+another XML-based format of its protocol dissection, then "-Txml" would
+be ambiguous.
+
+PDML
+====
+The PDML that tethereal produces is known not to be loadable into Analyzer.
+It causes Analyzer to crash. As such, the PDML that tethereal produces
+is be labled with a version number of "0", which means that the PDML does
+not fully follow the PDML spec. Furthemore, a creator attribute in the
+"<pdml>" tag gives the version number of [t]ethereal that produced the PDML.
+In that way, as the PDML produced by tethereal matures, but still does not
+meet the PDML spec, scripts can make intelligent decisions about how to
+best parse the PDML, based on the "creator" attribute.
+
+A PDML file is delimited by a "<pdml>" tag.
+A PDML file contains multiple packets, denoted by the "<packet>" tag.
+A packet will contain multiple protocols, denoted by the "<proto>" tag.
+A protocol might contain one or more fields, denoted by the "<field>" tag.
+
+A pseudo-protocol named "geninfo" is produced, as is required by the PDML
+spec, and printed as the first protocol after the opening "<packet>" tag.
+Its information comes from ethereal's "frame" protocol, which servers
+the similar purpose of storing packet meta-data. Both "geninfo" and
+"frame" protocols are provided in the PDML output.
+
+The "<pdml>" tag
+================
+Example:
+ <pdml version="0" creator="ethereal/0.9.17">
+
+The creator is "ethereal" (i.e., the "ethereal" engine. It will always say
+"ethereal", not "tethereal") version 0.9.17.
+
+
+The "<proto>" tag
+=================
+"<proto>" tags can have the following attributes:
+
+ name - the display filter name for the protocol
+ showname - the label used to describe this protocol in the protocol
+ tree. This is usually the descriptive name of the protocol,
+ but it can be modified by dissectors to include more data
+ (tcp can do this)
+ pos - the starting offset within the packet data where this
+ protocol starts
+ size - the number of octets in the packet data that this protocol
+ covers.
+
+The "<field>" tag
+=================
+"<field>" tags can have the following attributes:
+
+ name - the display filter name for the field
+ showname - the label used to describe this field in the protocol
+ tree. This is usually the descriptive name of the protocol,
+ followed by some represention of the value.
+ pos - the starting offset within the packet data where this
+ field starts
+ size - the number of octets in the packet data that this field
+ covers.
+ value - the actual packet data, in hex, that this field covers
+ show - the representation of the packet data ('value') as it would
+ appear in a display filter.
+
+Some dissectors sometimes place text into the protocol tree, without using
+a field with a field-name. Those appear in PDML as "<field>" tags with no
+'name' attribute, but with a 'show' attribute giving that text.
+
+Many dissectors label the undissected payload of a protocol as belonging
+to a "data" protocol, and the "data" protocol usually resided inside
+that last protocol dissected. In the PDML, The "data" protocol becomes
+a "data" field, placed exactly where the "data" protocol is in tethereal's
+protocol tree. So, if tethereal would normally show:
+
++-- Frame
+|
++-- Ethernet
+|
++-- IP
+|
++-- TCP
+|
++-- HTTP
+ |
+ +-- Data
+
+In PDML, the "Data" protocol would become another field under HTTP:
+
+<packet>
+ <proto name="frame">
+ ...
+ </proto>
+
+ <proto name="eth">
+ ...
+ </proto>
+
+ <proto name="ip">
+ ...
+ </proto>
+
+ <proto name="tcp">
+ ...
+ </proto>
+
+ <proto name="http">
+ ...
+ <field name="data" value="........."/>
+ </proto>
+</packet>
+
+
+
+tools/EtherealXML.py
+====================
+This is a python module which provides some infrastructor for
+Python developers who wish to parse PDML. It is designed to read
+a PDML file and call a user's callback function every time a packet
+is constructed from the protocols and fields for a single packet.
+
+The python user should import the module, define a callback function
+which accepts one argument, and call the parse_fh function:
+
+------------------------------------------------------------
+import EtherealXML
+
+def my_callback(packet):
+ # do something
+
+fh = open(xml_filename)
+EtherealXML.parse_fh(fh, my_callback)
+
+# Now that the script has the packet data, do someting.
+------------------------------------------------------------
+
+The object that is passed to the callback function is an
+EtherealXML.Packet object, which corresponds to a single packet.
+EtherealXML Provides 3 classes, each of which corresponds to a PDML tag:
+
+ Packet - "<packet>" tag
+ Protocol - "<proto>" tag
+ Field - "<field>" tag
+
+Each of these classes has accessors which will return the defined attributes:
+
+ get_name()
+ get_showname()
+ get_pos()
+ get_size()
+ get_value()
+ get_show()
+
+Protocols and fields can contain other fields. Thus, the Protocol and
+Field class have a "children" member, which is a simple list of the
+Field objects, if any, that are contained. The "children" list can be
+directly accessed by calling users. It will be empty of this Protocol
+or Field contains no Fields.
+
+Furthemore, the Packet class is a sub-class of the PacketList class.
+The PacketList class provides methods to look for protocols and fields.
+The term "item" is used when the item being looked for can be
+a protocol or a field:
+
+ item_exists(name) - checks if an item exists in the PacketList
+ get_items(name) - returns a PacketList of all matching items
+
+
+General Notes
+=============
+Generally, parsing XML is slow. If you're writing a script to parse
+the PDML output of tethereal, pass a read filter with "-R" to tethereal to
+try to reduce as much as possible the number of packets coming out of tethereal.
+The less your script has to process, the faster it will be.
+
+'tools/msnchat' is a sample Python program that uses EtherealXML to parse PDML.
+Given one or more capture files, it runs tethereal on each of them, providing
+a read filter to reduce tethereal's output. It finds MSN Chat conversations
+in the capture file and produces nice HTML showing the conversations. It has
+only been tested with capture files containing non-simultaneous chat sessions,
+but was written to more-or-less handle any number of simultanous chat
+sessions.