aboutsummaryrefslogtreecommitdiffstats
path: root/doc/README.display_filter
blob: 4e4d060a4d3927b9f067e9407436ad71cafd3c9a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
$Id$

XXX - move this file to epan?

1. How the Display Filter Engine works.

code:
epan/dfilter/* - the display filter engine, including
		scanner, parser, syntax-tree semantics checker, DFVM bytecode
		generator, and DFVM engine.
epan/ftypes/* - the definitions of the various FT_* field types.
epan/proto.c   - proto_tree-related routines

1.1 Parsing text.

The scanner/parser pair read the string representing the display filter
and convert it into a very simple syntax tree.  The syntax tree is very
simple in that it is possible that many of the nodes contain unparsed
chunks of text from the display filter.

1.1 Enhancing the syntax tree.

The semantics of the simple syntax tree are checked to make sure that
the fields that are being compared are being compared to appropriate
values.  For example, if a field is an integer, it can't be compared to
a string, unless a value_string has been defined for that field.

During the process of checking the semantics, the simple syntax tree is
fleshed out and no longer contains nodes with unparsed information.  The
syntax tree is no longer in its simple form, but in its complete form.

1.2 Converting to DFVM bytecode.

The syntax tree is analyzed to create a sequence of bytecodes in the
"DFVM" language.  "DFVM" stands for Display Filter Virtual Machine.  The
DFVM is similar in spirit, but not in definition, to the BPF VM that
libpcap uses to analyze packets.

A virtual bytecode is created and used so that the actual process of
filtering packets will be fast.  That is, it should be faster to process
a list of VM bytecodes than to attempt to filter packets directly from
the syntax tree.  (heh...  no measurement has been made to support this
supposition)

1.3 Filtering.

Once the DFVM bytecode has been produced, it's a simple matter of
running the DFVM engine against the proto_tree from the packet
dissection, using the DFVM bytecodes as instructions.  If the DFVM
bytecode is known before packet dissection occurs, the
proto_tree-related code can be "primed" to store away pointers to
field_info structures that are interesting to the display filter.  This
makes lookup of those field_info structures during the filtering process
faster.

1.4 Display Filter Functions.

You define a display filter function by adding an entry to
the df_functions table in epan/dfilter/dfunctions.c. The record struct
is defined in dfunctions.h, and shown here:

typedef struct {
    char            *name;
    DFFuncType      function;
    ftenum_t        retval_ftype;
    guint           min_nargs;
    guint           max_nargs;
    DFSemCheckType  semcheck_param_function;
} df_func_def_t;

name - the name of the function; this is how the user will call your
    function in the display filter language

function - this is the run-time processing of your function.

retval_ftype - what type of FT_* type does your function return?

min_nargs - minimum number of arguments your function accepts
max_nargs - maximum number of arguments your function accepts

semcheck_param_function - called during the semantic check of the
    display filter string.

DFFuncType function
-------------------
typedef gboolean (*DFFuncType)(GList *arg1list, GList *arg2list, GList **retval);

The return value of your function is a gboolean; TRUE if processing went fine,
or FALSE if there was some sort of exception.

For now, display filter functions can accept a maximum of 2 arguments.
The "arg1list" parameter is the GList for the first argument. The
'arg2list" parameter is the GList for the second argument. All arguments
to display filter functions are lists. This is because in the display
filter language a protocol field may have multiple instances. For example,
a field like "ip.addr" will exist more than once in a single frame. So
when the user invokes this display filter:

    somefunc(ip.addr) == TRUE

even though "ip.addr" is a single argument, the "somefunc" function will
receive a GList of *all* the values of "ip.addr" in the frame.

Similarly, the return value of the function needs to be a GList, since all
values in the display filter language are lists. The GList** retval argument
is passed to your function so you can set the pointer to your return value.

DFSemCheckType
--------------
typedef void (*DFSemCheckType)(int param_num, stnode_t *st_node);

For each parameter in the syntax tree, this function will be called.
"param_num" will indicate the number of the parameter, starting with 0.
The "stnode_t" is the syntax-tree node representing that parameter.
If everything is okay with the value of that stnode_t, your function
does nothing --- it merely returns. If something is wrong, however,
it should THROW a TypeError exception.