aboutsummaryrefslogtreecommitdiffstats
path: root/doc/README.binarytrees
blob: 7e0fc3652a6dbfa4dab73c0cde29f425a073c004 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
$Id$

1. Introduction

Binary trees are a well known and popular device in computer science to handle
storage of object based on a search key or identity.
One particular class of binary trees are Red/Black trees which have nice 
properties such as being self-balanced.
Such trees are often very fast for accessing data, and may average O(log(n))
for lookups, compared to linked lists that are of order O(n).

Benefits of using binary trees are that they are incredibly fast for 
accessing data and they scale very well with good characteristics even to
very large numbers of objects.

Wireshark provides its own version of red black binary trees designed in 
particular to be easy to use and to eliminate most of the memory management
often associated with such trees.

The trees supported by wireshark are currently all created using SEasonal 
storage which means that when you load a new trace into wireshark, the SEasonal
memory management will automatically release every single byte of data
associated with the tree.

Please see README.malloc for a description of EPhemeral and SEasonal storage.


2. Basic Usage
For most users it will be sufficient to only know and use three functions
emem_tree_t *se_tree_create(int type, char *name);
void se_tree_insert32(emem_tree_t *se_tree, guint32 key, void *data);
void *se_tree_lookup32(emem_tree_t *se_tree, guint32 key);


2.1 se_tree_create(int type, char *name);
se_tree_create() is used to initialize a tree that will be automatically 
cleared and reset every time wireshark is resetting all SEasonal storage.
That is every time you load a new capture file into wireshark or when
you rescan the entire capture file from scratch.

Name is just a literal text string and serves no other purpose than making
debugging of the trees easier. Specify a name here that uniquely identifies
both the protocol you create the tree for and its purpose.


This function is most likely called once from your 
proto_register_<yourprotocol>() function.

Example (from packet-tcp.c):
#include <epan/emem.h>
...
static emem_tree_t *tcp_pdu_time_table = NULL;
...
void proto_register_...(void) {
   ...
   tcp_pdu_time_table=se_tree_create(EMEM_TREE_TYPE_RED_BLACK, "PROTO_mytree");
   ...
}

That is how easy it is to create a binary tree. You only need to create it
once when wireshark starts and the tree will remain there until you exit
wireshark.  Every time a new capture is loaded, all nodes allocated to the
tree are freed automatically and the tree is reset without you having to do
anything at all.


2.2 se_tree_insert32(emem_tree_t *se_tree, guint32 key, void *data);
This function is used to add a new node into the tree. 
This function is used for such trees where you always use a guint32 as the 
identifier/key for the node. Most trees you want to use are likely in this 
category.

As data you should specify a pointer to the data structure you want to be
able to retrieve later when you look for that same key.

NOTE: If you insert a node to a key that already exists in the tree
this function will allow you to do that. It will just drop the old pointer
to data and replace it with the new one you just provided.
This should not be a problem as long as the old and the new data blobs
are se_allocated() since you cant free() such memory explicitly anyway
and the old one will be release automatically anyway when the SEasonal
system reclaims all the SE data.

NOTE: It is a good idea to only provide data that point to blobs allocated
by se_alloc(). By doing that you will have a tree where the tree and all
the data pointed to will be automatically released by the system at the
same time.  This is very neat and makes it real difficult to have memory
leaks in your code.

NOTE: When you insert items in the tree, it is very likely that you only
want to add any data to the tree during the very first time you process
a particular packet.
Wireshark may reprocess the same packet multiple times afterward by the user
clicking on the packet or for other reasons. 
You probably DO want to protect the insert call within an if statement such
as

Example:
    /* only TRUE first time we process this packet*/
    if(!pinfo->fd->flags.visited){ 
       ...
       data=se_alloc(...);
       data->...
       ...
       se_tree_insert32(se_tree, key, data);
       ...
    }

Please do think about how and when you want to add items to the tree.
If you don't think you should use the if statement to protect the insert
please reconsider and think it through one extra time.


2.3 se_tree_lookup32(emem_tree_t *se_tree, guint32 key);
This function is used to read back from the tree the data value that is 
associated with this key.
If no such node was found the function will return NULL.

Example:
    ...
    data_structure = se_tree_lookup32(se_tree, key);
    if(data_structure){
      ...
    }

Don't forget to check that the returned pointer is non-NULL before you 
dereference it, please.

2.4 se_tree_lookup32_le(emem_tree_t *se_tree, guint32 key);
The function will return the node that has the largest key that is
equal to or smaller than the search key, or NULL if no such key was found.


Simple as that, can it be easier?


3. Advanced Usage
This will list some of the more unconventional and hopefully rarely used 
functions.

3.1 se_tree_create_non_persistent(int type, char *name);
Sometimes you don't want a tree that is automatically reset to become an empty
tree. You might want a tree that is completely destroyed once the next
capture file is read and even the pointer to the tree itself becomes invalid.

This would most likely be the case when you do NOT want a global tree
but instead a tree that is held inside some other SE allocated structure.
So that when that encapsulating structure is released the entire tree will
disappear completely as well.

Maybe you have a normal tree to track all conversations for your protocol
and for each conversation you se_alloc() a structure to maintain some 
data about that conversation. Assume you want to add to that structure
another binary tree   a binary tree that is private to that structure/
conversation to hold some other data.
In that case, this is the function you want.


3.2 se_tree_insert32_array / se_tree_lookup32_array
typedef struct _emem_tree_key_t {
	guint32 length;			/*length in guint32 words */
	guint32 *key;
} emem_tree_key_t;
void se_tree_insert32_array(emem_tree_t *se_tree, emem_tree_key_t *key, void *data);
void *se_tree_lookup32_array(emem_tree_t *se_tree, emem_tree_key_t *key);

NOTE: the *key parameter taken by these functions WILL be modified by these
functions so if you want to reuse the key for a second call you will have
to reinitialize key.


These functions are used to insert and lookup a tree where nodes are NOT 
indexed by a single guint32 but more like an array of guint32 elements.

These functions take as key an array of guint32 vectors : emem_tree_key_t.
The functions will use vector by vector to search further down the tree 
until an array element where length==0 is found indicating the end of the 
array.

NOTE: you MUST terminate the emem_tree_key_t array by {0, NULL}
If you forget to do this wireshark will immediately crash.

NOTE: length indicates the number of guint32 values in the vector, not the
number of bytes.

If your keys are always of exactly the same length, always, and you are willing
to bet that there can never ever be a key of a different length you can
get away with a simple :
  emem_tree_key_t key[2];
  key[0].length= LEN;
  key[0].key= KEY;
  key[1].length=0;
  key[1].key=NULL;
  se_tree_insert32_array(se_tree, &key[0], data);
But IF you would accidentally pass a key with a different number of guint32s
in its vectors to this same se_tree  you will crash.
Don't use key like this. Please.


Instead use this simple workaround to make it all safe :
Specify the first index as 1 guint32 holding the total number of guints in the
rest of the key.
See NFS that does this to handle file handles that may be of different lengths
in the same tree :
  emem_tree_key_t fhkey[3];
  guint32 tmplen;

  tmplen = <length of filehandle/4>;
  fhkey[0].length = 1;
  fhkey[0].key    = &tmplen;
  fhkey[1].length = tmplen;
  fhkey[1].key    = <filehandle>;
  fhkey[2].length = 0;
  fhkey[2].key    = NULL;

( /4 since we specify the length here in number of guint32 not number of bytes)

3.3 se_tree_lookup32_array_le(emem_tree_t *se_tree, emem_tree_key_t *key);
Much like the se_tree_lookup32_le, this function will return the node that has
the largest key that is equal to or smaller than the search key, or NULL if
no such key was found.

When using _array_ trees, the tree that is created is a "tree of trees" where the
last leaf has the indexed data.  Thus if you have 3 keys in the emme_tree_key_t
structure, the "1st" tree indexes key[0].  Each node in this tree points to a
tree indexed using key[1].  The nodes of the final tree will contain the data.

This construction must be taken into account when using se_tree_lookup32_array_le.
The program should verify that the node returned contains data that is expected.

3.4 se_tree_insert_string / se_tree_lookup_string
void emem_tree_insert_string(emem_tree_t* h, const gchar* k, void* v, guint32 flags);
void* emem_tree_lookup_string(emem_tree_t* h, const gchar* k, guint32 flags);
These functions are essentially wrappers for se_tree_insert32_array and
se_tree_lookup32_array, tailored to text strings. They extend the text string
into an array key and use that to key the se_tree_insert32_array and
se_tree_lookup32_array functions.
In order to support text string in a case insensitive way add the
EMEM_TREE_STRING_NOCASE flag. This will uppercase all string data before using
it as key data.