aboutsummaryrefslogtreecommitdiffstats
path: root/epan/ftypes/ftype-bytes.c
diff options
context:
space:
mode:
authorPeter Wu <peter@lekensteyn.nl>2019-02-08 17:20:37 +0100
committerAnders Broman <a.broman58@gmail.com>2019-02-11 05:08:53 +0000
commit0ca65a66f425c8beaa1af3deb3b84c2b16cffb55 (patch)
tree4015973a2b1cd2758be093183711f7fd9b733489 /epan/ftypes/ftype-bytes.c
parentf2dc64e9b8d6cbd3dd5f7cda03596abd1c0ceea7 (diff)
Fix crash when using the "matches" operator on non-UTF-8 data
GRegex is a thin wrapper around PCRE. Inputs (patterns and subjects) are assumed to be UTF-8 by default (unless G_REGEX_RAW is set). If the subject is not valid UTF-8, normally pcre_exec will immediately return a failure. However, as GLib sets PCRE_NO_UTF8_CHECK when G_REGEX_RAW is given, pcre_exec() will skip the safety check and crash instead. Fix this by always assuming raw byte patterns. Regression risk: patterns such as `ö.ï` will no longer match `öñï` since `ñ` is a multi-byte sequence. Patterns such as `(GET|POST) /` remain functional though. Bug: 14905 Change-Id: I6450bb83f565d377f82a5dbb01690c5f49acd96f Reviewed-on: https://code.wireshark.org/review/31935 Petri-Dish: Peter Wu <peter@lekensteyn.nl> Tested-by: Petri Dish Buildbot Reviewed-by: Anders Broman <a.broman58@gmail.com>
Diffstat (limited to 'epan/ftypes/ftype-bytes.c')
-rw-r--r--epan/ftypes/ftype-bytes.c24
1 files changed, 0 insertions, 24 deletions
diff --git a/epan/ftypes/ftype-bytes.c b/epan/ftypes/ftype-bytes.c
index c1d57f0bbd..9bfc37b637 100644
--- a/epan/ftypes/ftype-bytes.c
+++ b/epan/ftypes/ftype-bytes.c
@@ -665,30 +665,6 @@ cmp_matches(const fvalue_t *fv_a, const fvalue_t *fv_b)
if (! regex) {
return FALSE;
}
- /*
- * XXX - do we want G_REGEX_RAW or not?
- *
- * If we're matching against a string, we don't want it (and
- * we want the string value encoded in UTF-8 - and, if it can't
- * be converted to UTF-8, because it's in a character encoding
- * that doesn't map every possible byte sequence to Unicode (and
- * that includes strings that are supposed to be in UTF-8 but
- * that contain invalid UTF-8 sequences!), treat the match as
- * failing.
- *
- * If we're matching against binary data, and matching a binary
- * pattern (e.g. "0xfa, 3 or more 0xff, and 0x37, in order"),
- * we'd want G_REGEX_RAW. If we're matching a text pattern,
- * it's not clear *what* the right thing to do is - if they're
- * matching against a pattern containing non-ASCII characters,
- * they might want it to match in whatever encoding the binary
- * data is, but Wireshark might not have a clue what that
- * encoding is. In addition, it's not clear how to tell
- * whether a pattern is "binary" or not, short of having
- * a different (non-PCRE) syntax for binary patterns.
- *
- * So we don't use G_REGEX_RAW for now.
- */
return g_regex_match_full(
regex, /* Compiled PCRE */
(char *)a->data, /* The data to check for the pattern... */