Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
This parameter was introduced as a safeguard for bugs
that generate an unbounded string but its utility for
that purpose is doubtful and the way it is being used
creates problems with invalid truncation of UTF-8
strings.
Rename wmem_strbuf_sized_new() with a better name.
|
|
|
|
The original error was identified in a comment on !8583
|
|
hex_str_to_bytes_encoding() consumes pairs of hex digits (and
optional separator) to turn into bytes. It can return a pointer
to the character after the last digit consumed. Don't advance
the end pointer after a single unpaired digit that is not consumed
as part of the hex string returned.
tvb_get_string_bytes() can pass back the end offset. If conversion
fails, return the initial offset instead of zero to make repeated
calls easier in cases where the full length is not decoded due to
errors.
Relatedly, no dissector currently uses this return value, because
it's not useful currently.
|
|
This fixes windows build errors reported in comments on !8541
|
|
This avoids having general-purpose decoding happening in
non-DLL-exported functions defined in a dissector for #18478,
and removes unused functions and avoids duplicate decoding.
This also removes unnecessary early exit conditions for #18145.
Unit test cases for varint decoding are added to verify this.
|
|
Rename tvb_get_nstringz0() to tvb_get_raw_bytes_as_stringz()
to reflect the fact that this function does not return
a string (UTF-8 internal text string).
Remove tvb_get_stringz() because it is unused and just seems
dangerous.
|
|
Ensure that FTP doesn't add invalid strings to the tree or columns.
Also allow UTF-8 pathnames to work.
According to RFC 2640, FTP supports UTF-8 for pathnames (and it
MUST be supported even if the other side does not advertise support
for UTF-8, unless a different character set has been explicitly
configured, which is out of scope of the RFCs, and we don't have
such a preference.) So in general interpret strings as UTF-8, not
ASCII.
Reduce the use of tvb_get_ptr by using functions directly on the
original tvb and offset. This also happens to be more compliant
with RFC 2640 when getting the token lengths. (RFC 2640 states
that implementations MUST assume that there is only one space between
a command and the pathname, and treat additional spaces as part of
the pathname instead of skipping them. tvb_get_token_len() does not
skip trailing spaces, but get_token_len() does.)
The only place that still uses tvb_get_ptr is when processing a PWD
command, because it has to deal with the double quote escaping as
a custom encoding.
Add a tvb_ascii_isdigit function.
Fix #18439.
|
|
Replace several instances in which a REPLACEMENT CHARACTER was being
appended to a wmem_strbuf with a call to
wmem_strbuf_append_unichar_repl().
This reduces the number of explicit 0x00fffd or 0xfffd or... in the
code.
|
|
A reported length less than a captured length is bogus, as you cannot
capture more data than there is in a packet.
Fixes #18313.
|
|
Fix tvb_find_guint16 when there is a partial match (first byte
matches but second byte does not) in the buffer before an
actual match.
The function claims that it takes negative offsets and a negative
maxlength value (for "to the end of the buffer.") Convert those to
absolute offsets and limits at the start of the function rather than
repeatedly having special checks for negatives.
Fix the "number of bytes searched so far" calculation, which was only
correct for negative offsets (but only used when there was a partial
match.)
|
|
Add BASE_SHOW_UTF_8_PRINTABLE and related function tvb_utf_8_isprint
for supporting fields of bytes that are "maybe UTF-8" (default or
SHOULD be UTF-8 but could be something else, with no encoding indicator),
such as SSID fields in IEEE 802.11 (See #16208), certain OctetString
fields in Diameter or PFCP, and other places where
BASE_SHOW_ASCII_PRINTABLE is currently used. Fix #5307
|
|
tvb_ascii_isprint like other tvb_ functions accepts -1 as a parameter,
meaning "to the end of the tvb". Get the real length for the loop.
|
|
This assert will notify the higher layers that the dissector needs
to be fixed. ieee1722 and zbee-zcl dissectors have been updated to
prevent such a call.
Ref: #17882.
|
|
|
|
|
|
|
|
Replace:
g_snprintf() -> snprintf()
g_vsnprintf() -> vsnprintf()
g_strdup_printf() -> ws_strdup_printf()
g_strdup_vprintf() -> ws_strdup_vprintf()
This is more portable, user-friendly and faster on platforms
where GLib does not like the native I/O.
Adjust the format string to use macros from intypes.h.
|
|
Add the ISO 8601 Basic date time format as another string time
option. This could be used for e.g. ASN.1 GeneralizedTime.
Add tests for it.
|
|
Add support in iso8601_to_nstime for the ISO 8601 Basic date/time
format that lacks the - and : separators.
|
|
Move epan_memmem() and epan_strcasestr() to wsutil/str_util.
Rename to ws_memmem() and ws_strcasestr(). Add compile time
check for a system implementation and use that if available.
We invoke those functions using a wrapper to avoid exposing
_GNU_SOURCE outside of the implementation.
|
|
Have tvb_get_string_time use iso8601_to_nstime for
ENC_ISO_8601_DATE_TIME (which seems to be the only time in a string
encoding any built in dissector actually uses, in syslog). It is
strictly superior; among other things it handles fractional seconds.
Also, tvbuff.c does not use strptime, so remove that include.
|
|
All fragment errors are bounds errors that go past the contained length,
but they do not necessarily involve going past the reported length,
so the checks for FragmentBoundsError should reflect that.
With some forms of reassembly, like IP fragmentation, we don't know how
big the PDU/reassembled packet is until reassembly is complete, so we
probably use tvb_new_subset_remaining() to create fragments and the tvb's
reported length is equal to its contained length. In these cases
ReportedBoundsError would be otherwise thrown, except when the existing
checks for FragmentBoundsError intervene.
However, with other forms of reassembly, like various PDUs carried over TCP,
we know the total PDU length, so we use tvb_new_subset_length[_caplen](),
setting the proper reported length, but not changing the contained
length when reassembly is not performed. In those cases, a bounds error
that occurs due to lack of reassembly is otherwise a ContainedBoundsError,
not a ReportedBoundsError.
In both cases, a bounds error caused by an unreassembled fragment should
be a FragmentBoundsError for the existing reasons. It is not necessarily
a malformed packet (to the extent reassembly is not performed because of a
malformed error elsewhere, that should be reported separately) and can
likely be avoided by changing preferences (e.g., turning reassembly
preferences on, turning off checksum verification, etc.) Otherwise it
is probably a dissector bug.
|
|
Implement little endian support for tvb_get_bits family of functions.
The big/little endian refers to bit numbering within an octet. In big
endian, the most significant bit is considered bit 0, while in little
endian the least significant bit is considered bit 0.
Add encoding parameters to proto tree bits format family functions.
Specify ENC_BIG_ENDIAN in all dissectors using these functions except in
USB HID that requires ENC_LITTLE_ENDIAN to work correctly.
When formatting bits values, always display most significant bit on the
leftmost position regardless of the encoding. This results in no gaps
between octets and makes the displayed value comprehensible.
Close #4478
Fix #17014
|
|
A few of them just needed scratch memory, so allocate and free it
manually after doing any exception-raising checks.
A few others were returning memory, and needed conversion to accept a
wmem scope argument.
|
|
|
|
It has been added since its length is signed, while the underlying
bytes_to_str uses a size_t, causing an unwanted cast. Basically
passing a len < 0 is pointless.
|
|
|
|
It is to tvb_reported_length_remaining() as
tvb_ensure_captured_length_remaining() is to
tvb_captured_length_remaining() - it throws an exception if the offset
is out of range.
(Note that an offset that's just past the end of the {reported,
captured} data is *not* out of range, it just means that there is no
data remaining. Anything *past* that is out of range and thus invalid.)
|
|
Process the characters entirely ourselves; that way, we don't have to
worry about tvb_get_string_enc(..., ENC_ASCII) mangling label length
values, can convert non-ASCII characters in labels to the Unicode
REPLACEMENT CHARACTER, and can do bounds checks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Add various other encodings that differ from ASCII in the ISO/IEC 646
invariant region to the reject list for validate_single_byte_ascii_encoding()
|
|
Add support internally to using iconv (always present with glib) to convert
strings from various encodings to UTF-8 (using REPLACEMENT CHARACTER as
recommended), and use that to support GB 18030 and EUC-KR. Replace call
directly to iconv in ANSI 637 for EUC-KR to new API. Update comments
and documentation around character encodings. It is possible to replace
the calls to iconv with an internal decoder later. Tested on Linux and
on Windows (including with illegal characters). Closes #16630.
|
|
Implement the Unicode Standard "best practices" for replacing ill-formed
sequences with the Unicode REPLACEMENT CHARACTER. Add wmem_strbuf_append_len
for appending strings with embedded null characters. Clarify why
wmem_strbuf_grow() doesn't always ensure that there's enough room for
a new string, and short-circuit some tests there. Related to #14948
|
|
Add an encoding for "unpacked" 3GPP TS 23.038 7-bit strings, in which
each code position is in a byte of its own, rather than with the code
positions packed into 7 bits. Rename the packed encoding to explicitly
indicate that it's packed.
Add an encoding for ETSI TS 102 221 Annex A strings.
Use the new encodings.
|
|
Change-Id: I2fad824ca417dcd089fabfdf06f28529c7ee9e87
Signed-off-by: Filipe Laíns <lains@archlinux.org>
Reviewed-on: https://code.wireshark.org/review/37949
Petri-Dish: Anders Broman <a.broman58@gmail.com>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
|
|
Group them by the data types for which they're used, starting with the
byte-order definitions which (with the inclusion of ENC_NA) are used
with all types.
Put all the ones used for strings together, starting with the character
encodings, with the Zigbee flag and the flags for "this is a string but
we're going to interpret it as a byte array or time stamp".
Make ENC_CHARENCODING_MASK equal to ENC_STR_MASK; no, there's no reason
for ENC_STR_MASK to replace ENC_CHARENCODING_MASK - the opposite should
happen, as ENC_CHARENCODING_MASK at least specifies what the bits set in
it are used for, namely character encodings. If all #defines for
strings should have _STR_ in them, start with the character encoings.
Change-Id: I072420f313086153b4ea4034911fc293453dea00
Reviewed-on: https://code.wireshark.org/review/36962
Petri-Dish: Guy Harris <gharris@sonic.net>
Tested-by: Petri Dish Buildbot
Reviewed-by: Guy Harris <gharris@sonic.net>
|
|
Add some ENC_ values for various flavors of packed BCD, and use that
instead of explicitly calling tvb_bcd_dig_to_wmem_packet_str() and
adding the result.
Change-Id: I07511d9d09c9231b610c121cd6ffb3b16fb017a9
Reviewed-on: https://code.wireshark.org/review/36952
Reviewed-by: Guy Harris <gharris@sonic.net>
|
|
tvb_find_line_end(), unlike a tvb_find_guint8() looking for an LF,
returns a length that *doesn't* include the line ending, *regardless* of
whether the line ends with CR-LF or just LF, so the query string we
extract is just the query, without any of the line ending.
Update some comments while we're at it to note that the "next_offset"
pointer argument to tvb_find_line_end() and tvb_find_line_end_unquoted()
can be NULL, in which case the offset *past* the line ending isn't
returned. (We pass tvb_find_line_end() NULL in the aforementioned call,
because, in that particular case, we don't care about the next line.)
Change-Id: I1c9746e32c61a79f8cb636d577a2e14a07ecab17
Reviewed-on: https://code.wireshark.org/review/35566
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot
Reviewed-by: Guy Harris <guy@alum.mit.edu>
|
|
Add "native" support for the "zig-zag" version of a varint in proto.[ch] and
tvbuff.[ch]. Convert the use of varint in the KAFKA dissector to use the (new)
"native" API.
Ping-Bug: 15988
Change-Id: Ia83569203877df8c780f4f182916ed6327d0ec6c
Reviewed-on: https://code.wireshark.org/review/34386
Petri-Dish: Alexis La Goutte <alexis.lagoutte@gmail.com>
Tested-by: Petri Dish Buildbot
Reviewed-by: Alexis La Goutte <alexis.lagoutte@gmail.com>
Reviewed-by: Anders Broman <a.broman58@gmail.com>
|
|
Change all wireshark.org URLs to use https.
Fix some broken links while we're at it.
Change-Id: I161bf8eeca43b8027605acea666032da86f5ea1c
Reviewed-on: https://code.wireshark.org/review/34089
Reviewed-by: Guy Harris <guy@alum.mit.edu>
|
|
That's what the remaining calls to tvb_get_nstringz() and
tvb_get_nstringz0() are being used to do, even though those routines
were not intended for that purpose - the calls are extracting from a
text protcool, meaning that the strings are *not* null-terminate in the
packet.
Strings - even null-terminated ones - should, in almost all cases, be
extracted by tvb_get_string_enc() or routines that call it, so that an
encoding is specified. In the few cases where we're fetching strings
only to be compared to ASCII constants, or to parse as numbers, we can
get away with this.
Change-Id: I29f0532902c4ade2207de7f06db69c32eafd4132
Reviewed-on: https://code.wireshark.org/review/34072
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot
Reviewed-by: Guy Harris <guy@alum.mit.edu>
|
|
The "Basic code table" in ISO 646 is mostly ASCII, but some code points
either 1) have more than one glyph that can be assigned to them or 2)
have no glyph assigned to them. National versions choose one of the two
glyphs for the code points in group 1) and assign specific glyphs to the
code points in group 2); the International Reference Version assigns the
same glyphs to those code points as does ASCII.
For the "Basic code table" encoding, we map the code points in groups 1)
and 2) to a REPLACEMENT CHARACTER; additional encodings can be added for
the national versions.
Add ENC_ISO_646_IRV (International Reference Version) as an alias for
ENC_ASCII.
Expand some comments, and add some comments, while we're at it.
Change-Id: I4f1b5e426ec193775e919731c5cae1224dc65115
Reviewed-on: https://code.wireshark.org/review/33941
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot
Reviewed-by: Guy Harris <guy@alum.mit.edu>
|
|
Clean up some comments while we're at it.
Change-Id: I0cd014bf1d1e7dc740eac1721d5466377938655f
Reviewed-on: https://code.wireshark.org/review/33939
Reviewed-by: Guy Harris <guy@alum.mit.edu>
|