aboutsummaryrefslogtreecommitdiffstats
path: root/epan/tvbuff.c
AgeCommit message (Collapse)AuthorFilesLines
2022-12-21DECT-NWK: Add basic support for DECT charsetsBernhard Dick1-0/+49
2022-12-03wmem: Remove strbuf max size parameterJoão Valverde1-1/+1
This parameter was introduced as a safeguard for bugs that generate an unbounded string but its utility for that purpose is doubtful and the way it is being used creates problems with invalid truncation of UTF-8 strings. Rename wmem_strbuf_sized_new() with a better name.
2022-11-22Fix some issues seen with cppcheckMartin Mathieson1-3/+3
2022-10-24epan: Fix build errors about try/catch block on some compilersBrian Sipos1-1/+1
The original error was identified in a comment on !8583
2022-10-21epan: Fix the end offsets for hex string itemsJohn Thacker1-1/+1
hex_str_to_bytes_encoding() consumes pairs of hex digits (and optional separator) to turn into bytes. It can return a pointer to the character after the last digit consumed. Don't advance the end pointer after a single unpaired digit that is not consumed as part of the hex string returned. tvb_get_string_bytes() can pass back the end offset. If conversion fails, return the initial offset instead of zero to make repeated calls easier in cases where the full length is not decoded due to errors. Relatedly, no dissector currently uses this return value, because it's not useful currently.
2022-10-20TCPCL: Clamp and indicate lengths too large for Wireshark to handleBrian Sipos1-1/+1
This fixes windows build errors reported in comments on !8541
2022-10-19epan: centralize SDNV processing along other similar varint typesBrian Sipos1-0/+23
This avoids having general-purpose decoding happening in non-DLL-exported functions defined in a dissector for #18478, and removes unused functions and avoids duplicate decoding. This also removes unnecessary early exit conditions for #18145. Unit test cases for varint decoding are added to verify this.
2022-10-18epan: Rename tvb_get_nstringz0()João Valverde1-30/+3
Rename tvb_get_nstringz0() to tvb_get_raw_bytes_as_stringz() to reflect the fact that this function does not return a string (UTF-8 internal text string). Remove tvb_get_stringz() because it is unused and just seems dangerous.
2022-10-16ftp: deal with UTF-8John Thacker1-0/+16
Ensure that FTP doesn't add invalid strings to the tree or columns. Also allow UTF-8 pathnames to work. According to RFC 2640, FTP supports UTF-8 for pathnames (and it MUST be supported even if the other side does not advertise support for UTF-8, unless a different character set has been explicitly configured, which is out of scope of the RFCs, and we don't have such a preference.) So in general interpret strings as UTF-8, not ASCII. Reduce the use of tvb_get_ptr by using functions directly on the original tvb and offset. This also happens to be more compliant with RFC 2640 when getting the token lengths. (RFC 2640 states that implementations MUST assume that there is only one space between a command and the pathname, and treat additional spaces as part of the pathname instead of skipping them. tvb_get_token_len() does not skip trailing spaces, but get_token_len() does.) The only place that still uses tvb_get_ptr is when processing a PWD command, because it has to deal with the double quote escaping as a custom encoding. Add a tvb_ascii_isdigit function. Fix #18439.
2022-10-15Use wmem_strbuf_append_unichar_repl() to append a REPLACEMENT CHARACTER.Guy Harris1-4/+1
Replace several instances in which a REPLACEMENT CHARACTER was being appended to a wmem_strbuf with a call to wmem_strbuf_append_unichar_repl(). This reduces the number of explicit 0x00fffd or 0xfffd or... in the code.
2022-09-03Fix bogus tvbuffs to make sure reported length >= captured length.Guy Harris1-0/+15
A reported length less than a captured length is bogus, as you cannot capture more data than there is in a packet. Fixes #18313.
2022-09-03epan: Fix tvb_find_guint16 with previous partial matchesJohn Thacker1-7/+22
Fix tvb_find_guint16 when there is a partial match (first byte matches but second byte does not) in the buffer before an actual match. The function claims that it takes negative offsets and a negative maxlength value (for "to the end of the buffer.") Convert those to absolute offsets and limits at the start of the function rather than repeatedly having special checks for negatives. Fix the "number of bytes searched so far" calculation, which was only correct for negative offsets (but only used when there was a partial match.)
2022-02-06epan: Add BASE_SHOW_UTF_8_PRINTABLEJohn Thacker1-0/+12
Add BASE_SHOW_UTF_8_PRINTABLE and related function tvb_utf_8_isprint for supporting fields of bytes that are "maybe UTF-8" (default or SHOULD be UTF-8 but could be something else, with no encoding indicator), such as SSID fields in IEEE 802.11 (See #16208), certain OctetString fields in Diameter or PFCP, and other places where BASE_SHOW_ASCII_PRINTABLE is currently used. Fix #5307
2022-02-03epan: Handle -1 length in tvb_ascii_isprintJohn Thacker1-1/+6
tvb_ascii_isprint like other tvb_ functions accepts -1 as a parameter, meaning "to the end of the tvb". Get the real length for the loop.
2022-01-19tvbuff: assert the called len is > 0.Dario Lombardo1-0/+1
This assert will notify the higher layers that the dissector needs to be fixed. ieee1722 and zbee-zcl dissectors have been updated to prevent such a call. Ref: #17882.
2022-01-18tvbuff: add robustness to tvb search related functionsJaap Keuter1-0/+4
2022-01-16tvbuff: add robustness to memory copy related functionsJaap Keuter1-2/+5
2021-12-23Refactor VARINT handlingJaap Keuter1-3/+15
2021-12-19epan: Convert to use stdio.h from GLibJoão Valverde1-2/+2
Replace: g_snprintf() -> snprintf() g_vsnprintf() -> vsnprintf() g_strdup_printf() -> ws_strdup_printf() g_strdup_vprintf() -> ws_strdup_vprintf() This is more portable, user-friendly and faster on platforms where GLib does not like the native I/O. Adjust the format string to use macros from intypes.h.
2021-12-02epan: Add ENC_ISO_8601_DATE_TIME_BASICJohn Thacker1-0/+5
Add the ISO 8601 Basic date time format as another string time option. This could be used for e.g. ASN.1 GeneralizedTime. Add tests for it.
2021-12-01nstime: Support ISO 8601 basic formatJohn Thacker1-1/+1
Add support in iso8601_to_nstime for the ISO 8601 Basic date/time format that lacks the - and : separators.
2021-11-28Move two functions from epan to wsutil/str_utilJoão Valverde1-1/+1
Move epan_memmem() and epan_strcasestr() to wsutil/str_util. Rename to ws_memmem() and ws_strcasestr(). Add compile time check for a system implementation and use that if available. We invoke those functions using a wrapper to avoid exposing _GNU_SOURCE outside of the implementation.
2021-11-27tvbuff: Use iso8601_to_nstimeJohn Thacker1-178/+109
Have tvb_get_string_time use iso8601_to_nstime for ENC_ISO_8601_DATE_TIME (which seems to be the only time in a string encoding any built in dissector actually uses, in syslog). It is strictly superior; among other things it handles fractional seconds. Also, tvbuff.c does not use strptime, so remove that include.
2021-10-23exceptions: set FragmentBoundsError priority above ContainedBoundsErrorJohn Thacker1-34/+32
All fragment errors are bounds errors that go past the contained length, but they do not necessarily involve going past the reported length, so the checks for FragmentBoundsError should reflect that. With some forms of reassembly, like IP fragmentation, we don't know how big the PDU/reassembled packet is until reassembly is complete, so we probably use tvb_new_subset_remaining() to create fragments and the tvb's reported length is equal to its contained length. In these cases ReportedBoundsError would be otherwise thrown, except when the existing checks for FragmentBoundsError intervene. However, with other forms of reassembly, like various PDUs carried over TCP, we know the total PDU length, so we use tvb_new_subset_length[_caplen](), setting the proper reported length, but not changing the contained length when reassembly is not performed. In those cases, a bounds error that occurs due to lack of reassembly is otherwise a ContainedBoundsError, not a ReportedBoundsError. In both cases, a bounds error caused by an unreassembled fragment should be a FragmentBoundsError for the existing reasons. It is not necessarily a malformed packet (to the extent reassembly is not performed because of a malformed error elsewhere, that should be reported separately) and can likely be avoided by changing preferences (e.g., turning reassembly preferences on, turning off checksum verification, etc.) Otherwise it is probably a dissector bug.
2021-09-26USB HID: Parse bit fields with correct bit orderTomasz Moń1-16/+169
Implement little endian support for tvb_get_bits family of functions. The big/little endian refers to bit numbering within an octet. In big endian, the most significant bit is considered bit 0, while in little endian the least significant bit is considered bit 0. Add encoding parameters to proto tree bits format family functions. Specify ENC_BIG_ENDIAN in all dissectors using these functions except in USB HID that requires ENC_LITTLE_ENDIAN to work correctly. When formatting bits values, always display most significant bit on the leftmost position regardless of the encoding. This results in no gaps between octets and makes the displayed value comprehensible. Close #4478 Fix #17014
2021-09-01tvbuff: convert helper methods to pinfo->poolEvan Huus1-12/+22
A few of them just needed scratch memory, so allocate and free it manually after doing any exception-raising checks. A few others were returning memory, and needed conversion to accept a wmem scope argument.
2021-07-29wsutil: rename bytestring_to_str() -> bytes_to_str_punct()João Valverde1-1/+1
2021-07-01tvbuff: add a DISSECTOR_ASSERT to tvb_bytes_to_str.Dario Lombardo1-0/+1
It has been added since its length is signed, while the underlying bytes_to_str uses a size_t, causing an unwanted cast. Basically passing a len < 0 is pointless.
2021-06-19Replace g_assert() with ws_assert()João Valverde1-2/+3
2021-06-15tvbuff: add tvb_ensure_reported_length_remaining().Guy Harris1-3/+18
It is to tvb_reported_length_remaining() as tvb_ensure_captured_length_remaining() is to tvb_captured_length_remaining() - it throws an exception if the offset is out of range. (Note that an offset that's just past the end of the {reported, captured} data is *not* out of range, it just means that there is no data remaining. Anything *past* that is out of range and thus invalid.)
2021-05-21epan: redo the processing of ENC_APN_STR.Guy Harris1-17/+71
Process the characters entirely ourselves; that way, we don't have to worry about tvb_get_string_enc(..., ENC_ASCII) mangling label length values, can convert non-ASCII characters in labels to the Unicode REPLACEMENT CHARACTER, and can do bounds checks.
2021-05-20Use ENC_APN_STR in one more place.Anders Broman1-4/+6
2021-05-20Add ENC_APN_STR to handle APN stringsAndersBroman1-0/+18
2021-02-22ZVT: Addedd dissection of amount, terminal ID, date and time. Registration fix.Grzegorz Niemirowski1-9/+24
2020-12-10Introduce ENC_BCD_ODD_NUM_DIG in order to handle odd number of digitsAnders Broman1-7/+19
2020-11-18tvb_get_bcd_string: 0xf can both be filler and stop digit.Anders Broman1-4/+2
2020-10-28Encodings: Add FT_STRINGZ support for GB18030, EUC-KRJohn Thacker1-0/+36
2020-10-22Update validate_single_byte_ascii_encoding with new encodingsJohn Thacker1-0/+21
Add various other encodings that differ from ASCII in the ISO/IEC 646 invariant region to the reject list for validate_single_byte_ascii_encoding()
2020-10-21Use iconv to support GB 18030 and EUC-KR, allow future encodingsJohn Thacker1-6/+49
Add support internally to using iconv (always present with glib) to convert strings from various encodings to UTF-8 (using REPLACEMENT CHARACTER as recommended), and use that to support GB 18030 and EUC-KR. Replace call directly to iconv in ANSI 637 for EUC-KR to new API. Update comments and documentation around character encodings. It is possible to replace the calls to iconv with an internal decoder later. Tested on Linux and on Windows (including with illegal characters). Closes #16630.
2020-10-15Replace ill-formed UTF-8 byte sequences with replacement characterJohn Thacker1-15/+12
Implement the Unicode Standard "best practices" for replacing ill-formed sequences with the Unicode REPLACEMENT CHARACTER. Add wmem_strbuf_append_len for appending strings with embedded null characters. Clarify why wmem_strbuf_grow() doesn't always ensure that there's enough room for a new string, and short-circuit some tests there. Related to #14948
2020-09-28Add some more string encodings.Guy Harris1-6/+41
Add an encoding for "unpacked" 3GPP TS 23.038 7-bit strings, in which each code position is in a byte of its own, rather than with the code positions packed into 7 bits. Rename the packed encoding to explicitly indicate that it's packed. Add an encoding for ETSI TS 102 221 Annex A strings. Use the new encodings.
2020-08-21tvb: add tvb_get_bits_arrayFilipe Laíns1-0/+13
Change-Id: I2fad824ca417dcd089fabfdf06f28529c7ee9e87 Signed-off-by: Filipe Laíns <lains@archlinux.org> Reviewed-on: https://code.wireshark.org/review/37949 Petri-Dish: Anders Broman <a.broman58@gmail.com> Tested-by: Petri Dish Buildbot Reviewed-by: Anders Broman <a.broman58@gmail.com>
2020-04-28Clean up the encoding value definitions.Guy Harris1-1/+1
Group them by the data types for which they're used, starting with the byte-order definitions which (with the inclusion of ENC_NA) are used with all types. Put all the ones used for strings together, starting with the character encodings, with the Zigbee flag and the flags for "this is a string but we're going to interpret it as a byte array or time stamp". Make ENC_CHARENCODING_MASK equal to ENC_STR_MASK; no, there's no reason for ENC_STR_MASK to replace ENC_CHARENCODING_MASK - the opposite should happen, as ENC_CHARENCODING_MASK at least specifies what the bits set in it are used for, namely character encodings. If all #defines for strings should have _STR_ in them, start with the character encoings. Change-Id: I072420f313086153b4ea4034911fc293453dea00 Reviewed-on: https://code.wireshark.org/review/36962 Petri-Dish: Guy Harris <gharris@sonic.net> Tested-by: Petri Dish Buildbot Reviewed-by: Guy Harris <gharris@sonic.net>
2020-04-27Add string encoding values for various BCD encodings, and use them.Guy Harris1-36/+95
Add some ENC_ values for various flavors of packed BCD, and use that instead of explicitly calling tvb_bcd_dig_to_wmem_packet_str() and adding the result. Change-Id: I07511d9d09c9231b610c121cd6ffb3b16fb017a9 Reviewed-on: https://code.wireshark.org/review/36952 Reviewed-by: Guy Harris <gharris@sonic.net>
2019-12-26Find the line ending using tvb_find_line_end().Guy Harris1-6/+6
tvb_find_line_end(), unlike a tvb_find_guint8() looking for an LF, returns a length that *doesn't* include the line ending, *regardless* of whether the line ends with CR-LF or just LF, so the query string we extract is just the query, without any of the line ending. Update some comments while we're at it to note that the "next_offset" pointer argument to tvb_find_line_end() and tvb_find_line_end_unquoted() can be NULL, in which case the offset *past* the line ending isn't returned. (We pass tvb_find_line_end() NULL in the aforementioned call, because, in that particular case, we don't care about the next line.) Change-Id: I1c9746e32c61a79f8cb636d577a2e14a07ecab17 Reviewed-on: https://code.wireshark.org/review/35566 Petri-Dish: Guy Harris <guy@alum.mit.edu> Tested-by: Petri Dish Buildbot Reviewed-by: Guy Harris <guy@alum.mit.edu>
2019-09-05kafka: Cleanup to use "native" APIs.Michael Mann1-1/+16
Add "native" support for the "zig-zag" version of a varint in proto.[ch] and tvbuff.[ch]. Convert the use of varint in the KAFKA dissector to use the (new) "native" API. Ping-Bug: 15988 Change-Id: Ia83569203877df8c780f4f182916ed6327d0ec6c Reviewed-on: https://code.wireshark.org/review/34386 Petri-Dish: Alexis La Goutte <alexis.lagoutte@gmail.com> Tested-by: Petri Dish Buildbot Reviewed-by: Alexis La Goutte <alexis.lagoutte@gmail.com> Reviewed-by: Anders Broman <a.broman58@gmail.com>
2019-07-26HTTPS (almost) everywhere.Guy Harris1-1/+1
Change all wireshark.org URLs to use https. Fix some broken links while we're at it. Change-Id: I161bf8eeca43b8027605acea666032da86f5ea1c Reviewed-on: https://code.wireshark.org/review/34089 Reviewed-by: Guy Harris <guy@alum.mit.edu>
2019-07-24Add a routine to fetch raw bytes into a fixed-length buffer as a string.Guy Harris1-0/+32
That's what the remaining calls to tvb_get_nstringz() and tvb_get_nstringz0() are being used to do, even though those routines were not intended for that purpose - the calls are extracting from a text protcool, meaning that the strings are *not* null-terminate in the packet. Strings - even null-terminated ones - should, in almost all cases, be extracted by tvb_get_string_enc() or routines that call it, so that an encoding is specified. In the few cases where we're fetching strings only to be compared to ASCII constants, or to parse as numbers, we can get away with this. Change-Id: I29f0532902c4ade2207de7f06db69c32eafd4132 Reviewed-on: https://code.wireshark.org/review/34072 Petri-Dish: Guy Harris <guy@alum.mit.edu> Tested-by: Petri Dish Buildbot Reviewed-by: Guy Harris <guy@alum.mit.edu>
2019-07-15Add support for the ISO 646 "Basic code table" encoding.Guy Harris1-0/+44
The "Basic code table" in ISO 646 is mostly ASCII, but some code points either 1) have more than one glyph that can be assigned to them or 2) have no glyph assigned to them. National versions choose one of the two glyphs for the code points in group 1) and assign specific glyphs to the code points in group 2); the International Reference Version assigns the same glyphs to those code points as does ASCII. For the "Basic code table" encoding, we map the code points in groups 1) and 2) to a REPLACEMENT CHARACTER; additional encodings can be added for the national versions. Add ENC_ISO_646_IRV (International Reference Version) as an alias for ENC_ASCII. Expand some comments, and add some comments, while we're at it. Change-Id: I4f1b5e426ec193775e919731c5cae1224dc65115 Reviewed-on: https://code.wireshark.org/review/33941 Petri-Dish: Guy Harris <guy@alum.mit.edu> Tested-by: Petri Dish Buildbot Reviewed-by: Guy Harris <guy@alum.mit.edu>
2019-07-15Add support for code pages 855 and 856 for FT_STRINGZ strings.Guy Harris1-2/+10
Clean up some comments while we're at it. Change-Id: I0cd014bf1d1e7dc740eac1721d5466377938655f Reviewed-on: https://code.wireshark.org/review/33939 Reviewed-by: Guy Harris <guy@alum.mit.edu>