`char` specification mismatch with implementation

`char` in the specification is described as being encoded as a `u32`:
https://github.com/bincode-org/bincode/blob/55fd02934cff567ce1b2ff9d007608818ea6481b/docs/spec.md?plain=1#L58

But it appears that the actual implementation just encodes and decodes them as multi-byte UTF-8 codepoint sequences:

* Encode:
https://github.com/bincode-org/bincode/blob/55fd02934cff567ce1b2ff9d007608818ea6481b/src/enc/impls.rs#L290-L294
\\/
https://github.com/bincode-org/bincode/blob/55fd02934cff567ce1b2ff9d007608818ea6481b/src/enc/impls.rs#L325-L349

* Decode:
https://github.com/bincode-org/bincode/blob/55fd02934cff567ce1b2ff9d007608818ea6481b/src/de/impls.rs#L425-L452

I assume this is a bug in the specification, and if so, it would be helpful to have it rectified.

	fn encode_utf8(writer: &mut impl Writer, c: char) -> Result<(), EncodeError> {
	let code = c as u32;

	if code < MAX_ONE_B {
	writer.write(&[c as u8])
	} else if code < MAX_TWO_B {
	let mut buf = [0u8; 2];
	buf[0] = ((code >> 6) & 0x1F) as u8 \| TAG_TWO_B;
	buf[1] = (code & 0x3F) as u8 \| TAG_CONT;
	writer.write(&buf)
	} else if code < MAX_THREE_B {
	let mut buf = [0u8; 3];
	buf[0] = ((code >> 12) & 0x0F) as u8 \| TAG_THREE_B;
	buf[1] = ((code >> 6) & 0x3F) as u8 \| TAG_CONT;
	buf[2] = (code & 0x3F) as u8 \| TAG_CONT;
	writer.write(&buf)
	} else {
	let mut buf = [0u8; 4];
	buf[0] = ((code >> 18) & 0x07) as u8 \| TAG_FOUR_B;
	buf[1] = ((code >> 12) & 0x3F) as u8 \| TAG_CONT;
	buf[2] = ((code >> 6) & 0x3F) as u8 \| TAG_CONT;
	buf[3] = (code & 0x3F) as u8 \| TAG_CONT;
	writer.write(&buf)
	}
	}

	impl<Context> Decode<Context> for char {
	fn decode<D: Decoder<Context = Context>>(decoder: &mut D) -> Result<Self, DecodeError> {
	let mut array = [0u8; 4];

	// Look at the first byte to see how many bytes must be read
	decoder.reader().read(&mut array[..1])?;

	let width = utf8_char_width(array[0]);
	if width == 0 {
	return Err(DecodeError::InvalidCharEncoding(array));
	}
	// Normally we have to `.claim_bytes_read` before reading, however in this
	// case the amount of bytes read from `char` can vary wildly, and it should
	// only read up to 4 bytes too much.
	decoder.claim_bytes_read(width)?;
	if width == 1 {
	return Ok(array[0] as char);
	}

	// read the remaining pain
	decoder.reader().read(&mut array[1..width])?;
	let res = core::str::from_utf8(&array[..width])
	.ok()
	.and_then(\|s\| s.chars().next())
	.ok_or(DecodeError::InvalidCharEncoding(array))?;
	Ok(res)
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`char` specification mismatch with implementation #789

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	impl Encode for char {
	fn encode<E: Encoder>(&self, encoder: &mut E) -> Result<(), EncodeError> {
	encode_utf8(encoder.writer(), *self)
	}
	}

char specification mismatch with implementation #789

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`char` specification mismatch with implementation #789