Skip to content

int-returning varint readers silently corrupt overflowing values #2902

Description

@romka

Problem

clickhouse-java has several int-returning varint readers used for binary metadata such as string lengths, column counts, row counts, array sizes, and similar values.

These readers currently accept certain overflowing or overlong varint encodings silently. Depending on the implementation, this can produce negative int values, truncated values, or corrupted lower bits.

VarInt Encoding

A varint byte uses the high bit as a continuation flag and the lower 7 bits as payload:

bit:   7 6 5 4 3 2 1 0
       C p p p p p p p

C = 1  another byte follows
C = 0  this is the last byte
p = 7 payload bits

For a multi-byte value:

byte 0      byte 1      byte 2      byte 3      byte 4
C ppppppp   C ppppppp   C ppppppp   C ppppppp   0 0000ppp
bits 0..6   bits 7..13  bits 14..20 bits 21..27 bits 28..30

A Java int has 31 value bits for non-negative value bits (0..Integer.MAX_VALUE), so it can require up to 5 varint bytes. For an int-returning reader, the fifth byte may contain only 3 payload bits.

Important Corner Cases

Four varint bytes carry only 28 payload bits, so values larger than 0x0FFFFFFF require a fifth byte.

Valid maximum non-negative int:

ff ff ff ff 07  ->  Integer.MAX_VALUE

Integer overflow / out of range

This value is invalid for an int-returning reader because the fifth-byte payload exceeds 3 bits:

ff ff ff ff 08  ->  2^31, any value equal to or greater than this cannot be represented as a non-negative `int`

Overlong varint encoding

Invalid/overlong for an int reader, because the fifth byte still has the continuation flag set:

ff ff ff ff 87 00

If a server or stream contains a valid protocol varint whose value is larger than Integer.MAX_VALUE, that value is valid at the wire level but not valid for an API returning int.

The issue is unlikely to affect normal server responses, but public low-level readers should either decode the value correctly or reject it explicitly. Silently returning negative, truncated, or corrupted values makes malformed or out-of-range metadata harder to diagnose.

Affected Implementations

The following links point to upstream main commit e9a3e259551876d80d219ae6b3fde35f5ba52fe6 as of June 29, 2026.

client-v2 BinaryStreamReader.readVarInt

public static int readVarInt(InputStream input) throws IOException {
int value = 0;
for (int i = 0; i < 10; i++) {
byte b = (byte) readByteOrEOF(input);
value |= (b & 0x7F) << (7 * i);
if ((b & 0x80) == 0) {
break;
}
}
return value;
}

int value = 0;

for (int i = 0; i < 10; i++) {
    byte b = (byte) readByteOrEOF(input);
    value |= (b & 0x7F) << (7 * i);

Issues:

  • Loops up to 10 bytes even though the return type is int.
  • Java masks int shift distances modulo 32, so shifts after 31 bits can corrupt lower bits.
  • Does not reject fifth-byte payloads greater than 0x07.
  • Does not reject continuation past the fifth byte.

BinaryStreamUtils.readVarInt(InputStream)

public static int readVarInt(InputStream input) throws IOException {
// https://github.com/ClickHouse/ClickHouse/blob/abe314feecd1647d7c2b952a25da7abf5c19f352/src/IO/VarInt.h#L126
long result = 0L;
int shift = 0;
for (int i = 0; i < 9; i++) {
// gets 7 bits from next byte
int b = input.read();
if (b == -1) {
try {
input.close();
} catch (IOException e) {
// ignore error
}
throw new EOFException();
}
result |= (b & 0x7F) << shift;
if ((b & 0x80) == 0) {
break;
}
shift += 7;
}
return (int) result;
}

long result = 0L;
...
result |= (b & 0x7F) << shift;
...
return (int) result;

Issues:

  • The accumulator is long, but (b & 0x7F) << shift is still an int-width shift.
  • Loops up to 9 bytes and then casts to int, so values outside the non-negative int range can be silently truncated or converted to a negative value.

BinaryStreamUtils.readVarInt(ByteBuffer)

public static int readVarInt(ByteBuffer buffer) {
long result = 0L;
int shift = 0;
for (int i = 0; i < 9; i++) {
// gets 7 bits from next byte
byte b = buffer.get();
result |= (b & 0x7F) << shift;
if ((b & 0x80) == 0) {
break;
}
shift += 7;
}
return (int) result;

Same issue pattern as the InputStream overload: long accumulator, int shift expression, up to 9 bytes, final cast to int.

ClickHouseByteUtils.getVarInt(ByteBuffer) and readVarInt(InputStream)

public static int getVarInt(ByteBuffer buffer) {
long result = 0L;
int shift = 0;
for (int i = 0; i < 9; i++) {
// gets 7 bits from next byte
byte b = buffer.get();
result |= (b & 0x7F) << shift;
if ((b & 0x80) == 0) {
break;
}
shift += 7;
}
return (int) result;

public static int readVarInt(InputStream input) throws IOException {
// https://github.com/ClickHouse/ClickHouse/blob/abe314feecd1647d7c2b952a25da7abf5c19f352/src/IO/VarInt.h#L126
long result = 0L;
int shift = 0;
for (int i = 0; i < 9; i++) {
// gets 7 bits from next byte
int b = input.read();
if (b == -1) {
try {
input.close();
} catch (IOException e) {
// ignore error
}
throw new EOFException();
}
result |= (b & 0x7F) << shift;
if ((b & 0x80) == 0) {
break;
}
shift += 7;
}
return (int) result;

These have the same long result plus int shift/cast pattern as BinaryStreamUtils.

ClickHouseInputStream.readVarInt

public int readVarInt() throws IOException {
// https://github.com/ClickHouse/ClickHouse/blob/abe314feecd1647d7c2b952a25da7abf5c19f352/src/IO/VarInt.h#L126
int b = readByte();
if (b >= 0) {
return b;
}
int result = b & 0x7F;
for (int shift = 7; shift <= 28; shift += 7) {
if ((b = readByte()) >= 0) {
result |= b << shift;
break;
} else {
result |= (b & 0x7F) << shift;
}
}
// consume a few more bytes - readVarLong() should be called instead
if (b < 0) {
for (int shift = 35; shift <= 63; shift += 7) {
if (peek() < 0 || readByte() >= 0) {
break;
}
}
}
return result;

for (int shift = 7; shift <= 28; shift += 7) {
    if ((b = readByte()) >= 0) {
        result |= b << shift;
        break;
    }

Issues:

  • This implementation is capped at the fifth byte, but it still accepts an overflowing fifth-byte payload.
  • If more bytes follow, it consumes the remaining extra bytes and returns the truncated int result instead of rejecting the value.
  • That can hide malformed or too-large size/count metadata.

Expected Behavior

For methods returning int, valid non-negative int varints should decode unchanged.
The reader should reject:

  • Any fifth byte whose payload is greater than 0x07.
  • Any fifth byte with the continuation bit set.
  • Any varint value that cannot be represented as a non-negative int.

Rejecting these values with IOException is preferable to returning corrupted, truncated, or negative sizes/counts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions