Problem
clickhouse-java has several int-returning varint readers used for binary metadata such as string lengths, column counts, row counts, array sizes, and similar values.
These readers currently accept certain overflowing or overlong varint encodings silently. Depending on the implementation, this can produce negative int values, truncated values, or corrupted lower bits.
VarInt Encoding
A varint byte uses the high bit as a continuation flag and the lower 7 bits as payload:
bit: 7 6 5 4 3 2 1 0
C p p p p p p p
C = 1 another byte follows
C = 0 this is the last byte
p = 7 payload bits
For a multi-byte value:
byte 0 byte 1 byte 2 byte 3 byte 4
C ppppppp C ppppppp C ppppppp C ppppppp 0 0000ppp
bits 0..6 bits 7..13 bits 14..20 bits 21..27 bits 28..30
A Java int has 31 value bits for non-negative value bits (0..Integer.MAX_VALUE), so it can require up to 5 varint bytes. For an int-returning reader, the fifth byte may contain only 3 payload bits.
Important Corner Cases
Four varint bytes carry only 28 payload bits, so values larger than 0x0FFFFFFF require a fifth byte.
Valid maximum non-negative int:
ff ff ff ff 07 -> Integer.MAX_VALUE
Integer overflow / out of range
This value is invalid for an int-returning reader because the fifth-byte payload exceeds 3 bits:
ff ff ff ff 08 -> 2^31, any value equal to or greater than this cannot be represented as a non-negative `int`
Overlong varint encoding
Invalid/overlong for an int reader, because the fifth byte still has the continuation flag set:
If a server or stream contains a valid protocol varint whose value is larger than Integer.MAX_VALUE, that value is valid at the wire level but not valid for an API returning int.
The issue is unlikely to affect normal server responses, but public low-level readers should either decode the value correctly or reject it explicitly. Silently returning negative, truncated, or corrupted values makes malformed or out-of-range metadata harder to diagnose.
Affected Implementations
The following links point to upstream main commit e9a3e259551876d80d219ae6b3fde35f5ba52fe6 as of June 29, 2026.
client-v2 BinaryStreamReader.readVarInt
|
public static int readVarInt(InputStream input) throws IOException { |
|
int value = 0; |
|
|
|
for (int i = 0; i < 10; i++) { |
|
byte b = (byte) readByteOrEOF(input); |
|
value |= (b & 0x7F) << (7 * i); |
|
|
|
if ((b & 0x80) == 0) { |
|
break; |
|
} |
|
} |
|
|
|
return value; |
|
} |
int value = 0;
for (int i = 0; i < 10; i++) {
byte b = (byte) readByteOrEOF(input);
value |= (b & 0x7F) << (7 * i);
Issues:
- Loops up to 10 bytes even though the return type is
int.
- Java masks
int shift distances modulo 32, so shifts after 31 bits can corrupt lower bits.
- Does not reject fifth-byte payloads greater than
0x07.
- Does not reject continuation past the fifth byte.
BinaryStreamUtils.readVarInt(InputStream)
|
public static int readVarInt(InputStream input) throws IOException { |
|
// https://github.com/ClickHouse/ClickHouse/blob/abe314feecd1647d7c2b952a25da7abf5c19f352/src/IO/VarInt.h#L126 |
|
long result = 0L; |
|
int shift = 0; |
|
for (int i = 0; i < 9; i++) { |
|
// gets 7 bits from next byte |
|
int b = input.read(); |
|
if (b == -1) { |
|
try { |
|
input.close(); |
|
} catch (IOException e) { |
|
// ignore error |
|
} |
|
throw new EOFException(); |
|
} |
|
result |= (b & 0x7F) << shift; |
|
if ((b & 0x80) == 0) { |
|
break; |
|
} |
|
shift += 7; |
|
} |
|
|
|
return (int) result; |
|
} |
long result = 0L;
...
result |= (b & 0x7F) << shift;
...
return (int) result;
Issues:
- The accumulator is long, but
(b & 0x7F) << shift is still an int-width shift.
- Loops up to 9 bytes and then casts to
int, so values outside the non-negative int range can be silently truncated or converted to a negative value.
BinaryStreamUtils.readVarInt(ByteBuffer)
|
public static int readVarInt(ByteBuffer buffer) { |
|
long result = 0L; |
|
int shift = 0; |
|
for (int i = 0; i < 9; i++) { |
|
// gets 7 bits from next byte |
|
byte b = buffer.get(); |
|
result |= (b & 0x7F) << shift; |
|
if ((b & 0x80) == 0) { |
|
break; |
|
} |
|
shift += 7; |
|
} |
|
|
|
return (int) result; |
Same issue pattern as the InputStream overload: long accumulator, int shift expression, up to 9 bytes, final cast to int.
ClickHouseByteUtils.getVarInt(ByteBuffer) and readVarInt(InputStream)
|
public static int getVarInt(ByteBuffer buffer) { |
|
long result = 0L; |
|
int shift = 0; |
|
for (int i = 0; i < 9; i++) { |
|
// gets 7 bits from next byte |
|
byte b = buffer.get(); |
|
result |= (b & 0x7F) << shift; |
|
if ((b & 0x80) == 0) { |
|
break; |
|
} |
|
shift += 7; |
|
} |
|
|
|
return (int) result; |
|
public static int readVarInt(InputStream input) throws IOException { |
|
// https://github.com/ClickHouse/ClickHouse/blob/abe314feecd1647d7c2b952a25da7abf5c19f352/src/IO/VarInt.h#L126 |
|
long result = 0L; |
|
int shift = 0; |
|
for (int i = 0; i < 9; i++) { |
|
// gets 7 bits from next byte |
|
int b = input.read(); |
|
if (b == -1) { |
|
try { |
|
input.close(); |
|
} catch (IOException e) { |
|
// ignore error |
|
} |
|
throw new EOFException(); |
|
} |
|
result |= (b & 0x7F) << shift; |
|
if ((b & 0x80) == 0) { |
|
break; |
|
} |
|
shift += 7; |
|
} |
|
|
|
return (int) result; |
These have the same long result plus int shift/cast pattern as BinaryStreamUtils.
ClickHouseInputStream.readVarInt
|
public int readVarInt() throws IOException { |
|
// https://github.com/ClickHouse/ClickHouse/blob/abe314feecd1647d7c2b952a25da7abf5c19f352/src/IO/VarInt.h#L126 |
|
int b = readByte(); |
|
if (b >= 0) { |
|
return b; |
|
} |
|
|
|
int result = b & 0x7F; |
|
for (int shift = 7; shift <= 28; shift += 7) { |
|
if ((b = readByte()) >= 0) { |
|
result |= b << shift; |
|
break; |
|
} else { |
|
result |= (b & 0x7F) << shift; |
|
} |
|
} |
|
// consume a few more bytes - readVarLong() should be called instead |
|
if (b < 0) { |
|
for (int shift = 35; shift <= 63; shift += 7) { |
|
if (peek() < 0 || readByte() >= 0) { |
|
break; |
|
} |
|
} |
|
} |
|
return result; |
for (int shift = 7; shift <= 28; shift += 7) {
if ((b = readByte()) >= 0) {
result |= b << shift;
break;
}
Issues:
- This implementation is capped at the fifth byte, but it still accepts an overflowing fifth-byte payload.
- If more bytes follow, it consumes the remaining extra bytes and returns the truncated
int result instead of rejecting the value.
- That can hide malformed or too-large size/count metadata.
Expected Behavior
For methods returning int, valid non-negative int varints should decode unchanged.
The reader should reject:
- Any fifth byte whose payload is greater than
0x07.
- Any fifth byte with the continuation bit set.
- Any varint value that cannot be represented as a non-negative
int.
Rejecting these values with IOException is preferable to returning corrupted, truncated, or negative sizes/counts.
Problem
clickhouse-javahas severalint-returning varint readers used for binary metadata such as string lengths, column counts, row counts, array sizes, and similar values.These readers currently accept certain overflowing or overlong varint encodings silently. Depending on the implementation, this can produce negative
intvalues, truncated values, or corrupted lower bits.VarInt Encoding
A varint byte uses the high bit as a continuation flag and the lower 7 bits as payload:
For a multi-byte value:
A Java
inthas 31 value bits for non-negative value bits (0..Integer.MAX_VALUE), so it can require up to 5 varint bytes. For anint-returning reader, the fifth byte may contain only 3 payload bits.Important Corner Cases
Four varint bytes carry only 28 payload bits, so values larger than
0x0FFFFFFFrequire a fifth byte.Valid maximum non-negative
int:Integer overflow / out of range
This value is invalid for an
int-returning reader because the fifth-byte payload exceeds 3 bits:Overlong varint encoding
Invalid/overlong for an
intreader, because the fifth byte still has the continuation flag set:If a server or stream contains a valid protocol varint whose value is larger than
Integer.MAX_VALUE, that value is valid at the wire level but not valid for an API returningint.The issue is unlikely to affect normal server responses, but public low-level readers should either decode the value correctly or reject it explicitly. Silently returning negative, truncated, or corrupted values makes malformed or out-of-range metadata harder to diagnose.
Affected Implementations
The following links point to upstream
maincommite9a3e259551876d80d219ae6b3fde35f5ba52fe6as of June 29, 2026.client-v2 BinaryStreamReader.readVarInt
clickhouse-java/client-v2/src/main/java/com/clickhouse/client/api/data_formats/internal/BinaryStreamReader.java
Lines 959 to 972 in e9a3e25
Issues:
int.intshift distances modulo 32, so shifts after 31 bits can corrupt lower bits.0x07.BinaryStreamUtils.readVarInt(InputStream)
clickhouse-java/clickhouse-data/src/main/java/com/clickhouse/data/format/BinaryStreamUtils.java
Lines 1673 to 1696 in e9a3e25
Issues:
(b & 0x7F) << shiftis still anint-width shift.int, so values outside the non-negativeintrange can be silently truncated or converted to a negative value.BinaryStreamUtils.readVarInt(ByteBuffer)
clickhouse-java/clickhouse-data/src/main/java/com/clickhouse/data/format/BinaryStreamUtils.java
Lines 1704 to 1717 in e9a3e25
Same issue pattern as the
InputStreamoverload:longaccumulator,intshift expression, up to 9 bytes, final cast toint.ClickHouseByteUtils.getVarInt(ByteBuffer) and readVarInt(InputStream)
clickhouse-java/clickhouse-data/src/main/java/com/clickhouse/data/ClickHouseByteUtils.java
Lines 238 to 251 in e9a3e25
clickhouse-java/clickhouse-data/src/main/java/com/clickhouse/data/ClickHouseByteUtils.java
Lines 276 to 298 in e9a3e25
These have the same
long resultplusintshift/cast pattern asBinaryStreamUtils.ClickHouseInputStream.readVarInt
clickhouse-java/clickhouse-data/src/main/java/com/clickhouse/data/ClickHouseInputStream.java
Lines 1106 to 1130 in e9a3e25
Issues:
intresult instead of rejecting the value.Expected Behavior
For methods returning
int, valid non-negativeintvarints should decode unchanged.The reader should reject:
0x07.int.Rejecting these values with
IOExceptionis preferable to returning corrupted, truncated, or negative sizes/counts.