Destination Improvement: Request to add additional configuration options for schema subject naming in confluent with AVRO formats
AnsweredHVR only allows one very specific subject naming strategy which requires that each schema subject has the table name appended to the schema subject name and it creates two schemas, appending -value and -key to each. This doesn't follow industry-standard naming strategies (https://developer.confluent.io/courses/schema-registry/schema-subjects/). It looks close to TopicRecordNameStrategy, but adds -key/-value suffixes to the schema subject name, which fails further validation.
Current state: HVR can be configured with topic name (e.g., dev.mytopic) and table name (e.g., TABLE1). Its internal logic generates a schema based on table schema, calls it "<table name>" (e.g., TABLE1) with no namespace, and generates schema subjects by "<topic name>-<table name>-<key or value>" (e.g., "dev.mytopic-TABLE1-value" and "dev.mytopic-TABLE1-key").
Why does this matter?
Confluent allows to configure broker-side schema ID validation ( Validate Broker-side Schemas IDs in Confluent Platform | Confluent Documentation ). It helps to reject invalid messages before they poison the data. How it works:
- producer discovers schemas by generated subject name (according to the Subject Naming Strategy configuration), finds the schema version which looks same as the one generated (or registers it), and uses to serialize the message.
- producers serializes message in AVRO using this schema (so we'll have a binary stream instead), and "enveloped" to Wire format (https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html#wire-format). (e.g., the schema version ID is 123456 - so, the bytestream will be 0|123456|<avro-serialized binary>).
- Confluent kafka brokers (in case schema id validation is turned on) receive this message, read first part of the Wire "envelop", see the schema ID (e.g., 123456), and check if this schema is applicable to the topic, according to Subject Naming Strategy.
In our particular case, TopicRecordNameStrategy would expect the schema subject to be dev.mytopic-TABLE1(notice the lack of -value suffix). Since the subject name does NOT match, they mark schema to be wrong and reject the message (you can check TopicRecordNameStrategy schema name implementation here - https://github.com/confluentinc/schema-registry/blob/master/schema-serializer/src/main/java/io/confluent/kafka/serializers/subject/TopicRecordNameStrategy.java#L43 ).
This looks like a bug in schema subject name generation on HVR side, which prevents us from using HVR to publish data to Confluent Kafka with schema id validation turned on.
-
Hi Erika,
Thank you for highlighting this issue.
Let me discuss with engineering whether we can address this.
Thanks,
Mark.
Please sign in to leave a comment.
Comments
1 comment