In This Section
SINCE 3.4
Overview
This section discusses bulk copy support for repeated fields on Xbuf generated messages and entities. Bulk copies allow the encoded backing bytes for a repeated field to be copied as single direct memory copy without iteration at very low cost.At a high level Xbuf generated objects provide efficient bulk transfer of repeated primitive fields under the following circumstances:
- An XIterator returned by getXXXIterator() is passed to a setValuesFrom(XIterator) on another message or entity with the same field id (aka protobuf field tag).
- An XIterator returned by a getXXXIterator() is passet setValuesFrom(XIterator) on an XbufRepeatedXXXFieldBuffer with the same type and field id.
- An XbufRepeatedXXXFieldBuffer is passed to a setValuesFrom(XIterator) on a message or entity with the same field tag as the field buffer.
The main take away from the above is that when using xbuf with large array fields, it is highly advantageous to ensure that copies can be done between fields withe the same id whenever possible as it saves on message processing time that would be spent doing iterative transfer.
Challenging Aspects of Protobuf Repeated Fields
Xbuf messages and entities generated with Protobuf compatibility are wire compatible with those generated with the protoc compiler. This is good for interoperability, but the standard protobuf wire format for repeated fields has several challenges that make them inefficient to work with. This section describes some of these challenges for those that are curious.
- Non Contiguous: Protobuf on the wire does not mandate that repeated fields be laid out contiguously. A repeated field's values can be interspersed between other field values on the wire. Therefore the whole message needs to be 'walked' initially to find all of the values that make up the field.
- Xbuf collects all of the values when it initially frames a messages field into a single buffer that then allows bulk transfers.
- Variable Length: Repeated field values are typically variable length. For example an int in a repeated field buffer won't always be 32 bytes in length. It will instead be shorter for smaller values. Consequently, even for a contiguous set of repeated values individual values can't be easily accessed by an indexed offset.
- Xbuf consequently provides access to repeated fields via XIterators (which provide zero garbage semantics) and more accurately mirror the wire format. Generated messages still provide array accessors, but they are much less efficient and not zero garbage.
- Tag prefixed: Additionally each repeated value on the wire is preceded by the field's tag (which is also variable length). This is what allows field values to be serialized in a non contiguous fashion on the wire, but it also means that for a contiguous set of repeated field values for one field can be bulk copied to another message field with a different tag value.
- Xbuf support bulk copy of contiguous field values between various messages when the field ids (tags) are the same. It transparently falls back to iterative copy and tag transaction when the tag values are different.
- Packed Encoding: Protobuf has the concept of packed repeated field values. In Packed encoding each field value is not tag prefixed, but rather a group of the values are tag prefixed and the serialized length of the tagless encoded values indicates the number of bytes on the wire that make up the elements. However, even with packed encoding a protobuf message can have multiple non contiguous set of packed values. Additionally, packed encoding is not supported for all types (such as 'string')
- Xbuf doesn't currently use the packed format when serializing fields even for types that do support it, but it is a feature that may beimplemented in a future release. With packed encoding bulk transfer between fields with different tags can be more efficient and the serialized size can be reduced.
- Null Values: Protobuf doesn't support encoding null values on the wire. ADM does support null values. This impedance mismatch can be costly when doing bulk operations such as setting a field value from an array (e.g. for String[] and Date[]) or an arbitrary iterator because the setters must check for and avoid null values which requires iteration.
- For values that come from another serialized object the null check can be skipped as the lack of null values can be inferred by virtue of it having been serialized.
Repeated Field Buffers
Messages and Entities generated with Xbuf encoding type and Protobuf compatibility provide the ability to create 'Field Buffer' objects that wraps the encoded bytes making up the repeated field and allows zero garbage iteration, addition and bulk copy to/from the XIterators returned by the message or entity. The generated code for a message collects repeated field values into these buffers (or wraps the buffer around the backing message buffer). The generated code for a message or entity also generates factory methods for create a field buffer for use in application state which allows efficient zero garbage non iterative copy of the field values between messages and application state.
Types
Repeated field buffers are supported for the following primitive and enum types.
Type | Repeated Field Buffer Type | Proto Wire Encoding |
---|---|---|
boolean | com.neeve.xbuf.XbufRepeatedBooleanFieldBuffer | bool |
byte | com.neeve.xbuf.XbufRepeatedByteFieldBuffer | int32 |
char | com.neeve.xbuf.XbufRepeatedCharFieldBuffer | int32 |
short | com.neeve.xbuf.XbufRepeatedShortFieldBuffer | int32 |
int | com.neeve.xbuf.XbufRepeatedIntFieldBuffer | int32 |
long | com.neeve.xbuf.XbufRepeatedLongFieldBuffer | int64 |
float | com.neeve.xbuf.XbufRepeatedFloatFieldBuffer | float |
double | com.neeve.xbuf.XbufRepeatedDoubleFieldBuffer | double |
String | com.neeve.xbuf.XbufRepeatedStringFieldBuffer | string |
Date | com.neeve.xbuf.XbufRepeatedDateFieldBuffer | int64 |
enums | com.neeve.xbuf.XbufRepeatedEnumFieldBuffer<T> | int32 |
Creation
Because efficient copy depend on how an particular message encodes a field, repeated Field Buffers are created via static factory methods on messages and entities. These factory method as only created on types generated with Xbuf encoding when they are generated with Protobuf compatiblity. The creation methods take the form createXXXFieldBuffer where XXX represents the field name modeled in ADM.
The create methods take 2 arguments:
- initialBufferLength: The initial length of the backing buffer to hold field values. Because protobuf encodes the values in a variable length format, it can be difficult to tell up front how large of a buffer to allocate. The underlying buffer will be grown to accommodate what is being set into it, but it is best if the buffer is large enough up front to accomodate the values because growing the buffer causes additional copies and may result in additional buffer allocation under the covers. See scalar value type in the protobuf documentation for sizing.
- isNative: When true the backing buffer will use a native, off heap buffer. When false backing buffer will be a heap buffer. Consider the following when choosing a native vs. non-native buffer:
- Native buffers use mostly off heap memory. Consequently allocating new ones at runtime puts less pressure on the garbage collector for object promotions.
- Native buffers are allocated by 'slicing' into direct buffers that are memory page sized (typically 8k). Consequently, multiple smaller native buffers can all reference a single 8k memory page and that page won't be freed until all of the buffers that reference it are similarly freed. Consequently, allocating and retaining native buffers at runtime (as opposed to preallocated at application startup) creates the risk that allocation of one small allocation will retain a full page worth of memory that won't be reclaimed by the garbage collector if there are other areas of the application allocating and leaking buffer.
So, if the application knows the maximum size of repeated field buffers up front and can afford the memory, native buffers can be used. Otherwise when not preallocating, native buffers can be used providing other part of the application aren't leaking native buffers as it could lead to situations where the field buffer could keep references to off heap memory pages from being freed.
Iteration
Repeated field types all implement iterator interfaces and allow zero garbage access of field values directly from their backing buffer and use XIterators to allow zero garbage iteration without autoboxing garbage.
Type | Repeated Field Buffer Type |
---|---|
boolean | com.neeve.lang.XBooleanIterator |
byte | com.neeve.lang.XByteIterator |
char | com.neeve.lang.XCharIterator |
short | com.neeve.lang.XShortIterator |
int | com.neeve.lang.XIntIterator |
long | com.neeve.lang.XLongIterator |
float | com.neeve.lang.XFloatIterator |
double | com.neeve.lang.XDoubleIterator |
String | com.neeve.lang.XStringIterator |
Date | com.neeve.lang.XDateIterator |
enums | com.neeve.lang.XIterator<T> |
Null Values
By default repeated field buffers don't allow addition of null values since standard Protobuf doesn't support encoding nulls on the wire. Behavior on addition of a null value depends on how the code was generated. See Null Value Handling in the ADM documentation.
Adding Values to a Field Buffer
The following operations are available on field buffers for setting or adding values
Setter | Description |
---|---|
setValuesFrom(XIterator) | Sets the values in the field buffer to those in the provided XIterator. This operation supports bulk copy if the source iterator is from field buffer that has |
setValuesFrom(FieldBuffer) | Sets the values in the field buffer to those from another field buffer. This operation support bulk copy if the source field buffer has the same field |
add(T value) | Appends the value to the field buffer |
Use Cases
Bulk Copy From Inbound To Outbound Message
Bulk transfer from an inbound to an outbound message is done transparently using the getXXXIterator() and setXXXFrom(XIterator) accessors on the message. The generated code is checks if the iterator is backed by a repeated field buffer. If the source and target have compatible encoding (same field tag etc) then a non iterative buffer copy is done.
Bulk Copy to State
Preallocated State
If a repeated field needs to be held as part of domain state it is possible to preallocate the field buffer at application startup to avoid allocation of the field buffer at runtime. With Xbuf generated code you can create a preallocated field buffer to which a field can be copied by using the static createXXXFieldBuffer on the source message from which the field will be copied.
In the above example the preallocated field buffer was created from the source message, but one could also preallocate the field buffer from the outbound message as well. Because bulk copy can't be done when the field ids don't match, it may make sense to use the outbound message as the factory for the field buffer, so that the copy fields can be bulk copied on the outbound side. This would make sense if the buffer were to be set on multiple outbound messages.
Non Preallocated State
An application can still benefit from bulk buffer copy without preallocation of the field buffer up front by creating the field buffer on demand. In such cases an application can create a field buffer on the fly with an initialLength of 1:
Bulk Copy From State to Outbound Message
Repeated field buffers can be created and used to directly serialized repeated field values on outbound messages. This can be useful if the values will be set on mulitple outbound messages so that the values don't need to be serialized into each message.
When the repeated values won't be set on more than one outbound message, it is more efficient to add the values one by to the outbound message unless the application is already holding them in a field buffer in state:
This will serialize the values directly into the field buffer maintained by the message.
It is also best practice to add such values back to back and not interleaved with setting other fields.
Injecting Messages with repeated fields into an engine
Populating a message with repeated fields to inject into an engine's dispatch loop is similar to the case or populating outbound messages. If the repeated values will be set on multiple injected events then it usually more efficient to serialize them once via a repeated field buffer and then setting that repeated field buffer on each event to inject.
Consider 'syncing' the message
When using EventSourcing, injected messages must be replicated to the backup. For large messages this means that such messages need to be serialized. Because the serialization of the message must be done by the engine's business logic thread as part of the commit process, the injecting thread can save the engine thread some work by calling message.sync() prior to injection to flush values into the message's backing buffer.