See also Transaction Log Tool for general usage information about the tool, along with command reference. |
|
An X Platform application has two sources of data that are backed by transaction log files which are interesting to query: application state and inbound/outbound messaging traffic. This document discusses the tools available to query transaction logs. For the sake of brevity, the term log is used interchangeably herein.
Application State: In memory application state is modeled as a singly rooted object tree. When using State Replication the state tree is built incrementally by a series of PUT, UPDATE and REMOVE events that can be replicated by a primary updating the state tree and possibly also persisted to the application's recovery log. For an application using Event Sourcing, application state is built up by replaying all of an application's inbound messaging traffic with the expectation that it will produce exactly the same output as the original inbound message stream.
Recovery Transaction Log: The recovery transaction log holds all of the data needed to recover an application to its last stable state from disk. In an Event Sourcing application, this transaction log will hold all of an application's inbound messages. A StateReplication transaction log holds all of the incremental updates to the application's state tree plus the application's unacknowledged outbound messages.
Inbound Message Transaction Log: An inbound transaction log is an audit log that can be used to log inbound messages for audit purposes.
Outbound Message Transaction Log: An outbound transaction log is an audit log that can be used to log an application's outbound messages.
An application configured with recovery, inbound and outbound transaction logs is in a good position to have the after-the-fact historical analysis performed on it via query. Because it is often the case that an application is operating in collaboration with several other application, it is also generally true that a set of log files from multiple inter-operating applications are of interest to a user. A user might want to, for example, track a message's path through a set of applications, therefore the X Platform query engine allows query of multiple logs.
Querying in the X-Platform can be accomplished via a Programmatic API which provides the ability to query a set of objects using bean path style selectors via a QueryEngine. Querying can also be performed on the command line using the Transaction Log Tool via the xlogtool scripts provided with an X distribution or by. Command line queries are entered using XPQL, the X Platform Query Language, a SQL-like query grammar.
An X Platform transaction log is simply a series of Entries containing messages and/or state updates. This section discusses the mapping of such a log to a table structure suitable for query.
To understand how we query logs, one must first look at the structure of a log. A log is comprised of series of serialized Entries, These entries are made accessible for reading via the platform's RogLog.LogReader.Entry class:
For transaction logs, XPQL maps this object model into the following logical table structure to facilitate SQL-like querying. Each transaction log is modeled as a table, and each Entry in a transaction log is modeled as a row in the transaction log table. Each row's columns consist of the Entry class itself, Metadata and a column for each Entity and/or Message type present in the transaction log. For example, in a state replication log, we might have objects such as Customer, Employee, and OrderRequest. We would then have 4 columns in the table which represents this log and any given row of the table will contain the Entry fields and Metadata for the log Entry and will have exactly one Entity or Message type column populated, e.g.
Entry | Metadata | Employee | Customer | OrderRequest |
---|---|---|---|---|
Entry entryType=Put | Metadata transactionId=1 | NULL | Customer | NULL |
Entry entryType=Put | Metadata transactionId=2 | Employee | NULL | NULL |
Entry entryType=Update | Metadata transactionId=3 | NULL | Customer cid=2 firstName='Bobby' | NULL |
Entry entryType=Put | Metadata transactionId=4 | Employee | NULL | NULL |
Similarly, an event sourcing log's table will consist of one column for Metadata and an additional number of columns for each type of Message contained within the log. This allows us to SELECT from among each of the Entity/Message types that are relevant to the log in question.
Each of these columns is a composite column. The column may be selected as is in which case the object itself is returned, but it's fields can also be accessed via standard java bean paths. For example, you can refer to Entry.entryType, Metadata.inboundSequenceno, or Customer.firstName. The following query serves as an example of this.
query>SELECT Employee.firstName, Customer.firstName, Entry.transactionId, Entry.timestamp, \ metadata.isMessage \ FROM logs WHERE firstName = 'Bob' Query executed in 90ms. Use 'next' or 'dump' to display results query>next 10 | Employee.firstName| Customer.firstName| transactionId| timestamp|Metadata.isMessage| +--------------------+--------------------+----------------+---------------------------+------------------+ | null| Bob| 1|2014-01-27 05:28:41.832 PST| false| | Bob| null| 2|2014-01-27 05:28:41.835 PST| false| No more results |
In order to find and retrieve the relevant rows from a log table we will create indexes against relevant fields of the table. For each log file, e.g. /rdat/warehouse.log, we will create an index file, e.g. /rdat/warehouse.idx, that will contain these indexes. Indexes can be created for any field of any Entity or Message in the log table. For example, we could index Customer.lastName or we could index Order.itemId. Each of these indexes must be declared as either unique or non-unique. This decision determines whether the index can refer to multiple rows of the table. For example, in an event sourcing system we might have a unique index on NewOrderSingle.id. We would not, however, have a unique index on the Customer.id of a state replication system because we would expect to see one row for the initial PUT of the Entity and additional rows for its UPDATE and potential REMOVE messages. An index can also be created on fields that span multiple Entities/Messages. For example, if many or even every Message had a timestamp field we could create an INDEX against timestamp and thereby allow any Message, regardless of what type, to be retrieved by timestamp.
The Transaction Log Tool provides various utilities for working with X Platform logs including query via XPQL. The log tool can be launched via the xtxnlogtool.bat or xtxnlogtool.sh script provided with an X Platform distribution. See the Transaction Log Tool page for more details on launching the tool.
In the transaction log tool any open log is available for query, Logs are opened with the open command. Given a transaction log named OrderProcessingApp.log in the tool's working directory:
open OrderProcessingApp --Now eligible for SELECT statements close OrderProcessingApp --No longer considered in select statements |
The tool also allows for opening all logs in a given directory:
open . --Opens all logs in the tool's working directory open /logs/backupLogs --Opens all logs in the /logs/backupLogs directory. list logs --Lists all logs that are currently open close --Closes all open logs |
The query syntax is a subset of ANSI SQL SELECT. The basic form is:
SELECT <columns>
FROM <logs>
WHERE <conditions>
GROUP BY <columns>
ORDER BY <columns>
LIMIT [skip,]fetch
The WHERE, GROUP BY, ORDER BY and LIMIT clauses are optional.
Usually, you can simply specify a column and field by using the domain class's simple name and the bean path of the field, for example 'Order.orderId' (or even just 'orderId' if you don't care about the type that contains the field). This is typically enough for the query parser to find and locate the field of interest. In cases where ambiguity arises the full column and field syntax is specified below:
'*' | 'x_repository_name' | [Column][.][fieldPath] where: Column:= 'Entry'|'Metadata'|DomainType|AggregateColumn DomainType: SimpleName | QualifiedClassName SimpleName:= A simple classname e.g. 'Customer' QualifiedClassName:= The fully qualified class name with '.' replaced by '/' e.g. com/mycompany/MyMessage fieldPath:= A bean path for the field of interest using standard bean path case. AggregateColumn:= AggregateFunction(Column) AggregateFunction:= COUNT|MIN|MAX|SUM|AVG |
When specifying a field's path the field should be specified in a case consistent with standard bean paths described here. It is possible to use the `describe` command listed below to list an object's fields.
Column/Field | Notes |
---|---|
Customer | Selects all Customer objects from the log. This returns the Customer object itself rather than any of its fields which is useful in cases where you plan to dump raw results from the transaction log tool (or if you are working with the programmatic API). |
Customer.firstName | Selects/Matches any Customer object's first name field. Also matches any subclass of Customer that has a firstName field. If no such field exists this results in an error. This would not, for example, select or match Employee.firstName. |
Object.firstName | Selects/Matches any object that is a subclass of Object that has a firstName field. An error is reported if two types have a firstName field that differs in type. For example if Customer.firstName is a String, but CustomerFlags.firstName is a boolean this will produce an error. |
firstName | An unqualified fieldPath shorthand for Object.firstName or: selects a class with a simpleName of 'firstName'. |
Entry.object.firstName field | Nested select/match of Entry's object field for any object that has a firstName field. |
Customer.address.zipCode | Select of nested zipCode field in Customer object. |
address.zipCode | Unqualified select/match of nested address.zipCode field in any object that has one or: Select of address class and its field zipCode. This would be an error if there existed a class com.mycompany.address and a class Customer with an address.zipCode field. |
com/packageA/Customer com/packageB/Customer | Column type resolution can usually be achieved with just the domain object's simple name. Fully qualified the type is necessary in cases for which the type would otherwise be ambiguous. |
com/mycompany/Customer.firstName | Fully qualified select/match of Customer's firstName field. |
com/mycompany/Customer/firstName | Fully qualified select/match of the unconventionally named 'firstName' class |
com/mycompany/Customer$InnerClass.firstName | Fully qualified select/match of the InnerClass.firstName field from the inner class of Customer. |
* | Can be used in a Select clause to select all objects. This is synonymous with specifying 'Object' |
x_repository_name | Pseudo column that can be used in a SELECT clause to select the log name from which an entry came which can be useful in cases where you are querying across multiple logs. |
Metadata | Selects the metadata column (see examples below) |
Entry | Selects the Entry column |
COUNT(Customer) | A count of the number of non-null Customer entries with respect to the GROUP BY clause's aggregation key |
COUNT( * ) | A count of all of the entries with respect to the GROUP BY clause's aggregation key |
MIN(Customer.address.zipcode) | The least value found for this zipcode field with respect to the GROUP BY clause's aggregation key |
MAX(Customer.address.state) | The greatest value found for this state field with respect to the GROUP BY clause's aggregation key |
SUM(Employee.salary) | The arithmetic sum of all salary entries with respect to the GROUP BY clause's aggregation key |
AVG(Employee.bonus) | The arithmetic average or mean of all employee bonuses with respect to the GROUP BY clause's aggregation key |
The below fields can be selected using Entry.<ColumnName> or just <Column> name if the name is not ambiguous:
Column | Type | Notes |
---|---|---|
entryType | enum | Select the type of entry ('Put', 'Update', 'Send', 'Message', or 'Remove'). |
id | uuid | The id of the payload object in this entry. |
timestamp | long | The timestamp at which the entry was logged. See Dates and Timestamps below for more details. |
transactionId | long | The AEP transactionId for the entry |
stableTransactionId | long | The Stable transaction id for the entry (id of a previous transaction that commit of this entry marks as stable) |
commitStart | boolean | Whether this entry demarcates the first entry in an AEP transaction |
commitEnd | boolean | Whether this entry demarcates the last entry in an AEP transaction. An AEP transaction isn't considered to be complete until the 'commitEnd' entry is written. |
className | String | The fully qualified name of the payload object (e.g. message or entity). |
simpleClassName | String | The simple class name of the payload object. |
entrySize | int | The serialized size of the entry including the log entry metadata. |
payloadSize | int | The serialized size of the contained payload object. |
logName | String | The name of the log from which this entry was read. |
filePosition | long | The position of the entry in the transaction log |
object | Selects the object associated with the entry. This can be null if the entryType is 'Remove' | |
metadata | The metadata for the object associated with the entry. This is never null even for removes. |
You can see a listing of the available Entry fields by issuing: describe entry |
All domain objects in a transaction log implement the IRogMetadata interface which means that all entries have metadata. A listing of metadata fields can be found by consulting the javadoc for com.neeve.rog.IRogMetadata.
The following are all valid ways of representing the isMessage field:
Column | Notes |
---|---|
Metadata.isMessage | Select Metadata column and isMessage field |
Entry.metadata.isMessage | Select Entry column, its metadata.isMessagenested bean path |
metadata.isMessage | Unqualified select of Entry column, selecting its metadata.isMessagenested bean path |
Entry.object.isMessage | Select Entry column, its object.isMessagenested bean path (all object implement metadata) |
object.isMessage | Unqualified select of Entry column, selecting its object.isMessagenested bean path (all objects implement metadata) |
Object.isMessage | Wildcard select of any object type (and all objects have metadata) |
isMessage | Unqualified select of isMessage(all objects have metadata) |
You can see a listing of the available metadata fields by issuing: describe metadata |
Note that: Customer.transactionId and transactionId are different, the former only selects Customer transactionIds while the latter selects all transactionIds |
The types in the log can be displayed using the 'list' command which displays a list of the types available through the object factories registered with the log:
list types |
The fields of a type that are available for query can be described with the describe command:
describe Customer --unqualified describe com/mycompany/Customer --fully qualified |
This command will list out each of the Customer class fields. If Customer has an embedded address field of type Address, the fields of Address can be described with:
describe Customer.address --unqualified describe com/mycompany/Customer.address --fully qualified |
Conditions are of the general form: <field> <operator> <value(s)>
There is an implicit condition on any query. This is that only rows will be returned in which the selected entities/messages are not null. Hence "SELECT Customer" would not return log entries for Addresses or Orders. See "Rows of a ResultSet", below, for further details.
Conditions may be joined by either AND or OR, with parentheses specifying order or application.
Fields are from Entry, Metadata, messages or entities. They may either be qualified by a particular type, e.g. Customer.firstName = 'Susan', or they may be used without a containing type, e.g. symbol = 'IBM'. In the latter case, any entities/messages with such a field would be returned.
quantity IN (40, 43, 44)
Some notes on the use of fields are in order with respect to performance. Deserializing an Entry's payload: Entry.object is comparatively quite costly. Consequently, it is preferable to place predicates that do not require deserialization first in the WHERE clause. Such fields include:
Hence the following WHERE clause is an example of this approach, in that places predicates against Entry.timestamp prior to other predicates that require the deserialization of the payload.
WHERE Entry.timestamp BETWEEN 1421271440000 AND 1421271449255 AND firstName = 'John' |
The query engine's optimizer is being enhanced in a coming release to identify such predicates so that they can be evaluated first, regardless of where they are located in the WHERE clause.
The GROUP BY clause is optional, and when present specifies the granularity to which a query's result set is aggregated. Specifically, this means that the result set of the query is aggregated to contain one row per distinct combination of values for the columns specified in the GROUP BY clause. Consider the following data.
id | firstName | lastName | city |
---|---|---|---|
1 | John | Smith | Boston |
2 | Emily | Smith | Boston |
3 | Alice | Jones | Chicago |
4 | John | Williams | Chicago |
An aggregate query might yield only the distinct last names and cities:
SELECT lastName, city FROM logs GROUP BY lastName, city |
lastName | city |
---|---|
Williams | Chicago |
Jones | Chicago |
Smith | Boston |
An aggregate function, discussed in more detail later, could summarize the number for instances of each last name/city.
SELECT lastName, COUNT(*) FROM logs GROUP BY lastName |
lastName | city | COUNT( * ) |
---|---|---|
Williams | Chicago | 1 |
Smith | Boston | 2 |
Jones | Chicago | 1 |
Aggregate functions are only applied to other non-aggregate columns/fields. Further, aggregate functions are only applied to the non-null values of those columns/fields.
Function | Description |
---|---|
COUNT | Returns the cardinality of non-null values, or 0 if all values are null. Hence COUNT(Customer.middleName) will return the number of Customer entries with non-null middleName fields. |
MIN | Returns the minimum non-null value obtained, or null if all values are null. MIN can only be applied to Comparable fields. |
MAX | Returns the maximum non-null value obtained, or null if all values are null. MAX can only be applied to Comparable fields. |
SUM | Returns the arithmetic sum of all non-null values, or null if all values are null. SUM can only be applied to numeric fields, fields that implement Number. |
AVG | Returns the arithmetic average or mean of all non-null values, or null if all values are null. AVG can only be applied to numeric fields, fields that implement Number. Note, in particular for AVG, that null values are completely excluded from the computation. Hence, averaging: {null, 4, null, null, 5, 0} yields (4+5+0)/3 = 3. |
The ORDER BY clause is also optional. It allows a sort order to be specified with respect to one or more fields. Each field's ordering may optionally be specified as either ascending, the default, or descending.
For example,
SELECT lastName, city FROM logs ORDER BY city |
would return rows for 'Boston' before 'Chicago'. Because no additional ORDER BY fields are specified the order of rows within each such group is arbitrary. If we additionally specified,
ORDER BY city, lastName |
then each row within a given city grouping would be in alphabetic order by last name. Finally, the modifier ASC, for ascending, or DESC for descending may be appended after one or more ORDER BY fields. The default ordering is ascending, so the following clause is equivalent to the one above.
ORDER BY city ASC, lastName ASC |
If, instead, we wished to have the last names in descending order we would specify either of the below, equivalent clauses.
ORDER BY city, lastName DESC ORDER BY city ASC, lastName DESC |
At this time ORDER BY sorting may not be applied to aggregate queries, that is queries that have aggregate fields and/or a GROUP BY clause. This is a temporary restriction and will be removed in an upcoming release. |
The LIMIT clause restricts the number of rows to be fetched and optionally skips a given number of rows. For example,
SELECT * FROM logs LIMIT 10 |
will only return the first 10 rows from the log(s). An optional, two argument form specifies the number of rows to be skipped followed by the number of rows to be fetched. The query,
SELECT * FROM logs LIMIT 50, 10 |
will skip 50 rows and then return the next 10. The LIMIT clause is often used with an ORDER BY clause to specify the number of entries to be fetched with respect to a particular ordering of the result set. E.g.
SELECT * FROM logs ORDER BY transactionDate DESC LIMIT 10 |
would return the most recent 10 entries with respect to the transactionDate.
Where possible Date and timestamp fields displayed within the tool will be displayed using format defined by the timestampFormat and timestampTZ tool configuration properties which can be set at any time. The current values of the formats can be viewed with the following command:
query>get -a timestamp | Property| Value| Description| +---------------+-------------------------+------------------------------------+ |timestampFormat|yyyy-MM-dd HH:mm:ss.SSS z|The default timestamp format to use | | | |when displaying results (use quotes | | | |when setting). | |timestampTZ |DEFAULT |The timezone in which to display | | | |timestamps. A value of 'DEFAULT' | | | |uses the current system timezone | +---------------+-------------------------+------------------------------------+ |
See the Transaction Log Tool page for more detailed on these properties.
Dates and timestamps can be passed as conditions pertaining to fields that have a type of date or long.
Standard SQL Date and Timestamp literals are accepted:
Note that a date literal is specified by DATE'yyyy-MM-dd', and that the literal begins with DATE. A timestamp literal, likewise, must begin with TIMESTAMP. A timestamp literal includes hours, minutes, seconds and milliseconds. The timezone field is optional.
While using DATE and TIMESTAMP literals are a surefire way to enter dates and timestamps into a query it can be cumbersome and also inconvenient in places where the timestamp output is not configured to match standard SQL syntax. Consequently, several additional mechanism for parsing a non-literal specified timestamp:
If the supplied value is a long, then it used to construct a Date interpreted as the number of milliseconds since the epoch.
An attempt is made to parse the date with the standard SQL format:
yyyy-MM-dd HH:mm:ss[.f...] [z]
where the first 2 tokens are passed into java.sql.Timestamp.valueOf(java.lang.String) method and the final remaining argument is parsed via TimeZone.getTimeZone(String) and used as the timezone offset.
A plain date of the form yyyy-MM-dd is interpreted to mean the beginning of that day in the current timezone.
To accommodate cut and paste from a result, an attempt will be made to parse the date using the timestamp format specified by the tool's timestampFormat config property.
Examples:
-- Select Orders with a receipt Date after April 1st 2010 at 00:00 AM: -- using a DATE literal WHERE Order.receiptDate > DATE'2010-04-01' -- Select Orders with a receipt Date after April 1st 2010 at 00:00 AM: WHERE Order.receiptDate > '2010-04-01' -- Select yesterday's orders WHERE Order.receiptDate BETWEEN 'yesterday' AND 'today' -- Orders since yesterday WHERE Order.receiptDate >= 'yesterday' -- Select any order with a timestamp of yesterday at 00:00 AM ... NOT any order yesterday. WHERE Order.receiptDate = 'yesterday' -- Orders between yesterday at 3pm and today at 3pm: WHERE Order.receiptDate BETWEEN 'yesterday at 3pm' AND 'today at 3PM' -- Orders received in the last 30 minutes: WHERE Order.receiptDate > '30 minutes ago' |
Some timestamps in transaction log entry metadata fields are captured with microsecond resolution. Examples of such fields are preProcessingTs, outTs, and enqueueTs. These field contain additional 'Micros' suffixed accessors that allow the microsecond level timestamp to be retrieved (e.g. preProcessingTsMicros, outTsMicros and enqueueTsMicros). The transaction log tool doesn't currently support any timestamp formatting of these microsecond level timestamps; they are displayed simply as longs and may be used for computing microsecond deltas. |
A class literal consists of a string containing the fully qualified class name, for example:
WHERE salesOrder.timestamp.class IN ('java.lang.Long', 'java.util.Date') |
For factory-generated classes the simple name may be used, as long as it is unambiguous, for example:
AND Entry.object.class = 'Customer' |
Inner classes are referenced with the standard Java syntax, e.g. 'java.util.Map$Entry'. Anonymous inner classes use the standard naming as well, 'com.foo.Bar$1'.
Text/string scalars are defined via standard SQL in single quotes. Within a string double quotes may be used freely and single quotes are specified as two single quotes in a row. E.g.
Repositories should be given aliases that are ANSI SQL identifiers. This means that the identifier must begin with a letter and may contain letters, numbers and underscores. In some cases this is not possible, for example, if the contents of an entire directory are opened via "open ." or "open some/path" then the repositories/logs are specified by their corresponding file names. These may well not be valid SQL identifiers. In these cases, the repository names must be escaped by quoting them with double quotes. E.g.
First, we will query a repository, finding Customers with a particular first & last name and an entry type of PUT.
SELECT Customer FROM orderProcessingApp WHERE Customer.lastName = 'Doe' AND Customer.firstName = 'John' AND Entry.entryType = 'PUT' |
Inner classes may be specified with the standard Java class nomenclature. So, to select all of the History entries on April 1st 2013 you can issue:
SELECT Customer$History FROM orderProcessingApp WHERE Customer$History.lastOrderDate = DATE '2013-04-01' |
We can also select fields of interest via standard Java bean-path notation, For example, to find the transaction in which Customer 'John Doe' was created you could issue:
SELECT transactionId, Customer.firstName, Customer.address.zipcode FROM orderProcessingApp WHERE Customer.lastName = 'Doe' AND Customer.firstName = 'John' AND Entry.entryType = 'Put' |
The following query demonstrates a query across all 'columns' in the schema. In the table model this means across all object types, so it would select any object that has matching metadata criteria:
SELECT * FROM orderProcessingApp WHERE transactionId = '30' OR Entry.entryType = 'MESSAGE' |
The following query demonstrates aggregation of sales across zip codes.
SELECT Shipment.address.zipcode, SUM(Shipment.total), AVG(Shipment.total) FROM orderProcesingApp WHERE Shipment.address.state = 'TX' GROUP BY Shipment.address.zipcode |
In cases where object names might be ambiguous, the fully qualified class name can be specified to resolve the ambiguity. The package structure is delimited with "/".
SELECT com/company/ordering/Request FROM orderProcessingApp WHERE com/company/ordering/Request.customerId = '12345' AND com/company/ordering/Request.quantity >= 1 AND Entry.entryType = 'Send' SELECT com/company/shipping/Request FROM orderProcessingApp WHERE com/company/shipping/Request.customerId = '12345' AND com/company/shipping/Request.cost < 100.00 AND Entry.entryType = 'Message' |
Finally, if you have added multiple logs to the query engine you can select all of them via the aggregate view "logs". This will return the results of querying each log, presented sequentially, i.e. log1's results, followed by log2's results, and on until the results of the last log are returned.
SELECT x_repository_name, NewOrderSingle FROM logs WHERE NewOrderSingle.symbol = 'IBM' AND NewOrderSingle.id IN (4001,4007,4008) AND NewOrderSingle.trader BETWEEN 'Johnson' AND 'Jones' |
The execution of a query yields a RogLogResultSet. This is a navigable container that implements Iterator and has a count() method. Note that computing count() may result in a delay in some cases. Each row of the RogLogResultSet is comprised of the columns specified in the SELECT clause. A RogLogResult exposes its columns through accessors similar to those found in a java.sql.ResultSet. Specifically, this means that SELECT Customer will result in a single column RogLogResultSet that has a single Customer object. Similarly, SELECT Metadata.seqNo, Entry.entryType, Address.city, Address.state would yield a RogLogResultSet comprised of four columns of type Long, Enum, String & String.
A result set will contain one row for each relevant entry/node of the underlying repository/log. It is essential to understand what the word relevant means in this context. Consider the following query:
SELECT Customer FROM orderProcessingApp WHERE Customer.lastName = 'Doe' AND Customer.firstName = 'John' AND Entry.entryType = 'Put' |
Note that the only domain entity or message referred to is Customer. Because of this only entry that contain a Customer as their payload will be returned. If there are entries in a log that contain an Order as their root payload they will not be returned, regardless of whether there is an Order.customer field of type Customer. If, however, NO domain entities/messages are referenced then all rows in the log/repository will be returned. Consider:
SELECT Entry.entryType, transactionId FROM orderProcessingApp WHERE Entry.entryType = 'Put' |
This query returns every entry from a log that is a 'Put' regardless of the type of object it contains. If multiple domain entities/messages are referenced, then each underlying entry/node whose domain entity/message is referenced will be returned. Hence the following query will return entries from a log whose root payload is either a Customer or an Order.
SELECT Customer.id, Order.shipDate FROM orderProcessingApp WHERE transactionId= 1234 |
Finally, if only unqualified fields are used in a query, e.g. firstName vs the qualified Customer.firstName, a separate implicit filtering is applied. In this case only entries that contain at least one of the selected fields are returned. Consider:
SELECT firstName, address.province FROM orderProcessingApp WHERE firstName = 'Bob' |
In this case, only entries that contain either a firstName (e.g. Customer or Employee) or an address.province (e.g. Customer or Warehouse) would be returned. It is not necessary that these fields be populated, as long as they exist on the entry null values will be returned, including entries containing only null values for these fields.
We consider address.province more closely. If the class Address does not declare such a field, but only its subclass CanadianAddress has a field province then address must be such a CanadianAddress. Entries containing a UsAddress (with perhaps a state field) would not be returned. Neither would entries with a null address be returned because Address itself does not have a province field.
You can navigate and display results from a select statement in the Transaction Log Tool with the next, rewind, skip, count and dump commands.
See Transaction Log Tool for more information |
The count command gives a count of the number of results. In its current form this results in a scan of all of the results to perform the count
Execute 'help count' in the tool for more information
The rewind command resets the query result to the beginning.
Execute 'help rewind' in the tool for more information
Skips the next result or next N results, for example skip 10 would skip the next 10 results.
Execute 'help skip' in the tool for more information
The next command prints the next or next 'n' results to the console. for example:
query>SELECT Customer.firstName, Entry.transactionId, timestamp, entryType from logs WHERE firstName = 'Bob' Query executed in 90ms. Use 'next' or 'dump' to display results query>next 10 | Customer.firstName| transactionId| timestamp| entryType | +--------------------+--------------------+---------------------------+------------------+ | Bob| 1|2014-01-27 05:28:41.832 PST| 'Put'| | Bob| 2|2014-01-27 05:28:41.835 PST| 'Update'| No more results |
By default the tool dumps in a textual tabular format. You can also issue next -d to print the details of the corresponding entry. You can also specify -c to print results in csv format which can be useful when values would otherwise be truncated.
Execute 'help next' in the tool for more information
The dump command dumps all the results to either the console or a file. When dumping to a file, the file suffix .csv will cause the output to be formatted as csv and the file suffix will .html will cause the output to be formatted as an html table. Specifying the -d flag will dump detailed results for each entry that matched the query.
Execute 'help dump' in the tool for more information
The following syntax is used to create indexes see Column and Field Syntax
CREATE <optional:UNIQUE> INDEX <name> ON <repository>(<indexField>) CREATE UNIQUE INDEX customer_u1 ON some_repo(com/foo/bar/Customer.id) --Creates a non unique index on the address.zipcode field of Customer: CREATE INDEX customer_n1 ON another_repo(com/foo/bar/Customer.address.zipcode) CREATE UNIQUE INDEX metadata_id ON repo(Metadata.id) |
Note that a default package can be specified. In this context either slashes or dots are accepted as delimiters.
SCHEMA com.foo.bar SCHEMA com/foo/bar |
With such a setting we could create two of the above indexes via:
CREATE INDEX customer_n1 ON repo(Customer.address.zipcode) CREATE UNIQUE INDEX customer_u1 ON repo(Customer.id) |
The Java API is in beta and still is under development. The API below is fairly stable, but subject to change. |
First we must instantiate a QueryEngine and a Repository. For TransactionLogs, a RogLog serves as a repository for queries. The following snippet demonstrates creating a transaction log and adding it to a query engine.
// Assuming an orderProcessingApp.log in a folder indicated by: // System.getProperty("NVROOT")/rdat // OR // System.getProperty("nv.data.directory") RogLog log = RogLogFactory.createLog("orderProcessingApp"); log.open(); RogLogRepository repository = log.asRepository(); repository.open(); RogLogQueryEngine queryEngine = RogLogFactory.createQueryEngine(); queryEngine.addRepository(repository, "orderProcessingApp"); |
Any number of repositories may be added to an engine. Unless a repository is specified in a query (see query details below), queries are run against all repositories associated with an engine. A repository may be remove from the engine by its alias.
queryEngine.removeRepository("orderProcessingApp"); |
The fields that are indexed, selected in queries, and used in predicates in queries (details below) are all defined similarly. A field is defined by the bean path that leads to its value. Since we are dealing with log entries, the bean path for any message or entity will start with "object" since you would retrieve it from Entry.getObject()
An example of a field might be a customer's first name customer, which would be accessed in Java via String val = ((Customer)entry.getObject()).getFirstName().
RogLogField<String> customerFirstName = queryEngine.getField(Customer.class, "firstName"); RogLogField<String> customerLastName = queryEngine.getField(Customer.class, "lastName"); RogLogField<Long> customerId = queryEngine.getField(Customer.class, "customerId"); |
Let's look at how the customerFirstName field was created:
A field could also have a nested path, such as ((Customer)entry.getObject()).getAddress().getZipcode().
RogLogField<Short> customerZipcode = queryEngine.getField(Customer.class, "address.zipcode"); |
Aggregate fields are specified directly from their parent field. Hence, using the above customerZipcode, we could effect "COUNT(Customer.address.zipcode)" in the Java API with, e.g.
customerZipcode.count() customerZipcode.min() customerZipcode.max() employeeSalary.sum() employeeSalary.average() |
Either unique or non-unique indexes may be created. Indexes are defined as bean paths based on an entity or message. They are typed by their entity/message as well as the final field to which they refer.
RogLogField<String> customerZipcode = queryEngine.getField(Customer.class, "address.zipcode"); RogLogField<Integer> customerId = queryEngine.getField(Customer.class, "customerId"); queryEngine.createIndex(customerZipcode, false); // non-unique index queryEngine.createIndex(customerId, true); // unique index |
The simplest way to execute queries using the API is to pass in an XQL query:
// Select All Customers from open logs with a customer id of 2: RogLogQuery query = queryEngine.createQuery("SELECT * FROM logs WHERE Customer.customerId = 2"); RogLogResultSet rs = queryEngine.execute(query); while (rs.next()) { Customer customer = (Customer)rs.getLogEntry().getObject(); System.out.println(customer.toJsonString()); } rs.close(); |
The above query will select all log entries for all logs added to the query engine which contains a Customer object with a customerId of 2.
For more complex use cases one can use query engine's fluent APIs to build up queries programmatically. The following examples discuss how the fluent API
//Creating a query: RogLogQuery query = queryEngine.createQuery(); query.select(customerFirstName) .select(customerLastName) .select(customerZipcode) .from("orderProcessingApp") .where(customerId.is((long)2)); RogLogResultSet rs = queryEngine.execute(query); // do something with results: while (rs.next()) { Customer customer = (Customer) rs.getLogEntry().getObject(); System.out.println(customer.toJsonString()); } // Close the result set rs.close(); //Same query created using xpql query = queryEngine.createQuery("SELECT Customer.firstName, Customer.address.zipcode " + "FROM orderProcessingApp WHERE Customer.customerId = 2"); |
Note, that because the QueryEngine may need to allocate substantial resources to serve query results, application code must close the returned ResultSet when done working with it. Failing to close a result set may lead to resource leaks. |
Compound predicates may be formed with the conjunctions AND and OR which can be built fluently from another Predicate. These may be nested arbitrarily:
And, the SQL equivalent of "SELECT *" is available via RogLogQueryFieldResolver.SELECT_ALL_FIELD:
RogLogField<Entry.Type> entryType = queryEngine.getField("entryType"); query = queryEngine.createQuery().select(RogLogQueryFieldResolver.SELECT_ALL_FIELD) .from("orderProcessingApp") .where(customerLastName.is("Doe") .and(customerFirstName.like("J%")) .and(entryType.is(Entry.Type.Put)) ); rs = queryEngine.execute(query); rs.close(); |
RogLogField<Long> timestamp = queryEngine.getField("timestamp"); query = queryEngine.createQuery().select(customerZipcode) .select(customerId.count()) .select(customerId.min()) .select(customerId.max()) .from("logs") .where(timestamp.greaterThan(UtlDataTypes.parseAsDate("one day ago").getTime())) .groupBy(customerZipcode); rs = queryEngine.execute(query); rs.close(); |
An ORDER-BY clause is indicated by calls to the orderBy() method on the query. This method has a two-parameter form that takes a field as well as a SortOrder value, ASC or DESC. The single parameter form only takes a field and is equivalent to calling the two parameter version with SortOrder.ASC as the second parameter.
query = queryEngine.createQuery().select(customerLastName) .select(customerFirstName) .from("logs") .orderBy(customerLastName, SortOrder.ASCENDING); |
A LIMIT clause is specified by a call to either the single or two parameter method limit() on the query.
query = queryEngine.createQuery().select(customerLastName) .select(customerFirstName) .from("logs") .orderBy(customerLastName, SortOrder.ASCENDING) .limit(1); |
The ResultSet that we obtain from a query is a subset of the standard Java ResultSet API. To that end, it allows iteration by repeated calls to next() and column value may be accessed via the familiar ResultSet getters. Note that columns are indexed are 1-based. For example, if a query only selected Customer as its single column its results could be read via:
query = queryEngine.createQuery(); query.select(queryEngine.getField("entry.object")) .from("orderProcessingApp"); rs = queryEngine.execute(query); while (rs.next()) { Customer customer = rs.getObject(1); } |
Similarly, the first sample query in this section specifies columns for customerFirstName, customerLastName and customerZipcode. This query's ResultSet could be read via:
while(rs.next()) { String firstName = rs.getString(1); String lastName = rs.getString(2); Short zipcode = rs.getShort(3); } |
Finally, the underlying log entry containing both the object and metadata may be obtained via a call to getLogEntry(), so regardless of the fields selected the corresponding entry will always be available (unless the query is an aggregate query).
while (rs.next()) { Entry entry = rs.getLogEntry(); IRogMetadata metadata = entry.getMetadata(); Object object = entry.getObject(); } |