[go: nahoru, domu]

Skip to content

Tickerplant (TP) using tick.q

tick.q is available from KxSystems/kdb-tick

Overview

All incoming streaming data is processed by a kdb+ process acting as a tickerplant. A tickerplant writes all data to a tickerplant log (to permit data recovery) and publishes data to subscribed clients, for example a RDB.

Customization

tick.q provides a starting point to most environments. The source code is freely available and can be tailored to individual needs.

Schema file

A tickerplant requires a schema file. A schema file describes the data you plan to capture, by specifying the tables to be populated by the tickerplant environment. The datatypes and attributes are denoted within the file as shown in this example:

quote:([]time:`timespan$(); sym:`g#`symbol$(); bid:`float$(); ask:`float$(); bsize:`long$(); asize:`long$(); mode:`char$(); ex:`char$())
trade:([]time:`timespan$(); sym:`g#`symbol$(); price:`float$(); size:`int$(); side:`char$())

The default setup requires the first two columns to be time and sym.

Real-time vs Batch Mode

The mode is controlled via the -t command line parameter. Batch mode can alleviate CPU use on both the tickerplant and its subscribers by grouping together multiple ticks within the timer interval prior to sending/writing. This comes at the expense of tickerplant memory (required memory to hold several ticks) and increased latency that may occur between adding to the batch and sending. There is no ideal setting for all deployments as it depends on the frequency of the ticks received. Real-time mode processes every tick as soon as they occur.

A feedhandler can be written to send messages comprising of multiple ticks to a tickerplant. In this situation real-time mode will already be processing batches of messages.

End-of-day

The tickerplant watches for a change in the current day. As the day ends, a new tickerplant log is created and the tickerplant informs all subscribed clients, via their .u.end function. For example, a RDB may implement .u.end to write down all in-memory tables to disk which can then be consumed by a HDB.

Tickerplant Logs

Log files are created using the format <tickerplant log dir>/<schema filename><date> e.g. tplog/sym2022.02.02. These record all published messages and permit recovery by downstream clients, by allowing them to replay messages they have missed. The directory used should have enough space to record all published data.

As end-day-day causes a file roll, a process should be put in place to remove old log files that are no longer required.

The tickerplant does not replay log files for clients, but exposes log file details to clients so they can access the current log file

Publishing to a tickerplant

Feed handlers publish ticks to the tickerplant using IPC. These can be a kdb+ process or clients written in any number of different languages that use one of the available client APIs. Each feed sends data to the tickerplant by calling the .u.upd function. The call can include one or many ticks. For example, publishing from kdb+:

q)h:hopen 5010                                                               / connect to TP on port 5010 of same host
q)neg[h](".u.upd";`trade;(.z.n;`APPL;35.65;100;`B))                          / async publish single tick to a table called trade
q)neg[h](".u.upd";`trade;(10#.z.n;10?`MSFT`AMZN;10?10000f;10?100i;10?`B`S))  / async publish 10 ticks of some random data to a table called trade
...

Subscribing to a tickerplant

Clients, such as a RDB or RTE, can subscribe by calling .u.sub over IPC.

q)h:hopen 5010                          / connect to TP on port 5010 of same host
q)h".u.sub[`;`]"                        / subscribe to all updates
q)h:hopen 5010                          / connect to TP on port 5010 of same host
q)h".u.sub[`trade;`MSFT.O`IBM.N]"       / subscribe to updates to trade table that contain sym value of MSFT.O or IBM.N only

Clients should implement functions upd to receive updates, and .u.end to perform any end-of-day actions.

Usage

q tick.q SRC DST [-p 5010] [-t 1000] [-o hours]
Parameter Name Description Default
SRC schema filename, loaded using the format tick/<SRC>.q sym
DST directory to be used by tickerplant logs. No tickerplant log is created if no directory specified <none>
-p listening port for client communications 5010
-t timer period in milliseconds. Use zero value to enable real-time mode, otherwise will operate in batch mode. real-time mode (with timer of 1000ms)
-o utc offset localtime

Standard kdb+ command line options may also be passed

Variables

Name Description
.u.w Dictionary of registered clients interest in data being processed i.e. tables->(handle;syms)
.u.i Msg count in log file
.u.j Total msg count (log file plus those held in buffer) - used when in batch mode
.u.t Table names
.u.L TP log filename
.u.l Handle to tp log file
.u.d Current date

Functions

Functions are open source & open to customisation.

.u.endofday

Performs end-of-day actions.

.u.endofday[]

Actions performed:

  • inform all subscribed clients (for example, RDB/RTE/etc) that the day is ending by calling .u.end
  • increment current date (.u.d) to next day
  • roll log if using tickerplant log, i.e.
    • close current tickerplant log (.u.l)
    • create a new tickerplant log file i.e set .u.l, call .u.ld with new date

.u.tick

Performs initialisation actions for the tickerplant.

.u.tick[x;y]

Where

  • x is the name of the schema file without the .q file extension i.e. SRC command line parameter
  • y is the directory used to store tickerplant logs i.e. DST command line parameter

Actions performed:

  • call .u.init[] to initialise table info, .u.t and .u.w
  • check first two columns in all tables of provided schema are called time and sym (throw timesym error if not)
  • apply grouped attribute to the sym column of all tables in provided schema
  • set .u.d to current local date, using .z.D
  • if a tickerplant log filename was provided:
    • set .u.L with a temporary value of `:<log filename>/<schema filename>.......... (will have date added in next step)
    • create/initialise the log file by calling .u.ld, passing .u.d (current local date)
    • set .u.l to log file handle

.u.ld

Initialise or reopen existing log file.

.u.ld[x]

Where x is current date. Returns handle of log file for that date.

Actions performed:

  • using .u.L, change last 10 chars to provided date and create log file if it doesnt yet exist
  • set .u.i and .u.j to count of valid messages currently in log file
  • if log file is found to be corrupt (size bigger than size of number of valid messages) an error is returned
  • open new/existing log file

.u.ts

Given a date, runs end-of-day procedure if a new day has started.

.u.ts[x]
Where x is a date.

Compares date provided with .u.d. If no change, no action taken. If one day difference (i.e. a new day), .u.endofday is called. More than one day results in an error and the kdb+ timer is cancelled.

.u.upd

Update tickerplant with data to process/analyse. External processes call this to input data into the tickerplant.

.u.upd[x;y]
Where

  • x is table name (sym)
  • y is data for table x (list of column data, each element can be an atom or list)

Batch Mode

Add each received message to the batch and record message to the tickerplant log. Batch is published on running timer.

Actions performed: * If the first element of y is not a timespan (or list of timespan) * inspect .u.d, if a new day has occured call .z.ts * add a new timespan column populated with the current local time (.z.P). If mutiple rows of data, all rows receive the same time. * Add data to current batch (i.e. new data y inserted into table x), which will be published on batch timer .z.ts. * If tickerplant log file created, write upd function call & params to the log and increment .u.j so that an RDB can execute what was originally called during recovery.

Realtime Mode

Publish each received message to all interested clients & record message to tickerplant log.

Actions performed:

  • Checks if end-of-day procedure should be run by calling .u.ts with the current date
  • If the first element of y is not a timespan (or list of timespan), add a new timespan column populated with the current local time (.z.P). If mutiple rows of data, all rows receive the same time.
  • Retrieves the column names of table x
  • Publish data to all interested clients, by calling .u.pub with table name x and table generated from y and column names.
  • If tickerplant log file created, write upd function call & params to the log and increment .u.i so that an RDB can execute what was originally called during recovery

.z.ts

Defines the action for the kdb+ timer callback function .z.ts.

The frequency of the timer was set on the command line (-t command-line option or \t system command).

Batch Mode

Runs on system timer at specified interval.

Actions performed:

  • For every table in .u.t
    • publish data to all interested clients, by calling .u.pub with table name x and table generated from y and column names.
    • reapply the grouped attribute to the sym column
  • Update count of processed messages by setting u.i to u.j (the number of batched messages).
  • Checks if end-of-day procedure should be run by calling .u.ts with the current date

Realtime Mode

If batch timer not specified, system timer is set to run every 1000 milliseconds to check if end-of-day has occured. End-of-day is checked by calling .u.ts, passing current local date (.z.D).

Pub/Sub functions

tick.q also loads u.q which enables all of its features within the tickerplant.