Query Examples

TODO: more examples and patterns

Counting

Count impressions and unique users

Just do with DISTINCT as same as RDBMS.

SELECT COUNT(DISTINCT cookie.id) AS uu,
       COUNT(*) AS imps
FROM impressions.win:time_batch(1 hours)
WHERE cookie.valid

Count logs of each patterns

We want to count access logs of specified path per 10 minutes. Total, successes and internal server errors are required.

SELECT
  path,
  COUNT(1, status=200) AS success_count,
  COUNT(1, status=500) AS servererror_count,
  COUNT(*) AS count
FROM AccessLog.win:time_batch(10 min, 0L)
WHERE service='myservice' AND path LIKE '/api/%'
GROUP BY path

Esper's COUNT() can receive optional 2nd argument as a filter expression, as filter for countings. This 2nd optional argument are also exists for AVG(), SUM() and FMAX(), FMIN().

NOTE: When using COUNT() with second argument, DO NOT USE * for first argument. COUNT(*, filter_expression) is parsed as COUNT(filter_expression), and counting result is a number of records which 'filter_expression' IS NOT NULL. To avoid this situation, specify constant like '1'.

Count moving average of active unique users

Simply, we can use COUNT(DISTINCT user_id) with win:time(range) to count unique users. But win:time() puts output records per all input records. These are too many to get summarized data/graph.

So, use LOOPBACK() and 2nd query with win:time_batch() to summarize output data of moving average, like below:

-- 1st query
-- with query group 'LOOPBACK(active_users_5min)'
SELECT COUNT(DISTINCT user_id) AS active_users
FROM activity.win:time(5 min)

-- 2nd query
SELECT MAX(active_users) AS active_users
FROM active_users_5min.win:time_batch(10 sec)

By these queries, we can get 1 output of active unique users per 10 seconds. This event rate is good for many purposes.

(From this entry: beatsync.net)

Alerting

Finding attackers

Query example for finding too many requests from single remote_host, to find DoS attacker:

SELECT
  remote_host,
  COUNT(*) as requests
FROM accesslog.win:time_batch(1 min)
GROUP BY remote_host
HAVING COUNT(*) >= 1000

Or to find brute force attacker:

SELECT
  remote_host,
  COUNT(*) AS failures
FROM authlog.win:time_batch(1 min)
WHERE result = 'failed'
GROUP BY remote_host
HAVING COUNT(*) >= 60

These queries are very simple, and we can fix thresholds or other conditions anytime.

(From this entry: Developers.IO)

Summarize application logs

To tell application error logs to programmers, but not to flood notification center by one kind of messages:

SELECT
  level,
  file, line,
  LAST(host) AS host,
  LAST(message) AS message,
  COUNT(*) AS count
FROM application_logs.win:time_batch(5 min)
WHERE level IN ('error', 'critical')
GROUP BY level, file, line

Logs from a same point of code (same file, line and log-level) are expected to be a same log. Programmers can understand its severity by level and count, and get a first investigation step at a glance by sample of log message by LAST(message) in output events.

(From this entry: kawamuray's blog)