TODO: more examples and patterns
Just do with
DISTINCT as same as RDBMS.
SELECT COUNT(DISTINCT cookie.id) AS uu, COUNT(*) AS imps FROM impressions.win:time_batch(1 hours) WHERE cookie.valid
We want to count access logs of specified path per 10 minutes. Total, successes and internal server errors are required.
SELECT path, COUNT(1, status=200) AS success_count, COUNT(1, status=500) AS servererror_count, COUNT(*) AS count FROM AccessLog.win:time_batch(10 min, 0L) WHERE service='myservice' AND path LIKE '/api/%' GROUP BY path
COUNT() can receive optional 2nd argument as a filter expression, as filter for countings. This 2nd optional argument are also exists for
NOTE: When using
COUNT() with second argument, DO NOT USE
* for first argument.
COUNT(*, filter_expression) is parsed as
COUNT(filter_expression), and counting result is a number of records which 'filter_expression' IS NOT NULL. To avoid this situation, specify constant like '1'.
Simply, we can use
COUNT(DISTINCT user_id) with
win:time(range) to count unique users. But
win:time() puts output records per all input records. These are too many to get summarized data/graph.
LOOPBACK() and 2nd query with
win:time_batch() to summarize output data of moving average, like below:
-- 1st query -- with query group 'LOOPBACK(active_users_5min)' SELECT COUNT(DISTINCT user_id) AS active_users FROM activity.win:time(5 min)
-- 2nd query SELECT MAX(active_users) AS active_users FROM active_users_5min.win:time_batch(10 sec)
By these queries, we can get 1 output of active unique users per 10 seconds. This event rate is good for many purposes.
(From this entry: beatsync.net)
Query example for finding too many requests from single remote_host, to find DoS attacker:
SELECT remote_host, COUNT(*) as requests FROM accesslog.win:time_batch(1 min) GROUP BY remote_host HAVING COUNT(*) >= 1000
Or to find brute force attacker:
SELECT remote_host, COUNT(*) AS failures FROM authlog.win:time_batch(1 min) WHERE result = 'failed' GROUP BY remote_host HAVING COUNT(*) >= 60
These queries are very simple, and we can fix thresholds or other conditions anytime.
(From this entry: Developers.IO)
To tell application error logs to programmers, but not to flood notification center by one kind of messages:
SELECT level, file, line, LAST(host) AS host, LAST(message) AS message, COUNT(*) AS count FROM application_logs.win:time_batch(5 min) WHERE level IN ('error', 'critical') GROUP BY level, file, line
Logs from a same point of code (same file, line and log-level) are expected to be a same log. Programmers can understand its severity by level and count, and get a first investigation step at a glance by sample of log message by
LAST(message) in output events.
(From this entry: kawamuray's blog)