I was working on a chart system for a web application with thousands of data points for each chart. Clearly downloading thousands of points from the server is a bad thing, and rendering them all would be too heavy on the browser. I decided that I would simplify my charts by downloading only a subset of the available points, including only every nth point. In other words, given 50,000 points, I wanted to show only 500 points, so I would need to fetch every 100th point.

It wasn’t immediately obvious how to implement this as a MySQL query, which is what this post is all about.

My requirements were these: I have a REST endpoint that accepts a start date, an end date, and a maximum number of points to retrieve for a measurement device identified by a device_id. I decided to implement the require MySQL query as a stored procedure; my back-end uses Lumen/Eloquent and I wanted to keep the code there simple and hide MySQL details away.

Here’s how to create a stored procedure in MySQL:

CREATE PROCEDURE `GetReadings`(device_id INT, maximum INT, startdate DATE, enddate DATE)
    READS SQL DATA
BEGIN
  ...
END

This procedure accepts the arguments that the back-end will provide to it, i.e. start date, end date, etc. It can be called from SQL like so:

CALL GetReadings(1, 500, '2019-01-01', '2019-12-31');

and using Lumen/Laravel’s DB facade, you can do:

$readings = DB::select("CALL GetReadings($id, $maximum, '$startdate', '$enddate')");

Now in the stored procedure, we will first determine the total number of records available within the given date constraints (and for the specified device, in this example):

CREATE PROCEDURE `GetReadings`(device_id INT, maximum INT, startdate DATE, enddate DATE)
    READS SQL DATA
BEGIN
  DECLARE total INT;
  DECLARE skipnum INT; -- we'll get to this
  
  -- Determine number of readings present between dates (inclusive):
  SELECT COUNT(*) 
    INTO total
    FROM readings 
    WHERE readings.device_id=device_id 
      AND DATE(readings.datetime) >= DATE(startdate)
      AND DATE(readings.datetime) <= DATE(enddate);

With that total, we can now determine the number of rows to skip after reading each row. For 50,000 available rows and a requirement of 500 rows, we could do total/maximum to get 100. We must be careful that the skip value must be a whole number, so we can do FLOOR(total / maximum) instead. Doing this may yield more than the maximum number of rows requested, but never more than twice as many. Also, if the number of available rows is actually smaller than the requested number, we must be careful to set the skip value to 1. We add the following code to the stored procedure:

  -- To return every nth row, determine which n to use. n must be a
  -- whole number. By dividing total/maximum, we always return at least
  -- maximum, up to 2 * maximum.
  IF (total < maximum) THEN
    SET skipnum = 1;
  ELSE 
    SET skipnum = FLOOR(total / maximum);
  END IF;

Now for selecting the required rows. The trick here is create a subquery with a running row index, which is increased as each row is read. An outer query uses this subquery as input, and compares the running counter with skipnum by doing MOD(rownum, skipnum) = 0. Whenever the row number is divisible by the skipnum, we include the row in the result.

Note that we only consider rows between startdate and enddate, and apply a date order. Also, all dates are returned as UNIX timestamps, which will save space in the JSON we’ll eventually send back to the browser. Plus, field names are short (d and v), for the same purpose.

  -- Fetch every nth reading.
  SET @therow := 0;
  SELECT UNIX_TIMESTAMP(datetime) AS d, value AS v FROM (
	  SELECT @therow := @therow + 1 AS rownum, datetime, value FROM readings 
	  WHERE readings.device_id=device_id 
        AND DATE(readings.datetime) >= DATE(startdate)
        AND DATE(readings.datetime) <= DATE(enddate)
	) AS counted_readings
    WHERE MOD(rownum, skipnum) = 0
    ORDER BY datetime ASC;

That’s it. With details hidden away in a stored procedure, PHP or Lumen/Laravel code and easily query a set of data points with reduction applied to a maximum number.

Here is the full query:

CREATE PROCEDURE `GetReadings`(device_id INT, maximum INT, startdate DATE, enddate DATE)
    READS SQL DATA
BEGIN
  DECLARE total INT;
  DECLARE skipnum INT;
  
  -- Determine number of readings present between dates (inclusive):
  SELECT COUNT(*) 
    INTO total
    FROM readings 
    WHERE readings.device_id=device_id 
      AND DATE(readings.datetime) >= DATE(startdate)
      AND DATE(readings.datetime) <= DATE(enddate);
    
  -- To return every nth row, determine which n to use. n must be a
  -- whole number. By dividing total/maximum, we always return at least
  -- maximum, up to 2 * maximum.
  IF (total < maximum) THEN
    SET skipnum = 1;
  ELSE 
    SET skipnum = FLOOR(total / maximum);
  END IF;

  -- Fetch every nth reading.
  SET @therow := 0;
  SELECT UNIX_TIMESTAMP(datetime) AS d, value AS v FROM (
	  SELECT @therow := @therow + 1 AS rownum, datetime, value FROM readings 
	  WHERE readings.device_id=device_id 
		AND DATE(readings.datetime) >= DATE(startdate)
        AND DATE(readings.datetime) <= DATE(enddate)
	) AS counted_readings
    WHERE MOD(rownum, skipnum) = 0
    ORDER BY datetime ASC;
END