I was working on a chart system for a web application with thousands of data points for each chart. Clearly downloading thousands of points from the server is a bad thing, and rendering them all would be too heavy on the browser. I decided that I would simplify my charts by downloading only a subset of the available points, including only every nth point. In other words, given 50,000 points, I wanted to show only 500 points, so I would need to fetch every 100th point.
It wasn’t immediately obvious how to implement this as a MySQL query, which is what this post is all about.
My requirements were these: I have a REST endpoint that accepts a start date, an end date, and a maximum number of points to retrieve for a measurement device identified by a
device_id. I decided to implement the require MySQL query as a stored procedure; my back-end uses Lumen/Eloquent and I wanted to keep the code there simple and hide MySQL details away.
Here’s how to create a stored procedure in MySQL:
This procedure accepts the arguments that the back-end will provide to it, i.e. start date, end date, etc. It can be called from SQL like so:
and using Lumen/Laravel’s
DB facade, you can do:
Now in the stored procedure, we will first determine the total number of records available within the given date constraints (and for the specified device, in this example):
With that total, we can now determine the number of rows to skip after reading each row. For 50,000 available rows and a requirement of 500 rows, we could do
total/maximum to get 100. We must be careful that the skip value must be a whole number, so we can do
FLOOR(total / maximum) instead. Doing this may yield more than the maximum number of rows requested, but never more than twice as many. Also, if the number of available rows is actually smaller than the requested number, we must be careful to set the skip value to 1. We add the following code to the stored procedure:
Now for selecting the required rows. The trick here is create a subquery with a running row index, which is increased as each row is read. An outer query uses this subquery as input, and compares the running counter with skipnum by doing
MOD(rownum, skipnum) = 0. Whenever the row number is divisible by the skipnum, we include the row in the result.
Note that we only consider rows between
enddate, and apply a date order. Also, all dates are returned as UNIX timestamps, which will save space in the JSON we’ll eventually send back to the browser. Plus, field names are short (
v), for the same purpose.
That’s it. With details hidden away in a stored procedure, PHP or Lumen/Laravel code and easily query a set of data points with reduction applied to a maximum number.
Here is the full query: