In this tutorial, we’ll use a time-series model to forecast quarterly house sales.

This tutorial uses the Lightwood integration that requires the mindsdb/mindsdb:lightwood Docker image. Learn more here.

Connect a data source

We start by connecting a demo database to MindsDB using the CREATE DATABASE statement.

CREATE DATABASE example_db
WITH ENGINE = "postgres",
PARAMETERS = {
"user": "demo_user",
"password": "demo_password",
"host": "3.220.66.106",
"port": "5432",
"database": "demo",
"schema": "demo_data"
};

Let’s preview the data that will be used to train the model.

SELECT *
FROM example_db.house_sales
LIMIT 10;

Deploy and train an ML model

Now, lets specify that we want to forecast the ma column, which is a moving average of the historical median price for house sales. Looking at the data, you can see several entries for the same date, which depend on two factors: how many bedrooms the properties have, and whether properties are “houses” or “units”. This means that we can have up to ten different groupings here. Let’s look at the data for one of them.

SELECT saledate, ma, type, bedrooms
FROM example_db.house_sales
WHERE type='house'
AND bedrooms=3;

We want to generate forecasts to predict the behavior of this and the other series for the next year. MindsDB makes it simple so that we don’t need to repeat the predictor creation process for every group there is. Instead, we can just group for both columns and the predictor will learn from all series and enable all forecasts. We are going to use the CREATE MODEL statement, where we specify what data to train FROM and what we want to PREDICT.

CREATE MODEL mindsdb.house_sales_model
FROM example_db
(SELECT * FROM house_sales)
PREDICT ma
ORDER BY saledate
GROUP BY bedrooms, type
-- as the data is quarterly, we will look back two years to forecast the next one year
WINDOW 8
HORIZON 4;

You can check the status of the model as below:

DESCRIBE house_sales_model;

Make predictions

Once the model’s status is complete, you can query it as a table to get forecasts for a given period of time. Usually, you’ll want to know what happens right after the latest training data point that was fed, for which we have a special bit of syntax, the LATEST keyword.

SELECT m.saledate as date, m.ma as forecast
FROM mindsdb.house_sales_model as m
JOIN example_db.house_sales as t
WHERE t.saledate > LATEST
AND t.type = 'house'
AND t.bedrooms=2
LIMIT 4;

Now, try changing the value of type and bedrooms columns and check how the forecast varies. This is because MindsDB recognizes each grouping as being its own different time series.

Automate continuous improvement of the model

Now, we can take this even further. MindsDB includes powerful automation features called Jobs which allow us to automate queries in MindsDB. This is very handy for production AI/ML systems which all require automation logic to help them to work.

We use the CREATE JOB statement to create a Job.

Now, let’s use a Job to retrain the model every two days, just like we might in production. You can retrain the model to improve predictions every time when either new data or new MindsDB version is available. And, if you want to retrain your model considering only new data, then go for finetuning it.

CREATE JOB retrain_model_and_save_predictions (

   RETRAIN mindsdb.house_sales_model
   FROM example_db
      (SELECT * FROM house_sales)
)
EVERY 2 days
IF (SELECT * FROM example_db.house_sales
    WHERE created_at > LAST);

This job will execute every 2 days only if there is new data available in the house_sales table. Learn more about the LAST keyword here.

And there you have it! You created an end-to-end automated production ML system in a few short minutes.