Time-Series Prediction Using Long Short-Term Memory (LSTM) Networks Forecasting COVID-19 Confirmed Cases in India.

The code in the post can be found at https://github.com/shivam04/covid-19-india-forecasting

This post included forecasting of COVID-19 confirmed cases based on previous data recorded on different dates.

The language I am using for this prediction is Python.

Part 1: Importing libraries used for forecasting results.

import pandas
import numpy
from keras.preprocessing.sequence import TimeseriesGenerator
import plotly.graph_objects as go
import datetime
from keras.models import Sequential
from keras.layers import LSTM, Dense

Part 2: Load data and preprocessing.

numpy.random.seed(7)
dataframe = pandas.read_csv('covid_19_india.csv')
dataframe.head()

Next, remove unnecessary fields. We need only the date and Confirmed cases.

dataframe = dataframe.drop(['State/UnionTerritory', 'ConfirmedIndianNational', 'ConfirmedForeignNational', 'Cured', 'Deaths', 'Time', 'Sno'], axis=1)
dataframe.head()

Convert Date from string to DateTime format.

dataframe['Date'] = pandas.to_datetime(dataframe['Date'], format='%d/%m/%y').dt.strftime('%Y-%m-%d')
dataframe

Group the data based on the Date.
Sum up all confirmed cases on these dates recorded in different states of India.

gdf = dataframe.groupby('Date')
data = []
date = []
cases = []
for name, df in gdf:
date.append(name)
s = sum(df['Confirmed'].astype(float))
cases.append(s)
data.append([s])
data

Since data are cumulative we need to find everyday data.
Subtract current data from previous data to get everyday data.

for i in range(len(cases)-1, 1, -1):
cases[i] = cases[i] - cases[i-1]
dataset[i][0] = cases[i]

A graph to show confirmed cases recorded every day.

trace = go.Scatter(
x = date,
y = cases,
mode = 'lines',
name = 'Data'
)
layout = go.Layout(
title = "Covid 19 India",
xaxis = {'title' : "Date"},
yaxis = {'title' : "Confirmed Cases"}
)
fig = go.Figure(data=[trace], layout=layout)
fig.show()

Split data 80% to train the model and 20% to test the model.

split_percent = 0.80
split = int(split_percent*len(dataset))
dataset_train = dataset[:split]
dataset_test = dataset[split:]

date_train = date[:split]
date_test = date[split:]

print(len(dataset_train))
print(len(dataset_test))

In time-series prediction we used previous time data to predict future data.

  • Coming back to the format, at a given day x(t), the features are the values of x(t-1), x(t-2), …., x(t-n) where n is look back.

So here we set lookback n = 3 So if our data is like this,
[1,2,3,4,5,6]
the required data format (n=3) would be this:

  • [1,2,3] -> [4]
  • [2,3,4] -> [5]
  • [3,4,5] -> [6]
look_back = 3
train_generator = TimeseriesGenerator(dataset_train, dataset_train, length=look_back, batch_size=1)
test_generator = TimeseriesGenerator(dataset_test, dataset_test, length=look_back, batch_size=1)

Creating Model

  • We use LSTM to train our model
  • We use two layers of LSTM and one layer of dense (i.e simple neural network layer).
  • This model is used to predict future confirmed cases.
model = Sequential()
model.add(
LSTM(10,
activation='relu',
return_sequences=True,
input_shape=(look_back,1))
)
model.add(LSTM(7, return_sequences=True, activation='relu'))
model.add(LSTM(3, activation='relu'))
model.add(Dense(1, activation='relu'))
model.compile(optimizer='adam', loss='mse')
model.summary()

Training Model

num_epochs = 200
history = model.fit_generator(train_generator, epochs=num_epochs, verbose=1)

Predicting Beyond the Dataset

dataset = dataset.reshape((-1))
def predict(num_prediction, model):
prediction_list = dataset[-look_back:]

for _ in range(num_prediction):
x = prediction_list[-look_back:]
x = x.reshape((1, look_back, 1))
out = model.predict(x)[0][0]
prediction_list = numpy.append(prediction_list, out)
prediction_list = prediction_list[look_back-1:]

return prediction_list

def predict_dates(num_prediction):
last_date = date[-1]
prediction_dates = pandas.date_range(last_date, periods=num_prediction+1).tolist()
return prediction_dates

Predict future data.

num_prediction = 10
forecast = predict(num_prediction, model).astype(int)
forecast_dates = predict_dates(num_prediction)

A graph to show confirmed cases recorded in the next 10 days.

x = date,
y = cases,
mode = 'lines',
name = 'Data'
)
forcast_trace = go.Scatter(
x = forecast_dates,
y = forecast,
mode = 'lines',
name = 'Data'
)
layout = go.Layout(
title = "Covid 19 India Forcast Information",
xaxis = {'title' : "Date"},
yaxis = {'title' : "Confirmed Cases"}
)
fig = go.Figure(data=[given_trace, forcast_trace], layout=layout)
fig.show()

Predications can be wrong since we have limited information. This is just for learning purposes.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store