COVID 19 STUDY OF STATE OF MAHARASHTRA USING DATA SCIENCE as on 08-05-2020
We run the code
Total rows are 61 and columns are 5 updated to 08/05/2020
We call this subset as dfMaharashtra and run the following code
The linear regression of Confirmed Vs Cured is obtained by running the following codes
X = dfMaharashtra.drop('Confirmed',axis = 1)
y = dfMaharashtra[['Confirmed']]
seed = 10
test_data_size = 0.3
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = test_data_size, random_state = seed)
train_data = pd.concat([X_train, y_train], axis = 1)
test_data = pd.concat([X_test, y_test], axis = 1)
fig, ax = plt.subplots(figsize=(12, 6))
sns.regplot(x='Confirmed', y='Cured', ci=None, data=train_data, ax=ax, color='k', scatter_kws={"s": 20,"color":"royalblue", "alpha":1})
The Regression lines is as follows
From the graph we can see that the blue dots are well below the regression line indicating less Cured against Confirmed
The log plot for the data is obtained by running the following codes
fig, ax = plt.subplots(figsize=(12, 6))
y = np.log(train_data['Confirmed'])
sns.regplot(x='Cured', y=y, ci=95, data=train_data, ax=ax, color='k', scatter_kws={"s": 10,"color": "royalblue", "alpha":1})
ax.set_ylabel('log of Confirmed', fontsize=15,fontname='DejaVu Sans')
ax.set_xlabel("Cured",fontsize=15, fontname='DejaVu Sans')
ax.set_xlim(left=None, right=None)
ax.set_ylim(bottom=None, top=None)
ax.tick_params(axis='both', which='major', labelsize=12)
fig.tight_layout()
The plot shows that the log curve moving parallel to the cured axis. The number of Cured being stagnant or less than
confirmed cases.
X = dfMaharashtra.drop('Confirmed',axis = 1)
y = dfMaharashtra[['Confirmed']]
seed = 10
test_data_size = 0.3
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = test_data_size, random_state = seed)
train_data = pd.concat([X_train, y_train], axis = 1)
test_data = pd.concat([X_test, y_test], axis = 1)
fig, ax = plt.subplots(figsize=(12, 6))
sns.regplot(x='Confirmed', y='Deaths', ci=None, data=train_data, ax=ax, color='k', scatter_kws={"s": 20,"color":"royalblue", "alpha":1})
We get the following graph
From the graph, we can see that the data shown in blue points are initially moving below and then above the line and finally as the figures keep increasing
it is below the line indicating irregular Confirmed Vs Cured correlation.
The log graph for the data is obtained by running the following codes
fig, ax = plt.subplots(figsize=(12, 6))
y = np.log(train_data['Confirmed'])
sns.regplot(x='Deaths', y=y, ci=95, data=train_data, ax=ax, color='k', scatter_kws={"s": 10,"color": "royalblue", "alpha":1})
ax.set_ylabel('log of Confirmed', fontsize=15,fontname='DejaVu Sans')
ax.set_xlabel("Deaths",fontsize=15, fontname='DejaVu Sans')
ax.set_xlim(left=None, right=None)
ax.set_ylim(bottom=None, top=None)
ax.tick_params(axis='both', which='major', labelsize=12)
fig.tight_layout()
The blue dots do not fall on the line linearly which indicates the variation between the Confirmed Vs Deaths to be high. The number of deaths being high.
The Heatmap showing the Correlation Matrix using Pearsons , for the State of Maharashtra is obtained by using the following codes,
corrMatrix = train_data.corr(method = 'pearson')
xnames=list(train_data.columns)
ynames=list(train_data.columns)
plot_corr(corrMatrix, xnames=xnames, ynames=ynames,title=None,normcolor=False, cmap='RdYlBu_r')
and finally the Correlation Coefficient between the various variables used in our data frame
train_data.corr (method = 'pearson')
There is a high degree of correlation between the variables (Confirmed, Cured) being 0.993 and that
between (Confirmed, Deaths) being 0.99.
This helps to further our study of using advanced Training Models of Machine Learning to bring out a favourable result.
Comments
Post a Comment