COVID 19 STUDY OF STATE OF MAHARASHTRA USING DATA SCIENCE as on 08-05-2020






We now study the COVID19 Data of the State of Maharashtra where the COVID crisis is headed towards the worse as on 08-05-2020.
We run the code 
is_subset1_Maharashtra=subset1.STUT == "Maharashtra"
subset1[is_subset1_Maharashtra]
The dataframe is loaded for the State of Maharashtra
Total rows are 61 and columns are 5 updated to 08/05/2020
We call this subset as dfMaharashtra and run the following code
The linear regression of Confirmed Vs Cured is obtained by running the following codes
X = dfMaharashtra.drop('Confirmed',axis = 1)
y = dfMaharashtra[['Confirmed']]
seed = 10
test_data_size = 0.3 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = test_data_size, random_state = seed) 
train_data = pd.concat([X_train, y_train], axis = 1
test_data = pd.concat([X_test, y_test], axis = 1
fig, ax = plt.subplots(figsize=(126))
sns.regplot(x='Confirmed', y='Cured', ci=None, data=train_data, ax=ax, color='k', scatter_kws={"s"20,"color":"royalblue""alpha":1})
The Regression lines is as follows
From the graph we can see that the blue dots are well below the regression line indicating less Cured against Confirmed
The log plot for the data is obtained by running the following codes
fig, ax = plt.subplots(figsize=(126)) 
y = np.log(train_data['Confirmed'])
sns.regplot(x='Cured', y=y, ci=95, data=train_data, ax=ax, color='k', scatter_kws={"s"10,"color""royalblue""alpha":1})
ax.set_ylabel('log of Confirmed', fontsize=15,fontname='DejaVu Sans'
ax.set_xlabel("Cured",fontsize=15, fontname='DejaVu Sans'
ax.set_xlim(left=None, right=None
ax.set_ylim(bottom=None, top=None
ax.tick_params(axis='both', which='major', labelsize=12
fig.tight_layout()

The plot shows that the log curve moving parallel to the cured axis. The number of Cured being stagnant or less than
confirmed cases.
X = dfMaharashtra.drop('Confirmed',axis = 1)
y = dfMaharashtra[['Confirmed']]
seed = 10
test_data_size = 0.3 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = test_data_size, random_state = seed) 
train_data = pd.concat([X_train, y_train], axis = 1
test_data = pd.concat([X_test, y_test], axis = 1
fig, ax = plt.subplots(figsize=(126))
sns.regplot(x='Confirmed', y='Deaths', ci=None, data=train_data, ax=ax, color='k', scatter_kws={"s"20,"color":"royalblue""alpha":1})
We get the following graph 

From the graph, we can see that the data shown in blue points are initially moving below and then above the line and finally as the figures keep increasing 
it is below the line indicating irregular Confirmed Vs Cured correlation.

The log graph for the data is obtained by running the following codes
fig, ax = plt.subplots(figsize=(126)) 
y = np.log(train_data['Confirmed'])
sns.regplot(x='Deaths', y=y, ci=95, data=train_data, ax=ax, color='k', scatter_kws={"s"10,"color""royalblue""alpha":1})
ax.set_ylabel('log of Confirmed', fontsize=15,fontname='DejaVu Sans'
ax.set_xlabel("Deaths",fontsize=15, fontname='DejaVu Sans'
ax.set_xlim(left=None, right=None
ax.set_ylim(bottom=None, top=None
ax.tick_params(axis='both', which='major', labelsize=12
fig.tight_layout()

 The blue dots do not fall on the line linearly which indicates the variation between the Confirmed Vs Deaths to be high. The number of deaths being high.

The Heatmap showing the Correlation Matrix using Pearsons , for the State of Maharashtra is obtained by using the following codes,
corrMatrix = train_data.corr(method = 'pearson'
xnames=list(train_data.columns) 
ynames=list(train_data.columns) 
plot_corr(corrMatrix, xnames=xnames, ynames=ynames,title=None,normcolor=False, cmap='RdYlBu_r')

and finally the Correlation Coefficient between the various variables used in our data frame
train_data.corr (method = 'pearson')
There is a high degree of correlation between the variables (Confirmed, Cured) being 0.993 and that
between (Confirmed, Deaths) being 0.99.
This helps to further our study of using advanced Training Models of Machine Learning to bring out a favourable result.


 

Comments