# quant analysis

business project and need a sample draft to help me learn.

Stock Selection Model

In your first step, you should have done for these parts:

Use the WFE Testing. PDF as the check list

(1) Fundamental Research into the companies: Ratios and Ownership

(2) Descriptive Statistics

(3) Stationarity Check + AR models to check autocorrelations in residuals

(4) 5-6 Statistical Testing

(5) Construct Linear Regression models and check assumptions

The highlighted parts are added this time:

(6) Use the first 90% of the data as training to predict the 10%:

The most important step is the final steps for the prediction.

The final prediction method should be selected from ARIMA model, Linear State Space Model(OPTIONAL) and VARMA models(OPTIONAL).

Target Firm: The same with your first task

The whole idea is to your analysis to this company very comprehensive.

Some References

ARIMA model

ARIMA, short for ‘Auto Regressive Integrated Moving Average’ is actually a class of models that ‘explains’ a given time series based on its own past values, that is, its own lags and the lagged forecast errors, so that equation can be used to forecast future values.

Any ‘non-seasonal’ time series that exhibits patterns and is not a random white noise can be modeled with ARIMA models.

An ARIMA model is characterized by 3 terms: p, d, q

where,

p is the order of the AR term

q is the order of the MA term

d is the number of differencing required to make the time series stationary

If a time series, has seasonal patterns, then you need to add seasonal terms and it becomes SARIMA, short for ‘Seasonal ARIMA’. More on that once we finish ARIMA.

So, what does the ‘order of AR term’ even mean? Before we go there, let’s first look at the ‘d’ term.

You can have the reference to the website

https://www.machinelearningplus.com/time-series/arima-model-time-series-forecasting-python/

Linear State Space Model (OPTIONAL)

Its many applications include:

representing dynamics of higher-order linear systems

predicting the position of a system j steps into the future

predicting a geometric sum of future values of a variable like

You can have the reference to the website

https://python.quantecon.org/linear_models.html

VARMA models (OPTIONAL)

You can have the reference to the file

https://www.economics-sociology.eu/files/12_Simionescu_1_7.pdf

Your submission list

Original Code – you could use R or Python

Final PPT

Here are example codes for the two steps, please find this attachment and work on it.

This is on AVTI.

#step one

import pandas as pd

import numpy as np

from matplotlib import pyplot

ATVI = pd.read_csv(“C:/Users/Administrator/Downloads/ATVI.csv”,parse_dates=[‘Date’])

ATVI[‘Return’] = (ATVI[‘Adj Close’].pct_change())

ATVI[‘Logrt’] =np.log(ATVI[‘Adj Close’]/ATVI[‘Adj Close’].shift(1))

ATVI=ATVI.drop(ATVI.index[0])

ATVI.plot(x = ‘Date’,y = ‘Return’)

ATVI.plot(x = ‘Date’, y = ‘Adj Close’)

pyplot.show()

#step two

from scipy.stats import kurtosis

from scipy.stats import skew

ATVImean = np.mean(ATVI[‘Return’])

ATVIvar = np.var(ATVI[‘Return’])

ATVImax = np.max(ATVI[‘Return’])

ATVImin = np.min(ATVI[‘Return’])

ATVIstd = np.std(ATVI[‘Return’])

ATVIkur = kurtosis(ATVI[‘Return’])

ATVIskew = skew(ATVI[‘Return’])

ATVInew = ATVI[[‘Date’,’Return’]].copy()

print(‘mean’,ATVImean)

print(‘var’,ATVIvar)

print(‘max’,ATVImax)

print(‘min’,ATVImin)

print(‘std’,ATVIstd)

print(‘kur’,ATVIkur)

print(‘skew’,ATVIskew)

#from matplotlib import pyplot

#from pandas.plotting import lag_plot

#lag_plot(ATVInew)

#pyplot.show()

from pandas import DataFrame

from pandas import concat

from matplotlib import pyplot

values = DataFrame(ATVInew.values)

dataframe = concat([values.shift(1), values], axis=1)

dataframe.columns = [‘t1′,’t-1′,’t2’, ‘t+1’]

dataframe = dataframe.drop(‘t1’,axis = 1)

dataframe = dataframe.drop(‘t2’,axis = 1)

dataframe=dataframe.drop(dataframe.index[0])

dataframe = dataframe.astype(float)

rst = dataframe.corr()

print(rst)

ATVInew = ATVInew[‘Return’].astype(float)

from pandas.plotting import autocorrelation_plot

autocorrelation_plot(ATVInew)

pyplot.show()

from pandas.plotting import lag_plot

lag_plot(ATVInew)

pyplot.show()

“””

Autoregressive (AR) Model

“””

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

#from statsmodels.tsa.arima_model import ARMA

# Detect ACF&PACF

values.columns = [‘Date’, ‘Value’]

fig, axes = pyplot.subplots(1,2,figsize=(16,3), dpi= 100)

plot_acf(values[“Value”].tolist(), lags=50, ax=axes[0])

plot_pacf(values[“Value”].tolist(), lags=50, ax=axes[1])

#Data processing – take out the first value of NaN

import statsmodels.api as sm

y = values[‘Value’].values

y=np.array(y, dtype=float)

X = values[[‘Value’]].shift(1).values

X = sm.add_constant(X)

X=np.array(X, dtype=float)

X=np.delete(X,0,0)

y=np.delete(y,0,0)

# train autoregression AR(1)

model = sm.OLS(y,X)

#mod= ARMA()

results = model.fit()

print(‘AR’)

print(results.summary())

“””

Regression:Day of the Week Effect

“””

#Create day dummy

names = [‘D1’, ‘D2’, ‘D3’, ‘D4’, ‘D5’]

for i, x in enumerate(names):

values[x] = (values[‘Date’].dt.dayofweek == i).astype(int)

print (values.head(n=5))

# Regression – Seasonality

X2=values[[‘D1′,’D2′,’D3′,’D4′,’D5’]].values

X2=np.delete(X2,0,0)

model = sm.OLS(y,X2)

results = model.fit()

print(results.summary())

“””

Regression:OLS Assumption Tests

“””

#get residuals

def calculate_residuals(model, features, label):

“””

Creates predictions on the features with the model and calculates residuals

“””

predictions = results.predict(features)

df_results = pd.DataFrame({‘Actual’: label, ‘Predicted’: predictions})

df_results[‘Residuals’] = abs(df_results[‘Actual’]) – abs(df_results[‘Predicted’])

return df_results

calculate_residuals(model, X2, y)

“””

Assumption 2: Normality of the Error Terms

except the way below, Regression output showed Prob(JB) <0.05, reject the null hypothesis that it is normally distributed
"""
import seaborn as sns
def normal_errors_assumption(model, features, label, p_value_thresh=0.05):
from statsmodels.stats.diagnostic import normal_ad
print('Assumption 2: The error terms are normally distributed', '\n')
# Calculating residuals for the Anderson-Darling test
df_results = calculate_residuals(model, features, label)
print('Using the Anderson-Darling test for normal distribution')
# Performing the test on the residuals
p_value = normal_ad(df_results['Residuals'])[1]
print('p-value from the test - below 0.05 generally means non-normal:', p_value)
# Reporting the normality of the residuals
if p_value < p_value_thresh:
print('Residuals are not normally distributed')
else:
print('Residuals are normally distributed')
# Plotting the residuals distribution
pyplot.subplots(figsize=(12, 6))
pyplot.title('Distribution of Residuals')
sns.distplot(df_results['Residuals'])
pyplot.show()
print()
if p_value > p_value_thresh:

print(‘Assumption satisfied’)

else:

print(‘Assumption not satisfied’)

print()

print(‘Confidence intervals will likely be affected’)

print(‘Try performing nonlinear transformations on variables’)

normal_errors_assumption(model, X2, y)

“””

Assumption 3: No Autocorrelation

Performing Durbin-Watson Test:

Values of 1.5 < d < 2.5 generally show that there is no autocorrelation in the data
0 to 2< is positive autocorrelation
>2 to 4 is negative autocorrelation

Durbin-Watson: 2.115

Conclusion: Little to no autocorrelation

Assumption satisfied

“””

“””

Assumption 4:Homoscedasticity

“””

def homoscedasticity_assumption(model, features, label):

“””

Homoscedasticity: Assumes that the errors exhibit constant variance

“””

print(‘Assumption 5: Homoscedasticity of Error Terms’, ‘\n’)

print(‘Residuals should have relative constant variance’)

# Calculating residuals for the plot

df_results = calculate_residuals(model, features, label)

# Plotting the residuals

pyplot.subplots(figsize=(12, 6))

ax = pyplot.subplot(111) # To remove spines

pyplot.scatter(x=df_results.index, y=df_results.Residuals, alpha=0.5)

pyplot.plot(np.repeat(0, df_results.index.max()), color=’darkorange’, linestyle=’–‘)

ax.spines[‘right’].set_visible(False) # Removing the right spine

ax.spines[‘top’].set_visible(False) # Removing the top spine

pyplot.title(‘Residuals’)

pyplot.show()

homoscedasticity_assumption(model, X2, y)

Requirements: Using the python to finish the requirement | .doc file