Algorithmic Trading: Gametheoretic and Simulation Approach to Reinforcement Learning bot

54 139 0
Algorithmic Trading: Gametheoretic and Simulation Approach to Reinforcement Learning bot

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

[Type here] [Type here] Bui Ngoc Duc Abstract: Keywords Data mining, Game theory, policy making process, reinforcement learning Algorithmic Trading: Game-theoretic and Simulation Approach to Reinforcement Learning bot [Type here] [Type here] Bui Ngoc Duc Chapter 1: Introduction 1.1 Problem statement Trading stocks on the stock market is one of the major investment activities In the past, investors developed a number of stock analysis method that could help them predict the direction of stock price movement Modelling and predicting of equity future price, based on the current financial information and news, is of enormous use to the investors Investors want to know whether some stock will rise or fall over certain period of time In order to predict how some company, in which investor want to invest, would perform in future, they developed a number of analysis methods based on current and past financial data and other information about the company Financial balance sheets and various ratios that describe the health of company are the bases of technical analysis that investors undertake to analyze and predict company’s future stock prize Predicting the direction of stock price is particularly important for value investing Experienced analysts could apply some mathematical models that are proven based on the past data in order to evaluate company’s intrinsic value However, markets not remain stable and indicators that have strong predictive value over one period may cease to generate excess returns as soon as market conditions change New investment strategies and new technology were introduced, which made some of the old models obsolete Since financial literacy became higher, there are more market players than ever Two measures have been proposed to counter this evolving market behavior First, some trading systems are based on genetic algorithms that transform the indicators that are used as attributes over time [6] [28] Second, more commonly, the data set is fit to nonlinear models using machine learning algorithms such as Artificial Neural Networks [10] [Type here] [Type here] Bui Ngoc Duc The introduction to algorithms in trading definitely changed the stock market Algorithms made it easy to react fast to certain events on the stock market Machine learning algorithms also enabled analysts to create models for predicting prices of stocks much easier Introduction of machine learning caused that new models can be developed based on the past data The proof is the AI fund have outperformed their peers while providing downside protection, according to Eurekahedge’s report [Type here] [Type here] Bui Ngoc Duc The table above is comparing AI funds to the average hedge fund and systematic CTA/managed futures strategies, which can be considered the rough approximation for the average quant fund Source: Eurekahedge For the successful performance of AI fund, in this paper we will describe introduction to the method for creating artificial agent trading on stock market using stock prices and through several machine learning algorithms 1.2 Objective of research The monetary motivation behind the predictive value of buying and selling stocks at profitable positions is a key driver of this research Our main hypothesis was that by applying machine learning and training it on the past data, it is possible to predict the movement of the stock price through market’s patterns, then applying algorithms to create a profitable trading agent We use Profit and Loss (PnL) factor of agent through the test to justify the profitability of our agent We shall conduct some simulations to examine whether the agent is profitable under different data set (seen and unseen) then calculate the average PnL of the agent 1.3 Scope of the research This thesis only provides elementary introduction approach to the algorithmic trading and game theory approach as the frame work for market environment The game environment is uncomplicated when we assumed that others respond to our agent’s strategies indicate the stock price movement Moreover, the algorithms used to create and train the agent exploits from the machine learning algorithms library called “Scikit-learn”, “Keras” Nevertheless, exploited algorithms and functions shall be explained in the Appendix of this thesis [Type here] [Type here] Bui Ngoc Duc 1.4 Overview The thesis is organized in the following manner: • Chapter is stated the motivation for writing this thesis, the objectives and scope of the research • In Chapter 2, we provide the background of Efficient Market Hypothesis (EMH) and it’s contradicts, as well as relevant works for this topic • The game theoretical frame work background for describe the market, simulation approach and algorithms are established in Chapter • Chapter describes the methods of data collection as well as data processing, implementation and simulation on different variable of model • Section is the last section, we will discuss the final results of our agent, explain the limitations of our research and state future improvement [Type here] [Type here] Bui Ngoc Duc Chapter 2: literature review: This section begins with a background to efficient markets and then gives a brief review of previous empirical studies that use machine learning algorithms to construct trading strategies 1.5 Efficient Markets One of the strongest oppositions to the existence of profitable trading strategies is founded on the ideas of Efficient Market Hypothesis (EMH) Since EMH implies that our search for continuously profitably trading strategies is futile, we first give an overview of EMH and then show the empirical results that contradict this theory EMH states that the current market price reflects the assimilation of all the information available [13] That is, its proponents argue that since the stocks always trade at their fair value on stock exchanges, it is impossible to outperform the overall market through expert stock selection or market timing Any new information is quickly integrated into the market price Fama formalized the concept of efficient markets in 1970 by expressing the non-predictability of market prices: Where:  is the price of security j at time t;  is the one-period percentage return; and  is the information reflected at time t [Type here] [Type here] Bui Ngoc Duc Based on this expectation expression, Fama argues that there is no possibility of finding excess market returns via market timing based solely on information in , hence dispelling the possibility of trading strategies based on technical indicators On the other hand, despite the theoretically sound nature of EMH, research over the last 30 years has shown that several assumptions made in EMH may be unrealistic First, a fundamental assumption is that investors behave rationally, or that the deviations of the many irrational investors cancel out However, some research has shown that investors are not strictly rational [41], or devoid of biases [20] Indeed, people with a conservatism bias tend to underweight new information Moreover, experiments have shown that these biases tend to be systematic and that deviations not cancel each other out [21] This leads to over- and under-reaction to news events From the 1990s, literature has seen the growing decline of the EMH and the emergence of behavioral finance Behavioral finance views the market as an aggregate of human actions filled with imperfect and inefficient decisions Under this theory, the financial markets are a reflection of human desires, goals, motivations, errors and overconfidence [40] An alternative to EMH that has grown traction is the idea of the Adaptive Market Hypothesis, which posits that profit opportunities from inefficiencies exist in finance markets but are eroded away as the knowledge of the efficiency spreads throughout the public and the public capitalizes on the opportunities By this view of financial markets, many have built evolutionary and/or non-linear models and demonstrated that excess returns can be attained on out-of-sample data [Type here] [Type here] Bui Ngoc Duc 1.6 Previous Research Because of their ability to model nonlinear relationships without pre-specification during the modeling process, neural networks (NNs) have become a popular method in financial time-series forecasting NNs also offer huge flexibility in the type of architecture of the model, in terms of number of hidden nodes and layers Indeed, Pekkaya and Hamzacebi compare the results from using a linear regression versus a NN model to forecast macro variables and show that the NN gives much better results [35] Many studies have used NNs and shown promising results in the financial markets Grudnitski and Osburn implemented NNs to forecast S&P500 and Gold futures price directions and found they were able to correctly prediction the direction of monthly price changes 75% and 61% respectively [15] Another study showed that a NN-based model leads to higher arbitrage profits compared to cost of carry models Phua, Ming and Lin implement a NN using Singapore’s stock market index and show a forecasting accuracy of 81% [36] Similarly, NN models applied to weekly forecasting of Germany’s FAZ index find favorable predictive results compared to conventional statistical approaches [14] More recently, NNs have been augmented or adapted to improve performance on financial time series forecasting Shaoo et al show that cascaded functional link artificial neural networks (CFLANN) perform the best in FX markets [39] Egrioglu et al introduce a new method based on feed forward artificial neural networks to analyze multivariate high order fuzzy time series forecasting models [12] Liao and Wang used a stochastic time effective neural network model to show predictive results on the global stock indices Bildirici and Ersin combined NNs with ARCH/GARCH and other volatility-based models to produce a model that out performed ANNs or GARCH based models alone Moreover, Yudong and Lenan used back-trial chemotaxis [Type here] [Type here] Bui Ngoc Duc optimization (BCO) and back-propagation NN on S&P500 index and conclude that their hybrid model (IBCO-BP) offers less computational complexity, better prediction accuracy and less training time Another popular machine learning classification technique that does not require any domain knowledge or parameter setting is the decision tree It also often offers a better visually interpretable model compared to NN, as the nodes in the tree can be easily understood The simplest type of decision tree model is the classification and regression tree (CART) Sorensen et al show that CART decision trees perform better than single-factor models-based on the same variables in picking stock portfolios [42] Wang and Chan use a two-layer bias decision tree to predict the daily stock prices of Microsoft, Intel and IBM, finding excess returns compared to a buy and hold method [43] Another study found that a boosted alternating decision tree with expert weighing generated abnormal returns for the S&P500 index during the test period [11] To improve accuracy, some studies used the random forest algorithm for classification, which will be further discussed in chapter Namely, Booth et al show that a regency-weighted ensemble of random forests produced superior results when analyzed on a large sample of stocks from the DAX in terms of both profitability and prediction accuracy compared with other ensemble techniques [7] Similarly, a gradient boosted random forest model applied to Singapore’s stock market was able to generate excess returns compared with a buy-and-hold strategy [37] Some recent research combines decision tree analysis with evolutionary algorithms to allow the model to adapt to changing market conditions Hsu et al present constraint-based evolutionary classification trees (CECT) and show strong predictability of a company’s financial performance [16] [Type here] [Type here] Bui Ngoc Duc Support Vector Machines (SVM) are also often used in prediction market behaviors Huang et al compare SVM with other classification methods (random Walk, linear discriminant analysis, quadratic discriminant analysis and elman backpropagation neural networks) and finds that SVM performs the best in forecasting weekly movements of the Nikkei 225 index [17] Similarly, Kim compares SVM with NN and case-based reasoning (CBR) and finds that SVM outperforms both in forecasting the daily direction of change in the Korea composite stock price index (KOSPI) [23] Likewise, Yang et al use a margin-varying Support Vector Regression model and show empirical results that have good predictive value for the Hang Seng Index [46] Nair et al propose a system that is a genetic algorithm optimized decision treesupport vector machine hybrid and validate its performance on the BSE-Sensex and found that its predictive accuracy is better than that of both a NN and Naive bayes based model [31] While some studies have tried to compare various machine learning algorithms against each other, the results have been inconsistent Patel et al compares four prediction models, NN, SVM, random forest and naive-Bayes and find that over a ten years period of various indices, the random forest model performed the best However, Ou and Wang examine the performance of ten machine learning classification techniques on the Hang Sen Index and found that the SVM outperformed the other models [33] Kara et al compared the performance of NN versus SVM on the daily Istanbul Stock Exchange National 100 Index and found that the average performance of the NN model (75.74%) was significantly better than that of the SVM model (71.52%) [22] Machine learning researches are focus on predictive modeling However, aiming to create an agent in dynamic environment that is able to learn and improve his performance policy during training requires another approach of machine learning that is reinforcement learning, when 10 [Type here] [Type here] Position taking decision at 100th iteration PnL of our agent after 100 iterations 40 Bui Ngoc Duc [Type here] [Type here] Bui Ngoc Duc Visually, we can see that our agent can decide accurately when to take long and short position Indeed, the PnL values tend to reach a maximum of about 600 after iterations It claims that convergence point can be reached ideally Moreover, this pattern on the market is easy to recognize; since the model only took a few seconds to reach the convergence Real price movement In this part, we shall report the result of our agent original model when applying to more realistic data set as mentioned in chapter We want to observe whether our agent is remaining profitability when working on real life environment The position taking policy and PnL of our agent will be disclosed in following figures, relatively Position taking decision when applying to the test data PnL after 30 iterations 41 [Type here] [Type here] Bui Ngoc Duc As can be seen from above figure, after 30 iteration the PnL increases However, it does not indicate guaranteed convergence when increasing the iteration Because of the complexity of price movement patterns, it took 2179.9 seconds for our agent to complete predictions and conduct optimal polices It was a very large amount of processing time due to limited processing power Refinement Although it took a large amount of time to processing only 30 iterations, we want to perceive whether the model is improvable with bigger iteration times Thus, from original model, we change the epouchs attribute from 30 to 100, we expected that the PnL will progress into the convergence Results from this test will be report in the following figures: Position taking policies PnL after 100 iterations Although increase the iteration to 100 times, the PnL remains no converge According to our perception, these unexpected consequences result from the large state number of environment as well as limited buffer’s size - is experience replay storage that shows how many set of agent's 42 [Type here] [Type here] Bui Ngoc Duc experiences can be stored; this model with 200 buffer sizes If the state in the environment outnumbers the buffer sizes, then agent is unable to learn from experience This problem can be solved by reducing the dimension of states or increase buffer size However, this method will compensate between the complexity of model and processing time A model is not complex enough may struggle when applying to sophisticated real-life problem Testing model on different data set In this section, we will use other combination of train and test data from our original data set We shall calculate the cumulative PnL in different test data, then calculate the average of PnL Different cumulative PnL from different datasets The model was able to make money in two different days after being trained in the previous session to each data set The performance of the third data set was pretty bad However, even 43 [Type here] [Type here] Bui Ngoc Duc wasting a lot of money at the beginning of the data, the agent was able to recover the most of its loss at the end of the session Looking at just to this data, the performance of the model looks very unstable and a little disappointing 44 [Type here] [Type here] Bui Ngoc Duc Chapter 5: Conclusions Final remark The result of this model through various tests in chapter indicates that our agent is able to learn from simple market pattern and convergence is feasible However, encountering the complexity of real-life problem is not suitable yet for our elementary model Thus, realizing the limitations and propose the future improvement for this model is an essential part Limitations: As mentioned in the beginning of this thesis, we only provide preliminary model that applies the Game theoretic frame work in combination with simulation and deep reinforcement learning approach Those elementary information and assumptions make this complicated and rich in content research feasible to us at the moment However, it is those plain knowledges that bring about the limitations for this research The first limitation is the assumption that all the other participants’ interaction in market results in the movement of stock prices In a more realistic game environment, our agent has to face with both human and learning agent and their best respond to our agent strategies is much more sophisticated that need further study The second shortcoming is the complexity of the model, agent’s profitability is only judged on factor; in fact, there are many factors that indicate the ability of making positive earning such as Sharp ratio The portfolio is limited to a single stock only, this can be a weakness in profitability of this agent because of diversification is ignored 45 [Type here] [Type here] Bui Ngoc Duc Moreover, the agent is only able to open the position at a state but not close position by taking information from current available assets This is possible struggle when apply the agent to the real market In addition, the reward function is not optimized, further research on reward function may brings faster learning rate to the agent The cost of taking a position is not put in consideration as well Another critical limitation of this simulation is that the lack of processing power (hardware), because this restriction of compensation between which is more sophisticated but time consuming and complicated model but require larger processing power Since the agent is train and test in low processing power devices, the number of layer of neural network is small One of the most interesting parts of this project was to define the state representation of the environment We recognize that when we increase the state space too much, it becomes very hard for the agent to learn an acceptable policy in the number of trials we have used In the context of data, currently, data is limited to price and volume However, there could be more factor that influences the price movement Another source of data is sentiment data whose data type is in the form of text In summary, this preliminary agent is created to prove that the application of algorithmic trading in the future is more and more feasible and is able to replace the normal technical analysis methods For this agent, we have a clearly defined improvement procedure in the future Improvement 46 [Type here] [Type here] Bui Ngoc Duc Firstly, we need to complete our agent’s available state and action set that is more realistic apply to the market The agent need to consider the cost of taking position, available cash and assets value, profit and loss limitation Secondly, we must take in to account more profitability feature of our agent such as profit and loss, put out some bench mark for agent’s performance based on the random agent strategy Third, further research and testing to optimized reward function that reduce the training time and make agent adapt better to the change of environment pattern The next improvement available is applying sentiment analysis to decide the effects of information on price changes This analysis is not as complicated as creating this agent because of the large number of available library for processing text like data The model could be build with more sophisticated algorithm with better processing power, now it is running on NVDIA GTX 1050ti graphic processing unit Nowadays, the most power full processor on the market is several times faster than ours with affordable prices Indeed, those devices can be stacked to create more powerful processing unit Thus, the upgrade of computing power is quite essential to this model However, the most complicated issue is more realistic game theory environment which is predicted taking more sophisticated research The available algorithms to solve the realistic game design included but not limited to: Bayesian Nash model, Monte Carlo Tree Search 47 [Type here] [Type here] Bui Ngoc Duc Appendix A: Library used Scikit-learn There are several Python libraries which provide solid implementations of a range of machine learning algorithms One of the best known is Scikit-Learn, a package that provides efficient versions of a large number of common algorithms Scikit-Learn is characterized by a clean, uniform, and streamlined API, as well as by very useful and complete online documentation A benefit of this uniformity is that once you understand the basic use and syntax of Scikit-Learn for one type of model, switching to a new model or algorithm is very straight forward.[ Introduction to scikit-learn - Jake VanderPlas' Python Data Science Handbook ] StandardScaler Standardize features by removing the mean and scaling to unit variance Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set Mean and standard deviation are then stored to be used on later data using the transform method The mean and standard deviation are calculated for the feature and then the feature is scaled based on: 48 [Type here] [Type here] Bui Ngoc Duc Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual feature not more or less look like standard normally distributed data (e.g Gaussian with mean and unit variance) For instance, many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around and have variance in the same order If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected This scaler can also be applied to sparse CSR or CSC matrices by passing with_mean= False to avoid breaking the sparsity structure of the data Keras Keras is an open source neural network library written in Python It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, Theano, or MXNet [ Keras backends] Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible It was developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System) [ Keras documentation] and its primary author and maintainer is Franỗois Chollet, a Google engineer To understand how Keras’s Recurrent Neural Network work, we will first discuss some important facts about the „normal “Feed Forward Neural Networks” 49 [Type here] [Type here] Bui Ngoc Duc Feed-Forward Neural Networks RNN’s and Feed-Forward Neural Networks are both named after the way they channel information In a Feed-Forward neural network, the information only moves in one direction, from the input layer, through the hidden layers, to the output layer The information moves straight through the network Because of that, the information never touches a node twice Feed-Forward Neural Networks, have no memory of the input they received previously and are therefore bad in predicting what’s coming next Because a feedforward network only considers the current input, it has no notion of order in time They simply can’t remember anything about what happened in the past, except their training 50 [Type here] [Type here] Bui Ngoc Duc Recurrent Neural Networks In RNN, the information cycles through a loop When it makes a decision, it takes into consideration the current input and also what it has learned from the inputs it received previously The two images below illustrate the difference in the information flow between a RNN and a Feed-Forward Neural Network A usual RNN has a short-term memory In combination with a LSTM they also have a longterm memory, but we will explain this further below Imagine you have a normal feed-forward neural network and give it the word “neuron” as an input and it processes the word character by character At the time it reaches the character “r”, it has already forgotten about “n”, “e” and “u”, which makes it almost impossible for this type of neural network to predict what character would come next 51 [Type here] [Type here] Bui Ngoc Duc A Recurrent Neural Network is able to remember exactly that, because of it’s internal memory It produces output, copies that output and loops it back into the network Long-Short Term Memory As briefly introduced in chapter 2, LSTM networks are an extension for recurrent neural networks, which basically extends their memory Therefore, it is well suited to learn from important experiences that have very long time-lags in between The units of an LSTM are used as building units for the layers of RNN, which is then often called an LSTM network STM’s enable RNN’s to remember their inputs over a long period of time This is because LSTM’s contain their information in a memory, that is much like the memory of a computer because the LSTM can read, write and delete information from its memory This memory can be seen as a gated cell, where gated means that the cell decides whether or not to store or delete information (e.g if it opens the gates or not), based on the importance it assigns to the information The assigning of importance happens through weights, which are also learned by the algorithm This simply means that it learns over time which information is important and which not In an LSTM you have three gates: input, forget and output gate These gates determine whether or not to let new input in (input gate), delete the information because it isn’t important (forget gate) or to let it impact the output at the current time step (output gate) You can see an illustration of RNN with its three gates below: 52 [Type here] [Type here] Bui Ngoc Duc Dropout rate Dropout is a regularization technique, which aims to reduce the complexity of the model with the goal to prevent overfitting Using “dropout", you randomly deactivate certain units (neurons) in a layer with a certain probability p from a Bernoulli distribution (typically 50%, but this yet another hyperparameter to be tuned) So, if you set half of the activations of a layer to zero, the neural network won’t be able to rely on particular activations in a given feed-forward pass during training As a consequence, the neural network will learn different, redundant representations; the network can’t rely on the particular neurons and the combination (or interaction) of these to be present Another nice side effect is that training will be faster 53 ... key factor to solve our problem simulation and computer science approach in the form of machine learning Simulation In the following parts, we shall mention some key concept of simulation and machine... Then when we need to use the model, we just provide some input and the out put come out automatically Application to stock data The application of machine learning approach to stock data is quite... characteristics and behaviors, the use of simplifying approximations and assumptions within the simulation, and fidelity and validity of the simulation outcomes Procedures and protocols for model

Ngày đăng: 31/03/2019, 15:30

Từ khóa liên quan

Mục lục

  • 1. Chapter 1: Introduction

    • 1.1. Problem statement

    • 1.2. Objective of research

    • 1.3. Scope of the research

    • 1.4. Overview

  • Chapter 2: literature review:

    • 1.5. Efficient Markets

    • 1.6. Previous Research

      • Chapter 3: Theoretical reviews

    • Game theory frame work

      • Game

      • Agents’ strategies

      • Optimal strategy

    • Game representations

      • Normal form

      • Extensive form

      • Stochastic games (Markov Games)

    • Simulation

      • Simulation

    • Algorithms

      • Machine learning

      • Reinforcement learning

      • Recurrent Neural Network and LSTM networks

  • Chapter 3: Proof of concept

    • Data collecting and preprocessing

      • Data collecting

      • Data preprocessing

        • Remove unused data

        • Separate data

          • Overfitting

          • Underfitting

        • Choosing features

        • Scaling data

    • Implementation

      • Q-learning function design

        • Algorithm 2: Train-Test Q-Learning Trader

        • LSTM networks

  • Chapter 4: Result

    • Result from original model

    • Refinement

    • Testing model on different data set

  • Chapter 5: Conclusions

    • Final remark

    • Limitations:

    • Improvement

  • Appendix A: Library used

    • Scikit-learn

      • StandardScaler

    • Keras

      • Feed-Forward Neural Networks

      • Recurrent Neural Networks

      • Long-Short Term Memory

      • Dropout rate

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan