Regression forecasting and predicting - Practical Machine Learning Tutorial with Python p.5 | Forex
Information | Currency | Markets | Fx | Video | Assets | Lesson | Trading
In this video, make sure you define the X's like so. I flipped the last two lines by mistake: X = np.array(df.drop(['label'],1)) X = preprocessing.scale(X) X_lately = X[-forecast_out:] X = X[:-forecast_out:] To forecast out, we need some data. We decided that we're forecasting out 10% of the data, thus we will want to, or at least *can* generate forecasts for each of the final 10% of the dataset. So when can we do this? When would we identify that data? We could call it now, but consider the data we're trying to forecast is not scaled like the training data was. Okay, so then what? Do we just do preprocessing.scale() against the last 10%? The scale method scales based on all of the known data that is fed into it. Ideally, you would scale both the training, testing, AND forecast/predicting data all together. Is this always possible or reasonable? No. If you can do it, you should, however. In our case, right now, we can do it. Our data is small enough and the processing time is low enough, so we'll preprocess and scale the data all at once. In many cases, you wont be able to do this. Imagine if you were using gigabytes of data to train a classifier. It may take days to train your classifier, you wouldn't want to be doing this every...single...time you wanted to make a prediction. Thus, you may need to either NOT scale anything, or you may scale the data separately. As usual, you will want to test both options and see which is best in your specific case. With that in mind, let's handle all of the rows from the definition of X onward. https://pythonprogramming.net/forecasting-predicting-machine-learning-tutorial/ https://twitter.com/sentdex https://www.facebook.com/pythonprogramming.net/ https://plus.google.com/+sentdex
Comments
-
Do u have a gitHub account ?
-
hi I want to predict weather every one hour using history weather data using python could you help me in this
-
The code is great. However, there seems to be a lack of economic sense: adjusted close is not what we usually want to predict. If we use this method to predict the daily return, it would have an accuracy of 0.0033976780251.
-
Hi Sentdex,
I'm having trouble with.. . .
last_unix = last_date.Timestamp()
I'm on Python 2.7, I have arrived at a solution by doing.. . .
last_unix = time.mktime(dt.datetime.strptime(str(last_date), "%Y-%m-%d %H:%M:%S").timetuple())
This worked but.. Is there any other simple method??? -
last_unix = last_date.timestamp()
AttributeError: 'Timestamp' object has no attribute 'timestamp' -
Hi I can't understand this line of code
df.loc[next_date] = [np.nan for _ in range(len(df.columns)-1)]+[i]
Can you explain it? -
My forecast_set is giving me values around 2.2 instead of 700s and I have no idea why
Edit: Turns out it was because I was scaling y -
Why is forecast_out put as a negative in X_lately?
Wouldn't it have to be a positive to predict into the future -
are these examples on stock prediction the actual way someone would go about building a trading platform?
-
For those of you receiving an error with the below code while running Python 3.5, consider how you are importing your data.
last_date = df.iloc[-1].name
last_unix = last_date.timestamp()
one_day = 86400
next_unix = last_unix + one_day
For example, the quandl api pull reads from Old to New.
df = quandl.get('WIKI/GOOGL', authtoken=auth_key)
Whereas, if you have downloaded the dataset and are importing from a csv, the download will be structured from New to Old.
df = pd.read_csv('{}{}'.format(f_path, security)) -
I am getting an error with length X and y...Can somebody help me to fix this ??
Same code as yours...
X = np.array(df.drop(['label'],1))
X = preprocessing.scale(X)
# print(len(X)) # len X :
X = X[:-forecast_out]
X_lately = X[-forecast_out:]
y = np.array(df['label'])
df.dropna(inplace=True)
print(len(X),len(y)) #length is not the same here !!!! -
This application failed to start because it could not find or load the Qt platform plugin "windows". Anybody who encountered this problem has resolved it? I'm using Anaconda3 BTW
-
Dear Sir,
Thanks for the wonderful tutorial. I have a doubt.
My understanding of what is being done:
Features are of a date suppose K and label is of the date K+30
So while predicting when we are using features of date say 1 October, aren't we predicting label(price) of 31st October ??
So we are actually not predicting for next 30 days but 30 days after the next 30 days.
Please clarify. -
Sorry to ask another so quickly but do you have the py2.7 fix for the unix time stamp section? I am having difficulty returning a unix result when try to use the workaround
last_date = datetime.datetime(df.iloc[-1].name)
last_unix = last_date.timedelta(seconds = one_day) -
Is there anyway you can post the final script you're using - I made the corrections noted both in the video and in the comments below, and am still getting an error on the cross_validation stage saying the rows aren't lining up...
-
What exactly do we use the classifier for?
In the clf.fit(), is the "fit" thing to find the best fitting line for the given dataset or something? If not what does it do? -
And why did matplotlib plotted the graph agains graph without specifying it, you only labeled it date but did not specified it.
What about that? -
there is a discontinuous space bet Adj. Close and forecast if i plot them differently, but if i plug the values of forecast_set in the Adj. Close column then there is no discontinuous. Can you explain why and how to tackle it.
-
There is something seriously wrong with the logic of this algorithm I think. OR I am completely on the wrong track.
The algorithm "forecasts" such a good indication of the stock price that it knows when the market dropped with the Brexit. Seems very fishy.
I do really like your video's and I am learning a lot, but I do feel something is not right here. -
error AttributeError: 'Timestamp' object has no attribute 'timestamp'
i change to last_unix = time.mktime(last_date.timetuple()) error SyntaxError: invalid syntax
i'm use python version 2.7.6