The optimal procedure for implementing cross-validation with TimeSeriesSplit() on a dataframe in a python end-to-end pipeline can be done as follows:
import pandas as pd
import numpy as np
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
df = pd.read_csv('data.csv', parse_dates=[0], index_col=0)
X = df.drop('target_variable', axis=1)
y = df['target_variable'].values
pipe = Pipeline([('scaler', StandardScaler()),
('regressor', LinearRegression())])
tscv = TimeSeriesSplit(n_splits=5)
scores = cross_val_score(pipe, X, y, cv=tscv, scoring='neg_mean_squared_error')
print('Mean Squared Error: ', -np.mean(scores))
This pipeline will standardize the features, fit the regression model, and evaluate the model using the negative mean squared error metric with a TimeSeriesSplit() cross-validation strategy.
Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss
Asked: 2022-03-22 11:00:00 +0000
Seen: 10 times
Last updated: Nov 23 '22
How can I use oversampling to address a problem?
What is the relationship between ESP8266 and Javascript AES?
How can the depth and color image be aligned on an Oak-D camera?
What is the process of using Debye's equation in either Matlab or Python to model experimental data?
What is the order of priority for the in operator and comparison operators in Python?
How to eliminate results from find_all?
How can the conditional user interface expression be expressed in the Maximo system?