# What is the effective way to implement dynamic time warping with early abandonment in Python for significant multidimensional data sets?

edit retag close merge delete

Sort by » oldest newest most voted

There are different ways to implement dynamic time warping (DTW) with early abandonment in Python for large multidimensional datasets. Here is one possible approach:

1. Use numpy arrays to represent the multidimensional time series data that you want to compare with DTW. Each numpy array should have shape (nsamples, nfeatures), where nsamples is the length of the time series and nfeatures is the number of dimensions (e.g., sensors, channels).

2. Define a distance function that computes the Euclidean distance (or other distance metric) between pairs of samples in the two input arrays. You can use the numpy function np.linalg.norm() to compute the Euclidean distance between two vectors.

3. Define a cost matrix that contains the pairwise distances between the samples in the two input arrays. The (i,j) entry of the cost matrix should be the distance between the i-th sample in the first input array and the j-th sample in the second input array. You can use nested loops or numpy broadcasting to compute the cost matrix.

4. Define a window size w that determines the maximum allowed temporal alignment between the two input arrays. For example, if w=5, then the algorithm can align any two samples whose temporal distance is at most 5. You can set w to a fixed value or compute it dynamically based on the length of the input arrays.

5. Initialize a numpy array D of shape (nsamples1+1, nsamples2+1) with large values (e.g., infinity). The (i,j) entry of D represents the minimum cost of aligning the first i samples of the first input array with the first j samples of the second input array.

6. Set D[0,0]=0 and fill the first row and first column of D with large values (e.g., infinity), except for D[0,1:w] and D[1:w,0], which should be set to infinity if the corresponding temporal distance is larger than w.

7. Compute the DTW path by filling in the remaining entries of D using dynamic programming. For each entry D[i,j], compute the minimum of D[i-1,j], D[i,j-1], and D[i-1,j-1], plus the cost of aligning the i-th sample of the first input array with the j-th sample of the second input array. Only consider the entries within the window size w. Stop computing the path as soon as D[i,j] exceeds a certain threshold (early abandonment).

8. Compute the final DTW distance as D[nsamples1,nsamples2]/L, where L is the length of the shortest time series. This normalization factor ensures that the DTW distance is between 0 and 1.

9. Optionally, visualize the resulting DTW path and check its correctness using matplotlib.

Here is an example code for implementing DTW with early abandonment in Python:

import numpy as np
import matplotlib.pyplot as plt

def dtw(X, Y, w):
"""Compute the DTW distance between two multidimensional arrays X and Y"""
dist = lambda x, y: np.linalg.norm(x - y)  # Euclidean distance function
X, Y = np.asarray(X), np.asarray(Y)
n_samples1, n_samples2 = X.shape, Y.shape
D = np.ones((n_samples1+1, n_samples2+1)) * np.inf  # initialize D with inf values
D[0,0] = 0
for i in range(1, ...
more