Ask Your Question

Revision history [back]

The method for computing the dissimilarity of two rows in SQL [PostgreSQL] for aggregated columns usually involves using a distance metric, such as the Euclidean distance or cosine similarity.

One way to compute the Euclidean distance between two rows is to first calculate the squared distance between each corresponding column, sum them up, and then take the square root of the result. For example:

SELECT SQRT(SUM(POWER(table1.col1 - table2.col1, 2)) + SUM(POWER(table1.col2 - table2.col2, 2)))
FROM table1, table2
WHERE table1.id = 1 AND table2.id = 2

This query calculates the Euclidean distance between row with id 1 in table1 and row with id 2 in table2, based on the values in columns col1 and col2.

Another distance metric that can be used is the cosine similarity, which measures the similarity between two vectors in terms of their orientation rather than their magnitude. This can be calculated using the dot product of the two vectors divided by the product of their magnitudes. For instance:

SELECT (SUM(table1.col1 * table2.col1) + SUM(table1.col2 * table2.col2)) / (SQRT(SUM(POWER(table1.col1, 2)) + SUM(POWER(table1.col2, 2))) * SQRT(SUM(POWER(table2.col1, 2)) + SUM(POWER(table2.col2, 2))))
FROM table1, table2
WHERE table1.id = 1 AND table2.id = 2

This query computes the cosine similarity between row with id 1 in table1 and row with id 2 in table2, based on the values in columns col1 and col2. The resulting value will be between -1 and 1, with higher values indicating greater similarity between the two rows.