By James Bell
Euclidean Distance is one method of measuring the direct line distance between two points on a graph. This is one of many different ways to calculate distance and applies to continuous variables. For categorical data, we suggest either Hamming Distance or Gower Distance if the data is mixed with categorical and continuous variables. This can be in a 2-dimensional, 3-dimensional, or even 1000-dimensional space. Going from a 2-dimension to a 3-dimension is actually pretty simple. We can actually use the same basic formula for nth dimensions which we describe below.
We use the Pythagorean Theorem to determine the distance between two points because the x and y axis form a right triangle. On a Cartesian Coordinate system, we use x and y to communicate where on the graph our point is. We have only the x axis and a y axis in a 2-D plane.
Which if you consider a to be
and b to be
you have
Then let’s say that this equals variable c. We can move the square root to the other side of the equal sign by squaring c you get
Look familiar? When we look at a, b, and c as distances between two points, the relationship between the Pythagorean Theorem and Euclidean Distance equations becomes clear. While I personally detest Pythagoras as a person, his Math does work. This is a fundamental relationship in Euclidean Geometry.
In a 3-D plane, we add z to our x and y axis to create a 3rd axis. We simply add in the dimension to our 2-D formula.
Now what if we have more than 3 Dimensions? I do not attempt to make fun of or attempt to argue that there are more than 3 dimensions of space, but in Advanced Calculus and Applied Machine Learning applications, sometimes we do indeed have more than 3 dimensions. This is because we are looking at each dimension as a variable that describes a point, cell, or case. This formula is more robust because it is the same as the previously described 2-D and 3-D formulas as we assign n as the dimension variable.
Adding more and more dimensions will introduce you to the Curse of Dimensionality. Increasing dimensions makes it more difficult to separate signals from noise which can cause over-fitting in classification analysis such as kNN (k Nearest Neighbors). The more dimensions you add, the farther away you go from the 2 and 3-D world. It can be done, but caution must be taken.
Euclidean distance can also suffer from extreme variables. You may have a point with an x and a y that are really close to another point, but their z dimension may be miles apart. Does that really mean that these two points should not be grouped together?
15 March 2020
15 December 2019
14 November 2019
07 November 2019
12 October 2019
26 September 2019
11 September 2019
Powered By Impressive Business WordPress Theme