Geographical Distance Calculations in SQL: A Comprehensive Exploration

Bruno Peixoto
4 min readDec 27, 2023

--

Geographical distance calculations are fundamental in applications dealing with location-based data. Whether you’re building a mapping application, a geospatial analytics platform, or a recommendation system, efficiently calculating distances between geographical points is a common requirement. In this article, we’ll explore different methods for performing geographical distance calculations in SQL, ranging from well-known formulas to database-specific functions.

Haversine Formula: The Classic Approach

The Haversine formula is a popular method for calculating distances between two points on the surface of a sphere, such as the Earth. It’s known for its simplicity and efficiency, making it a go-to solution for many applications. Let’s implement it as a SQL function:

CREATE FUNCTION HaversineDistance(
lat1 FLOAT, lon1 FLOAT,
lat2 FLOAT, lon2 FLOAT
) RETURNS FLOAT
BEGIN
DECLARE R FLOAT DEFAULT 6371; -- Earth radius in kilometers

DECLARE dlat FLOAT;
DECLARE dlon FLOAT;
DECLARE a FLOAT;
DECLARE c FLOAT;
DECLARE distance FLOAT;

SET dlat = RADIANS(lat2 - lat1);
SET dlon = RADIANS(lon2 - lon1);

SET a = SIN(dlat / 2) * SIN(dlat / 2) +
COS(RADIANS(lat1)) * COS(RADIANS(lat2)) *
SIN(dlon / 2) * SIN(dlon / 2);

SET c = 2 * ATAN2(SQRT(a), SQRT(1 - a));

SET distance = R * c;

RETURN distance;
END;

This function calculates the distance between two points specified by latitude and longitude.

PostGIS: Leveraging Database Extensions

PostGIS is a powerful extension of PostgreSQL that provides advanced geospatial capabilities. It includes functions like ST_Distance for efficient distance calculations. Here's an example of using PostGIS in SQL:

SELECT ST_Distance(
'POINT(-122.4194 37.7749)'::geography,
'POINT(-118.2437 34.0522)'::geography
) AS distance_postgis;

PostGIS handles the complexities of geospatial calculations internally, providing a convenient and optimized solution.

SQL Server: Database-Specific Functions

SQL Server, too, has built-in functions for handling geospatial data. The GEOGRAPHY data type and associated methods, such as STDistance(), offer a straightforward way to calculate distances:

DECLARE @point1 geography = geography::Point(37.7749, -122.4194, 4326);
DECLARE @point2 geography = geography::Point(34.0522, -118.2437, 4326);
SELECT @point1.STDistance(@point2) AS distance_sql_server;

Vincenty Formulae: An Alternative for Precision

While the Haversine formula is suitable for short to medium distances, it might lack the precision needed for long distances. The Vincenty formulae offer increased accuracy and can be implemented ad-hoc in SQL:

-- Function to calculate Vincenty distance between two points
CREATE FUNCTION VincentyDistance(
lat1 FLOAT, lon1 FLOAT, -- Latitude and longitude of the first point
lat2 FLOAT, lon2 FLOAT -- Latitude and longitude of the second point
) RETURNS FLOAT
BEGIN
DECLARE a FLOAT DEFAULT 6378137; -- Semi-major axis of the WGS-84 ellipsoid in meters
DECLARE f FLOAT DEFAULT 1/298.257223563; -- Flattening of the WGS-84 ellipsoid

DECLARE U1 FLOAT;
DECLARE U2 FLOAT;
DECLARE L FLOAT;
DECLARE lambda FLOAT;
DECLARE sinU1 FLOAT;
DECLARE cosU1 FLOAT;
DECLARE sinU2 FLOAT;
DECLARE cosU2 FLOAT;
DECLARE sinLambda FLOAT;
DECLARE cosLambda FLOAT;
DECLARE sinSigma FLOAT;
DECLARE cosSigma FLOAT;
DECLARE sigma FLOAT;
DECLARE sinAlpha FLOAT;
DECLARE cosSqAlpha FLOAT;
DECLARE cos2SigmaM FLOAT;
DECLARE C FLOAT;

SET U1 = ATAN((1 - f) * TAN(RADIANS(lat1)));
SET U2 = ATAN((1 - f) * TAN(RADIANS(lat2)));
SET L = RADIANS(lon2 - lon1);
SET lambda = L;
SET sinU1 = SIN(U1);
SET cosU1 = COS(U1);
SET sinU2 = SIN(U2);
SET cosU2 = COS(U2);

REPEAT
SET sinLambda = SIN(lambda);
SET cosLambda = COS(lambda);
SET sinSigma = SQRT((cosU2*SIN(lambda)) * (cosU2*SIN(lambda)) +
(cosU1*sinU2-sinU1*cosU2*cosLambda) * (cosU1*sinU2-sinU1*cosU2*cosLambda));

IF (sinSigma = 0) THEN
SET distance = 0; -- Coincident points
RETURN distance;
END IF;

SET cosSigma = sinU1*sinU2 + cosU1*cosU2*cosLambda;
SET sigma = ATAN2(sinSigma, cosSigma);
SET sinAlpha = cosU1 * cosU2 * sinLambda / sinSigma;
SET cosSqAlpha = 1 - sinAlpha * sinAlpha;
SET cos2SigmaM = cosSigma - 2*sinU1*sinU2/cosSqAlpha;

SET C = f/16*cosSqAlpha*(4+f*(4-3*cosSqAlpha));
SET lambda_prev = lambda;
SET lambda = L + (1-C) * f * sinAlpha *
(sigma + C*sinSigma*(cos2SigmaM+C*cosSigma*(-1+2*cos2SigmaM*cos2SigmaM)));

UNTIL (ABS(lambda - lambda_prev) < 1e-12) END REPEAT;

SET uSq = cosSqAlpha * (a*a - b*b) / (b*b);
SET A = 1 + uSq/16384*(4096+uSq*(-768+uSq*(320-175*uSq)));
SET B = uSq/1024 * (256+uSq*(-128+uSq*(74-47*uSq)));
SET deltaSigma = B*sinSigma*(cos2SigmaM+B/4*(cosSigma*(-1+2*cos2SigmaM*cos2SigmaM) -
B/6*cos2SigmaM*(-3+4*sinSigma*sinSigma)*(-3+4*cos2SigmaM*cos2SigmaM)));

SET distance = b*A*(sigma - deltaSigma);

RETURN distance;
END;

This formula is particularly useful when precision is crucial, such as in applications dealing with global distances. This SQL function, named VincentyDistance, calculates the distance between two points specified by latitude and longitude using the Vincenty formula. The constants a and f represent the semi-major axis and flattening of the WGS-84 ellipsoid, respectively. The function iteratively refines the calculation until a convergence threshold is met.

Conclusion

Geographical distance calculations in SQL involve a range of methods, from classic formulas like Haversine to leveraging database-specific extensions like PostGIS and SQL Server’s geospatial functions. The choice of method depends on factors such as the specific use case, the required precision, and the capabilities of the underlying database system.

In your next geospatial project, consider the advantages each method brings to the table. Whether you opt for the simplicity of Haversine, the power of database extensions, or the precision of ad-hoc formulas, each approach contributes to the rich landscape of geographical distance calculations.

Happy mapping! 🌍🔍

--

--

Bruno Peixoto

A person. Also engineer by formation, mathematician and book reader as hobby.