Architect for Usability - Predicting Service with KNN Algorithm
The topic of algorithms has been in the news a lot lately with regards to social media and the harm they can cause. In this article we will explore making predictions with an algorithm in a positive way, to improve user experience and customer acquisition.
I started my career as a graphic designer before moving into software development in the early 2000s. The transition wasn’t as drastic as it might sound—both fields are about solving problems for people under constraints. One early challenge captures a valuable lesson I learned: solve for the user, not the spec.
I worked at a regional telecom company that was dependent on a third-party system to determine whether a new address qualified for service. Due to peculiarities of copper-wire service, not all addresses qualify. The third-party lookup system was notoriously slow. Waiting several minutes for an answer might have been acceptable in back-office processes, but for someone filling out a web form, it meant abandonment, and a lost customer.
The Problem
The third-party API was a black box, and performance wasn’t going to improve. So how could we work around this hostile dependency to provide an acceptable user experience?
We couldn’t fix the API, but we had an existing database of customers - and addresses - that do have service. This provides positive examples (hasService = 1).
A manual verification of service via the 3rd party API was required before scheduling installation, but it did not need to negatively impact the user. As new addresses are verified, new data is added both positive (hasService = 1) and negative (hasService = 0). Over time this creates a more accurate dataset.
At its core, this is a geospatial classification problem: can nearby known addresses be used to predict service availability at a new one—fast enough to reduce abandonment rates?
The Solution
We start with the assumption that if your neighbors have service, you probably qualify too. That maps neatly to a k-nearest neighbors (KNN) approach. Each known, nearby address was labeled hasService: 1 or hasService: 0. For any new address, we’d find the closest known ones, within a reasonable distance, and let them “vote.”
A minimal version might use k = 3. If two or more neighbors had service, the new address was predicted to qualify. Variants could weight votes by distance, increase k, or switch to radius-based searches.
Other proximity-based methods—geohashing, quadtrees, Delaunay triangulation—offer different trade-offs, but they all follow the same principle: let nearby data guide the decision. We will explore these in future posts.
KNN in Practice
Here’s a simple JavaScript example using the Haversine formula to account for Earth’s curvature:
// =============================================================
// KNN implementation for 2D proximity search (lat/lon)
// =============================================================
/**
 * @typedef {Object} GeoPoint
 * @property {number} lat         - Latitude
 * @property {number} lon         - Longitude
 * @property {number} hasService  - 1 if service is available, 0 if not
 */
/**
 * Convert degrees to radians
 * @param {number} deg
 * @returns {number}
 */
const toRad = deg => deg * Math.PI / 180;
/**
 * Calculate the Haversine distance in meters
 * @param {GeoPoint} a
 * @param {GeoPoint} b
 * @returns {number}
 */
const haversine = (a, b) => {
    const R     = 6371000;
    const dLat  = toRad(b.lat - a.lat);
    const dLon  = toRad(b.lon - a.lon);
    const lat1  = toRad(a.lat);
    const lat2  = toRad(b.lat);
    const x     = Math.sin(dLat / 2) ** 2 +
                  Math.cos(lat1) * Math.cos(lat2) * Math.sin(dLon / 2) ** 2;
    return 2 * R * Math.atan2(Math.sqrt(x), Math.sqrt(1 - x));
};
/**
 * Search for k nearest neighbors
 * @param {GeoPoint[]} points - Array of known points
 * @param {GeoPoint} target   - The target point to search for
 * @param {number} k          - The number of neighbors to find
 * @returns {GeoPoint[]}
 */
const knnSearch = (points, target, k) => {
    // Calculate distances and sort by nearest
    const distances = points.map(p => ({
        point     : p,
        distance  : haversine(p, target)
    })).sort((a, b) => a.distance - b.distance);
    // Return the top k nearest neighbors
    return distances.slice(0, k).map(d => d.point);
};
/**
 * Estimate service availability using KNN
 * @param {GeoPoint} target - The target point to estimate
 * @param {GeoPoint[]} known - Array of known points
 * @param {number} k - Number of neighbors to consider
 * @returns {Object} - { method, decision, probability, neighbors }
 */
const estimateService = (target, known, k = 3) => {
    const neighbors = knnSearch(known, target, k);
    if (!neighbors.length) {
        return {
            method      : 'knn',
            decision    : 'unknown',
            probability : 0,
            neighbors   : []
        };
    }
    // Calculate the probability of service based on neighbors
    const total = neighbors.reduce((sum, n) => sum + n.hasService, 0);
    const probability = total / neighbors.length;
    return {
        method    : 'knn',
        decision  : probability > 0.5 ? 'available' : 'unavailable',
        probability,
        neighbors
    };
};
// =============================================================
// Example usage
// =============================================================
const knownPoints = [
    { lat: 38.8992, lon: -77.0399, hasService: 0 }, // 18th & N St NW
    { lat: 38.8997, lon: -77.0388, hasService: 1 }, // 18th & M St NW
    { lat: 38.9010, lon: -77.0410, hasService: 1 }, // 20th & O St NW
    { lat: 38.8977, lon: -77.0365, hasService: 0 }  // White House
];
// New address - Central Dupont Circle area
const newAddress = { lat: 38.9000, lon: -77.0400 }; 
const result = estimateService(newAddress, knownPoints, 3);
console.log(result);
The result for a new address is a quick estimate of service availability—milliseconds instead of minutes.
Pros
- Easy to implement and explain
- Works with raw latitude/longitude data
- Adapts dynamically as new addresses are added
Cons
- Scales poorly (O(n)lookups)
- Lacks spatial indexing
- Can be skewed by outliers or sparse neighborhoods
Human in the Loop
There was a hard requirement that every new order eventually had to be verified against the official (and slow) API. That lookup was mandatory before sending a technician out. But that was our problem, not the user’s.
By shifting the API check later in the process—handled by a rep after the customer had already signed up—we removed the bottleneck from the signup experience. The customer got an instant answer, and the confirmation step happened quietly in the background.
This is a good example of designing around a hostile dependency: respecting the spec, but refusing to let it dictate a bad experience.
Other Spatial Approaches
In future posts, we’ll compare this approach to:
- Geohashing, which uses spatial hashing for box-based lookup
- Quadtree search, which recursively splits maps into regions
- Delaunay triangulation, which finds the containing triangle and interpolates
All examples use real-world coordinates and are available in the GitHub repo.
Key Takeaways
Sometimes the “spec” demands calling the official system, even when it creates a painful user experience. But don’t let a hostile dependency define the user experience. Solve for the user first.