Efficient Median Finding Algorithm for Two Databases with Logarithmic Time Complexity

Efficient Median Finding in Two Databases

This document outlines an algorithm designed to efficiently compute the median of 2n values stored across two distinct databases. The median is defined as the nth smallest value in the combined dataset.

The Challenge: You can only interact with the databases through queries. A single query allows you to request the kth smallest value from a chosen database. Given the cost associated with each query, we aim to minimize the total number of queries required.

The Algorithm

Initialization: - left = 1: Represents the lower bound of the search range. - right = n: Represents the upper bound of the search range.
Iterative Search: - While left <= right: - mid = (left + right) / 2: Calculate the midpoint of the current range. - a = queryDatabase1(mid): Query the first database for its midth smallest value. - b = queryDatabase2(mid): Query the second database for its midth smallest value. - If a <= b: - left = mid + 1: Adjust the lower bound, discarding values less than or equal to a. - Else: - right = mid - 1: Adjust the upper bound, discarding values greater than or equal to b.
Result: - The final value of left represents the nth smallest value, which is the median of the combined dataset.

Pseudo-code

function findMedian(n): left = 1 right = n while left <= right: mid = (left + right) / 2 a = queryDatabase1(mid) b = queryDatabase2(mid) if a <= b: left = mid + 1 else: right = mid - 1 return left

Correctness

The algorithm leverages a binary search strategy. In each iteration, the search range is halved by comparing the midth smallest values (a and b) from the two databases. If a <= b, we know that the median must be greater than a and we adjust the lower bound accordingly. Conversely, if a > b, the median must be less than b and we adjust the upper bound. This iterative narrowing of the search space guarantees that the final left value will pinpoint the nth smallest element, our desired median.

Complexity Analysis

Time Complexity: The binary search approach ensures that the search space is halved with every iteration. This leads to a logarithmic number of iterations, resulting in a time complexity of O(log n). - Query Complexity: Within each iteration, we perform two database queries. As the number of iterations is O(log n), the total number of queries is also O(log n).

Conclusion

The presented algorithm provides an efficient solution for finding the median of values distributed across two databases. Its logarithmic time and query complexity make it a highly suitable choice for scenarios where minimizing database interactions is critical.

Efficient Median Finding Algorithm for Two Databases with Logarithmic Time Complexity