We introduce the adaptive cuckoo filter (ACF), a data structure for approximate set membership that extends cuckoo filters by reacting to false positives, removing them for future queries. As an example application, in packet processing queries may correspond to flow identifiers, so a search for an element is likely to be followed by repeated searches for that element. Removing false positives can therefore significantly lower the false positive rate. The ACF, like the cuckoo filter, uses a cuckoo hash table to store fingerprints. We allow fingerprint entries to be changed in response to a false positive in a manner designed to minimize the effect on the performance of the filter. We show that the ACF is able to significantly reduce the false positive rate by presenting both a theoretical model for the false positive rate and simulations using both synthetic data sets and real packet traces.
Given an urban road network and a set of origin-destination pairs, the traffic assignment problem asks for the traffic flow on each road segment. A common solution employs a feasible-direction method, where the direction-finding step requires many shortest-path computations. In this paper, we significantly accelerate the computation of flow patterns, enabling interactive transportation and urban planning applications. We achieve this by building a traffic assignment procedure upon customizable contraction hierarchies (CCHs), revisiting and carefully engineering CCH customization and queries, and adapting CCHs to compute batched point-to-point shortest paths. Although motivated by the traffic assignment problem, our optimizations apply to CCHs in general. In contrast to previous work, our evaluation uses real-world production data for all parts of the input. On a metropolitan area encompassing more than 2.7 million inhabitants, we decrease the flow-pattern computation for a typical two-hour morning peak from 75.6 to 9.5 seconds on one core, and 3.1 seconds on a four-core commodity machine. This represents a speedup of 24 over the state of the art, and 500 over the Dijkstra-based baseline.
We study the problem of computing isocontours in static and dynamic road networks, where the objective is to identify the boundary of the region that is reachable from a given source within a certain amount of time (or another limited resource). Although there is a wide range of practical applications for this problem (e.g., urban planning, geomarketing, visualizing the cruising range of a vehicle), there has been little research on fast algorithms for large, realistic inputs, and existing approaches tend to compute more information than necessary. Our contribution is twofold: (1) We propose compact but sufficient definitions of isocontours, based on which, (2) we provide several easy-to-parallelize, scalable algorithmic approaches for faster computation. By extensive experimental analysis, we demonstrate that our techniques enable interactive isocontour computation within milliseconds even on continental networks, significantly faster than the state-of-the-art.
Scalable Kernelization for Maximum Independent Sets
The state-of-the-art tools for practical graph canonization are all based on the indivi\-dualization-refinement paradigm, and their difference is primarily in the choice of heuristics they include and in the actual tool implementation. It is thus not possible to make a direct comparison of how individual algorithmic ideas affect the performance on different graph classes. We present an algorithmic software framework that facilitates implementation of heuristics as independent extensions to a common core algorithm. It therefore becomes easy to perform a detailed comparison of the performance and behaviour of different algorithmic ideas. Implementations are provided of a range of algorithms for tree traversal, target cell selection, and node invariant, including choices from the literature and new variations. The framework readily supports extraction and visualization of detailed data from separate algorithm executions for subsequent analysis and development of new heuristics. Using collections of different graph classes we investigate the effect of varying the selections of heuristics, often revealing exactly which individual algorithmic choice is responsible for particularly good or bad performance. On several benchmark collections, including a newly proposed class of difficult instances, we additionally find that our implementation performs better than the current state-of-the-art tools.
Clustering is a fundamental problem in data science, yet, the variety of clustering methods and their sensitivity to parameters make clustering hard. To analyze the stability of a given clustering algorithm while varying its parameters, and to compare clusters yielded by different algorithms, several comparison schemes based on matchings, information theory and various indices (Rand, Jaccard) have been developed. We go beyond these by providing a novel class of methods computing meta-clusters within each clustering? a meta-cluster is a group of clusters, together with a matching between these. Let the intersection graph of two clusterings be the edge-weighted bipartite graph in which the nodes represent the clusters, the edges represent the non empty intersection between two clusters, and the weight of an edge is the number of common items. We introduce the so-called D-family-matching problem on intersection graphs, with D the upper-bound on the diameter of the graph induced by the clusters of any meta-cluster. First we prove NP-completeness and APX-hardness results, and unbounded approximation ratio of simple strategies. Second, we design exact polynomial time dynamic programming algorithms for some classes of graphs (in particular trees). Then, we prove spanning-tree based efficient algorithms for general graphs. Our experiments illustrate the role of D as a scale parameter providing information on the relationship between clusters within a clustering and in-between two clusterings. They also show the advantages of our built-in mapping over classical cluster comparison measures such as the variation of information (VI).