Although many fast methods exist for constructing kNN-graph for low dimensional data, it is still an open question how to do it efficiently for high dimensional data. We present a new method to construct an approximate kNN-graph for medium to high dimensional data. Our method uses one dimensional mapping with Z-order curve to construct an initial graph and then continues to improve this using neighborhood propagation. Experiments show that the method is faster than the compared methods with four different benchmark data sets, the dimensionality of which ranges from 14 to 544. We also show that errors in the approximate kNN-graph originate more likely from outlier points; and it can be detected during run time which points are likely to have errors in their neighbors.

We introduce the Connection Scan Algorithm (CSA) to efficiently answer queries to timetable information systems. The input consists, in the simplest setting, of a source position and a desired target position. The output consist is a sequence of vehicles such as trains or buses that a traveler should take to get from the source to the target. We study several problem variations such as the earliest arrival and profile problems. We present algorithm variants that only optimize the arrival time or additionally optimize the number of transfers in the Pareto sense. An advantage of CSA is that is can easily adjust to changes in the timetable, allowing the easy incorporation of known vehicle delays. We additionally introduce the Minimum Expected Arrival Time (MEAT) problem to handle possible, uncertain, future vehicle delays. We present a solution to the MEAT problem that is based upon CSA. Finally, we extend CSA using the multilevel overlay paradigm to answer complex queries on nation-wide integrated timetables with trains and buses.

The minimum cut problem for an undirected edge-weighted graph asks us to divide its set of nodes into two blocks while minimizing the weight sum of the cut edges. Here, we introduce a linear-time algorithm to compute near-minimum cuts. Our algorithm is based on cluster contraction using label propagation and Padberg and Rinaldi's contraction heuristics [SIAM Review, 1991]. We give both sequential and shared-memory parallel implementations of our algorithm. Extensive experiments on both real-world and generated instances show that our algorithm finds the optimal cut on nearly all instances significantly faster than other state-of-the-art exact algorithms, and our error rate is lower than that of other heuristic algorithms. In addition, our parallel algorithm runs a factor 7.5x faster on average with 32 threads. We also give a version of our algorithm that first performs random edge contraction. This version achieves a lower running time and better parallel scalability at the expense of a higher error rate.

It is well-known that Quicksort - which is commonly considered as one of the fastest in-place sorting algorithms - suffers in an essential way from branch mispredictions. We present a novel approach to address this problem by partially decoupling control from data flow: in order to perform the partitioning, we split the input in blocks of constant size; then, all elements in one block are compared with the pivot and the outcomes of the comparisons are stored in a buffer. In a second pass, the respective elements are rearranged. By doing so, we avoid conditional branches based on outcomes of comparisons (except for the final Insertionsort). Moreover, we prove that when sorting $n$ elements the average total number of branch mispredictions is at most $\epsilon n \log n + \Oh(n)$ for some small $\epsilon$ depending on the block size. Our experimental results are promising: when sorting random integer data, we achieve an increase in speed (number of elements sorted per second) of more than 80% over the GCC implementation of Quicksort (C++ std::sort). Also for many other data types and non-random inputs, there is still a significant speedup. Only in few special cases like (almost) sorted inputs, std::sort can beat our implementation. Moreover, on random input permutations, our implementation is even slightly faster than Super Scalar Sample Sort, which uses a linear amount of additional space. Finally, we also apply our approach to Quickselect and we obtain a speed-up of more than 100% over the GCC implementation (C++ std::nth_element).