Betweenness is a well-known centrality measure that ranks the nodes according to their participation in the shortest paths of a network. In several scenarios, having a high betweenness can have a positive impact on the node itself. Hence, in this paper we consider the problem of determining how much a vertex can increase its centrality by creating a limited amount of new edges incident to it. In particular, we study the problem of maximizing the betweenness score of a given node - Maximum Betweenness Improvement (MBI) - and that of maximizing the ranking of a given node - Maximum Ranking Improvement (MRI). We show that MBI cannot be approximated in polynomial-time within a factor 1-1/(2e) and that MRI does not admit any polynomial-time constant factor approximation algorithm, both unless P=NP. We then propose a simple greedy approximation algorithm for MBI with an almost tight approximation ratio and we test its performance on several real-world networks. We experimentally show that our algorithm highly increases both the betweenness score and the ranking of a given node ant that it outperforms several competitive baselines. To speed up the computation of our greedy algorithm, we also propose a new dynamic algorithm for updating the betweenness of one node after an edge insertion, which might be of independent interest. Using the dynamic algorithm, we are now able to compute an approximation of MBI on networks with up to 105 edges in most cases in a matter of seconds or a few minutes.
LFR is a popular benchmark graph generator used to evaluate community detection algorithms. We present EM-LFR, the first external memory algorithm able to generate massive complex networks following the LFR benchmark. Its most expensive component is the generation of random graphs with prescribed degree sequences which can be divided into two steps: the graphs are first materialized deterministically using the Havel-Hakimi algorithm, and then randomized. Our main contributions are EM-HH and EM-ES, two I/O-efficient external memory algorithms for these two steps. We also propose EM-CM/ES, an alternative sampling scheme using the Configuration Model and rewiring steps to obtain a random simple graph. In an experimental evaluation we demonstrate their performance; our implementation is able to handle graphs with more than 37 billion edges on a single machine, is competitive with a massive parallel distributed algorithm, and is faster than a state-of-the-art internal memory implementation even on instances fitting in main memory. EM-LFR's implementation is capable of generating large graph instances orders of magnitude faster than the original implementation. We give evidence that both implementations yield graphs with matching properties by applying clustering algorithms to generated instances. Similarly, we analyse the evolution of graph properties as EM-ES is executed on networks obtained with EM-CM/ES and find that the alternative approach can accelerate the sampling process.
Let T be a terrain and P be a set of points on its surface. An important problem in Geographic Information Science (GIS) is computing the visibility index of a point p on P, that is, the number of points in P that are visible from p. The total visibility-index problem asks for the visibility index of every point in P. We present the first subquadratic-time algorithm to solve the 1D total-visibility-index problem. Our algorithm uses a geometric dualization technique to reduce the problem into a set of instances of the red-blue line segment intersection counting problem, allowing us to find the total visibility-index in O(n (log(n))^2) time. We implement a naive O(n^2) approach and four variations of our algorithm: one that uses an existing red-blue line segment intersection counting algorithm and three new approaches that leverage features specific to our problem. Two of our implementations allow for parallel execution, requiring O((log(n))^2) time and O(n (log(n))^2) work in the CREW PRAM model. We present experimental results for both serial and parallel implementations on synthetic and real-world datasets, using two hardware platforms. Results show that all variants of our algorithm outperform the naive approach by several orders of magnitude. Furthermore, we show that our special-case red-blue line segment intersection counting implementations out-perform the existing general-case solution by up to a factor 10. Our fastest parallel implementation is able to process a terrain of more than 100 million vertices in under 3 minutes, achieving up to 85\% parallel efficiency.
Complex networks are increasingly used to model various real-world phenomena. Realistic generative network models simplify complex network research regarding data sharing, reproducibility, and scalability studies. Random hyperbolic graphs (RHGs) are a very promising family of geometric graphs with unit-disk neighborhood in the hyperbolic plane. Previous work provided empirical and theoretical evidence that this generative graph model creates networks with many realistic features. In this work we provide a new generation algorithm for RHGs. We prove its time complexity to be O((n^(3/2)+m) log n) with high probability and confirm this running time experimentally. While a recent theoretical result suggests the generation of RHGs in expected linear time, it has not been put to practice so far. Thus, our implementation is currently the only one running in subquadratic time; it is at least two orders of magnitude faster than a previous implementation -- the latter allows more general neighborhoods than the unit disk, but uses the vanilla generation process. Networks with billions of edges can now be generated in a few minutes. The acceleration stems primarily from the reduction of pairwise distance computations through a polar quadtree, which we adapt to hyperbolic space for this purpose and which can be of independent interest. Due to this acceleration, one can analyze for the first time massive (\ie with billions of edges) generated RHGs in practice. Our empirical analysis shows that RHGs are indeed similar to real-world networks with respect to many (but not all) considered network properties, for example a power-law degree distribution.
We consider the critical node detection problem (CNDP) in directed graphs, which can be defined as follows. Given a directed graph G and a parameter k, we wish to remove a subset S of at most k vertices of G such that the residual graph G \ S has minimum pairwise strong connectivity. This problem is NP-hard, and thus we are interested in practical heuristics. In this paper we apply the framework of Georgiadis et al. [SODA 2017] and provide a sophisticated linear-time algorithm for the k = 1 case. Based on this algorithm, we provide an efficient heuristic for the general case. Then, we conduct a thorough experimental evaluation of various heuristics for CNDP. Our experimental results suggest that our heuristic performs very well in practice, both in terms of running time and of solution quality.
Let P be a point set in Rd, and let M be a function mapping any subset of P to a positive real. We examine the problem of computing the mean and variance of M when a subset in P is selected according to a random distribution. We consider two distributions; in the first distribution (the Bernoulli distribution), each point p in P is included in the random subset independently, with probability À(p). In the second distribution (the fixed-size distribution), a subset of exactly s points is selected uniformly at random among all possible subsets of s points in P. We present efficient algorithms for computing the mean and variance of several geometric measures when point sets are selected under one of the described random distributions. We also implemented three of those algorithms: an algorithm that computes the mean volume of the 2D bounding box in the Bernoulli distribution, an algorithm that computes the exact mean and variance of the mean pairwise distance (MPD) for d-dimensional point sets in the fixed-size distribution, and an (1-µ)-approximation algorithm for the same measure. We conducted experiments where we compared the performance of our implementations with a standard heuristic approach. We show that our implementations are very efficient, and they produce results of high precision. We also compared the implementation of our exact MPD algorithm with the corresponding (1-µ)-approximation method; we show that the approximation method performs faster in certain cases, while also providing high-precision approximations.