I received my Laurea in Mathematics from the University of Milano in 1991, and my Ph.D. in Computer Science from the University of Milano in 1996.
Presently I am a full professor at the Dipartimento di Informatica of the Università degli Studi di Milano, and a member of the LAW. You can email me if you want. Information related to the DI (e.g., phone numbers) can be found on my unimi web page.
- Compression of web and social graphs;
- Analysis of web and social graphs;
- Pseudorandom number generators;
- Efficient data structures for large datasets;
- Indexing of large document collections;
- High-performance parallel web crawling;
- Computability on the reals;
- Theory of anonymous networks (topological properties and computability problems);
- Self-stabilizing systems;
- ...and, last but not least, any programming activity.
More in detail...
Compression of web and social graphs
Snapshots of the web generated very large graphs—you need a compact representation. WebGraph is a framework built to this purpose. Among other things, WebGraph uses new instantaneous codes for the integers and new aggressive algorithmic compression techniques.
Nonetheless, compression algorithms need good orderings of the nodes—orderings that expose the inner structure of the graph. We have made some interesting experiments to find new, efficient orderings, which led to the development of Layered Label Propagation—our current technique to permute graphs so to increase their locality.
Analysis of web and social graphs
Web graphs have special properties whose study requires a sizeable amount of mathematics, but also a careful study of actual web graphs. I participated to the study of the largest web snapshot publicly available in 2013, dispelling a number of myths, and leading to the first open ranking of the World Wide Web.
We developed a new algorithm, called HyperBall, that can approximate quickly a number of geometric centralities and the neighbourhood function of large graphs. The algorithm was used to compute for the first time statistics about the distance distribution of a large number of web and social graphs: the results can be accessed at the LAW web site. In particular, HyperBall was used to show that Facebook has just four degrees of separation and to compute our open ranking.
HyperBall has also been used to compute the scores for the first open ranking of Wikipedia, which you can browse using Wikidata categories. The site is also a living example of our axioms for centrality, which studies the (often unexpected) behavior of classical centrality measures with respect to a number of intuitive axioms. The same idea inspired our follow-up paper on rank monotonicity and our analysis of the pathology of Newman's assortativity.
If you're interested in PageRank and similar scores for the web, please have a look at this survey about the history of spectral ranking methods. You might be surprised 8^). We have also studied the paradoxical way PageRank evolves during a crawl, and the way PageRank changes depending on the damping factor.
The development of WebGraph and of the software necessary for our large-scale experiments spun a number of other satellite Java projects whose code is available as free software, including fastutil and MG4J.
Pseudorandom number generators
I designed a few pseudorandom number generators based on
linear recurrences. In particular,
I worked with Guy Steele at the new
family of PRNGs available in Java 17. The family, called LXM, uses new, better
tables of multipliers for LCGs with power-of-two moduli. Moreover,
java.util.random contains ready-to-use implementations of
Faster and better generators are described in the PRNG shootout page and in the papers linked therein.
Efficient data structures for large datasets
Very large data sets requires efficient compressed data structures. An exciting recent development in this direction is that of succinct data structures, which implement standard data structures using the minimum possible space without slowing down more than by a constant factor. I proposed very efficient implementations of several basic building blocks used in succinct data structures by means of broadword programming. The building blocks are used throughout the code of the Sux and Sux4J projects, which aim at providing off-the-shelf useful basic succinct data structures such as the Elias–Fano representation of monotone sequences, Genuzio–Ottaviano–Vigna functions, the implementation of a minimal perfect hash function using, to the best of our knowledge, the least amount of space, and Fenwick trees optimized for dynamic rank and selection.
In the same spirit, we devised and implemented (almost) optimal monotone minimal perfect hash functions and other low-memory, high-speed data structures for very large sets of strings.
Indexing of large document collections
MG4J is a framework for building indices of large document collection based on the classical inverted-index approach. The kind of index constructed is very configurable (e.g., you can choose your preferred coding method), and moreover some new research has gone into providing efficient skips and minimal-interval semantics.
High-performance parallel web crawling
I collaborated with Paolo Boldi, Bruno Codenotti and Massimo Santini to the development of UbiCrawler, a scalable, fault-tolerant and fully distributed web crawler. The first report on the design of UbiCrawler won the Best Poster Award at the Tenth World Wide Web Conference. You can also read a more in-depth report.
A new open-source crawler, BUbiNG, developed with Paolo Boldi, Andrea Marino and Massimo Santini is now available at the LAW web site and through Maven.
Computability on the reals
I proved, in collaboration with Paolo Boldi, the first results comparing the Blum–Shub–Smale model (i.e., computation with exact reals) and Type 2 Theory of Effectivity (i.e., feasible discrete approximations). We gave a precise quantification of the gap between the two models using degree theory: in a slogan, equality is a jump. We also introduced a notion of Turing closure for Archimedean fields, which was used to characterize those fields in which Type 2-decidable sets exist.
Computability problems in anonymous networks
Deciding whether a problem can be solved with a certain knowledge in a network where all processors are identically programmed (an anonymous network) is a problem dating about 20 years. The concept of graph fibration gave rise to an effective characterization of the tasks solvable under a wide range of models and knowledge assumptions.
Topological properties of distributed systems
Faster solutions to classical distributed problems are possible if the way processors label the link interconnecting them respects a global coherence property called sense of direction, and formalized by Flocchini, Mans and Santoro (for instance, a mesh with the standard North/South/Est/West labelling has this property). We showed that weak sense of direction can be decided in logarithmic parallel time, and this implies that the same is true of Cayley colour graphs; this is the first complexity result on Cayley graph (the non-coloured case is not known to be polynomial or NP-complete).
We also proved the first lower bounds for weak sense of direction, using noncostructive techniques based on random graph theory, and we extended these results to regular graphs by proving some new properties of random regular graphs of high degree.
Self-stabilizing distributed system must converge to a desired behaviour in spite of an arbitrary initial state. By making self-stabilizing the classical view construction algorithm, we showed that the effective characterization for anonymous network can be extended to self-stabilizing systems. As a (surprising) consequence, there is no difference in distributed computational power between an arbitrary initial state and a chosen initial state if the latter is forced to be the same for all processors.