I received my Laurea in Mathematics from the University of Milano in 1991, and my Ph.D. in Computer Science from the University of Milano in 1996.

Presently I am an associate professor at the Dipartimento di Informatica
of the Università degli Studi di Milano, and a member of
the LAW.
You can mail me if you want. Information
related to the DI (e.g., phone numbers) can be found on my `unimi` web page.

I am the coordinator of the UMIL unit of the NADINE FET EU project.

# Current Interests

- Compression of web and social graphs;
- Analysis of web and social graphs;
- Efficient data structures for large datasets;
- Indexing of large document collections;
- High-performance parallel web crawling;
- Automatic generation of software from conceptual models;
- Computability on the reals;
- Theory of anonymous networks (topological properties and computability problems);
- Self-stabilizing systems;
- ...and, last but not least, any programming activity.

# More in detail...

## Compression of web and social graphs

Snapshots of the web generated very large graphs—you need a compact representation. WebGraph is a framework built to this purpose. Among other things, WebGraph uses new instantaneous codes for the integers and new aggressive algorithmic compression techniques.

Nonetheless, compression algorithms need good *orderings* of the
nodes—orderings that expose the inner structure of the graph. We
have made some interesting experiments to find new, efficient orderings,
which led to the development of Layered Label Propagation—our
current technique to permute graphs so to increase their locality.

## Analysis of web and social graphs

Web graphs have special properties
whose study requires a sizeable amount of mathematics, but also
a careful study of *actual* web graphs. Recently, I participated to the study of the
largest web snapshot publicly available in 2013, dispelling a number of myths, and leading to the
first open ranking of the World Wide Web.

Recently, we developed a new algorithm, called HyperBall, that can approximate quickly a number of geometric centralities and the neighbourhood function of large graphs. The algorithm was used to compute for the first time second-order statistics about the distance distribution of a large number of web and social graphs: the results can be accessed at the LAW web site. In particular, HyperBall was used to show that Facebook has just four degrees of separation and to compute our open ranking.

HyperBall has also been used to compute the scores for the first open ranking of Wikipedia, which you can browse using Wikidata categories.

If you're interested in PageRank and similar scores for the web, please have a look at this
survey about the history of spectral ranking methods. You might
be surprised `8^)`. We have also studied the
paradoxical way PageRank evolves during a crawl, and the way PageRank changes
depending on the damping factor.

The development of UbiCrawler, WebGraph, and of the software necessary for our large-scale experiments spun a number of other satellite Java projects whose code is available as free software, including fastutil and MG4J.

## Efficient data structures for large datasets

More generally, very large data sets requires efficient compressed data structures. An exciting
recent development in this direction is that of *succinct* data structures, which implement
standard data structures using the minimum possible space without slowing down more than by a constant
factor. I proposed very efficient implementations of several
basic building blocks used in succinct data structures by means of *broadword programming*. The
building blocks are used throughout the code of the Sux and
Sux4J projects, which aim at providing off-the-shelf useful
basic succinct data structures such as
the Elias–Fano representation of monotone sequences,
Majewski-Wormald-Havas-Czech functions, and more.

In the same spirit, we devised and implemented (almost) optimal monotone minimal perfect hash functions and other low-memory, high-speed data structures for very large sets of strings.

## Indexing of large document collections

MG4J is a
framework for building indices of large document collection based on the classical inverted-index approach.
The kind of index constructed is very configurable (e.g., you can choose your preferred coding method), and
moreover some new research has gone into providing efficient skips and minimal-interval semantics. Its *direct*
counterpart is Archive4J, which makes it possible to access
quickly and in a highly compressed form summaries of information about documents in a collection.

## High-performance parallel web crawling

I collaborated with Paolo Boldi, Bruno Codenotti and Massimo Santini to the development of UbiCrawler, a scalable, fault-tolerant and fully distributed web crawler. The first report on the design of UbiCrawler won the Best Poster Award at the Tenth World Wide Web Conference. You can also read a more in-depth report.

A new open-source crawler, BUbiNG, developed with Paolo Boldi, Andrea Marino and Massimo Santini is now available at the LAW web site.

## Automatic generation of software from conceptual models

ERW is an innovative system for
handling complex databases using a web browser. It uses the most recent
standards endorsed by the W3C to offer to the user a sophisticated
environment, similar to a dedicated client. Moreover, the user interface is
generated in a completely automatic way starting from a conceptual
description of the database by means of an XML-based language for
entity-relationship schemata. Customisation of the resulting application
is governed by *specification percolation*.

## Computability on the reals

I proved, in collaboration with Paolo Boldi, the
first results comparing
the Blum–Shub–Smale model (i.e., computation with exact reals)
and Type 2 Theory of Effectivity (i.e., feasible discrete approximations).
We gave a precise quantification of the gap between the two models using
degree theory: in a slogan, equality is a
jump. We also introduced a notion of *Turing closure* for Archimedean
fields, which was used to characterize those fields in which Type 2-decidable sets exist.

## Computability problems in anonymous networks

Deciding whether a problem can be solved with a certain knowledge in a
network where all processors are identically programmed (an
*anonymous* network) is a problem dating about 20 years. The concept
of graph fibration gave rise to an effective characterization of the tasks
solvable under a wide range of models and knowledge assumptions.

## Topological properties of distributed systems

Faster solutions to classical distributed problems are possible if the way
processors label the link interconnecting them respects a global coherence
property called *sense of direction*, and formalized by Flocchini,
Mans and Santoro (for instance, a mesh with the standard
North/South/Est/West labelling has this property). We showed that weak sense of direction can be decided in
logarithmic parallel time, and this implies that the same is true of
Cayley colour graphs; this is the first complexity result on Cayley graph
(the non-coloured case is not known to be polynomial or NP-complete).

We also proved the first lower bounds for weak sense of direction, using noncostructive techniques based on random graph theory, and we extended these results to regular graphs by proving some new properties of random regular graphs of high degree.

## Self-stabilizing systems

Self-stabilizing distributed system must converge to a desired behaviour in spite of an arbitrary initial state. By making self-stabilizing the classical view construction algorithm, we showed that the effective characterization for anonymous network can be extended to self-stabilizing systems. As a (surprising) consequence, there is no difference in distributed computational power between an arbitrary initial state and a chosen initial state if the latter is forced to be the same for all processors.