# Projects

Institute for Artificial Intelligence (KI)

Information Systems (ELLIS-SimTech Research Group)

## Current projects

Constrained learning uses the language of constrained optimization to tackle learning tasks that involve statistical requirements. Its strength lies in combining the data-driven, model-free nature of learning with the expressiveness of constrained programming. Doing so, however, also combines the statistical challenges of the former with the computational complexity of the latter. Hence, it is natural to wonder if constrained learning is feasible? If solving multiple learning problems simultaneously does not compound their complexity? These are the type of questions that concern **constrained learning theory**. We now know that constrained learning can solve problems that unconstrained learning cannot while often being essentially as hard. In fact, it is usually possible to learn under constraints by solving only of unconstrained problems. These results have already enabled many applications and opened up new theoretical questions on the limits of this new learning task.

Statistical requirements lie in a spectrum between *in expectation* and *almost surely*. On the former end, learning is performed using empirical averages over the available data. This is the case, for example, in wireless resource allocation, safe reinforcement learning, and certain definitions of fairness. **Semi-infinite constrained learning** is primarily concerned with problems on the other end of the spectrum, e.g., those involving min-max properties such as robustness and invariance. For these almost sure requirements to hold, however, an infinite number of constraints must be satisfied for each data point. Combining duality and hybrid stochastic optimization–MCMC sampling algorithms yield a new approach to tackle to these seemingly intractable problems. These developments have lead to new theoretical questions and applications, such as smooth learning and probabilistic robustness, a property that lies strictly in the interior of the expectation–almost sure spectrum.

Until 60 years ago, the tractability boundary in optimization separated *linear* from *nonlinear* programs. Advances in convex analysis and barrier methods have since made it common place to hear *convex* used as a synonym for *tractable* and *non-convex* as a synonym for *intractable*. Reality is naturally more subtle. In fact, there are both computationally challenging *convex* programs—e.g., large-scale semi-definite programming—and tractable *non-convex* ones—e.g., low-rank approximation. I am particularly interested in a case of the latter that arises from the observation that **problems known to be intractable in finite dimensions often become tractable in infinite dimensions**. This means, for instance, that sparse regression is NP-hard in finite dimensions whereas its functional form is tractable. This observation precludes the use of convex relaxations to tackle *off-the-grid compressive sensing* problems and makes it possible to fit complex nonlinear models (from multi-resolution kernels to GPs), giving rise to new statistical questions. It is also instrumental in the development of constrained learning.

Massive amounts of data in our increasingly interconnected world only make sense in the context of the networks from which they arise, be them social networks, power grids, IoT devices, or industry 4.0. Graph signal processing (GSP) and graph neural networks (GNNs) grew out of the need to extract information from those network (graph) signals. These techniques, however, are difficult to scale, hindering their use for large networks that can only be partially observed or in non-stationary, dynamic settings. Yet, it seems reasonable that if two graphs are "similar", then their graph Fourier transforms, graph filters, and GNNs should also be similar. Formalizing this intuition is one of the motivations for developing the theory of **graphon signal processing**. In fact, it has been used to show that GNNs are transferable between graphs, i.e., that they can be trained on subgraphs to then be deployed on the full network. These results raise fundamental questions on the limits of this transferability as well as to what is the right graph similarity metric to characterize it.

## Past projects

When the scale of the underlying system and/or technological constraints such as computation, power, and communication, limit our capabilities, we are forced to choose a subset of the available resources to use. These might be sensors for state estimation, actuators for control, pixels for face recognition, or movie ratings to kick-start a recommender systems. These selection problems are notoriously hard (often NP-hard in fact), so that the best we can expect is to find an approximate solution. Greedy search is widely used in this context due to its simplicity, iterative nature, and the fact that it is near-optimal when the objective has a "diminishing returns" property known as *submodularity*. Yet, despite the empirical evidence for its effectiveness in the above problems, **none of them are submodular.** In fact, quadratic costs generally only display "diminishing returns" under stringent conditions. What this research has shown is that while the MSE and the LQR objective are not submodular, they are not far from it, enjoying similar near-optimal properties. While many notions of approximate submodularity had been investigated before, these were the first computable,

*a priori*guarantees for these problems, precluding the use of submodular surrogates (e.g., logdet) or convex relaxations.

As most stochastic optimization algorithms, adaptive filters suffer from trade-offs that can hinder their use in practice. Larger step sizes, for example, lead to faster convergence and better tracking, but also worse steady-state errors. Combinations of adaptive filters were proposed to address such compromises by mixing the output of a fast filter with that of an accurate one and adjusting that mixture depending on which filter is performing best. How these outputs are combined has a large influence in the resulting performance, so this research program set out to determine **what is the best way to combine adaptive filters.** To do so, it developed an algebra to describe combinations of adaptive filters using message passing graphs and used it to design and analyze a myriad of combination topologies tackling a diversity of new applications. In one instance, a combination of simple adaptive filters was used to **reduce the complexity of adaptive algorithms** by outperforming complex, Newton-type adaptive methods at roughly 30 times lower computational complexity.

Air passenger traffic has grown enormously in the last few decades. This increase in competition has made aircraft carriers and manufacturers aware of the need to find new ways to attract customers, inevitably turning to the comfort factor. Although studies on automotive comfort are abundant, those on aircraft environments are scarce, partly due to the difficulties in running experiments (costs, risks...). In order to address these issues, a real-sized aircraft cabin simulator capable of controlling variables such as sound, vibration, temperature, air flow, pressure, and lighting, was built at the University of São Paulo in collaboration with EMBRAER.

I co-designed and built the vibro-acoustic reproduction system, composed of more than 20 loudspeakers and 30 shakers. Using new MIMO equalization methods^{[ 23 ]}, we were able to precisely simulate the noise patterns of dozens of aircraft, including take-off and landing. From the control room, it is possible to monitor the environment inside the simulator using microphones and accelerometers installed on each seat. After half a decade of work, this project culminated in more than 60 simulated flights involved over 1000 people. I was responsible for the statistical analysis of the results to understand the interplay between variables and passenger comfort. This simulator is still being used in collaborations between the University of São Paulo and aeronautic industries.