**Joining the dots - Statistics for spatial point patterns**

In 1854 a physician investigating cholera had the brilliant idea to make a map in which each cholera victim was represented by a dot placed at their home address. The clustering of dots on the map strongly suggested a common source for cholera infection, and pointed to its location.

The same idea can be used to study the spatial locations of gold deposits in a geological survey, galaxies in a survey of the distant universe, bird nests in a forest, and so on.

To draw scientific conclusions from a scatter of dots, we need statistical principles (whether we are aware of them or not!) Over the last 50 years, statisticians and other scientists have struggled

to establish those principles. The history of this field of `spatial statistics' is full of missteps, blind alleys, quirky ideas and amazing achievements. Finally it seems that a coherent science of

spatial statistics may be emerging.

**Variational approach for spatial point process intensity estimation**

I will introduce a new variational estimator for the intensity function of an inhomogeneous spatial point process with points in the d-dimensional Euclidean space and observed within a bounded region. The variational estimator applies in a simple and general setting when the intensity function is assumed to be of log-linear form.

Obtained from the solution of a linear system of equations, the variational estimator is very simple to implement and quicker than alternative estimation procedures. I will show asymptotic properties as well as finite-sample properties in comparison with the maximum first order composite likelihood estimator when considering various inhomogeneous spatial point process models and dimensions.

This is a joint work with Jesper Møller (Aalborg University).

**Gibbsian germ-grain models**

The germ-grain models are built by unifying random convex sets (the grains) centered at the points (the germs) of a spatial point process. It is used for modelling random surfaces and interfaces, geometrical structures growing from germs, etc. When the grains are independent and identically distributed, and the germs are given by the locations of a Poisson point process, the germ-grain model is known as the Boolean model. Because of the independence properties of the Poisson process, the Boolean model is sometimes caricatural for the applications in Biology or Physics. So Gibbsian modifications, based on a morphological Hamiltonian, are considered in order to be more relevant. The classical Quermass-Hamiltonian is defined by the linear combination of the fundamental Minkowski functionals (area, perimeter and Euler-Poincare characteristic in dimension 2). However other Hamiltonians may be considered. The random cluster interaction, based on the number of connected components or the exclusion interaction, which penalizes the overlapping of convex sets with different types, are such examples.

During the talk we will discuss three questions around these Gibbsian germ-grain models: Existence, Phase transition (and percolation of course), and the statistical inference issues.

**Determinantal point process models and statistical inference: Theoretical aspects**

** **

Statistical models and methods for determinantal point processes (DPPs) seem largely unexplored, though they possess a number of appealing properties and have been studied in mathematical physics, combinatorics, and random matrix theory. We demonstrate that DPPs provide useful models for the description of repulsive spatial point processes. In particular, we investigate the range of repulsiveness that DPPs can cover, and we characterize the most repulsive DPP, showing that DPPs are particularly adapted to 'soft-core' repulsive data. Such data are usually modelled by Gibbs point processes, where the likelihood and moment expressions are intractable and simulations are time consuming. We develop parametric models of DPPs, where the likelihood and moment expressions can be easily evaluated and realizations can be quickly simulated. We discuss how statistical inference is conducted using the likelihood or moment properties of DPP models.

This work has been carried out in collaboration with Jesper Møller and Ege Rubak, Aalborg University (the paper is available at arXiv:1205.4818). The study of the repulsiveness of DPPs is part of a joint work with Christophe Biscio, University of Nantes.

**General solution for multiple testing problem in Monte Carlo tests**

The rank envelope test was shown to solve the problem of multiple testing for the so-called envelope test in Myllymäki M., Mrkvicka T., Seijo H., Grabarnik P., 2013, *Global envelope tests for spatial processes*, arxiv.org/abs/1307.0239. Indeed, it extends the envelope test to a formal statistical test which provides both the $p$-value and a simultaneous envelope that is adjusted for inference for all distances at the significance level $\alpha$. However, the idea of the rank envelope test can be used in any Monte Carlo test where a distribution of a random vector with dependent components is explored. Several such examples will be presented in this talk. For example, multiple testing adjustment for goodness-of-fit tests with several summary functions, the test for comparing two or more groups of point patterns at once, the test for dependence of components in the multi-type process with more than two types and simultaneous goodness-of-fit test for several point patterns.

This is joint work with Ute Hahn and Mari Myllymäki.

**Rank envelope test for spatial point patterns**

The envelope test, where an empirical test function T(r) is compared with "extremal" behaviour of test functions estimated from simulations of a null model, is often used in testing spatial hypotheses. This method is a test for a fixed distance r only, whereas the functions are inspected for all distances r on an interval I=[r_min,r_max]. In this talk, the rank envelope test is introduced. This test considers simultaneous inference for all distances r on I. It provides p-values and allows to control the global type I error probability exactly when a simple hypothesis is tested. Further, the distances where the empirical test function goes outside the constructed simultaneous envelope on I indicate reasons of rejection of the null hypothesis.

This is joint work with Tomás Mrkvicka, Henri Seijo and Pavel Grabarnik.

**Determinantal point processes and statistical inference: Some case studies with R**

Determinantal point process (DPP) models constitute one of the few non-Poisson point process model classes where we have access to closed form expressions for both the likelihood function and the moments. Furthermore, we have an exact simulation algorithm which avoids the use of Markov chain Monte Carlo methods. These properties make DPP models well suited for statistical analysis. In this talk I will demonstrate how simulation and statistical inference for DPPs is carried out in practice using software developed in R. Specifically, I will show how we have analyzed several real datasets using this software and the DPP framework. This includes model specification, parameter estimation, simulation from the fitted model, and goodness-of-fit assessment.

Time permitting, I will end the talk with a brief demonstration of how recent developments allow us to extend the software to handle stationary DPPs on a sphere (e.g. the surface of Earth).

The main part of the work has been carried out in collaboration with Jesper Møller from Aalborg University and Frédéric Lavancier from Nantes University, while final part concerning DPPs on spheres is an ongoing collaboration which also includes Morten Nielsen (Aalborg University).

**Hierarchical modeling of spatial structure of epidermal nerve fibers**

Epidermal nerve fiber (ENF) density and morphology are used to diagnose small fiber involvement in diabetic and other small fiber neuropathies. ENF density and summed length of ENFs per epidermal surface area are reduced, and based on mainly visual inspection, ENFs seem to appear more clustered within the epidermis (the outmost living layer of the skin) in subjects with small fiber neuropathy compared to healthy subjects. We have investigated the spatial structure of ENF entry points, which are the locations where the nerves enter the epidermis, and ENF end points, which are the terminal nodes of ENFs. The study is based on suction skin blister specimens from two body locations of 32 healthy subjects and 15 subjects with diabetic neuropathy. The ENF entry (end) points are regarded as a realization of a spatial point process and Ripley's K function is used to summarize the spatial structure. A hierarchical Bayesian approach is then used to model the relationship between this summary characteristic and the disease status and some other covariates (gender, age, body mass index).

**J-functions for inhomogeneous point processes in space and time**

The analysis of data in the form of a map of (marked) points often starts with the computation of summary statistics. Some statistics are based on inter-point distances, others on the average number of points in sample regions, or on geometric information. The J-function (Van Lieshout and Baddeley, 1996) is a particular example that compares the size of gaps under the distribution of the point proces to that under its Palm distribution.

In the exploratory stage, it is usually assumed that the data constitute a realisation of a stationary point process and deviations from a stationary Poisson process are studied to suggest a suitable model. Although stationarity is a convenient assumption, especially if - as is often the case - only a single map is available, in many areas of application , though, heterogeneity is present. To account for possible non-stationarity, Baddeley et al. (2000) defined a reduced second moment function by considering the random measure obtained from the mapped point pattern by weighting each observed point according to the (estimated) intensity at its location. Gabriel and Diggle (2009) took this idea further into the domain of space time point processes.

In this talk, we describe an extension of the J-function that is able to accommodate spatial and/or temporal inhomogeneity.

**Thinning-stable point processes as a model for bursty spatial data**

Thinning-stable point processes are an important class of generally infinite intensity point processes since they are exactly the processes arising as a limit in superposition-thinning schemes. It can be shown that these processes are exactly Cox (doubly stochastic Poisson) processes with strictly stable random intensity measures and, in a regular case, they are cluster processes with a specific heavy tailed distribution of cluster size. The cluster representation uses the so-called Sibuya point processes that constitute a new family of purely random point processes. Based on this facts, we develop statistical inference for stable processes and also discuss their generalisation when the thinning operation is replaced by a stochastic operation based on branching.