Introduction 引言





What is geostatistics 什么是地质统计学?

The basic idea of geostatistics is to describe and estimate spatial covariance, or correlation, in a set of point data. While the main tool, the semi-variogram, is quite easy to implement and use, a lot of important assumptions are underlying it. 


The typical application of geostatistics is an interpolation. Therefore, although using point data, a basic concept is to understand this point data as a sample of a (spatially) continuous variable that can be described as a random field rf, or to be more precise, a Gaussian random field in many cases. 


The most fundamental assumption in geostatistics is that any two values x_%7Bi%7D%20 and x_%7Bi%2Bh%7D%20 are more similar, the smaller h is, which is a separating distance on the random field. In other words: close observation points will show higher covariances than distant points. In case this most fundamental conceptual assumption does not hold for a specific variable, geostatistics will not be the correct tool to analyse and interpolate this variable.


One of the most easiest approaches to interpolate point data is to use IDW (inverse distance weighting). This technique is implemented in almost any GIS software. The fundamental conceptual model can be described as:

内插点数据最简单的方法之一是使用反距离加权法(inverse distance weighting,IDW)。这种技术几乎可以在任何GIS(Geographic Information System)软件中实现。其基本概念模型可以描述为:


where Z_%7Bu%7D%20 is the value of rf at a non-observed location with N observations around it. These observations get weighted by the weight w_%7Bi%7D, which can be calculated like:


where u  is the unobserved point and x_%7Bi%7D is one of the sample points. Thus, %5Cvec%7Bux_%7Bi%7D%7Dis the 2-norm of the vector between the two points: the Euclidean distance in the coordinate space (which by no means has to be limited to the R%5E2 case).

This basically describes a concept, where a value of the random field is estimated by a distance-weighted mean of the surrounding points. As close points shall have a higher impact, the inverse distance is used and thus the name of inverse distance weighting.


In the case of geostatistics this basic model still holds, but is extended. Instead of depending the weights exclusively on the separating distance, a weight will be derived from a variance over all values that are separated by a similar distance. 


This has the main advantage of incorporating the actual (co)variance found in the observations and basing the interpolation on this (co)variance, but comes at the cost of some strict assumptions about the statistical properties of the sample. Elaborating and assessing these assumptions is one of the main challenges of geostatistics.


Geostatistical Tools 地理统计学工具

Geostatistics is a wide field spanning a wide variety of disciplines, like geology, biology, hydrology or geomorphology. Each discipline defines their own set of tools, and apparently definitions, and progress is made until today. 


It is not the objective of scikit-gstat to be a comprehensive collection of all available tools. The objective is more to offer some common and also more sophisticated tools for variogram analysis. Thus, when using scikit-gstat, you typically need another library for the actual application, like interpolation. In most cases that will be gstools. However, one can split geostatistics into three main fields, each of it with its own tools:


  • variography: with the variogram being the main tool, the variography focuses on describing, visualizing and modelling covariance structures in space and time.

  • 变差法:以变异函数为主要工具,变差法的重点是对空间和时间的协方差结构进行描述、可视化和建模。

  • kriging: is a family of interpolation methods, that utilize a variogram to estimate the kriging weights as sketched above.

  • kriging:包含一系列插值方法,利用变异函数来估计kriging的权重。

  • geostatistical simulation: is aiming on generate random fields that fit a given set of observations or a pre-defined variogram or covariance function.

  • 地质统计学模拟:旨在生成符合给定的观察值或预定的变异函数或协方差函数的随机场。

