data-cleaning.land-use-sample

clean

(clean settings-path)
Forms a sample of the cleaned .asc map files.

    settings-path: A path to the xml settings file.


Steps include: 1) filtering of cells,
               2) transformation of the land use map
                  by counting the number of cells with urban land use
                  in the cell neighbourhood,
               3) transformation of other maps with log and rescaling to [0,1],
                  if specified.

1) Excludes cells from every map if
       - a cell value is not defined originally
         (in any of the original maps specified),
       - any value in the cell Moore neighbourhood is not defined originally
         (in any of the original maps specified),
       - a cell belongs to the map-matrix border,
       - a cell is masked in the "black-list" file,
       - any value in the cell Moore neighbourhood is masked in the "black-list" file.
   "Black-list" file should mask cells with "1" in order to exclude these cells
   from the resultant maps.

2) Original land use map should contain 0-1 cell values, where 0 is a non-urban land
   (e.g., vegetation, wetlands, agricultural land and water) and 1 is an urban land
   (e.g., all artificial surfaces).

   Every cell value in the land use map is substituted with the number of cells with
   urban land use in the cell Moore neighbourhood.

3) If specified, transforms cell values with ln function or rescaling to the [0,1] range.
   Rescaling to the [0,1] range is done using the following transformation:
      x'=(x-x_min)/(x_max-x_min),
   where x is an original value. The minimum and maximum values (x_min, x_max respectively)
   are taken from the original cell values.

   Logarithmic transformation should be specified with 'log' attribute in the settings file.
   Rescaling to [0,1] transformation should be specified with 'unit-rescaling' attribute
   in the settings file.


In case of valid original maps,
returns a hash-map of
   a) cleaned .asc files content with file names,
   b) a sorted list of Moore neighbourhoods by their frequency in the land use map
      (in csv and latex formats) with names ["land-use.asc"-neighbourhoods.csv] and
      ["land-use.asc"-neighbourhoods.tex],
   c) a list of validation warnings.

Otherwise, returns a list of validation errors.

## Usage

  (require '[data-cleaning.land-use-sample :refer :all])

  (clean "settings.xml")