spopt.region.Skater

class spopt.region.Skater(gdf, w, attrs_name, n_clusters=5, floor=-inf, trace=False, islands='increase', spanning_forest_kwds={})[source]

Skater is a spatial regionalization algorithm based on spanning tree pruning introduced in [ANCdCF06].

Parameters:
gdfgeopandas.GeoDataFrame

A Geodataframe containing original data. The data attribute is derived from gdf as the attrs_name columns.

wlibpysal.weights.W

A PySAL weights object created from given data expressing the neighbor relationships between observations. It must be symmetric and binary, for example: Queen/Rook, DistanceBand, or a symmetrized KNN.

attrs_namepython:list

Strings for attribute names (columns of geopandas.GeoDataFrame).

n_clusterspython:int (default 5)

The number of clusters to form.

floorpython:int, python:float (default -numpy.inf)

The floor on the size of regions.

tracebool (default python:False)

Flag denoting whether to store intermediate labelings as the tree gets pruned.

islandspython:str (default ‘increase’)

Description of what to do with islands. If 'ignore', the algorithm will discover n_clusters regions, treating islands as their own regions. If “increase”, the algorithm will discover n_clusters regions, treating islands as separate from n_clusters.

spanning_forest_kwdspython:dict (default python:dict())

Keyword arguments to be passed to SpanningForest including dissimilarity, affinity, reduction, and center. See spopt.region.skater.SpanningForest for docstrings.

Attributes:
labels_numpy.array

Region IDs for observations.

Examples

>>> from spopt.region import Skater
>>> import geopandas
>>> import libpysal
>>> import numpy
>>> from sklearn.metrics import pairwise as skm

Read the data.

>>> pth = libpysal.examples.get_path('airbnb_Chicago 2015.shp')
>>> chicago = geopandas.read_file(pth)

Initialize the parameters.

>>> w = libpysal.weights.Queen.from_dataframe(chicago)
>>> attrs_name = ['num_spots']
>>> n_clusters = 10
>>> floor = 3
>>> trace = False
>>> islands = 'increase'
>>> spanning_forest_kwds = dict(
...     dissimilarity=skm.manhattan_distances,
...     affinity=None,
...     reduction=numpy.sum,
...     center=numpy.mean
... )

Run the skater algorithm.

>>> model = Skater(
...     chicago, w,
...     attrs_name,
...     n_clusters,
...     floor,
...     trace,
...     islands,
...     spanning_forest_kwds
... )
>>> model.solve()

Get the region IDs for unit areas.

>>> model.labels_

Show the clustering results.

>>> chicago['skater_new'] = model.labels_
>>> chicago.plot(
...     column='skater_new', categorical=True, figsize=(12,8), edgecolor='w'
... )
__init__(gdf, w, attrs_name, n_clusters=5, floor=-inf, trace=False, islands='increase', spanning_forest_kwds={})[source]

Methods

__init__(gdf, w, attrs_name[, n_clusters, ...])

solve()

Solve the optimization model.

solve()[source]

Solve the optimization model.