数学建模美赛O奖论文总结

Anil S. Damle Colin G. West Eric J. Benzel
University of Colorado–Boulder
Boulder, CO
Advisor: Anne Dougherty

Abstract

Research shows that most violent serial criminals tend to commit crimes in a radial band around a central point: home, workplace, We will give a predicting of a criminal’s spatial patterns is called geographic profiling. we assume that the offender is a ―violent serial criminal, since research suggests that serial burglars and arsonists are less likely to follow spatial patterns. We treat the single-anchor-point case first taking the spatial coordinates of the criminal’s last strikes and the sequence of the crimes as inputs. For the multiple-anchor-point case, we use a cluster-finding and sorting method

Assumptions

Domain is Approximately Urban

Entire domain is a potential crime spot
Criminal’s movement is unconstrained.
Domain contains all possible strike points.
###Developing a Serial Crime Test Set
Existing Crime Sets
The Problem with Simulation
Pixel Point Analysis

Metrics of Success

The Effectiveness Multiplier
\(\kappa=\frac{Z_1(CrimePoint)}{Z_2(CrimePoint)}\) \(\kappa_s=\frac{Z_{our\ model}(CrimePoint)}{Z_{flat}(CrimePoint)}\)

Two Schemes for Spatial Prediction

Serial crime is patterned around place of daily activity. The key is the crime center. The two scenarios are shown below

Single Anchor Point: Centroid Method

Algorithm

Create Search Domain
Constructs the smallest rectangle that contains all existing offenses, and scales each dimension three times.It meets the requirements

Find Centroid of Crime Sites
The anchor point is the average of the n crime coordinates\((x_i,y_i)\).

Build Likelihood Crater
We use cratering technique. The two dimensional crime points \(x_i\) are mapped to their radius from the anchor point \(a_i\). We have \(f:x_i→r_i\),where \(f(x_i)= \Arrowvert x_i−a_i\Arrowvert_2\) (a shifted modulus). Then using the set \(r_i\) to generate a crater around the anchor point. The following two methods can be used:

There is a buffer zone around the anchor point.
Crimes follow a decaying exponential pattern from the anchor point.

We use the gamma distribution. Define the random variable \(X_i\) to be the distance between the with crime point and the anchor point \(r\). We let each \(X_i\) have a gamma distribution with parameters \(\kappa\) and \(θ\): \(X_i ∼ Γ(\kappa, θ)\), with probability density function pdf

\(f(x;\kappa;\theta)=\frac{\theta^k}{\ulcorner(\kappa)}x^ {\kappa-1}\theta^{-\theta x}\)

Suppose \(X_i\) is independent, using the maximum likelihood estimates of \(\kappa\) and \(θ\). Use the resulting distribution to calculate possible crime locations. The pdf was evaluated for each point and normalized to give a volume of 1 under the likelihood surface.

Adjust for Temporal Trends

The outward or inward trend of \(r_i\) might indicate that the next crime will follow this trend. We let \(\stackrel{\sim}{X}=X +\overline{\Delta r}\),Where \(r=r_n-r_{n-1}\). The new random variable \(\stackrel{\sim}{X}\) Temporal adjustment in expected value:

\(E[\stackrel{\sim}{X}]=E[X +\overline{\Delta r}]=E[X]+\overline{\Delta r}\)

Results and Analysis

Analysis of three criminals by removal the final criminal data. Produce the likelihood plane \(Z(x,y)\). Then estimate the location of the final crime, and calculate the standard effect multiplier \(\kappa_s\).
For the offenders B&C, the model is relatively successful, \(\kappa_s ≈ 12\). And \(\Delta r=-0.276\), the temporal corrections in this distribution are negligible.
Since two outlier models failed for crime A (\( \kappa_s≈0.4 \)). There is a problem with the model. But the model still applies to most crimes. Unless some external influence distracts off the previous models of criminals.

Multiple Anchor Points: Cluster Method

Algorithm

We force a minimum of 2 clusters and a maximum of 4. The clustering algorithm is accomplished in a 3-step process.

Compute the Euclidean distances between all crime locations.
Organize the distances into a hierarchical cluster tree, represented by a dendrogram.
Merge the two clusters that are the closest, and continue such merging until the desired number of clusters is reached. The height is based on the distance between merged clusters at the time of merging.

To determine the optimal number of clusters, we use the notion of silhouettes. We denote by a\((P_i)\) the average distance from \(P_i\) to all other points in its cluster and by \(b(P_i,\kappa)\) the average distance from \(P_i\) to points in a different cluster \(C_k\) . Then the silhouette of \(P_i\) is

\(s(P_i)=\frac{\left[\min\limits_{\kappa|P_i\notin C_k}b(P_i,\kappa)\right] - a(P_i)}{\max\left(a(P_i), \min\limits_{\kappa|P_i\notin C_k}b(P_i,\kappa)\right)}\)

The silhouette s can take values in \([−1, 1]\): The closer \(s(P_i)\) is to \(1\), the better \(P_i\) fits into its current cluster; and the closer \(s(P_i)\) is to \(−1\), the worse it fits within its current cluster. To optimize the number of clusters, we compute the clusterings for 2, 3, and 4 clusters. We then find the maximum of the three average silhouette values.

Cluster Loop Algorithm and Combining Cluster Predictions

We compute the likelihood surface for the centroid of each cluster. We use a Gaussian distribution centered at the point as the likelihood surface, with mean the expected value of the gamma distribution placed over every anchor point of a cluster that has more than one point.
we create our final surface as a normalized linear combination of the individual surfaces, using weights for the number of points in the cluster and for the average temporal index of the events in the cluster.

Results and Analysis

Offender C: The cluster method identifies the point directly below the centroid as an outlier and therefore excludes it, which slightly reduces the variance and therefore narrowing the fit function.
Offender B: Although the actual crime point no longer appears in the band of maximum likelihood, the cluster method still outperforms the centroid method with a \(\kappa_s≈23\), for fewer resources are “wasted” at high-likelihood areas where no crime is committed.
Offender A: Since the outlier points are excluded from the centroid calculation for the larger cluster, the model bets even more aggressively on this cluster, with a resulting \(\kappa_s≈0\).

Summary

The predictions are based on the assumption of trends in serial crime behavior which has been tested on large sets of real-world data. Similar mathematical techniques are used in the anchor-point estimation solutions currently employed, which consistently outperform random guesses when tested across data samples.
The model is applicable only to violent serial criminals. Simultaneously, it has not been validated on a large set of empirical data, and cannot make use of underlying map data.