LABSS-UCSC meeting 20 Dec #71

mariopaolucci · 2018-12-21T08:40:51Z

(UCSC - Gianmaria)

"C" FUNCTION -- DESCRIPTION

Remarks on data and basic structure

The "C" function has the aim to provide a computational way to model the
probability that agent i will commit a crime at time t: the way in
which this function works is presented in this document. Its structure
revolves around two types of data source: the first source are official
statistics gathered from both the Italian National Institute of
Statistics (ISTAT) and the Palermo registry office. The second source
are several systematic reviews from which effect sizes (in different
forms, e.g. odds ratios) of relevant factors that can explain the risk
of committing an offense/getting involved in delinquency are retrieved.

The first source type (official data) provides information to:

1. empirically distribute gender and age classes across the whole
simulated population, along with education and socio-economic status
data;

2. Estimate the probabilities of committing a crime in each year for all
individuals based on their gender and age class (Table 1).

Table 1. Gender and age class probabilities of committing a crime in a
given year (source: authors' elaboration on ISTAT data)

(Gender | Age Class) Probability

These data are fundamental since they allow to estimate the average
probability for each subclass of the population of committing a crime.
These figures have been calculated using two different datasets within
the ISTAT repository: both are related to the Sicilian region (in
absence of a much specific geographic detail, e.g. Palermo province) and
take into account the gender and age class of all known authors of
crimes in the years 2012-2016 and the gender and age distribution of the
overall Sicilian population in the same period. Probabilities are then
calculated via the ratio of the two, and the provided figures are the
average of these ratios (probability that a man/woman in a given age
class is a known author of a crime) across the considered time-span.

The additional factors retrieved from the systematic reviews in
accordance with the theoretical structure of C will allow to tune these
values, increasing or decreasing the additive probability based on the
presence or non-presence of a given characteristic. To maintain a
compact and non-overwhelmingly expensive structure, we have selected few
risk factors to test the way in which the function works and the
emergent structure that it creates within the model. The factors are
presented below:

Table 2. Risk factors for committing a crime

Factor Odds Ratio Coefficient¹ Probability² Official data to be matched

Unemployment 1.30 1.30 0.57 Yes
Education 0.94 -0.03 0.48 Yes
Natural propensity 1.97 0.29 0.66 No
Criminal history 1.62 0.21 0.62 Emergent from the model
Criminal family 1.45 0.16 0.59 Emergent from the model
Criminal peer 1.81 0.26 0.64 Emergent from the model
OC membership Assessed otherwise Emergent from the model

These are the risk factors that will be in dichotomous form, given that
the odds ratio gathered from the systematic reviews mapped the risk of
committing a crime being in a category (e.g. unemployed) vs not being in
that same category. Therefore, we cannot make - at this stage - any
further assumption regarding the way in which this odds changes when a
more hierarchical structure is imposed (e.g.: three, or four classes
instead of two).

Furthermore, the third column in the table provides additional
information on the origin and nature of the data at our disposal.
Unemployment and education distribution within the population are
available (from official statistics, as already mentioned), natural
propensity cannot be data-driven modelled, while criminal history,
deviant family ties, peer ties and OC membership are emergent from the
model itself and will be originated by both C and R (the relational
dimension that will capture the embeddedness in - among the others - OC
communities).

Specifically, the number of deviant (criminal) family ties is dependent
upon the C function of parents/relatives, and the same applies to
(criminal) peer ties. At this stage, these latter two are again modelled
as binary, but we will think of a way to control for the
intensity/frequency/absolute number of criminal family and peer ties, in
order to enrich the model and respect the hypothesis for which the
higher the number of criminal ties (regardless of familiar or friendship
nature), the higher the probability of getting involved into criminal
activities.

Criminal propensity, which has the highest weight in terms of odds ratio
and probability cannot be retrieved from real data. For this reason, we
can include criminal propensity assuming that it behaves as a lognormal
distribution. A positive random variable $X$ is log-normally distributed
when its logarithm is normally distributed:

$$ln(X)\sim\mathcal{N}\left( \mu,\sigma^{2} \right)$$

This type of distribution is well-known and used in different scientific
areas, including economics. Indeed, there is evidence that the income
distribution usually follows the properties of a lognormal density
function.³

{width="3.130230752405949in"
height="2.4739763779527557in"}

In our case, allows to distribute the criminal propensity in accordance
with the assumption that most of the population will not commit a crime
and has not the "intrinsic" characteristics to offend: the magnitude of
this propensity will depend upon the parameters that will be tested
(i.e.: sample mean and standard deviation).

The Logistic Function and the Population Average Constraint

To derive the probability of committing a crime, we then fit a logistic
regression model for each individual in the classic form:

$$y\left\lbrack 0,1 \right\rbrack_{\text{it}} = \beta_{0} + \beta_{1}X_{1} + \beta_{2}X_{2} + \ldots + \beta_{n}X_{n} + \varepsilon$$

Where the outcome variable is indeed C, the probability of committing a
crime for individual i at time t and the different beta-coefficients
are the coefficient included in Table 2. An error term is included to
control for the potential explosion of the right side of the equation,
given that our odds ratios are independent of one another, and therefore
we may come up with a calculation that exceeds probability=1 of
committing a crime. As pointed out before, the intercept term
$\beta_{0}$ is the baseline coefficient calculated considering gender
and age class of the subject i to which the other risk factors are
added in case of occurrence. The error, in this case, is considered as a
normalization term that avoid the function to return non-computable
outcomes. Additionally, the error-term is also related to the need for
bound the individual probabilities of committing a crime to the
population average. Indeed, at each time of reference (to be decided: a
year? Every month?), the following equation shall hold:

$$C_{gender|age} \cong \frac{\sum_{i = 1}^{n}\left( \beta_{0} + \beta_{1}X_{1} + \beta_{2}X_{2} + \ldots + \beta_{n}X_{n} + \varepsilon \right)}{n_{gender|age}}$$

The equation means that at each time of reference, the average
probability of committing a crime for all individuals belonging to the
same gender|age class shall be approximately similar to the fixed
average values presented in Table 1, where approximately means that we
can allow the model to float in a 95% confidence interval in order not
to set overly deterministic mechanics to the model.

This is the type of function that has to be fitted for the individuals
belonging to most gender|age classes. Indeed, there are four
exceptions, specifically (Female|<13), (Female|>65), (Male|<13),
(Male|>65). In these four cases the simple probability based on gender
and age class is sufficient to model the risk of committing a crime,
adding an error term stochastically distributed to make the low
probabilities to float in order to prevent strict determinism. The
decision is based on the assumption that all the risk factors that have
been retrieved from literature do not play a role in the crime
commission process when individuals are either too young or too old.

The coefficient is simply calculated through the log of the Odds
Ratio. ↩
Calculated in the standard way as OR/(1+OR) ↩
Fabio Clementi & Mauro Gallegati, 2005. "Pareto's Law of Income
Distribution: Evidence for Germany, the United Kingdom, and the
United States," Microeconomics 0505006, University Library of
Munich, Germany. ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LABSS-UCSC meeting 20 Dec #71

LABSS-UCSC meeting 20 Dec #71

mariopaolucci commented Dec 21, 2018

LABSS-UCSC meeting 20 Dec #71

LABSS-UCSC meeting 20 Dec #71

Comments

mariopaolucci commented Dec 21, 2018

"C" FUNCTION -- DESCRIPTION

Remarks on data and basic structure

The Logistic Function and the Population Average Constraint

Footnotes