George Rebane
Today the Justice Department releases new regulatory guidelines on which federal agencies may do profiling and under which circumstances. The report has been six months in the making and we are told its release in such close proximity to the events in Ferguson and Staten Island is a coincidence. AG Holder will travel around the country explaining the new regulations to various police jurisdications which, he hopes, will also incorporate them into their enforcement protocols (more here).
Profiling has gotten a bad name over the years, even though all of us do it constantly in our daily round, yet very few understand what profiling really involves and why its use is important. Profiling is the lay term for an important function in the field of pattern recognition. A pattern in a dataset is a collection of data items that share some common meaning important to the pattern recognizer โ for example, like who would most likely take a cruise next summer. So in this case the data item would be a person with a bunch of attributes like age, gender, marriage status, income level, zip code, and so on. Each data item or person in the dataset could be represented by, say, 20 to 50 different attributes each having various values connected to them.
If each data item had only two attributes, then we could plot the dataset on x-y graph paper where each axis represented an attribute. The dataset would then be a collection of dots representing the attribute values which determined its coordinates on the plot. We could even visualize that with each data item or person being represented by three attributes. The collection of points representing the dataset would now be in x-y-z coordinates or 3-space which we could examine from various perspectives.
For more than three dimensions we humans have trouble visualizing the dataset, but conceptually we should have no problem imagining how the data items may distribute themselves. Fortunately, our mathematics that deals with such higher dimensioned data items has no problem at all with such added dimensions.
So if we look at a dataset of potential cruise customers in order to know which ones to send advertisements or even call so as to maximize the likely size of positive respondents, we might start with first looking at our dataset, or even a another dataset of people who have already bought cruises. From there we attempt to see which of their attributes fall into what ranges of values. The results of such an analysis yields what is called a โdiscriminantโ which usually defines a hopefully compact volume of data points in attribute space. Our assumption will be that people (data points) falling into the discriminated volume form a pattern of likely cruise customers. We then take this discriminant, and apply to our dataset and perhaps other datasets we may buy to come up with a mailing list (pattern) of people to receive our cruise adverts.
If we think about it a bit, discriminants are not all created equal, the better discriminants consist of fewer attributes that are arrayed in tighter ranges of values. That lets us efficiently apply them to various datasets each having overlapping but not identical sets of attributes. An appropriately sparse discriminant will then be applicable to most datasets we encounter. A great discriminant might consist of AGE > 60, MARITAL = married, INCOME > $100K. Then we could cull our datasets using only this discriminant and be sure of a high hit rate for our cruise pitch. We have essentially found a useful pattern, also called a profile, of potential customers. And now when we encounter another person, we can apply this profile and quickly determine whether we should pay further attention to him/her. This kind of profiling is done by government jurisdictions, corporations, and all of us to some degree or other every hour of every day.
The kind of profiling that has gotten a bad name is what law enforcement agencies do to maximize the effectiveness of their limited resources of people, time, and funding. To simplify the discussion, officers of the law have to find bad guys and potential bad guys while they are โin the fieldโ on routine patrol or on a specific mission. The opportunity for capture is usually fleeting, and decisions on how to proceed must often be made quickly.
Referring to the figure below, we can see what cops are up against. Assume that the entire population from which bad guys come is represented by the outside blue rectangle. This population further divides itself along a single attribute into the tan and gray rectangles. The red rectangle represents the population of bad guys. The various areas denote the proper proportions of how many people are in each subgroup. Past arrest, conviction, crime report, โฆ data shows that the red rectangle should be oriented as shown to correctly represent the proportion of bad guys between the tan and gray subpopulations. Looking at the red rectangle, it is clear that there are about four times as many bad guys having the gray attribute as those who have the tan attribute.
Given this data, it should be apparent that using the tan/gray attribute in a profile, it would behoove the police to give a higher probability of being a bad guy to a suspicious person who has the gray attribute, all other things being equal. In other words the probability of getting suspicious evidence from a gray attribute person is higher than that from a tan attribute person. It is for this reason that with limited resources, cops have not ignored any such a useful attribute when tracking down bad guys. However, allocating resources in such a manner is also called profiling, and especially the bad kind of profiling if the tan/gray attribute is politically incorrect.
And things would be nicer if another politically less sensitive attribute could be found that had the same discriminating power, then, of course, it should and would be used. However, if that is not possible, then the only choices are to profile using the tan/gray attribute or abandon that and allocate resources evenly across the entire (blue rectangle) population. Allocating evenly across the entire population would emphasize tan over gray by also about a four to one ratio, exactly the opposite that the crime data would indicate. (more here and here)
AG Holder apparently also realizes this dilemma and has exempted the DHS, ICE, and TSA from the new federal anti-profiling regulation. Yet the nationโs police departments now stand indicted by the mobs, and Mr Holder will be touring the country attempting to convince them to relinquish their own profiling procedures. The impact of this on crime rates is completely predictable, else the man would not be making exceptions for the mentioned federal agencies.


Leave a comment