Where (not) to park your car in Israel?
How to take generic tabular open data about crime and by using location technology analyze and view where and why vehicles are stolen.
In the previous post on analyzing crime data in Israel, I explained how to take a text table, and make a map using a geographic information system (GIS) to analyze crime data.
In this post, we decided to focus on a specific crime section: analyze and map the theft of vehicles in Israel, try to identify trends in car theft in certain areas, and speculate on why it is happening in specific areas.
This post is an enhancement of an article (Hebrew) we made for a real estate magazine in Israel.
Crime data is very important data for the public. For example, how to decide on a different price for home/business insurance, where the safer neighborhoods are, where there is an agricultural crime, and more.
The Israeli Police, following a government decision in 2016 and numerous requests from the public, opened the crime data in Israel for 2016-2022, to the public.
By analyzing this important data, it is possible to gain important and interesting insights about the quantity, dispersion, and mapping of crimes by type.
Or where (not) to park your car.
how to prepare the data for processing, clean it up and load it into a database and a geographic information system to visually display it on a map.
In this post we focus on analyzing a specific cross-section of the crime data:
Is there a relationship between location and car theft? Are there certain neighborhoods where there are more thefts than others? Why?
After we have cleansed and loaded all the data into the database, (for those who have not read the previous post here is how) we will filter the data by the desired period and type of crime. (you can also filter by other fields such as by city if desired)
In this case, we will filter only the data for 2021 and the crimes related to car theft.
Open data usually does not have a lot of explanations about the data. (metadata)
One of the challenges besides cleaning up the data is self-learning the data.
In the case of crime data, if you want to filter only specific crime types, you can learn that there are two relevant fields: type of crime and crime category.
If you want to get statistics and details, you can use the “Group by” query together with “Count” so that you can get a list of unique values with a quantity for each type.
The category field is a grouping of crime types and it is not always suitable if you want a specific crime (such as car theft).
Therefore, we will only use the crime type field and choose a suitable type of crime.
The crime type definition is sometimes subjective, for example, is the “theft from a vehicle” type included?
We chose to focus on one type that seems absolute: “car theft”.
The technical filter or selection can be done with the aid of an SQL query in your favorite tool (any database, in Qgis and ArcGIS or even Excel software).
PostgreSQL was my choice.
The data we have selected can be queried in SQL only to show numerical statistics such as total thefts (SUM), count thefts (COUNT), and check which city had the fewest or most thefts (MAX/MIN).
But if you want to visually analyze and understand where this happens and why then a map is the best way to do it. In a GIS system, you can do both.
The next step is Joining the car theft data to the geographic statistical areas layer.
(the process is not as simple as it sounds, for a detailed explanation read the previous post) and loading the information to the map.
The data can be colored according to the quantities of cases, and it is also important to normalize according to population quantity (with the help of information from the Central Bureau of Statistics) so that the measure is reliable.
This is what a map of car theft looks like throughout Israel by locality in 2021: