I - Discretization Testing:

In this part, we will test the performance of the home localization method proposed by the paper. We will compute the distance between localized homes, and the known homes of users. We will use a different dataset from the one used in the paper, as there were not labeled check-ins.`

1 - True Home Location:

We first start by selecting the homes of users with low variance. This is done by computing the standard deviation and mean of latitude and longitude of each user’s check-ins labeled as Home. Then, we construct two points by respectively substracting and adding the standard deviation from the mean of the latitude and longitude coordinates. If this distance is smaller than 100m, we can be confident about the home location.

2 - Discretization:

The discretization method proposed by the paper has the following steps:

We use this method to find home locations of the users of the new dataset.

3 - Testing:

We start by computing the distance between the true home location, and the home location provided by discretization. Then, we visualize the distribution of distances.

We see that the distribution follows a power law that can be approximated by:

This indicates that there are users whose predicted home location is far by orders of magnitude from the original actual home location.

Moreover, if we take a look at the CDF, we see that to achieve high accuracy, we need to tolerate large distances in the order of thousands of kilometers.

In fact, the paper claims that their method reaches 85% accuracy through manual verification. Nevertheless, we see that this does not apply to Foursquare dataset. Indeed, we need to tolerate distances up to 8’305km, which is larger than Europe.

In the following section, we will try to use Machine Learning methods to find the homes of users, and we will compare our solution to the method proposed by the paper in terms of performance on the same dataset.

II - Home locations predictions

1 - Method:

i - Feature engineering:

In order to predict users’ home location, we based our work on the paper Fine-scale prediction of people's home location using social media footprints by H.Kavak et al. They used DBSCAN unsupervised density-based clustring algorithm which creates clusters of points that can be connected to each other within a given radius. The clustering is performed on each user’s check-ins. This is equivalent to the discretization approach done in Friendship and Mobility paper.

Later, the authors created the following mobilitiy features for each cluster:

After creating these feature for each cluster, we label each cluster with the most home tags as a home cluster, whereas the remaining one are labeled as not home.

ii - Classification:

iii - Prediction testing:

We see that our prediction does not perform as expected. Indeed, the probability of large distances is quite high. as we can see by the plot below.

We can even see that it performs worse than the method proposed by the paper. In fact, to reach a 85% accuracy, we must tolerate distances up to 12’977km.

III - Checkin Patterns Between Friends:

In this part, we seek to find some meeting patterns between friends. We say two friends have met if they checked in the same place with at most one-hour difference.
We will do three tasks in this part:

1 - Checkin Patterns and Ditribution:

We begin by getting the probability of distribution for two different datasets: dataset where we take into account all the check-ins and dataset where we only study check-ins made with friends. In both cases, we only study users who checked in at least once in their homes and assume their home is located at the average of check-ins labeled with home Home (private). We plot (loglog) the distribution for both datasets as a function from the distance from home and try to describe the case where a user only moves to meet friends with a function of the type:

We now move to the visualisation

From the plot above, we can draw some conclusions:

2 - Place Check-in Patterns:

i - Day of the Week Check-in Patterns

ii - Period of the Day Check-in Patterns: