Inferring Context of Mobile Data Crowdsensed in the Wild


Understanding the sensing context of raw data is crucial for assessing the quality of large crowdsourced spatio-temporal datasets. Accelerometer’s precision can vary considerably depending on whether the phone is in-pocket or out-pocket, i.e., held in hand [1]. GPS accuracy can be very low in places like under-ground metro stations [2]. Further, jump-lengths are shorter and have higher frequency when a person is in-door. Hence, we focus on contexts such as in/out-pocket, under/over-ground, and in/out-door that can be essential for reliably inferring human mobility attributes and properties (e.g., location, jump-length, and mobility activity like walking or driving) from crowdsensed data. Our work is motivated by the fact that most of the publicly available crowdsensing datasets (e.g. PRIVA’MOV [3] and Beijing taxi dataset [4]) do not include data from specialized sensors such as light, barometer, etc. considered by state-of-the-art algorithms for detecting the above mentioned contexts. Therefore, we focus on mining context from the limited features available in the publicly available mobility related crowdsensing datasets. Moreover, as ground truth is typically not available in these datasets, we pay special attention to minimizing the training or tuning efforts of the introduced algorithms. Our algorithms are unsupervised binary classifiers with a small memory footprint and execution time. As the lack of certain features prohibits us to consider state-of-the-art algorithms as baselines, we compare the performance of our heuristic algorithms against Machine Learning (ML) models built by an AutoML tool [5] using the same set of features. Our experimental evaluation with a segment of the Ambiciti [6] dataset demonstrates that when compared to the best baseline ML model w.r.t. balanced accuracy (see Table I), our algorithm for in/out-pocket performs equally well, while for under/over-ground and in/out-door contexts, for a specific hyper-parameter, our corresponding algorithms are within 4.3% and 1%, respectively. Concerning memory, our algorithms require 0kB, 4kB, and 0kB, respectively, while they take 0.08sec, 0.17sec and 0.003sec, respectively, for execution. Our algorithms are lightweight enough to be integrated into smartphone applications. Context information mined onboard thus remains private and can be used to annotate users’ personal trajectories and incentivize them to participate in crowd-measurement campaigns.

In NetMob