Complex pattern mining, 1st ed , annalisa appice, michelangelo ceci, corrado loglisci, giuseppe manco, elio masciari, zbigniew w ras, 2020 523

251 0 0
  • Loading ...
1/251 trang
Tải xuống

Thông tin tài liệu

Ngày đăng: 08/05/2020, 06:56

Studies in Computational Intelligence 880 Annalisa Appice · Michelangelo Ceci · Corrado Loglisci · Giuseppe Manco · Elio Masciari · Zbigniew W Ras   Editors Complex Pattern Mining New Challenges, Methods and Applications Studies in Computational Intelligence Volume 880 Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output The books of this series are submitted to indexing to Web of Science, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink More information about this series at http://www.springer.com/series/7092 Annalisa Appice Michelangelo Ceci Corrado Loglisci Giuseppe Manco Elio Masciari Zbigniew W Ras • • • • • Editors Complex Pattern Mining New Challenges, Methods and Applications 123 Editors Annalisa Appice Dipartimento di Informatica Università degli Studi di Bari Aldo Moro Bari, Italy Corrado Loglisci Dipartimento di Informatica Università degli Studi di Bari Aldo Moro Bari, Italy Elio Masciari Università degli Studi di Napoli Federico II Naples, Italy Michelangelo Ceci Dipartimento di Informatica Università degli Studi di Bari Aldo Moro Bari, Italy Giuseppe Manco ICAR-CNR Rende, Italy Zbigniew W Ras Department of Computer Science University of North Carolina Charlotte, NC, USA ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-030-36616-2 ISBN 978-3-030-36617-9 (eBook) https://doi.org/10.1007/978-3-030-36617-9 © Springer Nature Switzerland AG 2020 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Preface Complex pattern mining provides concepts and techniques to process the huge volumes of data with a complex structure, which are nowadays gathered in various applications These massive and complex data pose new challenges for current research in Knowledge Discovery and Data Mining They require new theory and design methods for storing, managing, and analyzing them by taking into account various complexity aspects: complex structures (e.g., multi-relational, time series and sequences, networks, and trees) as input/output of the data mining process; massive amounts of high-dimensional data collections flooding as high-speed streams and requiring (near) real-time processing and model adaptation to concept drifts; new application scenarios involving security issues, interaction with other entities, and real-time response to events triggered by sensors Recent literature has endowed plentiful endeavors to this research area with significant breakthroughs In terms of scientific research, complex pattern mining has been focusing on developing specialized techniques and algorithms, which preserve the informative richness of data and allow us to efficiently and efficaciously identify complex information units present in such data On the other hand, as a fundamental field of data mining, complex pattern mining is emerging in a wide range of real-world applications ranging from process mining to cybersecurity, medicine, language processing, and remote sensing The intent of this book is to cover the recent developments in the theory, applications, and design methods of complex pattern mining as embedded in the fields of data science and big data analytics In particular, the works presented in this book should keep the attention of both researchers and practitioners of data mining who are interested in the advances and latest developments in the area of extracting patterns In our open call for contributions, we solicited submissions discussing and introducing new algorithmic foundations and representation formalisms in mining patterns from complex data We received a number of 16 submissions, which shows the liveliness of this field From the received 16 submissions, we selected 14 for inclusion in this book These articles are briefly summarized below v vi Preface Chapter “Efficient Infrequent Pattern Mining Using Negative Itemset Tree” describes a novel algorithm to discover rare patterns by employing both top-down and depth-first traversing paradigm Chapter “Hierarchical Adversarial Training for Multi-domain Adaptive Sentiment Analysis” illustrates a hierarchical adversarial neural network (HANN) for adaptive sentiment analysis, which shares information between multiple domains bidirectionally Chapter “Optimizing C-Index via Gradient Boosting in Medical Survival Analysis” investigates whether optimizing directly C-index via gradient boosting may gain accuracy in medical survival analysis Chapter “Order-Preserving Biclustering Based on FCA and Pattern Structures” explores the relation between bi-clustering and pattern structure by studying the order-preserving bi-clusters whose rows induce the same linear order across all columns Chapter “A Text-Based Regression Approach to Predict Bug-Fix Time” tackles the problem of predicting the bug-fixing time by resorting to a multiple regression analysis and accounting for textual information extracted from the bug reports Chapter “A Named Entity Recognition Approach for Albanian Using Deep Learning” proposes a deep learning approach to face the task of Named Entity Recognition in Albanian language It employees LSTM cells as the hidden layers, a Conditional Random Field as the output, as well as word and character tagging Chapter “A Latitudinal Study on the Use of Sequential and Concurrency Patterns in Deviance Mining” illustrates a latitudinal study on the use of sequential and concurrency patterns in deviance process mining Chapter “Efficient Declarative-Based Process Mining Using an Enhanced Framework” presents a process mining approach aimed at improving both the efficiency in learning process models and the readability of the learned process models Chapter “Exploiting Pattern Set Dissimilarity for Detecting Changes in Communication Networks” describes a data mining approach to analyze evolving communication data and detect changes in the communication modalities Chapter “Classification and Clustering of Emotive Microblogs in Albanian: Two User-Oriented Tasks” proposes a data mining methodology that resorts to both classification and clustering, in order to analyze microblogging content and characterize users writing posts with emotional content Chapter “Dealing with Class Imbalance in Android Malware Detection by Cascading Clustering and Classification” describes a supervised learning approach for classifying Android applications It resorts to a combination of clustering and classification in order to deal with the imbalanced data problem Chapter “Applying Analytics to Artist Provided Text to Model Prices of Fine Art” develops a set of text-based features that are processed in combination with clustering and sentiment analysis, in order to predict the price of a work of contemporary art sold online Preface vii Chapter “Approximate Query Answering over Incomplete Data” compares several recently proposed approximation algorithms which are designed to deal with incomplete data in big data applications Chapter “A Machine Learning Approach for Walker Identification Using Smartphone Sensors” illustrates a classification-based methodology that analyzes the data collected through MEMS smartphone sensors, in order to recognize the identity of the walker and the pose of the device during the walk We would like to thank all the authors who submitted papers for publication in this book We are also grateful to the members of the Program Committee and external referees for their excellent work in reviewing submitted and revised contributions with expertise and patience Last but not least, we thank Janusz Kacprzyk and Ramamoorthy Rajangam of Springer for their continuous support Bari, Italy Bari, Italy Bari, Italy Rende, Italy Naples, Italy Charlotte, USA October 2019 Annalisa Appice Michelangelo Ceci Corrado Loglisci Giuseppe Manco Elio Masciari Zbigniew W Ras Contents Efficient Infrequent Pattern Mining Using Negative Itemset Tree Yifeng Lu, Florian Richter and Thomas Seidl Hierarchical Adversarial Training for Multi-domain Adaptive Sentiment Analysis Zhao Xu, Lorenzo von Ritter and Giuseppe Serra 17 Optimizing C-Index via Gradient Boosting in Medical Survival Analysis Alicja Wieczorkowska and Wojciech Jarmulski 33 Order-Preserving Biclustering Based on FCA and Pattern Structures Nyoman Juniarta, Miguel Couceiro and Amedeo Napoli 47 A Text-Based Regression Approach to Predict Bug-Fix Time Pasquale Ardimento, Nicola Boffoli and Costantino Mele A Named Entity Recognition Approach for Albanian Using Deep Learning Evis Trandali, Elinda Kajo Meỗe and Enea Duka 63 85 A Latitudinal Study on the Use of Sequential and Concurrency Patterns in Deviance Mining 103 Laura Genga, Domenico Potena, Andrea Chiorrini, Claudia Diamantini and Nicola Zannone Efficient Declarative-Based Process Mining Using an Enhanced Framework 121 Stefano Ferilli and Sergio Angelastro Exploiting Pattern Set Dissimilarity for Detecting Changes in Communication Networks 137 Angelo Impedovo, Corrado Loglisci, Michelangelo Ceci and Donato Malerba ix x Contents Classification and Clustering of Emotive Microblogs in Albanian: Two User-Oriented Tasks 153 Marjana Prifti Skenduli and Marenglen Biba Dealing with Class Imbalance in Android Malware Detection by Cascading Clustering and Classification 173 Giuseppina Andresini, Annalisa Appice and Donato Malerba Applying Analytics to Artist Provided Text to Model Prices of Fine Art 189 Laurel Powell, Anna Gelich and Zbigniew W Ras Approximate Query Answering over Incomplete Data 213 Nicola Fiorentino, Cristian Molinaro and Irina Trubitsyna A Machine Learning Approach for Walker Identification Using Smartphone Sensors 229 Antonio Angrisano, Pasquale Ardimento, Mario Luca Bernardi, Marta Cimitile and Salvatore Gaglione Author Index 249 A Machine Learning Approach for Walker Identification … 235 Fig The overall classification process If X and max X are respectively, the minimum and maximum values for the attribute X, the min-max normalization maps a value vi of X to a vi in the range {newMin X , newMax X } by computing: vi = vi − X (newMax X − newMin X ) + newMin X max X − X (1) Finally, the cleaned and normalized dataset becomes input for the training and test set generation activity This activity allows splitting the data into two sets The first is called the training set and is used to train the classifier The second is called the test set and it is used to assess the performance of the classifier 4.2.2 Training and Time-Series Classification Figure 1b describes the training and time-series classification sub-process As shown in the figure, the process is mainly divided into the following steps: • Time Series Segmentation: it consists to analyze time series and divide them into segments (i.e time series windows); 236 A Angrisano et al • Post-processing: for each window it generates a representation based on values features and trends features; • Decision Trees Model Generation: it consists to train a simple decision trees-based classifier; • Classification: it consists to test the trained classifier performances on new timeseries samples In the time series segmentation step, a sliding a window approach is used to incrementally divide the multivariate time series into a sequence of segments across the time series values In particular, we adopted a fixed sliding window approach consisting to define several windows of increasing range size (from 0.32 to 10 s) In the post-processing step, for each time series window, identified during the segmentation, a set of features is evaluated Exactly, we associate to each window wi , the following sequence of features: (Fv1 , , Fvn , Ft1 , , Ftm ) (2) The values feature Fv j represents a discretized value of the time series contained in the [0, 1] range The features Ftk represent the trend of the time series local to the window and it can correspond to shape-based metrics (i.e., standard deviation, mean, average energy, entropy, skewness, and kurtosis) According to [5], in this study, the values feature is described by a single feature while trend features are described by two metrics: standard deviation and skewness metrics The decision tree model generation step consists to perform classification by using a decision trees-based classifier [2] The classifiers inputs are all the vectors representing the window components (i.e., data for each sensor involved in this study) The classifier is trained using the class labels available for each set of value-based and trend-based information of each window Finally, the trained classifier is used to classify new data and its performances are evaluated on new samples For training the classifiers, we defined T as a set of labeled traces (M, l), where each M is associated to a label l ∈ {W1 , …,Wn } (where Wn represents the nth walker) For each M we built a feature vector F ∈ R y , where y is the number of the features used in training phase (y = for all the sensors data taken into account) In the learning phase, the dataset assessment is performed by using a K-Fold CrossValidation approach [27] consisting to split the data into k equally sized subsets using random sampling A subset is retained as a validation dataset to assess the trained model whereas the remaining k − subsets are exploited to perform training Such a process is repeated k = 10 times: during the ten iterations, each of the k subsets has been used once as the validation dataset To obtain a single reliable estimate, the final results are evaluated by computing the average of the results obtained during the ten iterations The process starts by partitioning the dataset in k slices Then, for each iteration i, we train and evaluate the effectiveness of the trained classifier following the steps reported below: the training set Ti ⊂D is generated by selecting an unique set of k−1 slices from the dataset D; A Machine Learning Approach for Walker Identification … 237 the test set Ti = D − Ti is generated selecting the remaining kth slice (it can be evaluated as the complement of Ti to D) a classifier is trained on set Ti ; the trained classifier is applied to Ti to evaluate accuracy Since k = 10, each iteration i is performed using the 90% of the dataset D as the training set (Ti ) and the remaining 10% as test set (Ti ) Moreover, great representativeness of each subset is ensured by stratifying the data prior to being split into subsets This model selection method, according to [13] provides less biased estimation of accuracy 4.3 The Adopted Classifiers In this study two different classifiers are implemented First classifier is a generic walker classifier (C1 ) able to distinguish the identity of the walkers It can be trained, based on the needs, in two ways: using walking sessions in which device is in any pose and using walking sessions in which device pose is fixed Second classifier is a device pose classifier (shortly called C2 ) able to distinguish the pose of the phone (i.e., texting, pocket, phoning, swinging) Both the considered classifiers are preliminary tested on a partition of the data used cross-validation and, successively, on an external dataset of real walking sessions (to assess robustness) Evaluation In this section, we describe the experiments we performed (i.e., the classification goals of each trained classifiers) and the related evaluation settings (i.e., the context of such experiments and the metrics used for validation) 5.1 Description of the Experiments The main objective of the experiments is to evaluate the effectiveness of the proposed classifiers (C1 , C2 ) to recognize the identity of a walker bringing a typical smartphone along with the device pose Each evaluation consists to perform the classification process, according to what defined in Sect 4.2.2, on the studied classifier The evaluation is performed by considering different features models, shown in Table 1, and different time window size because, as described in Sect 4.2.2, we decided to adopt a fixed sliding window approach for the training and time series classification This 238 A Angrisano et al allows investigating the optimal time window and the best features model According to the described goals, the following experiments are performed: • the classifiers C1 with the feature set S7 is used to perform walker identification regardless of the device pose and using increasing time window sizes (ranging from 0.32 to 10 s); • the classifier C1 with different feature set (S1 , S2 , S3 , S4 , S5 , S6 , S7 ) is used to perform walker identification regardless of device pose and using the shortest time window size (0.32 s); • the classifier C2 with the feature set S7 is used to perform a fixed device pose identification using the shortest time window size (0.32 s); Moreover, we replicated the listed experiments on a different testing set (a new dataset is adopted) in order to generalize the obtained results The classification analysis is performed by using Weka,1 a well-known collection of machine learning algorithms for data mining tasks Instead of performing an explicit pruning step, we controlled the performances using the following random forests parameters: number of trees in the forest, max number of features considered for splitting a node and max number of levels in each decision tree 5.2 Evaluation Setting The adopted dataset (D1) consists of raw measurements collected from different persons (6 males and females) Figure reports the total number of samples of D1 (size), the number of samples used as training set and the number of samples used for the test set Each person walked indoor with the smartphones in different poses (texting pose, phoning pose, pocket pose, swinging pose) for about 40 meters In this study we considered, as discriminating features, the accelerometer and gyroscope triads (both physical sensors), and the orientation (virtual sensor), i.e roll, pitch and yaw angles A data rate of 100 Hz is adopted Two smartphones, from different brands and of different grades are used: the medium grade S6 from Samsung and the high-grade iPhone7plus from Apple The considered data are logged using the “Matlab mobile” application from Mathworks, which allows a simple acquisition from the sensors into the smartphones and the storage into cloud memory Moreover, the experiment is replicated by using a new dataset (D2) to assess the performances on a different group of walkers The second dataset includes the data of nine additional walkers (5 females and males) As for D1, each person walked indoor with all the four different poses for about 40 meters and with all the smartphones Differently for D1, the two smartphones adopted for this data acquisition are the Xiaomi Mi A2 and the Xiaomi Mi A2 Lite Four well-known metrics have been used to evaluate the classification results: Precision, Recall, F-Measure, ROC Area and Accuracy Precision (P) has been evaluated http://www.cs.waikato.ac.nz/ml/weka/ A Machine Learning Approach for Walker Identification … 239 Fig Total Number of records (size) and their distribution in training and test set for each walker for D1 as the proportion of the examples that truly belong to the class of a specific walker among all those which were assigned to the class It is computed as the ratio of the number of relevant retrieved records (true positives) to the total number of irrelevant retrieved records (false negatives) and relevant retrieved records (true positives): P= tp + f p (3) where is the number of true positives and fp is the number of false positives The recall (R) has been evaluated as the proportion of examples assigned to the class of a specific walker among all the examples that truly belong to that class It is computed as the ratio of the number of retrieved relevant records (true positives) to the total number of relevant records (i.e., the sum of true positives and false negatives): R= tp + f n (4) where fn is the number of false negatives The F-Measure (F1 ) is a measure of a test’s accuracy computed as a weighted average of the precision and recall: F1 = ∗ P∗R P+R (5) The ROC Area is a measure of the probability that a positive example selected randomly is classified above a negative one 240 A Angrisano et al Table Walkers profiles for D1 ID Gender w01m w02m w03f w04m w05m w06m w07m w08f w09f Male Male Female Male Male Male Male Female Female Age (years) Height (cm) 23 23 23 40 22 42 31 31 34 169 170 173 192 183 180 183 152 154 Finally, the accuracy has been evaluated as a description of systematic errors It is computed as the ratio of the sum of true positive and true negative to the total number of records: Accuracy = t p + tn t p + tn + f p + f n (6) Results and Discussion This section provides discussion of issues and lessons learned while performing this study To assess the performance of the classifiers we used the accuracy, precision, recall and ROC metrics on the dataset D1 To further deepen the results we also considered the confusion matrix Finally the experiment is replicated on the dataset D2 As naming convention, we decided to indicate the walkers by the letter “w”, followed by an identification number and, finally, by the letter “m” or “f”, depending on whether the walker is a male or a female; for instance, w02m indicates a male walker identified by the number 02 6.1 D1 Dataset Results Table lists the selected walkers’ profiles collected in D1 and involved in the study Considering all the walkers bringing any phone in any pose, among the considered ones, accuracy, precision and recall of the classifier C1 are all equal to 98.2%, while ROC is 99.3% The considered metrics demonstrate the good performance of C1 in recognizing the walker’s identity A Machine Learning Approach for Walker Identification … 241 Table Confusion matrix of classifier C1 : all walkers with any phone in any pose Classified → w01m w02m w03f w04m w05m w06m w07m w08f w01m w02m w03f w04m w05m w06m w07m w08f w09f ↑ Actual 97.3 0.6 0.0 0.5 0.4 0.6 0.3 0.0 0.0 0.5 97.4 0.0 0.6 0.3 0.3 0.9 0.0 0.0 0.0 0.0 99.6 0.0 0.0 0.0 0.0 0.2 0.2 0.8 0.7 0.0 97.4 0.7 0.9 0.3 0.0 0.0 0.5 0.3 0.0 0.5 97.7 0.4 0.5 0.0 0.0 0.6 0.3 0.0 0.8 0.3 97.7 0.1 0.0 0.0 0.3 0.7 0.0 0.2 0.5 0.1 98.0 0.0 0.0 Table Impact of each sensor on the classifier C1 performance Feature sets Sensors Accuracy (%) Precision (%) Recall (%) S1 S2 S3 S4 S5 S6 S7 A G O A and G A and O G and O A, G and O 63.2 30.1 93.8 81.0 95.7 95.8 98.2 63.0 30.1 93.8 81.0 95.7 95.8 98.2 63.2 30.1 93.8 81.0 95.7 95.8 98.2 0.0 0.0 0.1 0.0 0.0 0.0 0.0 98.8 0.6 w09f 0.0 0.0 0.3 0.0 0.0 0.0 0.0 1.1 99.2 ROC (%) 86.1 62.6 98.5 92.0 98.4 98.5 99.3 The confusion matrix of C1 is shown in Table reports very good classification results In particular, it is interesting to observe that both male and female walkers are well identified The largest percentage of identification failures is less than 2% and is between the walkers w08f and w09f, which are female walkers of similar ages and heights We have also investigated the impact of the specific phone on the performances of classifier C1 finding that with the Samsung S6 the percentage of correctly classified instances is 99.1% whereas with iPhone7plus it drops to 97.9% In order to analyze the impact of each sensor on the classifier performance, all possible combinations of sensors are considered; the comparison among the obtained results is shown in Table As predictable, the best performances are obtained with the configuration including all the considered sensors; the configuration including both accelerometers and gyros provides significantly worse results with respect to orientation coupled with accelerometers or gyros The main contribution to the walker recognition seems to come from the orientation sensor, with 93.8% of successful identifications, while only accelerometers or gyros allow respectively only 63.2% and 30.1% of correct classifications 242 A Angrisano et al Table Confusion matrix of classifier C1 , applied to walkers w05m and w06m with any phone ↓ True/Pred → w01m w02m w03f w04m w05m w06m w07m w08f w09f w05m w06m 1.2 2.2 0.4 1.0 1.1 0.4 1.9 2.5 91.9 1.9 2.5 88.3 0.8 0.7 Table Confusion matrix of classifier C2 , for all walkers with any phone ↓ Actual phone pose Classified phone pose ↓ Texting Pocket Phoning Texting Pocket Phoning Swinging 97.1 1.6 0.2 0.1 2.7 97.7 0.8 0.2 0.1 0.5 98.8 0.3 0.1 2.1 0.3 0.8 Swinging 0.1 0.2 0.2 99.4 To further test the robustness of classifier C1 , it has been applied on data collected from two walkers, w05m and w06m, equipped with iPhone7plus smartphone, in a real-world context: an outdoor walking session on a pedestrian street on a typical working day The percentage of correct identification drastically decreased to 90.2 From the confusion matrix, in Table 5, is evident that walker w05m and w06m are still correctly identified respectively 91.9% and 88.3% of the cases Concerning the classifier C2 , the objective is to distinguish the pose of the phone among four possible ones: texting, pocket, phoning and swinging The confusion matrix, in this case, is shown in Table The swinging pose is the easiest to detect with a 99.4% of correct classifications On the other hand, the worst performances are obtained for the texting pose with 97.1% of correct classifications, while 2.7% of texting poses are confused with pocket ones All the results shown so far are obtained processing the samples included in a very short time window of 0.32 s, in order to allow a real-time application of the method Increasing the time window, and consequently, the number of samples used for the recognition could improve the performance of C1 classifier To demonstrate that, a sliding window approach has been adopted with fixed windows of increasing sizes up to 10 s The obtained results, in terms of accuracy and error, are shown in Fig It is evident that the accuracy increases with the window and consequently the error decreases with it; starting from a window of 4.5 s, the accuracy becomes ≈100% The choice of the time window size is related to the application; if a real-time response is required, a short window is necessary (allowing more errors), otherwise, a larger one can be considered The time window size impact is analyzed also for C1 classifiers with fixed device poses and the accuracy/error behaviors are shown in Fig From those, it is evident that the accuracy increases more rapidly with window size with respect to the C1 classifier trained on the device in all poses; a window of s provides an accuracy very near to 100% for texting, pocket and phoning posed, whereas a window of 2.5 s is necessary for swinging A Machine Learning Approach for Walker Identification … 243 Fig Accuracy and error of C1 classifier, for all device poses, with respect to window size (a) texting (b) pocket (c) phoning (d) swinging Fig Accuracy (red line) and error (blue line) behavior of C1 classifiers for the four device poses considered with respect to window size (expressed in seconds) 6.2 D2 Dataset Results Table lists the selected walkers’ profiles collected in D2 and involved in the study replication For each walker the figure also reports the number of collected records 244 A Angrisano et al Table Walkers profiles for D2 and D2 size ID Gender Age (years) w01m w02m w03f w04m w05f w06f w07f w08m w09f Male Male Female Male Female Female Female Male Female 27 27 21 28 26 27 26 31 18 Height (cm) Size (records) 175 172 158 175 161 160 159 177 163 5550 5596 5608 5608 5803 6315 6008 4853 6970 Table Confusion matrix for the dataset D2 of classifier C1 : all walkers with any phone in any pose Classified → w01m w02m w03f w04m w05f w06f w07f w08m w09f w01m w02m w03f w04m w05f w06f w07f w08m w09f ↑ Actual 97.8 1.2 0.6 0.7 0.9 0.9 0.6 0.6 0.8 95.9 0.6 0.1 0.1 0.2 0.6 0.6 0.3 0.8 0.8 93.9 0.2 0.6 1.4 0.4 0.4 0.8 0.8 0.2 0.2 96.1 0.5 0.2 0.1 0.3 0.2 0.6 0.1 0.8 0.9 94 1.7 0.2 0.0 1.1 0.8 0.2 1.1 0.2 2.2 93.5 0.5 0.3 1.2 1.24 0.7 0.6 0.3 0.7 0.3 96.9 0.3 0.2 1.52 0.5 0.1 0.4 0.2 97.4 0.0 0.65 0.1 0.9 0.1 1.1 1.4 0.3 0.1 95.7 For briefly, we only report the replication of the experiment by using the classifier C1 and C2 on the features model S7 Considering all the walkers bringing any phone in any pose, the obtained accuracy of the classifier C1 is to 95.2%, confirming the good performance of C1 in recognizing the walker’s identity The confusion matrix of C1 is shown in Table confirms the consideration made for the dataset D1 Similarly, the confusion matrix of classifier C2 for the dataset D2 is reported in Table For all the poses the percentage of correct classification is greater than 99.7% confirming that the proposed classifier gives good performance for all the walkers with any phones A Machine Learning Approach for Walker Identification … 245 Table Confusion matrix of classifier C2 for the dataset D2, for all walkers with any phone ↓ Actual phone pose Classified phone pose ↓ Texting Pocket Phoning Swinging Texting Pocket Phoning Swinging 100 0.3 0.1 0.1 0.0 99.7 0.0 0.1 0.0 0.0 99.8 0.1 0.0 0.0 0.1 99.7 Conclusions We applied a machine learning technique to measurements from smartphone sensors like accelerometers, gyroscopes and orientation, in order to retrieve information about the identity of the walker bringing the smartphone, and the pose of the device during the walking The most relevant sensor for the identity recognition is the orientation, the less one is the gyroscope; using all three sensors is anyway the best choice Looking to walker identification, the best classifier obtains an accuracy of 98.2% with the smallest time window size of 0.32 s in the walker The walker identification is performed with several time windows of measurements and it has been demonstrated that increasing the window produce significant improvements of the results; specifically, with time windows over 4.5 s the percentage of correct identification is ≈100 Finally the performances of the C2 classifier (i.e., device pose detection) are equally satisfying with an the average accuracy, across the four classes, that is ≈99% even at the smallest time window The same experiment has been replicated on an additional group of peoples with different smartphones obtaining comparable results References Bogue, R.: Recent developments in mems sensors: a review of applications, markets and technologies Sens Rev 33(4), 300–304 (2013) Breiman, L.: Bagging predictors Mach Learn 24(2), 123–140 (1996) Breiman, L.: Random forests Mach Learn 45(1), 5–32 (2001) Bulling, A., Blanke, U., Schiele, B.: A tutorial on human activity recognition using body-worn inertial sensors ACM Comput Surv 46(3), 33:1–33:33 (2014) Esmael, B., Arnaout, A., Fruhwirth, R.K., Thonhauser, G.: Multivariate time series classification by combining trend-based and value-based approximations In: Murgante, B., Gervasi, O., Misra, S., Nedjah, N., Rocha, A.M.A.C., Taniar, D., Apduhan, B.O (eds.) Computational Science and Its Applications - ICCSA 2012, pp 392–403 Springer, Berlin (2012) Fareed, U.: Smartphone sensor fusion based activity recognition system for elderly healthcare In: Proceedings of the 2015 Workshop on Pervasive Wireless Healthcare, pp 29–34 MobileHealth ’15, ACM, New York, NY, USA (2015) Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees Mach Learn 63(1), 3–42 (2006) https://doi.org/10.1007/s10994-006-6226-1 246 A Angrisano et al Hammerla, N.Y., Halloran, S., Plötz, T.: Deep, convolutional, and recurrent models for human activity recognition using wearables In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp 1533–1540 IJCAI’16, AAAI Press (2016) http://dl acm.org/citation.cfm?id=3060832.3060835 Hoang, T., Choi, D., Nguyen, T.: On the instability of sensor orientation in gait verification on mobile phone In: 2015 12th International Joint Conference on e-Business and Telecommunications (ICETE), vol 04, pp 148–159 (2015) 10 Jain, Y., Chowdhury, D., Chattopadhyay, M.: Machine learning based fitness tracker platform using mems accelerometer In: 2017 International Conference on Computer, Electrical Communication Engineering (ICCECE), pp 1–5 (2017) https://doi.org/10.1109/ICCECE.2017 8526202 11 Kabigting, J.E.T., Chen, A.D., Chang, E.J., Lee, W., Roberts, R.C.: Mems pressure sensor array wearable for traditional chinese medicine pulse-taking In: 2017 IEEE 14th International Conference on Wearable and Implantable Body Sensor Networks (BSN), pp 59–62 (2017) https://doi.org/10.1109/BSN.2017.7936007 12 Khan, W., Xiang, Y., Aalsalem, M., Arshad, Q.: Mobile phone sensing systems: a survey IEEE Commun Surv Tutor 15(1), 402–427 (2013) https://doi.org/10.1109/SURV.2012.031412 00077 13 Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection In: Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2, pp 1137–1143 IJCAI’95, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1995) 14 Kwapisz, J.R., Weiss, G.M., Moore, S.A.: Cell phone-based biometric identification In: 2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS), pp 1–7 (2010) 15 Lammel, G.: The future of mems sensors in our connected world In: 2015 28th IEEE International Conference on Micro Electro Mechanical Systems (MEMS), pp 61–64 (2015) 16 Lane, N.D., Miluzzo, E., Lu, H., Peebles, D., Choudhury, T., Campbell, A.T.: A survey of mobile phone sensing IEEE Commun Mag 48(9), 140–150 (2010) 17 Lee, W.H., Lee, R.: Implicit sensor-based authentication of smartphone users with smartwatch In: Proceedings of the Hardware and Architectural Support for Security and Privacy 2016, pp 9:1–9:8 HASP 2016, ACM, New York, NY, USA (2016) 18 Liu, S., Gao, R.X., John, D., Staudenmayer, J.W., Freedson, P.S.: Multisensor data fusion for physical activity assessment IEEE Trans Biomed Eng 59(3), 687–696 (2012) 19 Mohammad Masoud, Yousef Jaradat, A.M., Jannoud, I.: Sensors of smart devices in the internet of everything (ioe) era: Big opportunities and massive doubts (2019) https://doi.org/10.1155/ 2019/6514520 20 Nweke, H.F., Teh, Y.W., Al-garadi, M.A., Alo, U.R.: Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: state of the art and research challenges Expert Syst Appl 105, 233–261 (2018) https://doi.org/10.1016/j.eswa.2018.03 056 21 Ordonez, F.J., Roggen, D.: Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition Sensors 16(1) (2016) 22 Pei, L., Liu, J., Guinness, R., Chen, Y., Kuusniemi, H., Chen, R.: Using ls-svm based motion recognition for smartphone indoor wireless positioning Sensors (Basel) (2012) 23 Radu, V., Lane, N.D., Bhattacharya, S., Mascolo, C., Marina, M.K., Kawsar, F.: Towards multimodal deep learning for activity recognition on mobile devices In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, pp 185–188 UbiComp ’16, ACM, New York, NY, USA (2016) 24 Rokni, S.A., Ghasemzadeh, H.: Autonomous training of activity recognition algorithms in mobile sensors: a transfer learning approach in context-invariant views IEEE Trans Mob Comput 17(8), 1764–1777 (2018) https://doi.org/10.1109/TMC.2018.2789890 25 Ronao, C.A., Cho, S.B.: Human activity recognition with smartphone sensors using deep learning neural networks Expert Syst Appl 59(C), 235–244 (2016) A Machine Learning Approach for Walker Identification … 247 26 Russell, S.J., Norvig, P.: Artificial intelligence: a modern approach Pearson Education Limited, Malaysia (2016) 27 Stone, M.: Cross-validatory choice and assessment of statistical predictions J R Stat Soc Ser B 36, 111–147 (1974) 28 Susi, M., Renaudin, V., Lachapelle, G.: Motion mode recognition and step detection algorithms for mobile phone user J Locat Based Serv (2012) 29 Yan, Y., Cosgrove, S., Blantont, E., Ko, S.Y., Ziarek, L.: Real-time sensing on android In: Proceedings of the 12th International Workshop on Java Technologies for Real-time and Embedded Systems, pp 67:67–67:75 JTRES ’14, ACM, New York, NY, USA (2014) 30 Yu, K., Liu, Y., Qing, L., Wang, B., Cheng, Y.: Positive and unlabeled learning for user behavior analysis based on mobile internet traffic data IEEE Access 6, 37568–37580 (2018) https:// doi.org/10.1109/ACCESS.2018.2852008 31 Zhang, H., Yuan, W., Shen, Q., Li, T., Chang, H.: A handheld inertial pedestrian navigation system with accurate step modes and device poses recognition IEEE Sens J 15(3), 1421–1429 (2015) Author Index A Andresini, Giuseppina, 173 Angelastro, Sergio, 121 Angrisano, Antonio, 229 Appice, Annalisa, 173 Ardimento, Pasquale, 63, 229 I Impedovo, Angelo, 137 B Bernardi, Mario Luca, 229 Biba, Marenglen, 153 Boffoli, Nicola, 63 L Loglisci, Corrado, 137 Lu, Yifeng, C Ceci, Michelangelo, 137 Chiorrini, Andrea, 103 Cimitile, Marta, 229 Couceiro, Miguel, 47 J Jarmulski, Wojciech, 33 Juniarta, Nyoman, 47 M Malerba, Donato, 137, 173 Meỗe, Elinda Kajo, 85 Mele, Costantino, 63 Molinaro, Cristian, 213 N Napoli, Amedeo, 47 D Diamantini, Claudia, 103 Duka, Enea, 85 F Ferilli, Stefano, 121 Fiorentino, Nicola, 213 G Gaglione, Salvatore, 229 Gelich, Anna, 189 Genga, Laura, 103 © Springer Nature Switzerland AG 2020 A Appice et al (eds.), Complex Pattern Mining, Studies in Computational Intelligence 880, https://doi.org/10.1007/978-3-030-36617-9 P Potena, Domenico, 103 Powell, Laurel, 189 Prifti, Marjana, 153 R Ras, Zbigniew W., 189 Richter, Florian, S Seidl, Thomas, Serra, Giuseppe, 17 249 250 T Trandafili, Evis, 85 Trubitsyna, Irina, 213 Author Index W Wieczorkowska, Alicja, 33 X Xu, Zhao, 17 V von Ritter, Lorenzo, 17 Z Zannone, Nicola, 103 ... Computational Intelligence ISBN 97 8-3 -0 3 0-3 661 6-2 ISBN 97 8-3 -0 3 0-3 661 7-9 (eBook) https://doi.org/10.1007/97 8-3 -0 3 0-3 661 7-9 © Springer Nature Switzerland AG 2020 This work is subject to copyright... this series at http://www.springer.com/series/7092 Annalisa Appice Michelangelo Ceci Corrado Loglisci Giuseppe Manco Elio Masciari Zbigniew W Ras • • • • • Editors Complex Pattern Mining New Challenges,... Springer Nature Switzerland AG 2020 A Appice et al (eds.), Complex Pattern Mining, Studies in Computational Intelligence 880, https://doi.org/10.1007/97 8-3 -0 3 0-3 661 7-9 _2 17 18 Z Xu et al Introduction
- Xem thêm -

Xem thêm: Complex pattern mining, 1st ed , annalisa appice, michelangelo ceci, corrado loglisci, giuseppe manco, elio masciari, zbigniew w ras, 2020 523 , Complex pattern mining, 1st ed , annalisa appice, michelangelo ceci, corrado loglisci, giuseppe manco, elio masciari, zbigniew w ras, 2020 523

Mục lục

Xem thêm

Gợi ý tài liệu liên quan cho bạn