Feature-Level Fusion Multi-Sensor Aggregation Temporal Network for Smartphone-Based Human Activity Recognition
Main Article Content
Keywords
Smartphone, inertial signal, Deep learning, multi-sensor, dilated convolution
Abstract
Smartphone-based Human Activity Recognition (HAR) identifies human movements using inertial signals gathered from multiple smartphone sensors. Generally, these signals are stacked as one (data-level fusion) and fed into deep learning algorithms for feature extractions. This research studies feature-level fusion, individually processing inertial signals from each sensor, and proposes a lightweight deep temporal learning model, Feature-Level Fusion Multi-Sensor Aggregation Temporal Network (FLF-MSATN), that performs feature extraction on inertial signals from each sensor separately. The raw signals, segmented into equally sized time windows, are passed into individual Dilated-Pooled Convolutional Heads (DPC Heads) for temporal feature analysis. Each DPC Head has a spatiotemporal block containing dilated causal convolutions and average pooling, to extract underlying patterns. The DPC Heads’ outputs are concatenated and passed into a Global Average Pooling layer to generate a condensed confidence map before activity classification. FLF-MSATN is assessed using a subject-independent protocol on a publicly available HAR dataset, UCI HAR, and a self-collected HAR dataset, achieving 96.67% and 82.70% accuracies, respectively. A Data-Level Fusion MSATN is built to compare and verify the model performance attained by the proposed FLF-MSATN. The empirical results show that implementing FLF-MSATN enhances the accuracy by ~3.4% for UCI HAR and ~9.68% for self-collected datasets.
Downloads
References
Alawneh, L., Mohsen, B., Al-Zinati, M., Shatnawi, A., & Al-Ayyoub, M. (2020, March 1). A Comparison of Unidirectional and Bidirectional LSTM Networks for Human Activity Recognition. 2020 IEEE International Conference on Pervasive Computing and Communications Workshops, PerCom Workshops 2020. https://doi.org/10.1109/PerComWorkshops48775.2020.9156264
Anguita, D., Ghio, A., Oneto, L., Parra, X., & Reyes-Ortiz, J. L. (2013). A public domain dataset for human activity recognition using smartphones. ESANN 2013 Proceedings, 21st European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. https://api.semanticscholar.org/CorpusID:6975432
Asim, M., Zhu, M., & Javed, M. Y. (2017). CNN based spatio-temporal feature extraction for face anti-spoofing. 2017 2nd International Conference on Image, Vision and Computing, ICIVC 2017, 234–238. https://doi.org/10.1109/ICIVC.2017.7984552
Bai, S., Kolter, J. Z., & Koltun, V. (2018). An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. ArXiv. https://arxiv.org/abs/1803.01271
Ben Mabrouk, A., & Zagrouba, E. (2018). Abnormal behavior recognition for intelligent video surveillance systems: A review. Expert Systems with Applications, 91, 480–491. https://doi.org/10.1016/j.eswa.2017.09.029
Bengio, Y. (2013). Deep learning of representations: Looking forward. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7978 LNAI, 1–37. https://doi.org/10.1007/978-3-642-39593-2_1
Challa, S. K., Kumar, A., & Semwal, V. B. (2022). A multibranch CNN-BiLSTM model for human activity recognition using wearable sensor data. Visual Computer, 38(12), 4095–4109. https://doi.org/10.1007/s00371-021-02283-3
Dua, N., Singh, S. N., & Semwal, V. B. (2021). Multi-input CNN-GRU based human activity recognition using wearable sensors. Computing, 103(7), 1461–1478. https://doi.org/10.1007/s00607-021-00928-8
Garcia, F. A., Ranieri, C. M., & Romero, R. A. F. (2019). Temporal approaches for human activity recognition using inertial sensors. Proceedings - 2019 Latin American Robotics Symposium, 2019 Brazilian Symposium on Robotics and 2019 Workshop on Robotics in Education, LARS/SBR/WRE 2019, 121–125. https://doi.org/10.1109/LARS-SBR-WRE48964.2019.00029
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. Journal of Machine Learning Research, 15, 315–323. https://api.semanticscholar.org/CorpusID:2239473
Gravina, R., Alinia, P., Ghasemzadeh, H., & Fortino, G. (2017). Multi-sensor fusion in body sensor networks: State-of-the-art and research challenges. Information Fusion, 35, 1339–1351. https://doi.org/10.1016/j.inffus.2016.09.005
Huang, J., Lin, S., Wang, N., Dai, G., Xie, Y., & Zhou, J. (2020). TSE-CNN: A Two-Stage End-to-End CNN for Human Activity Recognition. IEEE Journal of Biomedical and Health Informatics, 24(1), 292–299. https://doi.org/10.1109/JBHI.2019.2909688
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. 32nd International Conference on Machine Learning, ICML 2015, 1, 448–456. http://proceedings.mlr.press/v37/ioffe15.html
Ismail Fawaz, H., Lucas, B., Forestier, G., Pelletier, C., Schmidt, D. F., Weber, J., Webb, G. I., Idoumghar, L., Muller, P. A., & Petitjean, F. (2020). InceptionTime: Finding AlexNet for time series classification. Data Mining and Knowledge Discovery, 34(6), 1936–1962. https://doi.org/10.1007/s10618-020-00710-y
Khan, Z. N., & Ahmad, J. (2021). Attention induced multi-head convolutional neural network for human activity recognition. Applied Soft Computing, 110, 107671. https://doi.org/10.1016/j.asoc.2021.107671
Lea, C., Ren, M. D. F., Reiter, A., & Hager, G. D. (2016). Temporal Convolutional Networks for Action Segmentation and Detection. arXiv. https://arxiv.org/abs/1611.05267
Mercioni, M. A., & Holban, S. (2020). The Most Used Activation Functions: Classic Versus Current. 2020 15th International Conference on Development and Application Systems, DAS 2020 — Proceedings, 141–145. https://doi.org/10.1109/DAS49615.2020.9108942
Mim, T. R., Amatullah, M., Afreen, S., Yousuf, M. A., Uddin, S., Alyami, S. A., Hasan, K. F., & Moni, M. A. (2023). GRU-INC: An inception-attention based approach using GRU for human activity recognition. Expert Systems with Applications, 216, 119419. https://doi.org/10.1016/j.eswa.2022.119419
Minh Dang, L., Min, K., Wang, H., Jalil Piran, M., Hee Lee, C., & Moon, H. (2020). Sensor-based and vision-based human activity recognition: A comprehensive survey. Pattern Recognition, 108. https://doi.org/10.1016/j.patcog.2020.107561
Nair, N., Thomas, C., & Jayagopi, D. B. (2018). Human activity recognition using temporal convolutional network. ACM International Conference Proceeding Series. https://doi.org/10.1145/3266157.3266221
Perez-Gamboa, S., Sun, Q., & Zhang, Y. (2021, March 22). Improved Sensor Based Human Activity Recognition via Hybrid Convolutional and Recurrent Neural Networks. INERTIAL 2021 - 8th IEEE International Symposium on Inertial Sensors and Systems, Proceedings. https://doi.org/10.1109/INERTIAL51137.2021.9430460
Raja Sekaran, S., Han, P. Y., & Yin, O. S. (2023). Smartphone-based human activity recognition using lightweight multiheaded temporal convolutional network. Expert Systems with Applications, 227, 120132. https://doi.org/10.1016/j.eswa.2023.120132
Raja Sekaran, S., Pang, Y. H., Ling, G. F., & Yin, O. S. (2022). MSTCN: A multiscale temporal convolutional network for user independent human activity recognition. F1000Research, 10, 1261. https://doi.org/10.12688/f1000research.73175.2
Rasamoelina, A. D., Adjailia, F., & Sincak, P. (2020). A Review of Activation Function for Artificial Neural Network. SAMI 2020 - IEEE 18th World Symposium on Applied Machine Intelligence and Informatics, Proceedings, 281–286. https://doi.org/10.1109/SAMI48414.2020.9108717
Ronald, M., Poulose, A., & Han, D. S. (2021). iSPLInception: An Inception-ResNet Deep Learning Architecture for Human Activity Recognition. IEEE Access, 9, 68985–69001. https://doi.org/10.1109/ACCESS.2021.3078184
Shome, D. (2021). RestHAR: Residual Feature Learning Transformer for Human Activity Recognition from Multi-sensor Data. 2021 8th International Conference on Soft Computing and Machine Intelligence, ISCMI 2021, 181–185. https://doi.org/10.1109/ISCMI53840.2021.9654816
Stein, M., Janetzko, H., Lamprecht, A., Seebacher, D., Schreck, T., Keim, D., & Grossniklaus, M. (2016). From game events to team tactics: Visual analysis of dangerous situations in multi-match data. TISHW 2016 - 1st International Conference on Technology and Innovation in Sports, Health and Wellbeing, Proceedings, Tishw, 1–9. https://doi.org/10.1109/TISHW.2016.7847777
Tan, V. W. S., Ooi, W. X., Chan, Y. F., Tee, C., & Goh, M. K. O. (2024). Vision-Based Gait Analysis for Neurodegenerative Disorders Detection. Journal of Informatics and Web Engineering, 3(1), 136–154. https://doi.org/10.33093/jiwe.2024.3.1.9
Tang, Y., Teng, Q., Zhang, L., Min, F., & He, J. (2021). Layer-Wise Training Convolutional Neural Networks with Smaller Filters for Human Activity Recognition Using Wearable Sensors. IEEE Sensors Journal, 21(1), 581–592. https://doi.org/10.1109/JSEN.2020.3015521
Teng, Q., Wang, K., Zhang, L., & He, J. (2020). The Layer-Wise Training Convolutional Neural Networks Using Local Loss for Sensor-Based Human Activity Recognition. IEEE Sensors Journal, 20(13), 7265–7274. https://doi.org/10.1109/JSEN.2020.2978772
Ti, Y. F., Connie, T., & Goh, M. K. O. (2023). GenReGait: Gender Recognition using Gait Features. Journal of Informatics and Web Engineering, 2(2), 129–140. https://doi.org/10.33093/jiwe.2023.2.2.10
Ullah, M., Ullah, H., Khan, S. D., & Cheikh, F. A. (2019). Stacked Lstm Network for Human Activity Recognition Using Smartphone Data. 2019 8th European Workshop on Visual Information Processing (EUVIP), Roma, Italy, 2019, 175–180. https://doi.org/10.1109/EUVIP47703.2019.8946180
Wan, J., Li, M., O’Grady, M. J., Gu, X., Alawlaqi, M. A. A. H., & O’Hare, G. M. P. (2021). Time-Bounded Activity Recognition for Ambient Assisted Living. IEEE Transactions on Emerging Topics in Computing, 9(1), 471–483. https://doi.org/10.1109/TETC.2018.2870047
Xia, K., Huang, J., & Wang, H. (2020). LSTM-CNN Architecture for Human Activity Recognition. IEEE Access, 8, 56855–56866. https://doi.org/10.1109/ACCESS.2020.2982225
Xu, C., Chai, D., He, J., Zhang, X., & Duan, S. (2019). InnoHAR: A deep neural network for complex human activity recognition. IEEE Access, 7, 9893–9902. https://doi.org/10.1109/ACCESS.2018.2890675
Yang, J. B., Nguyen, M. N., San, P. P., Li, X. L., & Krishnaswamy, S. (2015). Deep convolutional neural networks on multichannel time series for human activity recognition. IJCAI International Joint Conference on Artificial Intelligence, 2015, 3995–4001. https://api.semanticscholar.org/CorpusID:1605434
Zafar, A., Aamir, M., Mohd Nawi, N., Arshad, A., Riaz, S., Alruban, A., Dutta, A. K., & Almotairi, S. (2022). A Comparison of Pooling Methods for Convolutional Neural Networks. Applied Sciences (Switzerland), 12(17). https://doi.org/10.3390/app12178643
Zhang, H., Xiao, Z., Wang, J., Li, F., & Szczerbicki, E. (2020). A Novel IoT-Perceptive Human Activity Recognition (HAR) Approach Using Multihead Convolutional Attention. IEEE Internet of Things Journal, 7(2), 1072–1080. https://doi.org/10.1109/JIOT.2019.2949715