Feature-Level Fusion Multi-Sensor Aggregation Temporal Network for Smartphone-Based Human Activity Recognition

Sarmela Raja Sekaran; Ying Han Pang; Zheng You Lim; Shih Yin Ooi

doi:10.18080/jtde.v12n3.979

Sarmela Raja Sekaran

Faculty of Information Science and Technology, Multimedia University, Malacca, Malaysia

https://orcid.org/0000-0002-6465-5503
Ying Han Pang

Multimedia University, Malaysia

https://orcid.org/0000-0002-3781-6623
Zheng You Lim

Multimedia University, Malaysia

Shih Yin Ooi

Faculty of Information Science and Technology, Multimedia University, Malacca, Malaysia

Keywords

Smartphone, inertial signal, Deep learning, multi-sensor, dilated convolution

Abstract

Smartphone-based Human Activity Recognition (HAR) identifies human movements using inertial signals gathered from multiple smartphone sensors. Generally, these signals are stacked as one (data-level fusion) and fed into deep learning algorithms for feature extractions. This research studies feature-level fusion, individually processing inertial signals from each sensor, and proposes a lightweight deep temporal learning model, Feature-Level Fusion Multi-Sensor Aggregation Temporal Network (FLF-MSATN), that performs feature extraction on inertial signals from each sensor separately. The raw signals, segmented into equally sized time windows, are passed into individual Dilated-Pooled Convolutional Heads (DPC Heads) for temporal feature analysis. Each DPC Head has a spatiotemporal block containing dilated causal convolutions and average pooling, to extract underlying patterns. The DPC Heads’ outputs are concatenated and passed into a Global Average Pooling layer to generate a condensed confidence map before activity classification. FLF-MSATN is assessed using a subject-independent protocol on a publicly available HAR dataset, UCI HAR, and a self-collected HAR dataset, achieving 96.67% and 82.70% accuracies, respectively. A Data-Level Fusion MSATN is built to compare and verify the model performance attained by the proposed FLF-MSATN. The empirical results show that implementing FLF-MSATN enhances the accuracy by ~3.4% for UCI HAR and ~9.68% for self-collected datasets.

Abstract 255 | 984-PDF-v12n3pp29-50 Downloads 4

References

Aguileta, A. A., Brena, R. F., Mayora, O., Molino-Minero-re, E., & Trejo, L. A. (2019). Multi-sensor fusion for activity recognition—a survey. Sensors (Switzerland), 19(17), 1–41. https://doi.org/10.3390/s19173808
Alawneh, L., Mohsen, B., Al-Zinati, M., Shatnawi, A., & Al-Ayyoub, M. (2020, March 1). A Comparison of Unidirectional and Bidirectional LSTM Networks for Human Activity Recognition. 2020 IEEE International Conference on Pervasive Computing and Communications Workshops, PerCom Workshops 2020. https://doi.org/10.1109/PerComWorkshops48775.2020.9156264
Anguita, D., Ghio, A., Oneto, L., Parra, X., & Reyes-Ortiz, J. L. (2013). A public domain dataset for human activity recognition using smartphones. ESANN 2013 Proceedings, 21st European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. https://api.semanticscholar.org/CorpusID:6975432
Asim, M., Zhu, M., & Javed, M. Y. (2017). CNN based spatio-temporal feature extraction for face anti-spoofing. 2017 2nd International Conference on Image, Vision and Computing, ICIVC 2017, 234–238. https://doi.org/10.1109/ICIVC.2017.7984552
Bai, S., Kolter, J. Z., & Koltun, V. (2018). An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. ArXiv. https://arxiv.org/abs/1803.01271
Ben Mabrouk, A., & Zagrouba, E. (2018). Abnormal behavior recognition for intelligent video surveillance systems: A review. Expert Systems with Applications, 91, 480–491. https://doi.org/10.1016/j.eswa.2017.09.029
Bengio, Y. (2013). Deep learning of representations: Looking forward. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7978 LNAI, 1–37. https://doi.org/10.1007/978-3-642-39593-2_1
Challa, S. K., Kumar, A., & Semwal, V. B. (2022). A multibranch CNN-BiLSTM model for human activity recognition using wearable sensor data. Visual Computer, 38(12), 4095–4109. https://doi.org/10.1007/s00371-021-02283-3
Dua, N., Singh, S. N., & Semwal, V. B. (2021). Multi-input CNN-GRU based human activity recognition using wearable sensors. Computing, 103(7), 1461–1478. https://doi.org/10.1007/s00607-021-00928-8
Garcia, F. A., Ranieri, C. M., & Romero, R. A. F. (2019). Temporal approaches for human activity recognition using inertial sensors. Proceedings - 2019 Latin American Robotics Symposium, 2019 Brazilian Symposium on Robotics and 2019 Workshop on Robotics in Education, LARS/SBR/WRE 2019, 121–125. https://doi.org/10.1109/LARS-SBR-WRE48964.2019.00029
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. Journal of Machine Learning Research, 15, 315–323. https://api.semanticscholar.org/CorpusID:2239473
Gravina, R., Alinia, P., Ghasemzadeh, H., & Fortino, G. (2017). Multi-sensor fusion in body sensor networks: State-of-the-art and research challenges. Information Fusion, 35, 1339–1351. https://doi.org/10.1016/j.inffus.2016.09.005
Huang, J., Lin, S., Wang, N., Dai, G., Xie, Y., & Zhou, J. (2020). TSE-CNN: A Two-Stage End-to-End CNN for Human Activity Recognition. IEEE Journal of Biomedical and Health Informatics, 24(1), 292–299. https://doi.org/10.1109/JBHI.2019.2909688
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. 32nd International Conference on Machine Learning, ICML 2015, 1, 448–456. http://proceedings.mlr.press/v37/ioffe15.html
Ismail Fawaz, H., Lucas, B., Forestier, G., Pelletier, C., Schmidt, D. F., Weber, J., Webb, G. I., Idoumghar, L., Muller, P. A., & Petitjean, F. (2020). InceptionTime: Finding AlexNet for time series classification. Data Mining and Knowledge Discovery, 34(6), 1936–1962. https://doi.org/10.1007/s10618-020-00710-y
Khan, Z. N., & Ahmad, J. (2021). Attention induced multi-head convolutional neural network for human activity recognition. Applied Soft Computing, 110, 107671. https://doi.org/10.1016/j.asoc.2021.107671
Lea, C., Ren, M. D. F., Reiter, A., & Hager, G. D. (2016). Temporal Convolutional Networks for Action Segmentation and Detection. arXiv. https://arxiv.org/abs/1611.05267
Mercioni, M. A., & Holban, S. (2020). The Most Used Activation Functions: Classic Versus Current. 2020 15th International Conference on Development and Application Systems, DAS 2020 — Proceedings, 141–145. https://doi.org/10.1109/DAS49615.2020.9108942
Mim, T. R., Amatullah, M., Afreen, S., Yousuf, M. A., Uddin, S., Alyami, S. A., Hasan, K. F., & Moni, M. A. (2023). GRU-INC: An inception-attention based approach using GRU for human activity recognition. Expert Systems with Applications, 216, 119419. https://doi.org/10.1016/j.eswa.2022.119419
Minh Dang, L., Min, K., Wang, H., Jalil Piran, M., Hee Lee, C., & Moon, H. (2020). Sensor-based and vision-based human activity recognition: A comprehensive survey. Pattern Recognition, 108. https://doi.org/10.1016/j.patcog.2020.107561
Nair, N., Thomas, C., & Jayagopi, D. B. (2018). Human activity recognition using temporal convolutional network. ACM International Conference Proceeding Series. https://doi.org/10.1145/3266157.3266221
Perez-Gamboa, S., Sun, Q., & Zhang, Y. (2021, March 22). Improved Sensor Based Human Activity Recognition via Hybrid Convolutional and Recurrent Neural Networks. INERTIAL 2021 - 8th IEEE International Symposium on Inertial Sensors and Systems, Proceedings. https://doi.org/10.1109/INERTIAL51137.2021.9430460
Raja Sekaran, S., Han, P. Y., & Yin, O. S. (2023). Smartphone-based human activity recognition using lightweight multiheaded temporal convolutional network. Expert Systems with Applications, 227, 120132. https://doi.org/10.1016/j.eswa.2023.120132
Raja Sekaran, S., Pang, Y. H., Ling, G. F., & Yin, O. S. (2022). MSTCN: A multiscale temporal convolutional network for user independent human activity recognition. F1000Research, 10, 1261. https://doi.org/10.12688/f1000research.73175.2
Rasamoelina, A. D., Adjailia, F., & Sincak, P. (2020). A Review of Activation Function for Artificial Neural Network. SAMI 2020 - IEEE 18th World Symposium on Applied Machine Intelligence and Informatics, Proceedings, 281–286. https://doi.org/10.1109/SAMI48414.2020.9108717
Ronald, M., Poulose, A., & Han, D. S. (2021). iSPLInception: An Inception-ResNet Deep Learning Architecture for Human Activity Recognition. IEEE Access, 9, 68985–69001. https://doi.org/10.1109/ACCESS.2021.3078184
Shome, D. (2021). RestHAR: Residual Feature Learning Transformer for Human Activity Recognition from Multi-sensor Data. 2021 8th International Conference on Soft Computing and Machine Intelligence, ISCMI 2021, 181–185. https://doi.org/10.1109/ISCMI53840.2021.9654816
Stein, M., Janetzko, H., Lamprecht, A., Seebacher, D., Schreck, T., Keim, D., & Grossniklaus, M. (2016). From game events to team tactics: Visual analysis of dangerous situations in multi-match data. TISHW 2016 - 1st International Conference on Technology and Innovation in Sports, Health and Wellbeing, Proceedings, Tishw, 1–9. https://doi.org/10.1109/TISHW.2016.7847777
Tan, V. W. S., Ooi, W. X., Chan, Y. F., Tee, C., & Goh, M. K. O. (2024). Vision-Based Gait Analysis for Neurodegenerative Disorders Detection. Journal of Informatics and Web Engineering, 3(1), 136–154. https://doi.org/10.33093/jiwe.2024.3.1.9
Tang, Y., Teng, Q., Zhang, L., Min, F., & He, J. (2021). Layer-Wise Training Convolutional Neural Networks with Smaller Filters for Human Activity Recognition Using Wearable Sensors. IEEE Sensors Journal, 21(1), 581–592. https://doi.org/10.1109/JSEN.2020.3015521
Teng, Q., Wang, K., Zhang, L., & He, J. (2020). The Layer-Wise Training Convolutional Neural Networks Using Local Loss for Sensor-Based Human Activity Recognition. IEEE Sensors Journal, 20(13), 7265–7274. https://doi.org/10.1109/JSEN.2020.2978772
Ti, Y. F., Connie, T., & Goh, M. K. O. (2023). GenReGait: Gender Recognition using Gait Features. Journal of Informatics and Web Engineering, 2(2), 129–140. https://doi.org/10.33093/jiwe.2023.2.2.10
Ullah, M., Ullah, H., Khan, S. D., & Cheikh, F. A. (2019). Stacked Lstm Network for Human Activity Recognition Using Smartphone Data. 2019 8th European Workshop on Visual Information Processing (EUVIP), Roma, Italy, 2019, 175–180. https://doi.org/10.1109/EUVIP47703.2019.8946180
Wan, J., Li, M., O’Grady, M. J., Gu, X., Alawlaqi, M. A. A. H., & O’Hare, G. M. P. (2021). Time-Bounded Activity Recognition for Ambient Assisted Living. IEEE Transactions on Emerging Topics in Computing, 9(1), 471–483. https://doi.org/10.1109/TETC.2018.2870047
Xia, K., Huang, J., & Wang, H. (2020). LSTM-CNN Architecture for Human Activity Recognition. IEEE Access, 8, 56855–56866. https://doi.org/10.1109/ACCESS.2020.2982225
Xu, C., Chai, D., He, J., Zhang, X., & Duan, S. (2019). InnoHAR: A deep neural network for complex human activity recognition. IEEE Access, 7, 9893–9902. https://doi.org/10.1109/ACCESS.2018.2890675
Yang, J. B., Nguyen, M. N., San, P. P., Li, X. L., & Krishnaswamy, S. (2015). Deep convolutional neural networks on multichannel time series for human activity recognition. IJCAI International Joint Conference on Artificial Intelligence, 2015, 3995–4001. https://api.semanticscholar.org/CorpusID:1605434
Zafar, A., Aamir, M., Mohd Nawi, N., Arshad, A., Riaz, S., Alruban, A., Dutta, A. K., & Almotairi, S. (2022). A Comparison of Pooling Methods for Convolutional Neural Networks. Applied Sciences (Switzerland), 12(17). https://doi.org/10.3390/app12178643
Zhang, H., Xiao, Z., Wang, J., Li, F., & Szczerbicki, E. (2020). A Novel IoT-Perceptive Human Activity Recognition (HAR) Approach Using Multihead Convolutional Attention. IEEE Internet of Things Journal, 7(2), 1072–1080. https://doi.org/10.1109/JIOT.2019.2949715

984-PDF-v12n3pp29-50 (AUD 30)

Published

2024-09-30

DOI: https://doi.org/10.18080/jtde.v12n3.979

How to Cite

Raja Sekaran, S., Pang, Y. H., Lim, Z. Y., & Ooi, S. Y. (2024). Feature-Level Fusion Multi-Sensor Aggregation Temporal Network for Smartphone-Based Human Activity Recognition. Journal of Telecommunications and the Digital Economy, 12(3), 29-50. https://doi.org/10.18080/jtde.v12n3.979

Issue

Vol. 12 No. 3 (2024)

Section

Special Issue: Perspectives on Machine Learning

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Copyright Telecommunications Association Inc.

Author Biographies

Sarmela Raja Sekaran, Faculty of Information Science and Technology, Multimedia University, Malacca, Malaysia

Faculty of Information Science and Technology, Multimedia University, Malacca, Malaysia. 1161303922@student.mmu.edu.my

Ying Han Pang, Multimedia University, Malaysia

Pang Ying Han^*

Faculty of Information Science and Technology, Multimedia University, Malacca, Malaysia. yhpang@mmu.edu.my

Zheng You Lim, Multimedia University, Malaysia

Faculty of Information Science and Technology, Multimedia University, Malacca, Malaysia. lim.zhengyou@mmu.edu.my

Shih Yin Ooi, Faculty of Information Science and Technology, Multimedia University, Malacca, Malaysia

Ooi Shih Yin

Faculty of Information Science and Technology, Multimedia University, Malacca, Malaysia. syooi@mmu.edu.my

How to Cite

Raja Sekaran, S., Pang, Y. H., Lim, Z. Y., & Ooi, S. Y. (2024). Feature-Level Fusion Multi-Sensor Aggregation Temporal Network for Smartphone-Based Human Activity Recognition. Journal of Telecommunications and the Digital Economy, 12(3), 29-50. https://doi.org/10.18080/jtde.v12n3.979