HomeInternational Journal of Multidisciplinary: Applied Business and Education Researchvol. 6 no. 11 (2025)

Cross-Attention Multimodal Transformer for Calibrated Binary Time-Series Forecasting of Rural Public Services

Daryl John C. Ragadio

Discipline: Education

 

Abstract:

Good governance, evidence-based planning, and sustainable rural de-velopment all rely on correct rural public service performance predic-tions. This study presents a Cross-Attention Multimodal Transformer developed for binary time-series classification of service conditions in the areas of agriculture, health, and environment at the level of the local government unit (LGU). Using bidirectional cross-attention lay-ers, the model mixes several temporal signals so that healthcare and agriculture-environment streams can interact with one another. Us-ing weighted uncertainty and calibration-awareness in a loss function helps to guarantee that the confidence scores are properly calibrated. With AUCs of 83.00% (agriculture), 79.40% (environment), and 63.90% (healthcare), which is lower, experimental results on a rural public service dataset indicate great discriminative and calibration performance. With 61.50%, 23.10%, and 18.90% respectively, the Brier scores suggest that the forecasts for health care and the envi-ronment are well calibrated. These findings suggest that cross-atten-tion multimodal transformers may be quite useful in producing pre-cise binary predictions of rural service results. At the LGU level, this would enable data-driven decision-making support.



References:

  1. Park, S. (2024). Multimodal Block Transformer for Multimodal Time Series Forecasting. In Annual Conference of KIPS (pp. 636-639). Korea Information Processing Society.
  2. Kim, K., Tsai, H., Sen, R., Das, A., Zhou, Z., Tanpure, A., ... & Yu, R. (2024). Multi-modal forecaster: Jointly predicting time series and textual data. arXiv preprint arXiv:2411.06735.
  3. Mou, S., Xue, Q., Chen, J., Takiguchi, T., & Ariki, Y. (2025). MM-iTransformer: A Multimodal Approach to Economic Time Series Forecasting with Textual Data. Applied Sciences, 15(3), 1241.
  4. Yuan, Y., Li, Z., & Zhao, B. (2025). A survey of multimodal learning: Methods, applications, and future. ACM Computing Surveys, 57(7), 1-34.
  5. Su, L., Zuo, X., Li, R., Wang, X., Zhao, H., & Huang, B. (2025). A systematic review for transformer-based long-term series forecasting. Artificial Intelligence Review, 58(3), 80.
  6. Jia, F., Wang, K., Zheng, Y., Cao, D., & Liu, Y. (2024, March). Gpt4mts: Prompt-based large language model for multimodal time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, No. 21, pp. 23343-23351).
  7. Zhao, F., Zhang, C., & Geng, B. (2024). Deep multimodal data fusion. ACM computing surveys, 56(9), 1-36.
  8. Al-Zoghby, A. M., Al-Awadly, E. M. K., Ebada, A. I., & Awad, W. A. (2025). Overview of Multimodal Machine Learning. ACM Transactions on Asian and Low-Resource Language Information Processing, 24(1), 1-20.
  9. Adam, M., Albaseer, A., Baroudi, U., & Abdallah, M. (2025). Survey of Multimodal Federated Learning: Exploring Data Integration, Challenges, and Future Directions. IEEE Open Journal of the Communications Society.
  10. Jing, T., Chen, S., Navarro-Alarcon, D., Chu, Y., & Li, M. (2024). SolarFusionNet: Enhanced Solar Irradiance Forecasting via Automated Multi-Modal Feature Selection and Cross-Modal Fusion. IEEE Transactions on Sustainable Energy.
  11. Kalisetty, S., & Lakkarasu, P. (2024). Deep Learning Frameworks for Multi-Modal Data Fusion in Retail Supply Chains: Enhancing Forecast Accuracy and Agility. American Journal of Analytics and Artificial Intelligence (ajaai) with ISSN 3067-283X, 2(1).
  12. Shao, M., Li, D., Hong, S., Qi, J., & Sun, H. (2024). IQFormer: A novel transformer-based model with multi-modality fusion for automatic modulation recognition. IEEE Transactions on Cognitive Communications and Networking.
  13. Thundiyil, S., Picone, J., & McKenzie, S. Transformer Architectures in Time Series Analysis: A Review.
  14. Abdullahi, S., Danyaro, K. U., Zakari, A., Aziz, I. A., Zawawi, N. A. W. A., & Adamu, S. (2025). Time-series large language models: A systematic review of state-of-the-art. IEEE Access.
  15. Siebra, C. A., Kurpicz-Briki, M., & Wac, K. (2024). Transformers in health: a systematic review on architectures for longitudinal data analysis. Artificial Intelligence Review, 57(2), 32.
  16. Cui, Y., Li, Z., Wang, Y., Dong, D., Gu, C., Lou, X., & Zhang, P. (2024). Informer model with season-aware block for efficient long-term power time series forecasting. Computers and Electrical Engineering, 119, 109492.
  17. Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2021, May). Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No. 12, pp. 11106-11115).
  18. Cui, W., Wan, C., & Song, Y. (2022). Ensemble deep learning-based non-crossing quantile regression for nonparametric probabilistic forecasting of wind power generation. IEEE Transactions on Power Systems, 38(4), 3163-3178.
  19. Jensen, V., Bianchi, F. M., & Anfinsen, S. N. (2022). Ensemble conformalized quantile regression for probabilistic time series forecasting. IEEE Transactions on Neural Networks and Learning Systems, 35(7), 9014-9025.
  20. Turki, A., Alshabrawy, O., & Woo, W. L. (2025). Multimodal Deep Learning for Stage Classification of Head and Neck Cancer Using Masked Autoencoders and Vision Transformers with Attention-Based Fusion. Cancers, 17(13), 2115.
  21. Xiao, W., Wang, Z., Gan, L., Zhao, S., Li, Z., Lei, R., ... & Wu, F. (2024). A comprehensive survey of direct preference optimization: Datasets, theories, variants, and applications. arXiv preprint arXiv:2410.15595.
  22. Emami Gohari, H., Dang, X. H., Shah, S. Y., & Zerfos, P. (2024, November). Modality-aware Transformer for Financial Time series Forecasting. In Proceedings of the 5th ACM International Conference on AI in Finance (pp. 677-685).
  23. Bouatmane, A., Daaif, A., Bousselham, A., Bouihi, B., & Bouattane, O. (2025). A Multimodal Deep Learning Model Integrating CNN and Transformer for Predicting Chemotherapy-Induced Cardiotoxicity. IEEE Access.
  24. Nikhil, U. V., Pandiyan, A. M., Raja, S. P., & Stamenkovic, Z. (2024). Machine learning-based crop yield prediction in south india: performance analysis of various models. Computers, 13(6), 137.
  25. Saravanan, K. S., & Bhagavathiappan, V. (2024). Prediction of crop yield in India using machine learning and hybrid deep learning models. Acta Geophysica, 72(6), 4613-4632.
  26. Chaturvedi, R. (2024, May). Temporal knowledge graph extraction and modeling across multiple documents for health risk prediction. In Companion Proceedings of the ACM Web Conference 2024 (pp. 1182-1185).
  27. Postiglione, M., Bean, D., Kraljevic, Z., Dobson, R. J., & Moscato, V. (2024). Predicting future disorders via temporal knowledge graphs and medical ontologies. IEEE Journal of Biomedical and Health Informatics, 28(7), 4238-4248.
  28. Guo, M. H., Xu, T. X., Liu, J. J., Liu, Z. N., Jiang, P. T., Mu, T. J., ... & Hu, S. M. (2022). Attention mechanisms in computer vision: A survey. Computational visual media, 8(3), 331-368.
  29. Ruan, T., & Zhang, S. (2024). Towards understanding how attention mechanism works in deep learning. arXiv preprint arXiv:2412.18288.
  30. Kotipalli, B. (2024). The Role of Attention Mechanisms in Enhancing Transparency and Interpretability of Neural Network Models in Explainable AI.
  31. Chefer, H., Gur, S., & Wolf, L. (2021). Transformer interpretability beyond attention visualization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 782-791).