Patients’ safety is paramount in the healthcare industry, and reducing medication errors is essential for improvement. A promising solution to this problem involves the development of automated systems capable of assisting patients in verifying their pill intake mistakes. This paper investigates a Pill-Prescription matching task that seeks to associate pills in a multi-pill photo with their corresponding names in the prescription. We specifically aim to overcome the limitations of existing pill detection methods when faced with unseen pills, a situation characteristic of zero-shot learning. We propose a novel method named Zero-PIMA (Zero-shot Pill-Prescription Matching), designed to match pill images with prescription names effectively, even for pills not included in the training dataset. Zero-PIMA is an end-to-end model that includes an object localization module to determine and extract features of pill images and a graph convolutional network to capture the spatial relationship of the pills’ text in the prescription. After that, we leverage the contrastive learning paradigm to increase the distance between mismatched pill images and pill name pairs while minimizing the distance between matched pairs. In addition, to deal with the zero-shot pill detection problem, we leverage pills’ metadata retrieved from the DrugBank database to fine-tune a pre-trained text encoder, thereby incorporating visual information about pills (e.g., shape, color) into their names, making them more informative and ultimately enhancing the pill image-name matching accuracy. Extensive experiments are conducted on our collected real-world VAIPE-PP dataset of multi-pill photos and prescriptions. Through a series of comprehensive experiments, the proposed method outperforms other methods for both seen and unseen pills in terms of mean average precision. These results indicate that the proposed method could reduce medication errors and improve patients’ safety.
@article{Nguyen2024IEEEAccess,note={International-Journal},author={Nguyen, Trung Thanh and Nguyen, Phi Le and Kawanishi, Yasutomo and Komamizu, Takahiro and Ide, Ichiro},title={Zero-shot Pill-Prescription Matching with Graph Convolutional Network and Contrastive Learning},journal={IEEE Access},publisher={IEEE},year={2024},doi={10.1109/ACCESS.2024.3390153},}
2022
IEEE TNSM
Fuzzy Q-Learning-Based Opportunistic Communication for MEC-Enhanced Vehicular Crowdsensing
Trung Thanh Nguyen, Truong Thao Nguyen, Thanh-Hung Nguyen, and 1 more author
IEEE Transactions on Network and Service Management, 2022
This study focuses on MEC-enhanced, vehicle-based crowdsensing systems that rely on devices installed on automobiles. We investigate an opportunistic communication paradigm in which devices can transmit measured data directly to a crowdsensing server over a 4G communication channel or to nearby devices or so-called Road Side Units positioned along the road via Wi-Fi. We tackle a new problem that is how to reduce the cost of 4G while preserving the latency. We propose an offloading strategy that combines a reinforcement learning technique known as Q-learning with Fuzzy logic to accomplish the purpose. Q-learning assists devices in learning to decide the communication channel. Meanwhile, Fuzzy logic is used to optimize the reward function in Q-learning. The experiment results show that our offloading method significantly cuts down around 30-40% of the 4G communication cost while keeping the latency of 99% packets below the required threshold.
@article{9841517,note={International-Journal},author={Nguyen, Trung Thanh and Thao Nguyen, Truong and Nguyen, Thanh-Hung and Nguyen, Phi Le},journal={IEEE Transactions on Network and Service Management},title={Fuzzy Q-Learning-Based Opportunistic Communication for MEC-Enhanced Vehicular Crowdsensing},year={2022},volume={19},number={4},pages={5021-5033},doi={10.1109/TNSM.2022.3192397},}
international conference
2025
IEEE FG
MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion
Trung Thanh Nguyen, Yasutomo Kawanishi, Vijay John, and 2 more authors
In Proceedings of the 19th IEEE International Conference on Automatic Face and Gesture Recognition, 2025
@inproceedings{nguyen2025multisensor,note={International-Conference},title={MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion},author={Nguyen, Trung Thanh and Kawanishi, Yasutomo and John, Vijay and Komamizu, Takahiro and Ide, Ichiro},booktitle={Proceedings of the 19th IEEE International Conference on Automatic Face and Gesture Recognition},year={2025},primaryclass={cs.CV},}
IEEE/CVF WACV
CT to PET Translation: A Large-scale Dataset and Domain-Knowledge-Guided Diffusion Approach
Dac Thai Nguyen, Trung Thanh Nguyen, Huu Tien Nguyen, and 5 more authors
In Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, 2025
Positron Emission Tomography (PET) and Computed Tomography (CT) are essential for diagnosing, staging, and monitoring various diseases, particularly cancer. Despite their importance, the use of PET/CT systems is limited by the necessity for radioactive materials, the scarcity of PET scanners, and the high cost associated with PET imaging. In contrast, CT scanners are more widely available and significantly less expensive. In response to these challenges, our study addresses the issue of generating PET images from CT images, aiming to reduce both the medical examination cost and the associated health risks for patients. Our contributions are twofold: First, we introduce a conditional diffusion model named CPDM, which, to our knowledge, is one of the initial attempts to employ a diffusion model for translating from CT to PET images. Second, we provide the largest CT-PET dataset to date, comprising 2,028,628 paired CT-PET images, which facilitates the training and evaluation of CT-to-PET translation models. For the CPDM model, we incorporate domain knowledge to develop two conditional maps: the Attention map and the Attenuation map. The former helps the diffusion process focus on areas of interest, while the latter improves PET data correction and ensures accurate diagnostic information. Experimental evaluations across various benchmarks demonstrate that CPDM surpasses existing methods in generating high-quality PET images in terms of multiple metrics.
@inproceedings{Nguyen2025CTPET,note={International-Conference},author={Nguyen, Dac Thai and Nguyen, Trung Thanh and Nguyen, Huu Tien and Nguyen, Thanh Trung and Pham, Huy Hieu and Nguyen, Thanh Hung and Nguyen, Truong Thao and Nguyen, Phi Le},title={CT to PET Translation: A Large-scale Dataset and Domain-Knowledge-Guided Diffusion Approach},booktitle={Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision},year={2025},publisher={IEEE},}
2024
ACM MMAsia
Action Selection Learning for Multi-label Multi-view Action Recognition
Trung Thanh Nguyen, Yasutomo Kawanishi, Takahiro Komamizu, and 1 more author
In Proceedings of the 6th ACM International Conference on Multimedia in Asia, 2024
Multi-label multi-view action recognition aims to recognize multiple concurrent or sequential actions from untrimmed videos captured by multiple cameras. Existing work has focused on multi-view action recognition in a narrow area with strong labels available, where the onset and offset of each action are labeled at the frame-level. This study focuses on real-world scenarios where cameras are distributed to capture a wide-range area with only weak labels available at the video-level. We propose a method named Multi-view Action Selection Learning (MultiASL), which leverages action selection learning to enhance view fusion by selecting the most useful information from different viewpoints. The proposed method includes a Multi-view Spatial-Temporal Transformer video encoder to extract spatial and temporal features from multi-viewpoint videos. Action Selection Learning is employed at the frame-level, using pseudo ground-truth obtained from weak labels at the video-level, to identify the most relevant frames for action recognition. Experiments in a real-world office environment using the MM-Office dataset demonstrate the superior performance of the proposed method compared to existing methods.
@inproceedings{10.1145/3696409.3700211,note={International-Conference},author={Nguyen, Trung Thanh and Kawanishi, Yasutomo and Komamizu, Takahiro and Ide, Ichiro},title={Action Selection Learning for Multi-label Multi-view Action Recognition},year={2024},isbn={9798400712739},doi={10.1145/3696409.3700211},booktitle={Proceedings of the 6th ACM International Conference on Multimedia in Asia},}
IEEE NCA
FedMAC: Tackling Partial-Modality Missing in Federated Learning with Cross-Modal Aggregation and Contrastive Regularization
Manh Duong Nguyen, Trung Thanh Nguyen, Huy Hieu Pham, and 3 more authors
In Proceedings of the 22nd IEEE International Symposium on Network Computing and Applications, 2024
@inproceedings{Nguyen2024FedMAC,note={International-Conference},author={Nguyen, Manh Duong and Nguyen, Trung Thanh and Pham, Huy Hieu and Hoang, Trong Nghia and Nguyen, Phi Le and Huynh, Thanh Trung},title={FedMAC: Tackling Partial-Modality Missing in Federated Learning with Cross-Modal Aggregation and Contrastive Regularization},booktitle={Proceedings of the 22nd IEEE International Symposium on Network Computing and Applications},year={2024},publisher={IEEE},}
IEEE NCA
FedCert: Federated Accuracy Certification
Minh Hieu Nguyen, Huu Tien Nguyen, Trung Thanh Nguyen, and 4 more authors
In Proceedings of the 22nd IEEE International Symposium on Network Computing and Applications, 2024
@inproceedings{Nguyen2024FedCert,note={International-Conference},author={Nguyen, Minh Hieu and Nguyen, Huu Tien and Nguyen, Trung Thanh and Nguyen, Manh Duong and Hoang, Trong Nghia and Nguyen, Truong Thao and Nguyen, Phi Le},title={FedCert: Federated Accuracy Certification},booktitle={Proceedings of the 22nd IEEE International Symposium on Network Computing and Applications},year={2024},publisher={IEEE},}
IEEE FG
One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-Scale and Action Label Features
Trung Thanh Nguyen, Yasutomo Kawanishi, Takahiro Komamizu, and 1 more author
In Proceedings of the 18th IEEE International Conference on Automatic Face and Gesture Recognition, 2024
Open-vocabulary Temporal Action Detection (Open-vocab TAD) is an advanced video analysis approach that expands Closed-vocabulary Temporal Action Detection (Closed-vocab TAD) capabilities. Closed-vocab TAD is typically confined to localizing and classifying actions based on a predefined set of categories. In contrast, Open-vocab TAD goes further and is not limited to these predefined categories. This is particularly useful in real-world scenarios where the variety of actions in videos can be vast and not always predictable. The prevalent methods in Open-vocab TAD typically employ a 2-stage approach, which involves generating action proposals and then identifying those actions. However, errors made during the first stage can adversely affect the subsequent action identification accuracy. Additionally, existing studies face challenges in handling actions of different durations owing to the use of fixed temporal processing methods. Therefore, we propose a 1-stage approach consisting of two primary modules: Multi-scale Video Analysis (MVA) and Video-Text Alignment (VTA). The MVA module captures actions at varying temporal resolutions, overcoming the challenge of detecting actions with diverse durations. The VTA module leverages the synergy between visual and textual modalities to precisely align video segments with corresponding action labels, a critical step for accurate action identification in Open-vocab scenarios. Evaluations on widely recognized datasets THUMOS14 and ActivityNet-1.3, showed that the proposed method achieved superior results compared to the other methods in both Open-vocab and Closed-vocab settings. This serves as a strong demonstration of the effectiveness of the proposed method in the TAD task.
@inproceedings{10581896,note={International-Conference},author={Nguyen, Trung Thanh and Kawanishi, Yasutomo and Komamizu, Takahiro and Ide, Ichiro},booktitle={Proceedings of the 18th IEEE International Conference on Automatic Face and Gesture Recognition},title={One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-Scale and Action Label Features},year={2024},doi={10.1109/FG59268.2024.10581896},}
2022
PRICAI
A Novel Approach for Pill-Prescription Matching with GNN Assistance and Contrastive Learning
Trung Thanh Nguyen, Hoang Dang Nguyen, Thanh Hung Nguyen, and 3 more authors
In Proceedings of the 2022 Pacific Rim International Conference on Artificial Intelligence, 2022
@inproceedings{nguyen2022novel,note={International-Conference},title={A Novel Approach for Pill-Prescription Matching with GNN Assistance and Contrastive Learning},author={Nguyen, Trung Thanh and Nguyen, Hoang Dang and Nguyen, Thanh Hung and Pham, Huy Hieu and Ide, Ichiro and Nguyen, Phi Le},booktitle={Proceedings of the 2022 Pacific Rim International Conference on Artificial Intelligence},pages={261--274},year={2022},organization={Springer},doi={10.1007/978-3-031-20862-1_19},}
ICPP
FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning
Nang Hung Nguyen, Phi Le Nguyen, Thuy Dung Nguyen, and 5 more authors
In Proceedings of the 51st International Conference on Parallel Processing, 2022
@inproceedings{nguyen2022feddrl,note={International-Conference},title={FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning},author={Nguyen, Nang Hung and Nguyen, Phi Le and Nguyen, Thuy Dung and Nguyen, Trung Thanh and Nguyen, Duc Long and Nguyen, Thanh Hung and Pham, Huy Hieu and Truong, Thao Nguyen},booktitle={Proceedings of the 51st International Conference on Parallel Processing},year={2022},doi={10.1145/3545008.3545085},}
IEEE WCNC
Deep Reinforcement Learning-based Offloading for Latency Minimization in 3-tier V2X Networks
Hieu Dinh, Nang Hung Nguyen, Trung Thanh Nguyen, and 3 more authors
In Proceedings of the 2022 IEEE Wireless Communications and Networking Conference, 2022
@inproceedings{9771583,note={International-Conference},author={Dinh, Hieu and Nguyen, Nang Hung and Nguyen, Trung Thanh and Nguyen, Thanh Hung and Nguyen, Truong Thao and Le Nguyen, Phi},booktitle={Proceedings of the 2022 IEEE Wireless Communications and Networking Conference},title={Deep Reinforcement Learning-based Offloading for Latency Minimization in 3-tier V2X Networks},year={2022},pages={1803-1808},doi={10.1109/WCNC51071.2022.9771583}}
2021
KSE
Data Augmentation for Small Face Datasets and Face Verification by Generative Adversarial Networks Inversion
Dinh Tan Nguyen, Cao Truong Tran, Trung Thanh Nguyen, and 4 more authors
In Proceedings of the 13th International Conference on Knowledge and Systems Engineering, 2021
@inproceedings{9648720,note={International-Conference},author={Nguyen, Dinh Tan and Tran, Cao Truong and Nguyen, Trung Thanh and Hoang, Cao Bao and Luu, Van Phu and Nguyen, Ba Ngoc and Cheong, Pou Ian},booktitle={Proceedings of the 13th International Conference on Knowledge and Systems Engineering},title={Data Augmentation for Small Face Datasets and Face Verification by Generative Adversarial Networks Inversion},year={2021},pages={1-6},doi={10.1109/KSE53942.2021.9648720},}
IEEE IPCCC
Q-learning-based Opportunistic Communication for Real-time Mobile Air Quality Monitoring Systems
Trung Thanh Nguyen, Truong Thao Nguyen, Tuan Anh Nguyen Dinh, and 2 more authors
In Proceedings of the 2021 IEEE International Performance, Computing, and Communications Conference, 2021
We focus on real-time air quality monitoring systems that rely on devices installed on automobiles in this research. We investigate an opportunistic communication model in which devices can send the measured data directly to the air quality server through a 4G communication channel or via Wi-Fi to adjacent devices or the so-called Road Side Units deployed along the road. We aim to reduce 4G costs while assuring data latency, where the data latency is defined as the amount of time it takes for data to reach the server. We propose an offloading scheme that leverages Q-learning to accomplish the purpose. The experiment results show that our offloading method significantly cuts down around 40-50% of the 4G communication cost while keeping the latency of 99.5% packets smaller than the required threshold.
@inproceedings{9679398,note={International-Conference},author={Nguyen, Trung Thanh and Thao Nguyen, Truong and Nguyen Dinh, Tuan Anh and Nguyen, Thanh Hung and Nguyen, Phi Le},booktitle={Proceedings of the 2021 IEEE International Performance, Computing, and Communications Conference},title={Q-learning-based Opportunistic Communication for Real-time Mobile Air Quality Monitoring Systems},year={2021},doi={10.1109/IPCCC51483.2021.9679398},}
@inproceedings{Nguyen2025PRMU,note={Domestic-Conference},author={Nguyen, Trung Thanh and Kawanishi, Yasutomo and John, Vijay and Komamizu, Takahiro and Ide, Ichiro},title={広域多視点マルチモーダル行動認識のためのセンサ統合手法とMultiSensor-Homeデータセットの提案},booktitle={Proceedings of the 2025 Pattern Recognition and Media Understanding },year={2025},}
2024
PRMU - Japan
大域・局所特徴統合埋め込みに基づくオープン語彙時系列行動検出
Trung Thanh Nguyen, Yasutomo Kawanishi, Takahiro Komamizu, and 1 more author
In Proceedings of the 2024 Pattern Recognition and Media Understanding , 2024
@inproceedings{Nguyen2024PRMU,note={Domestic-Conference},author={Nguyen, Trung Thanh and Kawanishi, Yasutomo and Komamizu, Takahiro and Ide, Ichiro},title={大域・局所特徴統合埋め込みに基づくオープン語彙時系列行動検出},booktitle={Proceedings of the 2024 Pattern Recognition and Media Understanding },year={2024},}
2023
DBWS - Japan
医療分野におけるゼロショット錠剤‐処方箋対応付け
Trung Thanh Nguyen, Phi Le Nguyen, Yasutomo Kawanishi, and 2 more authors
In Proceedings of the 2023 Tokai-Kansai Database Workshop, 2023
To prevent medication intake errors, we proposed a method to match prescription images with photos of medication, associating the pills in the images with those listed in the prescription. The method was evaluated through participant voting and was awarded the Best Presentation Award.
@inproceedings{Nguyen2023DBWS,note={Domestic-Conference},author={Nguyen, Trung Thanh and Nguyen, Phi Le and Kawanishi, Yasutomo and Komamizu, Takahiro and Ide, Ichiro},title={医療分野におけるゼロショット錠剤‐処方箋対応付け},booktitle={Proceedings of the 2023 Tokai-Kansai Database Workshop},year={2023},}
MIRU - Japan
時間マルチスケール特徴と行動ラベル特徴によるオープンボキャブラリ行動区間認識
Trung Thanh Nguyen, Yasutomo Kawanishi, Takahiro Komamizu, and 1 more author
In Proceedings of the 26th Meeting on Image Recognition and Understanding, 2023
@inproceedings{Nguyen2023,note={Domestic-Conference},author={Nguyen, Trung Thanh and Kawanishi, Yasutomo and Komamizu, Takahiro and Ide, Ichiro},title={時間マルチスケール特徴と行動ラベル特徴によるオープンボキャブラリ行動区間認識},booktitle={Proceedings of the 26th Meeting on Image Recognition and Understanding},year={2023}}
IEICE - Japan
PiDP: 処方箋に対する GCN の対照学習に基づく錠剤検出
Trung Thanh Nguyen, Yasutomo Kawanishi, Takahiro Komamizu, and 1 more author
In Proceedings of the 2023 IEICE General Conference, 2023
@inproceedings{2023pidp,note={Domestic-Conference},title={PiDP: 処方箋に対する GCN の対照学習に基づく錠剤検出},author={Nguyen, Trung Thanh and Kawanishi, Yasutomo and Komamizu, Takahiro and Ide, Ichiro},booktitle={Proceedings of the 2023 IEICE General Conference},year={2023},organization={The Institute of Electronics, Information and Communication Engineers},}
2022
HUST
Tối ưu truyền tin trong mạng điện toán biên di động bằng tuận toán Fuzzy Q-learning
Trung Thanh Nguyen, Nang Hung Nguyen, Manh Cuong Dao, and 2 more authors
Scientific Research Student Conference, Hanoi University of Science and Technology, 2022
Ứng dụng công nghệ điện toán đám mây di động (MCC) và điện toán biên di động (MEC) để giảm tải tác vụ tính toán là hướng tiếp cận đầy hứa hẹn cho phép các ứng dụng yêu cầu độ trễ thấp có thể thực thi trên các phương tiện giao thông thông minh. Hầu hết các nghiên cứu về giảm tải hiện nay tập trung vào sử dụng một mô hình đơn nhất là MEC hoặc MCC. Trong báo cáo này, chúng tôi nghiên cứu việc sử dụng kết hợp các công nghệ trong một mạng giao thông thông minh ba tầng V2X, trong đó phương tiện giao thông có thể giảm tải các tác vụ tính toán bằng cả MEC và MCC. Cụ thể, chúng tôi xem xét các xác suất tối ưu để định tuyến truyền tải trên ba đường truyền tin bao gồm: V2I, V2N và I2N. Nghiên cứu của chúng tôi có các đóng góp chính: Thứ nhất, chúng tôi đề xuất mô hình toán học tính toán chính xác độ trễ xử lý của các tác vụ, và công thức tìm lời giải tối ưu để đạt được độ trễ tối thiểu. Thứ hai, chúng tôi đề xuất phương pháp tối ưu dựa trên giải thuật di truyền (GA). Thứ ba, nhận thấy yếu điểm của GA, chúng tôi đề xuất một giải thuật ứng dụng kỹ thuật học tăng cường: multi-agent multi-armed bandits (MAB) với cải tiến cơ chế khám phá dựa trên hàm Sigmoid (SEM) để tối thiểu hóa độ trễ. Kết quả thí nghiệm cho thấy rằng thuật toán GA chúng tôi đề xuất trong mô hình ba tầng có thể rút ngắn độ trễ lên đến 99,9% so với các mô hình hai tầng hiện có, cơ chế khám phá cải tiến SEM chúng tôi đề xuất có độ trễ các tác vụ trung bình giảm 35% so với MAB cơ bản, thí nghiệm cũng cho thấy thuật toán MAB cải tiến rút ngắn độ trễ với trung bình 18.5% và 56.9% trong trường hợp tốt nhất so với thuật toán GA.
@article{JSTIC2021,note={Domestic-Journal},title={Mô hình mạng giao thông thông minh ba tầng và giải pháp giảm thiểu độ trễ truyền tin ứng dụng phương pháp học tăng cường},author={Nguyen, Nang Hung and Nguyen, Trung Thanh and Pham, Minh Khiem and Dinh, Van Hieu and Nguyen Dinh, Tuan Anh and Nguyen, Thanh Hung and Nguyen, Phi Le},year={2021},journal={Journal of Science and Technology on Information and Communications},}
HUST
Mô hình mạng giao thông thông minh ba tầng và giải pháp giảm thiểu độ trễ truyền tin ứng dụng phương pháp học tăng cường
Minh Khiem Pham, Nang Hung Nguyen, Trung Thanh Nguyen, and 2 more authors
Scientific Research Student Conference, Hanoi University of Science and Technology, 2021
This thesis was awarded the Best Thesis Award by the School of Information and Communications Technology, Hanoi University of Science and Technology, Vietnam.
miscellaneous (article, arXiv, etc.)
2025
arXiv
MultiTSF: Transformer-based Sensor Fusion for Human-Centric Multi-view and Multi-modal Action Recognition
Trung Thanh Nguyen, Yasutomo Kawanishi, Vijay John, and 2 more authors