用于精确图像分割的特征细化金字塔视觉转换器

doi:10.3969/j.issn.2095-9400.2024.08.004

摘要
图/表
参考文献
相关文章 (1)

全文: PDF (2333 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要目的：准确提取用于形态评估和临床疾病监测的医学图像中的目标区域，改进将卷积神经网络（CNN）与转换器（Transformer）结合的混合网络用以学习图像局部信息和全局信息。方法：①通过引入基于CNN的解码器并将其与金字塔视觉转换器（PVT）整合，开发了一种新颖的特征细化分割网络称为特征细化金字塔视觉转换器（FR-PVT）。解码器用于细化PVT捕获的多尺度全局特征，由特征细化模块（FRM）和上下文注意模块（CAM）以及相似性聚合模块（SAM）共同构成。②为了验证FR-PVT，将其用于五个公共结肠镜图像数据集（ClinicDB、ColonDB、EndoScene、ETIS和KvasirSEG）的息肉分割和温州医科大学附属眼视光医院提供的眼部视频数据集的睑裂分割。③使用四种不同的指标评估FR-PVT的性能，包括Dice系数、IOU、Matthew系数（MCC）和Hausdorff距离（Hdf）。FR-PVT与现有网络[即息肉PVT（Polyp-PVT）、U-Net及其变体]在相同的分割任务上进行比较。结果：①FR-PVT能够处理各种成像条件下获取的结肠镜图像，并在分割ClinicDB、ColonDB、EndoScene、ETIS和KvasirSEG数据集时获得平均Dice分别为0.937、0.819、0.892、0.800 和0.909。②在眼部视频数据集中的图像上进行的实验结果显示，FR-PVT获得的平均Dice、IOU、MCC和Hdf分别为0.966、0.943、0.957和4.706。③在五个息肉数据集上的分割性能对比显示，FR-PVT分别获得了平均Dice系数和IOU分别为0.840和0.764，优于Polyp-PVT（0.834 和0.760）、U-Net（0.561和0.493）、UNet++（0.546和0.476）、SFA（0.476和0.367）、PraNet（0.741和0.675）。在眼部视频图像上的分割性能显示，FR-PVT分别获得了0.840的平均Dice系数和0.764的平均IOU。结论：FR-PVT实现了比Polyp-PVT和现有的几种基于CNN的网络（如U-Net及其变体）更好的分割性能。

	服务
	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	聂应旺
	王雷
	梅晨阳
	陈浩

关键词 ：图像分割, 深度学习, 卷积块, 金字塔视觉转换器, 结肠息肉, 睑裂

Abstract：Objective: To accurately extract target regions in medical images used for morphological assessment and clinical disease monitoring, a hybrid network combining Convolutional Neural Network (CNN) and Transformer was explored to simultaneously learn local and global information in images. Methods: ①A novel feature-refined segmentation network (referred to as FR-PVT) was developed by introducing a CNN-based decoder and integrating it with the pyramid vision transformer (PVT). The decoder was used to refine multiscale global features captured by the PVT, consisting of the feature refinement module (FRM), context attention module (CAM), and similarity aggregation module (SAM). ②To validate FR-PVT, it was used to segment polyps from five public colonoscopy image datasets (ClinicDB, ColonDB, EndoScene, ETIS, and KvasirSEG) and palpebral fissures from frame images in the eye videography dataset provided by the Eye Hospital of Wenzhou Medical University. ③The performance of FR-PVT was evaluated by four different metrics, including Dice coefficient, IOU, Matthews correlation coefficient (MCC), and Hausdorff distance (Hdf). The same segmentation tasks were compared between FR-PVT and the networks available (Polyp-PVT, U-Net, and its multiple variants).Results: ①The FR-PVT was able to handle colonoscopy images acquired under various imaging conditions and achieved average Dice coefficients of 0.937, 0.819, 0.892, 0.800, and 0.909, respectively, for the five different testing subsets from ClinicDB, ColonDB, EndoScene, ETIS, and KvasirSEG datasets. ②Experimental results on frame images from the eye videography dataset showed that the FR-PVT obtainedaverage Dice, IOU, MCC,and Hdf of 0.966, 0.943, 0.957, and 4.706, respectively. ③The segmentation performance on five polyp datasets showed that the FR-PVT obtained average Dice and IOU of 0.840 and 0.764, outperforming Polyp-PVT (0.834 and 0.760), U-Net (0.561 and 0.493), U-Net++ (0.546 and 0.476), SFA (0.476 and 0.367), PraNet (0.741 and 0.675). Performance differences on frame images from the eye videography dataset showed that the FR-PVT obtains average Dice and IOU of 0.840 and 0.764. Conclusion: The FR-PVT achieves better segmentation performance than Polyp-PVT and several CNN-based networks available (such as U-Net and its variants).

Key words： image segmentation deep learning convolutional block pyramid vision transformer colon polyps palpebral fissure

收稿日期: 2024-03-14

基金资助:国家自然科学基金项目（62006175）。

通讯作者: 陈浩，教授 E-mail: chenhao@mail.eye.ac.cn

作者简介: 聂应旺，硕士生，Email：yingwangnie@foxmail.com。

链接本文:

https://xb.wmu.edu.cn/CN/10.3969/j.issn.2095-9400.2024.08.004 或 https://xb.wmu.edu.cn/CN/Y2024/V54/I8/631