ROCm简单入门 - 使用AMD显卡加速PyTorch

ROCm 在如今已经成为继 CUDA 之后,第二大 GPU 并行计算平台,就 PyTorch 而言,PyTorch 的 ROCm 版本在 Python 应用程序接口层面使用了相同的语义所以从现有的代码迁移到 ROCm 版本的 PyTorch 几乎不需要进行任何修改。尽管 ROCm 可能相比 CUDA 存在一定的性能损失,但 AMD GPU 以相对较低的硬件价格使得 AMD+ROCm 的搭配成为人工智能方面不二的性价比之选

本文暗雨冥将简单介绍如何在 AMD GPU 上使用 ROCm 加速 PyTorch,并补充部分官方教程中遗漏的部分细节,让我们开始吧~

硬件/系统配置

暗雨冥使用的是 AMD Radeon RX 7800 XT + AMD Ryzen R5 9600X + 32GB DDR5 的配置,该配置仅供参考,具体硬件需求请参考AMD 的官方文档

系统方面,AMD 官方支持 Ubuntu,Red Hat Enterprise Linux(RHEL),SUSE Linux Enterprise Server(SLES) 三大主流商业 Linux 发行版,与其颇有关系的发行版如 Linux Mint,Rocky Linux,OpenSUSE 等大概率也可以正常使用,但 AMD 官方看上去更希望用户使用 Ubuntu(不少文档只提供 Ubuntu 版本),暗雨冥因此在此为了避免潜在的问题也选择了基于 Ubuntu 22.04 LTS 的 Zorin OS 17.2(主要是长得好看ヾ(≧▽≦*)o)

*注:ROCm 暂不支持 Windows,如需在 Windows 平台上使用 ROCm,需借助 WSL2,这部分请直接参考AMD 官方文档

ROCm 安装

ROCm 的安装实际上非常简单,参考 AMD 的官方文档
在 Ubuntu 上,直接执行以下命令即可完成安装:

1
2
3
4
5
6
7
sudo apt update
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo usermod -a -G render,video $LOGNAME # 将当前用户添加至 render 和 video 组以便无需 root 权限即可访问 AMD GPU
wget https://repo.radeon.com/amdgpu-install/6.2.2/ubuntu/jammy/amdgpu-install_6.2.60202-1_all.deb #jammy 为 Ubuntu 22.04的代号,对于 Ubuntu 24.04 及其衍生版,请将 jammy 替换为 noble
sudo apt install ./amdgpu-install_6.2.60202-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms rocm

在此之后,还需做一些额外的配置

配置 ld

1
2
3
4
5
sudo tee --append /etc/ld.so.conf.d/rocm.conf <<EOF
/opt/rocm/lib
/opt/rocm/lib64
EOF
sudo ldconfig

将 ROCm 的可执行文件添加至 PATH

  • Plan A:使用 update-alternatives
    大多数 Linux 发行版都有 update-alternatives 工具。它有助于管理命令或程序的多个版本。有关 update-alternatives 的更多信息,请参阅 Linux man 文档。
    使用以下指令完成配置:

    1. 列出所有被支持的 ROCm 命令:
    1
    update-alternatives --list rocm
    1. 如果安装了多个 ROCm 版本,update-alternatives 会选择使用最新版本。如需指定想要使用的 ROCm 版本,请使用此命令:
    1
    update-alternatives --config rocm
  • Plan B:使用 environment-modules
    environment-modules 工具简化了 shell 初始化。它允许你使用模块文件修改会话环境。更多信息,请参阅 Environment Modules 文档
    使用以下指令完成配置:

    1. 列出可用的 ROCm 版本:
    1
    module avail

    2.如果安装了多个 ROCm 版本,使用以下命令选择所需的版本

    1
    module load rocm/<version>
  • Plan C:手动配置
    ROCm 模块文件位于 /opt/rocm-/lib/rocmmod 目录下,
    如果以上方法均无法满足需求,可手动将 ROCm 的可执行文件添加至 PATH
    如,在 .bashrc 中添加以下内容:

    1
    export PATH=$PATH:/opt/rocm-6.2.2/bin

验证内核驱动程序,ROCm,软件包 安装状态

1
2
3
4
dkms status
rocminfo
clinfo
apt list --installed #这一步可能会列出大量已安装的软件包

参考输出请转到文末处查看

重启以确保 ROCm 配置生效

1
reboot

PyTorch 安装

AMD 官方推荐使用 docker 镜像,以方便管理,可参考AMD 的官方文档
在这里由于暗雨冥懒得装 Docker,直接选择 pip 安装
这里可以直接按照 PyTorch 官网指引,执行以下命令安装

1
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2

如果存在网络问题,可以考虑使用下载工具下载对应的 whl 文件,再使用 pip 安装
如果不出意外,PyTorch 就成功安装啦~
我们可以简单验证一下,在 Python 中执行以下指令验证 PyTorch 是否成功安装:

1
2
3
import torch
x = torch.rand(5, 3)
print(x)

输出应该与下文类似:

1
2
3
4
5
tensor([[0.3380, 0.3845, 0.3217],
        [0.8337, 0.9050, 0.2650],
        [0.2979, 0.7141, 0.9069],
        [0.1449, 0.1132, 0.1375],
        [0.4675, 0.3947, 0.1426]])

在 Python 中执行以下指令验证 ROCm 是否正常工作:

1
2
import torch
torch.cuda.is_available()

如果得到了 True,那么恭喜你,至此大功告成
但如果很不幸,ROCm 不可用,可以继续往下看
执行以下指令,查看日志,并尝试找出可疑的输出,并善用搜索

1
2
export AMD_LOG_LEVEL=7
python -c "import torch;print(torch.cuda.is_available())"

值得一提的是,如果rocm-smi等工具无异常,有很大可能由于用户不在 render 组内,执行以下命令重新添加用户至 render 及 video 组

1
sudo usermod -a -G render,video $LOGNAME

完成后请重启系统

1
reboot

验证内核驱动程序,ROCm,软件包 安装状态命令行参考输出

1
2
3
# dkms status
amdgpu/6.8.5-2041575.22.04, 6.8.0-49-generic, x86_64: installed (original_module exists)
amdgpu/6.8.5-2041575.22.04, 6.8.0-52-generic, x86_64: installed (original_module exists)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
# rocminfo
ROCk module version 6.8.5 is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.14
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    AMD Ryzen 5 9600X 6-Core Processor
  Uuid:                    CPU-XX
  Marketing Name:          AMD Ryzen 5 9600X 6-Core Processor
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      49152(0xc000) KB
  Chip ID:                 0(0x0)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   5484
  BDFID:                   0
  Internal Node ID:        0
  Compute Unit:            12
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  WatchPts on Addr. Ranges:1
  Memory Properties:
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    31870192(0x1e64cf0) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    31870192(0x1e64cf0) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 3
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    31870192(0x1e64cf0) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*******
Agent 2
*******
  Name:                    gfx1100
  Uuid:                    GPU-3fbe3742bc309e9e
  Marketing Name:          AMD Radeon RX 7800 XT
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      32(0x20) KB
    L2:                      4096(0x1000) KB
    L3:                      65536(0x10000) KB
  Chip ID:                 29822(0x747e)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2169
  BDFID:                   768
  Internal Node ID:        1
  Compute Unit:            60
  SIMDs per CU:            2
  Shader Engines:          3
  Shader Arrs. per Eng.:   2
  WatchPts on Addr. Ranges:4
  Coherent Host Access:    FALSE
  Memory Properties:
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          32(0x20)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        32(0x20)
  Max Work-item Per CU:    1024(0x400)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Packet Processor uCode:: 232
  SDMA engine uCode::      22
  IOMMU Support::          None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    16760832(0xffc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:2048KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    16760832(0xffc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:2048KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 3
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Recommended Granule:0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx1100
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*******
Agent 3
*******
  Name:                    gfx1100
  Uuid:                    GPU-XX
  Marketing Name:          AMD Radeon Graphics
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    2
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      256(0x100) KB
  Chip ID:                 5056(0x13c0)
  ASIC Revision:           1(0x1)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2200
  BDFID:                   5376
  Internal Node ID:        2
  Compute Unit:            2
  SIMDs per CU:            2
  Shader Engines:          1
  Shader Arrs. per Eng.:   1
  WatchPts on Addr. Ranges:4
  Coherent Host Access:    FALSE
  Memory Properties:       APU
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          32(0x20)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        32(0x20)
  Max Work-item Per CU:    1024(0x400)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Packet Processor uCode:: 21
  SDMA engine uCode::      9
  IOMMU Support::          None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    15935096(0xf32678) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:2048KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    15935096(0xf32678) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:2048KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 3
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Recommended Granule:0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx1100
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
# clinfo
Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.1 AMD-APP (3625.0)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 2
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 AMD Radeon RX 7800 XT
  Device Topology:				 PCI[ B#3, D#0, F#0 ]
  Max compute units:				 30
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 256
  Preferred vector width char:			 4
  Preferred vector width short:			 2
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 1
  Native vector width char:			 4
  Native vector width short:			 2
  Native vector width int:			 1
  Native vector width long:			 1
  Native vector width float:			 1
  Native vector width double:			 1
  Max clock frequency:				 2169Mhz
  Address bits:					 64
  Max memory allocation:			 14588628168
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 8
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 16384
  Max image 3D height:				 16384
  Max image 3D depth:				 8192
  Max samplers within kernel:			 16
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 1024
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 Yes
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 32768
  Global memory size:				 17163091968
  Constant buffer size:				 14588628168
  Max number of constant args:			 8
  Local memory type:				 Local
  Local memory size:				 65536
  Max pipe arguments:				 16
  Max pipe active reservations:			 16
  Max pipe packet size:				 1703726280
  Max global variable size:			 14588628168
  Max global variable preferred total size:	 17163091968
  Max read/write image args:			 64
  Max on device events:				 1024
  Queue on device max size:			 8388608
  Max on device queues:				 1
  Queue on device preferred size:		 262144
  SVM capabilities:
    Coarse grain buffer:			 Yes
    Fine grain buffer:				 Yes
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 32
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue on Host properties:
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:
    Out-of-Order:				 Yes
    Profiling :					 Yes
  Platform ID:					 0x7e6eab7f0ff0
  Name:						 gfx1101
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 2.0
  Driver version:				 3625.0 (HSA1.1,LC)
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 2.0
  Extensions:					 cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program


  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 AMD Radeon Graphics
  Device Topology:				 PCI[ B#21, D#0, F#0 ]
  Max compute units:				 1
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 256
  Preferred vector width char:			 4
  Preferred vector width short:			 2
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 1
  Native vector width char:			 4
  Native vector width short:			 2
  Native vector width int:			 1
  Native vector width long:			 1
  Native vector width float:			 1
  Native vector width double:			 1
  Max clock frequency:				 2200Mhz
  Address bits:					 64
  Max memory allocation:			 13869907552
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 8
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 16384
  Max image 3D height:				 16384
  Max image 3D depth:				 8192
  Max samplers within kernel:			 16
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 1024
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 Yes
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 16384
  Global memory size:				 16317538304
  Constant buffer size:				 13869907552
  Max number of constant args:			 8
  Local memory type:				 Local
  Local memory size:				 65536
  Max pipe arguments:				 16
  Max pipe active reservations:			 16
  Max pipe packet size:				 985005664
  Max global variable size:			 13869907552
  Max global variable preferred total size:	 16317538304
  Max read/write image args:			 64
  Max on device events:				 1024
  Queue on device max size:			 8388608
  Max on device queues:				 1
  Queue on device preferred size:		 262144
  SVM capabilities:
    Coarse grain buffer:			 Yes
    Fine grain buffer:				 Yes
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 32
  Error correction support:			 0
  Unified memory for Host and Device:		 1
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue on Host properties:
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:
    Out-of-Order:				 Yes
    Profiling :					 Yes
  Platform ID:					 0xxxxxxxxxxxxx
  Name:						 gfx1036
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 2.0
  Driver version:				 3625.0 (HSA1.1,LC)
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 2.0
  Extensions:					 cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# apt list --installed
正在列表...
...
amd-smi-lib/jammy,now 24.6.3.60202-116~22.04 amd64 [已安装,自动]
amd64-microcode/jammy-updates,jammy-security,now 3.20191218.1ubuntu2.3 amd64 [已安装,自动]
amdgpu-core/jammy,jammy,now 1:6.2.60202-2041575.22.04 all [已安装,自动]
amdgpu-dkms-firmware/jammy,jammy,now 1:6.8.5.60202-2041575.22.04 all [已安装,自动]
amdgpu-dkms/jammy,jammy,now 1:6.8.5.60202-2041575.22.04 all [已安装]
amdgpu-install/jammy,jammy,now 6.2.60202-2041575.22.04 all [已安装]
amdgpu-lib/jammy,now 1:6.2.60202-2041575.22.04 amd64 [已安装,自动]
...
rocm-cmake/jammy,now 0.13.0.60202-116~22.04 amd64 [已安装]
rocm-core-asan/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocm-core/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocm-dbgapi/jammy,now 0.76.0.60202-116~22.04 amd64 [已安装]
rocm-debug-agent/jammy,now 2.0.3.60202-116~22.04 amd64 [已安装]
rocm-dev/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocm-developer-tools/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocm-device-libs/jammy,now 1.0.0.60202-116~22.04 amd64 [已安装]
rocm-gdb/jammy,now 14.2.60202-116~22.04 amd64 [已安装]
rocm-hip-libraries/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocm-hip-runtime-dev/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocm-hip-runtime/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocm-hip-sdk/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocm-language-runtime/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocm-llvm/jammy,now 18.0.0.24355.60202-116~22.04 amd64 [已安装]
rocm-ml-libraries/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocm-ml-sdk/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocm-opencl-dev/jammy,now 2.0.0.60202-116~22.04 amd64 [已安装]
rocm-opencl-icd-loader/jammy,now 1.2.60202-116~22.04 amd64 [已安装]
rocm-opencl-runtime/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocm-opencl-sdk/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocm-opencl/jammy,now 2.0.0.60202-116~22.04 amd64 [已安装]
rocm-openmp-sdk/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocm-smi-lib/jammy,now 7.3.0.60202-116~22.04 amd64 [已安装]
rocm-utils/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocm/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
rocminfo/jammy,now 1.0.0.60202-116~22.04 amd64 [已安装]
...
hip-dev/jammy,now 6.2.41134.60202-116~22.04 amd64 [已安装,自动]
hip-doc/jammy,now 6.2.41134.60202-116~22.04 amd64 [已安装,自动]
hip-runtime-amd/jammy,now 6.2.41134.60202-116~22.04 amd64 [已安装,自动]
hip-samples/jammy,now 6.2.41134.60202-116~22.04 amd64 [已安装,自动]
hipblas-dev/jammy,now 2.2.0.60202-116~22.04 amd64 [已安装,自动]
hipblas/jammy,now 2.2.0.60202-116~22.04 amd64 [已安装,自动]
hipblaslt-dev/jammy,now 0.8.0.60202-116~22.04 amd64 [已安装,自动]
hipblaslt/jammy,now 0.8.0.60202-116~22.04 amd64 [已安装,自动]
hipcc/jammy,now 1.1.1.60202-116~22.04 amd64 [已安装,自动]
hipcub-dev/jammy,now 3.2.0.60202-116~22.04 amd64 [已安装,自动]
hipfft-dev/jammy,now 1.0.15.60202-116~22.04 amd64 [已安装,自动]
hipfft/jammy,now 1.0.15.60202-116~22.04 amd64 [已安装,自动]
hipfort-dev/jammy,now 0.4.0.60202-116~22.04 amd64 [已安装,自动]
hipify-clang/jammy,now 18.0.0.60202-116~22.04 amd64 [已安装,自动]
hiprand-dev/jammy,now 2.11.0.60202-116~22.04 amd64 [已安装,自动]
hiprand/jammy,now 2.11.0.60202-116~22.04 amd64 [已安装,自动]
hipsolver-dev/jammy,now 2.2.0.60202-116~22.04 amd64 [已安装,自动]
hipsolver/jammy,now 2.2.0.60202-116~22.04 amd64 [已安装,自动]
hipsparse-dev/jammy,now 3.1.1.60202-116~22.04 amd64 [已安装,自动]
hipsparse/jammy,now 3.1.1.60202-116~22.04 amd64 [已安装,自动]
hipsparselt-dev/jammy,now 0.2.1.60202-116~22.04 amd64 [已安装,自动]
hipsparselt/jammy,now 0.2.1.60202-116~22.04 amd64 [已安装,自动]
hiptensor-dev/jammy,now 1.3.0.60202-116~22.04 amd64 [已安装,自动]
hiptensor/jammy,now 1.3.0.60202-116~22.04 amd64 [已安装,自动]
...
感谢看到这里噢~ 希望这能给你带来帮助,如果觉得在任何地方有疑问,欢迎联系暗雨冥(如需转载,请注明文章出处噢)!
或者如果大人愿意也可以...
上一篇
下一篇