-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsearch.xml
352 lines (167 loc) · 185 KB
/
search.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title>注意事项</title>
<link href="/2023/11/11/%E6%B3%A8%E6%84%8F%E4%BA%8B%E9%A1%B9/"/>
<url>/2023/11/11/%E6%B3%A8%E6%84%8F%E4%BA%8B%E9%A1%B9/</url>
<content type="html"><![CDATA[<h1 id="python相关">python相关</h1><h2 id="代理">代理</h2><p>如下操作可以让程序走代理</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> os</span><br><span class="line"></span><br><span class="line">os.environ[<span class="string">"http_proxy"</span>] = <span class="string">"http://127.0.0.1:7890"</span></span><br><span class="line">os.environ[<span class="string">"https_proxy"</span>] = <span class="string">"http://127.0.0.1:7890"</span></span><br><span class="line">``` </span><br><span class="line"></span><br><span class="line"><span class="comment">## 防止使用过多cpu的核导致电脑死机 </span></span><br><span class="line">如果你想让 Numpy 最多只使用 <span class="number">4</span> 个线程,你可以这样设置 </span><br><span class="line">请注意,这个设置必须在你导入 Numpy 之前进行。如果你在导入 Numpy 之后再设置这个环境变量,那么这个设置将不会生效。 </span><br><span class="line"></span><br><span class="line">```python </span><br><span class="line"><span class="keyword">import</span> os</span><br><span class="line">os.environ[<span class="string">"OMP_NUM_THREADS"</span>] = <span class="string">"8"</span></span><br></pre></td></tr></table></figure><h2 id="pip下载过慢">pip下载过慢</h2><h3 id="单次设置">单次设置</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch</span><br><span class="line">或者</span><br><span class="line">pip install torch -i https://pypi.tuna.tsinghua.edu.cn/simple</span><br><span class="line">``` </span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_">#</span><span class="language-bash"><span class="comment">## 全局设置</span></span> </span><br><span class="line"></span><br><span class="line">```shell </span><br><span class="line">pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple</span><br><span class="line">pip config set install.trusted-host mirrors.aliyun.com</span><br></pre></td></tr></table></figure><h2 id="apex安装">apex安装</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">pip install -r requirements.txt</span><br><span class="line">python setup.py install</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">git clone https://github.com/ptrblck/apex.git</span><br><span class="line">cd apex</span><br><span class="line">git checkout apex_no_distributed</span><br><span class="line">pip install -v --no-cache-dir ./</span><br></pre></td></tr></table></figure><h2 id="环境相关">环境相关</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">conda create -n swin python=3.10</span><br><span class="line">conda remove -n swin --all</span><br></pre></td></tr></table></figure><p>在一些项目中,如果有setup.py执行 <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">pip install -e .</span><br></pre></td></tr></table></figure> 可以将其装入需要的环境</p><h3 id="requirements.txt">requirements.txt</h3><h4 id="安装依赖项">安装依赖项</h4><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">pip install -r requirements.txt</span><br></pre></td></tr></table></figure><h4 id="生成requirements.txt">生成requirements.txt</h4><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">pip freeze > requirements.txt</span><br></pre></td></tr></table></figure><h2 id="导包">导包</h2><p>python import模块时,是在sys.path里按顺序查找的。 sys.path是一个列表,里面以字符串的形式存储了许多路径。 使用A.py文件中的函数需要先将他的文件路径放到sys.path中。 如果出问题了可以重启一下jupyter的kernel</p><h2 id="tensorflow安装">tensorflow安装</h2><p>tensorflow(((有事可以先看官方,先看官方,先看官方!!!tf2.10之后不再支持windows了!!</p><blockquote><p>安装东西学会看官方文档真的很重要</p></blockquote><h1 id="操作系统">操作系统</h1><ul><li>注意可执行文件放到这个文件夹下即可全局运行:/usr/local/bin</li><li>os.getcwd()查看工作目录,os.chdir()更改工作目录</li><li>注意如果在/sss开头和直接sss开头表示不同的路径,前一个方法表示在根目录下找到sss的文件</li><li>如果ubuntu卡死了,可以通过下面的键组合来对系统进行强制重启 <figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">Alt+B+Fn+PrtSc</span><br></pre></td></tr></table></figure></li><li>如果显卡驱动出了问题,那么用下面这个命令一般都能解决问题 <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">dpkg --configure -a</span><br></pre></td></tr></table></figure></li></ul><h2 id="在22.04安装libssl1.1">在22.04安装libssl1.1</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">echo "deb http://security.ubuntu.com/ubuntu focal-security main" | sudo tee /etc/apt/sources.list.d/focal-security.list</span><br><span class="line"> </span><br><span class="line">sudo apt-get update</span><br><span class="line">sudo apt-get install libssl1.1</span><br></pre></td></tr></table></figure><blockquote><p>注意注意注意一定要检查空格,以后输入路径都在pycharm中copy相对路径好了</p></blockquote><h2 id="文件传输">文件传输</h2><p>大文件一定要压缩了再上传!!!!!否则不仅上传慢且文件占用硬盘体积还大</p><h2 id="权限管理">权限管理</h2><p>如果你在尝试向某个目录粘贴文件时收到了 "Permission do not allow pasting files in this directory" 的错误消息,那么可能是你没有足够的权限来写入该目录。</p><p>这个问题通常会在你尝试向属于 root 或其他用户的目录粘贴文件时出现。</p><p>有两种主要的解决方法: 1. 更改目录权限:你可以更改目录的权限以允许你的用户帐号进行写入。在终端中,使用 chmod 命令来修改权限。例如,如果你想向 /path/to/directory/ 目录写入文件,可以运行以下命令: <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo chmod 777 /path/to/directory/</span><br></pre></td></tr></table></figure> 这将允许所有用户读取、写入和执行该目录中的文件。请注意,这可能会带来安全风险,特别是如果你在一个多用户系统上这样做。你也可以使用更严格的权限设置,比如 755(允许所有用户读取和pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch 或者 pip install torch -i https://pypi.tuna.tsinghua.edu.cn/simple 执行,但只有所有者可以写入)。</p><ol start="2" type="1"><li>以 root 用户身份粘贴:你也可以使用 root 权限来粘贴文件。首先,打开一个以 root 权限运行的文件管理器实例,然后在这个窗口中粘贴文件。例如,你可以在终端中运行以下命令来打开以 root 权限运行的 Nautilus: <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo nautilus</span><br></pre></td></tr></table></figure> 或者 <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">pkexec nautilus</span><br><span class="line">```git clone https://github.com/ptrblck/apex.git</span><br><span class="line">cd apex</span><br><span class="line">git checkout apex_no_distributed</span><br><span class="line">pip install -v --no-cache-dir ./</span><br><span class="line"></span><br><span class="line">zip命令的基本语法如下:</span><br><span class="line"></span><br><span class="line">```bash</span><br><span class="line">zip [选项] [文件名] [文件列表]</span><br></pre></td></tr></table></figure></li></ol><p>选项说明: - r:递归压缩子目录中的文件。 - v:显示压缩的详细信息。 - q:不显示压缩的详细信息。 - u:更新已经存在的文件。 - m:将压缩的文件移动到指定目录。</p><p>例如,将/home目录下的所有文件压缩成一个zip文件:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">zip -r home.zip /home</span><br></pre></td></tr></table></figure><p>解释: - -r:递归压缩子目录中的文件。 - home.zip:要压缩的文件名。 - /home:要压缩的目录。<br />## 蓝牙问题<br /><a href="https://zhuanlan.zhihu.com/p/563070545">ubuntu22.04蓝牙问题解决</a></p><h2 id="一些快捷键">一些快捷键</h2><ul><li><code>Ctrl+Shift+F</code>:中文简体繁体转换</li></ul>]]></content>
<tags>
<tag> Linux </tag>
<tag> Python </tag>
<tag> Network </tag>
<tag> Deep Learning </tag>
</tags>
</entry>
<entry>
<title>Pytorch Model Save</title>
<link href="/2023/10/28/Pytorch_model_save/"/>
<url>/2023/10/28/Pytorch_model_save/</url>
<content type="html"><![CDATA[<h1 id="state_dict">state_dict</h1><h2 id="state_dict简介">state_dict简介</h2><p>state_dict是Python的字典对象,可用于保存模型参数、超参数以及优化器(torch.optim)的状态信息。需要注意的是,只有具有可学习参数的层(如卷积层、线性层等)才有state_dict</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> torch</span><br><span class="line"><span class="keyword">import</span> torch.nn <span class="keyword">as</span> nn</span><br><span class="line"><span class="keyword">import</span> torch.nn.functional <span class="keyword">as</span> F</span><br><span class="line"><span class="keyword">import</span> torch.optim <span class="keyword">as</span> optim</span><br><span class="line"> </span><br><span class="line"><span class="comment"># 定义模型</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">MyModel</span>(nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="built_in">super</span>(MyModel, self).__init__()</span><br><span class="line"> self.conv1 = nn.Conv2d(<span class="number">2</span>, <span class="number">3</span>, <span class="number">3</span>)</span><br><span class="line"> self.pool = nn.MaxPool2d(<span class="number">2</span>, <span class="number">2</span>)</span><br><span class="line"> self.conv2 = nn.Conv2d(<span class="number">3</span>, <span class="number">4</span>, <span class="number">3</span>)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x</span>):</span><br><span class="line"> x = self.pool(F.relu(self.conv1(x)))</span><br><span class="line"> x = self.pool(F.relu(self.conv2(x)))</span><br><span class="line"> <span class="keyword">return</span> x</span><br><span class="line"> </span><br><span class="line"><span class="comment"># 初始化模型</span></span><br><span class="line">model = MyModel()</span><br><span class="line"><span class="comment"># 初始化优化器</span></span><br><span class="line">optimizer = optim.SGD(model.parameters(), lr=<span class="number">0.001</span>, momentum=<span class="number">0.9</span>)</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 打印模型的状态字典</span></span><br><span class="line"><span class="built_in">print</span>(<span class="string">"Model's state_dict:"</span>)</span><br><span class="line"><span class="keyword">for</span> param_tensor <span class="keyword">in</span> model.state_dict():</span><br><span class="line"> <span class="built_in">print</span>(param_tensor, <span class="string">"\t"</span>, model.state_dict()[param_tensor].size())</span><br></pre></td></tr></table></figure><pre><code>Model's state_dict:conv1.weight torch.Size([3, 2, 3, 3])conv1.bias torch.Size([3])conv2.weight torch.Size([4, 3, 3, 3])conv2.bias torch.Size([4])</code></pre><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 打印优化器的状态字典</span></span><br><span class="line"><span class="built_in">print</span>(<span class="string">"Optimizer's state_dict:"</span>)</span><br><span class="line"><span class="keyword">for</span> var_name <span class="keyword">in</span> optimizer.state_dict():</span><br><span class="line"> <span class="built_in">print</span>(var_name, <span class="string">"\t"</span>, optimizer.state_dict()[var_name])</span><br></pre></td></tr></table></figure><pre><code>Optimizer's state_dict:state {}param_groups [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'maximize': False, 'foreach': None, 'differentiable': False, 'params': [0, 1, 2, 3]}]</code></pre><h2 id="state_dict保存与加载">state_dict保存与加载</h2><p>可以通过torch.save()来保存模型的state_dict,即只保存学习到的模型参数,并通过load_state_dict()来加载并恢复模型参数。PyTorch中最常见的模型保存扩展名为’.pt’或’.pth’。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">PATH = <span class="string">'./test_state_dict.pth'</span></span><br><span class="line">torch.save(model.state_dict(), PATH)</span><br><span class="line"> </span><br><span class="line">model = MyModel() <span class="comment"># 首先通过代码获取模型结构</span></span><br><span class="line">model.load_state_dict(torch.load(PATH)) <span class="comment"># 然后加载模型的state_dict</span></span><br></pre></td></tr></table></figure><pre><code><All keys matched successfully></code></pre><blockquote><p>注意:load_state_dict()函数只接受字典对象,不可直接传入模型路径,所以需要先使用torch.load()反序列化已保存的state_dict。<br />注意:nn.DataParallel会自动在模型的参数名称前添加"module.",以表示这些参数属于模块。这是因为在多GPU训练时,模型会被复制到不同的GPU上,每个GPU上都有一份模型的副本,因此需要明确标识哪些参数属于哪个模块。这就有可能会导致模型无法加载入参数</p></blockquote><h2 id="保存和加载完整模型">保存和加载完整模型</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 保存完整模型</span></span><br><span class="line">PATH = <span class="string">'./test_total_model.pt'</span></span><br><span class="line">torch.save(model, PATH)</span><br><span class="line"> </span><br><span class="line"><span class="comment"># 加载完整模型</span></span><br><span class="line">model = torch.load(PATH)</span><br></pre></td></tr></table></figure><p>这种方式虽然代码看起来较state_dict方式要简洁,但是灵活性会差一些。因为torch.save()函数使用Python的pickle模块进行序列化,但pickle无法保存模型本身,而是保存包含类的文件路径,该文件会在模型加载时使用。所以当在其他项目对模型进行重构之后,就可能会出现意想不到的错误。</p><blockquote><p>注意:用load_state_dict()函数的时候,模型state_dict与pt文件的state_dict的key必须要完全一样才会加载进去,所以有时候我们需要对pt文件state_dict的值进行相应的调整</p></blockquote><h1 id="ordereddict">OrderedDict</h1><p>如果我们打印一下state_dict的数据类型,我们会得到如下的输出:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">type</span>(model.state_dict())</span><br></pre></td></tr></table></figure><pre><code>collections.OrderedDict</code></pre><p>collections 模块实现了特定目标的容器,以提供Python标准内建容器 dict , list , set , 和 tuple 的替代选择。<br />collections 模块中的 OrderedDict 是一个有序字典,它与普通的 dict 不同之处在于它会记住键值对的插入顺序。 一些与dict 的不同:<br />1. 常规的 dict 被设计为非常擅长映射操作。跟踪插入顺序是次要的;<br />2. OrderedDict 旨在擅长重新排序操作。空间效率、迭代速度和更新操作的性能是次要的;<br />3. 算法上, OrderedDict 可以比 dict 更好地处理频繁的重新排序操作。这使其适用于跟踪最近的访问(例如在 LRU cache 中;</p><h2 id="保存部分模型参数并在新的模型中加载">保存部分模型参数,并在新的模型中加载</h2><p>如果我们只想保存conv1的训练完成的参数,我们可以这样操作:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">save_state = {}</span><br><span class="line"><span class="built_in">print</span>(<span class="string">"Model's state_dict:"</span>)</span><br><span class="line"><span class="keyword">for</span> param_tensor <span class="keyword">in</span> model.state_dict():</span><br><span class="line"> <span class="comment"># 找到layer_name中有conv1的地方保存</span></span><br><span class="line"> <span class="keyword">if</span> <span class="string">'conv1'</span> <span class="keyword">in</span> param_tensor:</span><br><span class="line"> save_state.update({param_tensor:torch.ones((model.state_dict()[param_tensor].size()))})</span><br><span class="line"> <span class="built_in">print</span>(param_tensor, <span class="string">"\t"</span>, model.state_dict()[param_tensor].size())</span><br></pre></td></tr></table></figure><pre><code>Model's state_dict:conv1.weight torch.Size([3, 2, 3, 3])conv1.bias torch.Size([3])</code></pre><p>这里为了方便后续的演示,我们关键的一句话是这样的写的:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="string">"""</span></span><br><span class="line"><span class="string">dict对象的update()方法用于将一个字典的键值对添加到另一个字典中或更新一个字典中已存在的键的值。</span></span><br><span class="line"><span class="string">这个方法接受一个字典作为参数,并将这个字典的键值对合并到调用该方法的字典中。如果有重复的键,新值将覆盖旧值。</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line">save_state.update({param_tensor:torch.ones((model.state_dict()[param_tensor].size()))}) <span class="comment"># conv1就是将参数全置为1</span></span><br></pre></td></tr></table></figure><p>但是在实际保存的时候,我们应该这样写:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">save_state.update({param_tensor:model.state_dict()[param_tensor]})</span><br></pre></td></tr></table></figure><p>然后加载新的模型,并将保存的参数赋给新的模型:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">model = MyModel() <span class="comment"># 首先通过代码获取模型结构</span></span><br><span class="line">model.load_state_dict(save_state, strict=<span class="literal">False</span>) <span class="comment"># 不严格加载可以忽略缺失的值</span></span><br></pre></td></tr></table></figure><pre><code>_IncompatibleKeys(missing_keys=['conv2.weight', 'conv2.bias'], unexpected_keys=[])</code></pre><p>这里为热启动模式,通过在load_state_dict()函数中将strict参数设置为False来忽略非匹配键的参数。<br />我们再查看一下新的模型的参数:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> n, p <span class="keyword">in</span> model.named_parameters(): <span class="comment"># 我们上面将kernelsize设置成3,inchannel和outchannel设置成了2,3</span></span><br><span class="line"> <span class="keyword">if</span> <span class="string">'conv1'</span> <span class="keyword">in</span> n:</span><br><span class="line"> <span class="built_in">print</span>(p)</span><br></pre></td></tr></table></figure><pre><code>Parameter containing:tensor([[[[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]], [[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]]], [[[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]], [[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]]], [[[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]], [[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]]]], requires_grad=True)Parameter containing:tensor([1., 1., 1.], requires_grad=True)</code></pre><p>再看一下模型中的其他参数:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">model.state_dict()[<span class="string">'conv2.bias'</span>] <span class="comment"># 可以看到没有变化</span></span><br></pre></td></tr></table></figure><pre><code>tensor([-0.1684, -0.1891, -0.1000, -0.0915])</code></pre><h1 id="state_dictnamed_parametersmodel.parameternamed_modules-的区别">state_dict()、named_parameters()、model.parameter()、named_modules() 的区别</h1><h2 id="model.state_dict">model.state_dict()</h2><p>state_dict()是将 layer_name 与 layer_param 以键的形式存储为 dict 。输出的值不包括 require_grad 。在固定某层时不能采用 model.state_dict() 来获取参数设置 require_grad 属性。</p><h2 id="model.named_parameters">model.named_parameters()</h2><p>named_parameters()是将 layer_name 与 layer_param 以打包成一个元组然后再存到 list 当中。 只保存可学习、可被更新的参数。model.named_parameters() 所存储的模型参数 tensor 的 require_grad 属性都是默认为True。常用于固定某层的参数是否被训练,通常是通过 model.named_parameters() 来获取参数设置 require_grad 属性。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> torch</span><br><span class="line"><span class="keyword">import</span> torch.nn <span class="keyword">as</span> nn</span><br><span class="line"><span class="keyword">import</span> torch.optim <span class="keyword">as</span> optim</span><br><span class="line"> </span><br><span class="line"><span class="comment"># 定义模型</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">TheModelClass</span>(nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="built_in">super</span>(TheModelClass, self).__init__()</span><br><span class="line"> self.conv1 = nn.Conv2d(<span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>)</span><br><span class="line"> self.bn = nn.BatchNorm2d(num_features=<span class="number">2</span>)</span><br><span class="line"> self.act = nn.ReLU()</span><br><span class="line"> self.pool = nn.MaxPool2d(<span class="number">2</span>, <span class="number">2</span>)</span><br><span class="line"> self.fc1 = nn.Linear(<span class="number">8</span>, <span class="number">4</span>)</span><br><span class="line"> self.softmax = nn.Softmax(dim=-<span class="number">1</span>)</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x</span>):</span><br><span class="line"> x = self.conv1(x)</span><br><span class="line"> x = self.bn(x)</span><br><span class="line"> x = self.act(x)</span><br><span class="line"> x = self.pool(x)</span><br><span class="line"> x = x.view(-<span class="number">1</span>, <span class="number">8</span>)</span><br><span class="line"> x = self.fc1(x)</span><br><span class="line"> x = self.softmax(x)</span><br><span class="line"> <span class="keyword">return</span> x</span><br><span class="line"> </span><br><span class="line"><span class="comment"># 初始化模型</span></span><br><span class="line">model = TheModelClass()</span><br><span class="line"> </span><br><span class="line"><span class="comment"># 初始化优化器</span></span><br><span class="line">optimizer = optim.SGD(model.parameters(), lr=<span class="number">0.001</span>, momentum=<span class="number">0.9</span>)</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> layer_name, layer_param <span class="keyword">in</span> model.named_parameters():</span><br><span class="line"> <span class="built_in">print</span>(layer_name, <span class="string">"\n"</span>, layer_param)</span><br></pre></td></tr></table></figure><pre><code>conv1.weight Parameter containing:tensor([[[[-0.1057, 0.0993, 0.0687], [-0.2232, -0.0912, 0.0515], [-0.0758, -0.3306, 0.2587]]], [[[ 0.0357, 0.1247, 0.1937], [-0.1526, 0.2632, -0.1273], [-0.2205, 0.2305, -0.0460]]]], requires_grad=True)conv1.bias Parameter containing:tensor([ 0.1786, -0.0278], requires_grad=True)bn.weight Parameter containing:tensor([1., 1.], requires_grad=True)bn.bias Parameter containing:tensor([0., 0.], requires_grad=True)fc1.weight Parameter containing:tensor([[-0.2401, -0.2905, 0.1927, 0.0429, 0.3074, 0.2779, -0.1469, 0.3467], [-0.1879, 0.0466, 0.1249, -0.2596, -0.1794, 0.3399, -0.1363, 0.0404], [ 0.3511, -0.0246, -0.1646, 0.2226, -0.1836, -0.1917, -0.3264, 0.2842], [-0.3279, 0.1204, -0.0360, 0.1502, -0.3035, -0.3031, 0.1071, 0.0294]], requires_grad=True)fc1.bias Parameter containing:tensor([ 0.2755, 0.0264, -0.2187, -0.0648], requires_grad=True)</code></pre><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">print</span>(<span class="string">f"Unfrozen_Parameters:<span class="subst">{<span class="built_in">sum</span>([p.numel() <span class="keyword">for</span> p <span class="keyword">in</span> model.parameters() <span class="keyword">if</span> p.requires_grad==<span class="literal">True</span>])}</span>"</span>)</span><br></pre></td></tr></table></figure><pre><code>Unfrozen_Parameters:60</code></pre><p>我们通过named_parameters来冻结卷积层的参数</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> n, p <span class="keyword">in</span> model.named_parameters():</span><br><span class="line"> <span class="keyword">if</span> <span class="string">'conv'</span> <span class="keyword">in</span> n:</span><br><span class="line"> <span class="built_in">print</span>(n)</span><br><span class="line"> p.requires_grad=<span class="literal">False</span></span><br><span class="line"><span class="built_in">print</span>(<span class="string">f"Unfrozen_Parameters:<span class="subst">{<span class="built_in">sum</span>([p.numel() <span class="keyword">for</span> p <span class="keyword">in</span> model.parameters() <span class="keyword">if</span> p.requires_grad==<span class="literal">True</span>])}</span>"</span>)</span><br></pre></td></tr></table></figure><pre><code>conv1.weightconv1.biasUnfrozen_Parameters:40</code></pre><p>我们可以看到没有冻结的参数量减少了</p><blockquote><p>注: 此刻被冻结的参数在进行反向传播时依旧进行求导,只是参数没有更新。我们也可以采用在优化器中不传入冻结参数且同时冻结参数的办法使资源减少消耗。</p></blockquote><h2 id="model.parameter">model.parameter()</h2><p>parameter()返回的只是参数,不包括 layer_name 。返回结果包含 require_grad,如果没有修改则默认为 Ture。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> torch</span><br><span class="line"><span class="keyword">import</span> torch.nn <span class="keyword">as</span> nn</span><br><span class="line"><span class="keyword">import</span> torch.optim <span class="keyword">as</span> optim</span><br><span class="line"> </span><br><span class="line"><span class="comment"># 定义模型</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">TheModelClass</span>(nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="built_in">super</span>(TheModelClass, self).__init__()</span><br><span class="line"> self.conv1 = nn.Conv2d(<span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>)</span><br><span class="line"> self.bn = nn.BatchNorm2d(num_features=<span class="number">2</span>)</span><br><span class="line"> self.act = nn.ReLU()</span><br><span class="line"> self.pool = nn.MaxPool2d(<span class="number">2</span>, <span class="number">2</span>)</span><br><span class="line"> self.fc1 = nn.Linear(<span class="number">8</span>, <span class="number">4</span>)</span><br><span class="line"> self.softmax = nn.Softmax(dim=-<span class="number">1</span>)</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x</span>):</span><br><span class="line"> x = self.conv1(x)</span><br><span class="line"> x = self.bn(x)</span><br><span class="line"> x = self.act(x)</span><br><span class="line"> x = self.pool(x)</span><br><span class="line"> x = x.view(-<span class="number">1</span>, <span class="number">8</span>)</span><br><span class="line"> x = self.fc1(x)</span><br><span class="line"> x = self.softmax(x)</span><br><span class="line"> <span class="keyword">return</span> x</span><br><span class="line"> </span><br><span class="line"><span class="comment"># 初始化模型</span></span><br><span class="line">model = TheModelClass()</span><br><span class="line"> </span><br><span class="line"><span class="comment"># 初始化优化器</span></span><br><span class="line">optimizer = optim.SGD(model.parameters(), lr=<span class="number">0.001</span>, momentum=<span class="number">0.9</span>)</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> layer_param <span class="keyword">in</span> model.parameters():</span><br><span class="line"> <span class="built_in">print</span>(layer_param)</span><br></pre></td></tr></table></figure><pre><code>Parameter containing:tensor([[[[ 0.2674, 0.0372, -0.0216], [ 0.2953, 0.0751, 0.0727], [-0.2018, -0.1860, -0.3286]]], [[[-0.3124, 0.3087, -0.0310], [ 0.1500, -0.2332, 0.0454], [-0.1980, -0.2644, -0.1480]]]], requires_grad=True)Parameter containing:tensor([0.2166, 0.1575], requires_grad=True)Parameter containing:tensor([1., 1.], requires_grad=True)Parameter containing:tensor([0., 0.], requires_grad=True)Parameter containing:tensor([[-0.3116, 0.1185, -0.1725, 0.1541, -0.1431, -0.0970, -0.0413, -0.0950], [ 0.0391, 0.2460, 0.1450, 0.1834, 0.3273, 0.2182, 0.3023, 0.0531], [ 0.0548, 0.3145, -0.3225, -0.1229, -0.0785, 0.1050, 0.2095, 0.1562], [-0.1588, -0.3408, -0.0009, 0.2991, -0.2878, 0.2652, 0.0463, 0.2734]], requires_grad=True)Parameter containing:tensor([-0.1301, 0.1449, -0.1081, -0.2999], requires_grad=True)</code></pre><h2 id="model.named_modules">model.named_modules()</h2><p>返回每一层模型的名字和结构</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> torch</span><br><span class="line"><span class="keyword">import</span> torch.nn <span class="keyword">as</span> nn</span><br><span class="line"><span class="keyword">import</span> torch.optim <span class="keyword">as</span> optim</span><br><span class="line"> </span><br><span class="line"><span class="comment"># 定义模型</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">TheModelClass</span>(nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="built_in">super</span>(TheModelClass, self).__init__()</span><br><span class="line"> self.conv1 = nn.Conv2d(<span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>)</span><br><span class="line"> self.bn = nn.BatchNorm2d(num_features=<span class="number">2</span>)</span><br><span class="line"> self.act = nn.ReLU()</span><br><span class="line"> self.pool = nn.MaxPool2d(<span class="number">2</span>, <span class="number">2</span>)</span><br><span class="line"> self.fc1 = nn.Linear(<span class="number">8</span>, <span class="number">4</span>)</span><br><span class="line"> self.softmax = nn.Softmax(dim=-<span class="number">1</span>)</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x</span>):</span><br><span class="line"> x = self.conv1(x)</span><br><span class="line"> x = self.bn(x)</span><br><span class="line"> x = self.act(x)</span><br><span class="line"> x = self.pool(x)</span><br><span class="line"> x = x.view(-<span class="number">1</span>, <span class="number">8</span>)</span><br><span class="line"> x = self.fc1(x)</span><br><span class="line"> x = self.softmax(x)</span><br><span class="line"> <span class="keyword">return</span> x</span><br><span class="line"> </span><br><span class="line"><span class="comment"># 初始化模型</span></span><br><span class="line">model = TheModelClass()</span><br><span class="line"> </span><br><span class="line"><span class="comment"># 初始化优化器</span></span><br><span class="line">optimizer = optim.SGD(model.parameters(), lr=<span class="number">0.001</span>, momentum=<span class="number">0.9</span>)</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> name, module <span class="keyword">in</span> model.named_modules():</span><br><span class="line"> <span class="built_in">print</span>(name,<span class="string">'\n'</span>, module)</span><br></pre></td></tr></table></figure><pre><code> TheModelClass( (conv1): Conv2d(1, 2, kernel_size=(3, 3), stride=(1, 1)) (bn): BatchNorm2d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): ReLU() (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (fc1): Linear(in_features=8, out_features=4, bias=True) (softmax): Softmax(dim=-1))conv1 Conv2d(1, 2, kernel_size=(3, 3), stride=(1, 1))bn BatchNorm2d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)act ReLU()pool MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)fc1 Linear(in_features=8, out_features=4, bias=True)softmax Softmax(dim=-1)</code></pre>]]></content>
<tags>
<tag> Python </tag>
<tag> Pytorch </tag>
</tags>
</entry>
<entry>
<title>matplotlib</title>
<link href="/2023/10/08/matplotlib/"/>
<url>/2023/10/08/matplotlib/</url>
<content type="html"><![CDATA[<h1 id="解决python里matplotlib不显示中文的问题">解决Python里matplotlib不显示中文的问题</h1><p>不显示中文只有一个原因就是他没有这个字体,虽然电脑里有这个字体但是不代表matplotlib里也有这个字体,所以解决matplotlib中的中文显示问题主要就是要找到它所内置支持的字体,那么我们首先查看一下它的内置字体,运行以下代码查看所支持的字体。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 查询当前系统所有字体</span></span><br><span class="line"><span class="keyword">from</span> matplotlib.font_manager <span class="keyword">import</span> FontManager</span><br><span class="line"><span class="keyword">import</span> subprocess</span><br><span class="line"></span><br><span class="line">mpl_fonts = <span class="built_in">set</span>(f.name <span class="keyword">for</span> f <span class="keyword">in</span> FontManager().ttflist)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">'all font list get from matplotlib.font_manager:'</span>)</span><br><span class="line"><span class="keyword">for</span> f <span class="keyword">in</span> <span class="built_in">sorted</span>(mpl_fonts):</span><br><span class="line"> <span class="built_in">print</span>(<span class="string">'\t'</span> + f)</span><br></pre></td></tr></table></figure><p>但是你会发现这个都是英文字体啊,中文字体在哪里,其实我当时也非常困扰,但是细心的我发现了其中的奥秘,>>>其实他是有中文的只不过是用拼音写的....<<<</p><p>其中你会发现有如下字体:<br />- DengXian<br />- FangSong<br />- KaiTi<br />- LiSu<br />- YouYuan<br />- Adobe Fan Heiti Std<br />- Adobe Fangsong Std<br />- Adobe Heiti Std<br />- Adobe Kaiti Std</p><p>其实这些都是中文</p><p>最后通过<code>matplotlib.rc</code>来更换字体,具体代码如下</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">matplotlib.rc(<span class="string">"font"</span>,family=<span class="string">'MicroSoft YaHei'</span>,weight=<span class="string">"bold"</span>)</span><br></pre></td></tr></table></figure>]]></content>
<tags>
<tag> Python </tag>
<tag> Grammar </tag>
</tags>
</entry>
<entry>
<title>用PyTorch和BERT进行文本分类</title>
<link href="/2023/09/23/%E7%94%A8PyTorch%E5%92%8CBERT%E8%BF%9B%E8%A1%8C%E6%96%87%E6%9C%AC%E5%88%86%E7%B1%BB/"/>
<url>/2023/09/23/%E7%94%A8PyTorch%E5%92%8CBERT%E8%BF%9B%E8%A1%8C%E6%96%87%E6%9C%AC%E5%88%86%E7%B1%BB/</url>
<content type="html"><![CDATA[<h1 id="代码">代码</h1><h2 id="预处理数据">预处理数据</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> os</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> pandas <span class="keyword">as</span> pd</span><br><span class="line"><span class="keyword">import</span> torch.utils.data</span><br><span class="line"><span class="keyword">import</span> torch</span><br><span class="line"></span><br><span class="line">os.environ[<span class="string">"http_proxy"</span>] = <span class="string">"http://127.0.0.1:7890"</span></span><br><span class="line">os.environ[<span class="string">"https_proxy"</span>] = <span class="string">"http://127.0.0.1:7890"</span></span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> transformers <span class="keyword">import</span> BertTokenizer</span><br><span class="line"></span><br><span class="line">tokenizer = BertTokenizer.from_pretrained(<span class="string">'bert-base-cased'</span>)</span><br><span class="line">example_text = <span class="string">'I will watch Memento tonight'</span></span><br><span class="line">bert_input = tokenizer(example_text, padding=<span class="string">'max_length'</span>,</span><br><span class="line"> max_length=<span class="number">10</span>,</span><br><span class="line"> truncation=<span class="literal">True</span>,</span><br><span class="line"> return_tensors=<span class="string">'pt'</span>)</span><br><span class="line">bert_input</span><br></pre></td></tr></table></figure><pre><code>{'input_ids': tensor([[ 101, 146, 1209, 2824, 2508, 26173, 3568, 102, 0, 0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0]])}</code></pre><p>面是对上面BertTokenizer参数的解释:</p><ul><li>padding:将每个sequence填充到指定的最大长度。<br /></li><li>max_length: 每个sequence的最大长度。本示例中我们使用 10,但对于本文实际数据集,我们将使用 512,这是 BERT 允许的sequence 的最大长度。<br /></li><li>truncation:如果为True,则每个序列中超过最大长度的标记将被截断。<br /></li><li>return_tensors:将返回的张量类型。由于我们使用的是 Pytorch,所以我们使用pt;如果你使用 Tensorflow,那么你需要使用tf。</li></ul><p>从上面的变量中看到的输出bert_input,是用于稍后的 BERT 模型。但是这些输出是什么意思?</p><p>1.第一行是 input_ids,它是每个 token 的 id 表示。实际上可以将这些输入 id 解码为实际的 token,如下所示:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">example_text = tokenizer.decode(bert_input.input_ids[<span class="number">0</span>])</span><br><span class="line"><span class="built_in">print</span>(example_text)</span><br></pre></td></tr></table></figure><pre><code>[CLS] I will watch Memento tonight [SEP] [PAD] [PAD]</code></pre><p>由上述结果所示,BertTokenizer负责输入文本的所有必要转换,为 BERT 模型的输入做好准备。它会自动添加 [CLS]、[SEP] 和 [PAD] token。由于我们指定最大长度为 10,所以最后只有两个 [PAD] token。</p><p>2.第二行是 token_type_ids,它是一个 binary mask,用于标识 token 属于哪个 sequence。如果我们只有一个 sequence,那么所有的 token 类型 id 都将为 0。对于文本分类任务,token_type_ids是 BERT 模型的可选输入参数。<br />例如: tokens:[CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP] token_type_ids:0 0 0 0 0 0 0 0 1 1 1 1 1 1</p><p>3.第三行是 attention_mask,它是一个 binary mask,用于标识 token 是真实 word 还是只是由填充得到。如果 token 包含 [CLS]、[SEP] 或任何真实单词,则 mask 将为 1。如果 token 只是 [PAD] 填充,则 mask 将为 0。</p><p>注意到,我们使用了一个预训练BertTokenizer的bert-base-cased模型。如果数据集中的文本是英文的,这个预训练的分词器就可以很好地工作。</p><p>如果有来自不同语言的数据集,可能需要使用bert-base-multilingual-cased。具体来说,如果你的数据集是德语、荷兰语、中文、日语或芬兰语,则可能需要使用专门针对这些语言进行预训练的分词器。可以在此处查看相应的预训练标记器的名称。特别地,如果数据集中的文本是中文的,需要使用bert-base-chinese 模型,以及其相应的BertTokenizer等。</p><h2 id="数据集类">数据集类</h2><p>现在我们知道从BertTokenizer中获得什么样的输出,接下来为新闻数据集构建一个Dataset类,该类将作为一个类来将新闻数据转换成模型需要的数据格式。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"><span class="keyword">from</span> transformers <span class="keyword">import</span> BertTokenizer</span><br><span class="line"><span class="keyword">from</span> torch.utils.data <span class="keyword">import</span> Dataset</span><br><span class="line"></span><br><span class="line">tokenizer = BertTokenizer.from_pretrained(<span class="string">'bert-base-cased'</span>)</span><br><span class="line"></span><br><span class="line">labels = {</span><br><span class="line"> <span class="string">'business'</span>: <span class="number">0</span>,</span><br><span class="line"> <span class="string">'entertainment'</span>: <span class="number">1</span>,</span><br><span class="line"> <span class="string">'sport'</span>: <span class="number">2</span>,</span><br><span class="line"> <span class="string">'tech'</span>: <span class="number">3</span>,</span><br><span class="line"> <span class="string">'politics'</span>: <span class="number">4</span>,</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">Dataset</span>(<span class="title class_ inherited__">Dataset</span>):</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self, df</span>):</span><br><span class="line"> self.labels = [labels[label] <span class="keyword">for</span> label <span class="keyword">in</span> df[<span class="string">'Category'</span>]]</span><br><span class="line"> self.texts = [tokenizer(text,</span><br><span class="line"> padding=<span class="string">'max_length'</span>,</span><br><span class="line"> max_length=<span class="number">512</span>,</span><br><span class="line"> truncation=<span class="literal">True</span>,</span><br><span class="line"> return_tensors=<span class="string">'pt'</span>)</span><br><span class="line"> <span class="keyword">for</span> text <span class="keyword">in</span> df[<span class="string">'Text'</span>]]</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">classes</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="keyword">return</span> self.labels</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__len__</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="keyword">return</span> <span class="built_in">len</span>(self.labels)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">get_batch_labels</span>(<span class="params">self, idx</span>):</span><br><span class="line"> <span class="keyword">return</span> np.array(self.labels[idx])</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">get_batch_texts</span>(<span class="params">self, idx</span>):</span><br><span class="line"> <span class="keyword">return</span> self.texts[idx]</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__getitem__</span>(<span class="params">self, idx</span>):</span><br><span class="line"> batch_texts = self.get_batch_texts(idx)</span><br><span class="line"> batch_y = self.get_batch_labels(idx)</span><br><span class="line"> <span class="keyword">return</span> batch_texts, batch_y</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 下载数据的地址:https://www.kaggle.com/competitions/learn-ai-bbc/data</span></span><br><span class="line">df = pd.read_csv(<span class="string">r"./BBC_newsData/BBC News Train.csv"</span>)</span><br><span class="line">np.random.seed(<span class="number">112</span>)</span><br><span class="line"><span class="string">"""</span></span><br><span class="line"><span class="string">简单介绍一下这里的操作,df.sample后面的frac就是取出的信息的比例,np.split就是按照后面列表的两个数字,将整个集合分成三部分——注意,在第一次见这些函数的时候可以先通过源代码中的注释对函数进行一个了解</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line">df_train, df_val, df_test = np.split(df.sample(frac=<span class="number">1</span>, random_state=<span class="number">42</span>), [<span class="built_in">int</span>(<span class="number">.8</span> * <span class="built_in">len</span>(df)), <span class="built_in">int</span>(<span class="number">.9</span> * <span class="built_in">len</span>(df))])</span><br><span class="line"><span class="comment"># df_test = pd.read_csv(r"./BBC_newsData/BBC News Test.csv")</span></span><br><span class="line"><span class="built_in">len</span>(df_train), <span class="built_in">len</span>(df_val), <span class="built_in">len</span>(df_test)</span><br></pre></td></tr></table></figure><pre><code>(1192, 149, 149)</code></pre><h2 id="构建模型">构建模型</h2><p>至此,我们已经成功构建了一个 Dataset 类来生成模型输入数据。现在使用具有 12 层 Transformer 编码器的预训练 BERT 基础模型构建实际模型。</p><p>如果数据集中的文本是中文的,需要使用bert-base-chinese 模型。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> torch <span class="keyword">import</span> nn</span><br><span class="line"><span class="keyword">from</span> transformers <span class="keyword">import</span> BertModel</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">BertClassifier</span>(nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self, dropout=<span class="number">0.5</span></span>):</span><br><span class="line"> <span class="built_in">super</span>(BertClassifier, self).__init__()</span><br><span class="line"> self.bert = BertModel.from_pretrained(<span class="string">'bert-base-cased'</span>)</span><br><span class="line"> self.dropout = nn.Dropout(dropout)</span><br><span class="line"> self.linear = nn.Linear(<span class="number">768</span>, <span class="number">5</span>)</span><br><span class="line"> self.relu = nn.ReLU()</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, input_id, mask</span>):</span><br><span class="line"> _, pooled_output = self.bert(input_ids=input_id, attention_mask=mask, return_dict=<span class="literal">False</span>)</span><br><span class="line"> dropout_output = self.dropout(pooled_output)</span><br><span class="line"> linear_output = self.linear(dropout_output)</span><br><span class="line"> final_layer = self.relu(linear_output)</span><br><span class="line"> <span class="keyword">return</span> final_layer</span><br></pre></td></tr></table></figure><p>从上面的代码可以看出,BERT 模型输出了两个变量:</p><p>在上面的代码中命名的第一个变量_包含sequence中所有 token 的 Embedding 向量层。<br />命名的第二个变量pooled_output包含 [CLS] token 的 Embedding 向量。对于文本分类任务,使用这个 Embedding 作为分类器的输入就足够了。<br />然后将pooled_output变量传递到具有 ReLU 激活函数的线性层。在线性层中输出一个维度大小为 5 的向量,每个向量对应于标签类别(运动、商业、政治、 娱乐和科技)。</p><h2 id="训练模型">训练模型</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> torch.optim <span class="keyword">import</span> Adam</span><br><span class="line"><span class="keyword">from</span> tqdm <span class="keyword">import</span> tqdm</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">train</span>(<span class="params">model, train_data, val_data, learning_rate, epochs</span>):</span><br><span class="line"> train_data, val_data = Dataset(train_data), Dataset(val_data)</span><br><span class="line"> train_dataloader = torch.utils.data.DataLoader(train_data, batch_size=<span class="number">8</span>, shuffle=<span class="literal">True</span>)</span><br><span class="line"> val_dataloader = torch.utils.data.DataLoader(val_data, batch_size=<span class="number">8</span>)</span><br><span class="line"> use_cuda = torch.cuda.is_available()</span><br><span class="line"> <span class="comment">#判断是否使用GPU</span></span><br><span class="line"> device = torch.device(<span class="string">"cuda"</span> <span class="keyword">if</span> use_cuda <span class="keyword">else</span> <span class="string">"cpu"</span>)</span><br><span class="line"> criterion = nn.CrossEntropyLoss()</span><br><span class="line"> optimizer = Adam(model.parameters(), lr=learning_rate)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> use_cuda:</span><br><span class="line"> model.cuda()</span><br><span class="line"> criterion.cuda()</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> epoch_num <span class="keyword">in</span> <span class="built_in">range</span>(epochs):</span><br><span class="line"> total_acc_train = <span class="number">0</span></span><br><span class="line"> total_loss_train = <span class="number">0</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> train_input, train_label <span class="keyword">in</span> tqdm(train_dataloader):</span><br><span class="line"> train_label = train_label.to(device)</span><br><span class="line"> mask = train_input[<span class="string">'attention_mask'</span>].to(device)</span><br><span class="line"> input_id = train_input[<span class="string">'input_ids'</span>].squeeze(<span class="number">1</span>).to(device)</span><br><span class="line"></span><br><span class="line"> output = model(input_id, mask)</span><br><span class="line"></span><br><span class="line"> batch_loss = criterion(output, train_label)</span><br><span class="line"> total_loss_train += batch_loss.item()</span><br><span class="line"></span><br><span class="line"> acc = (output.argmax(dim=<span class="number">1</span>) == train_label).<span class="built_in">sum</span>().item()</span><br><span class="line"> total_acc_train += acc</span><br><span class="line"></span><br><span class="line"> <span class="comment"># model.zero_gard()</span></span><br><span class="line"> batch_loss.backward()</span><br><span class="line"> optimizer.step()</span><br><span class="line"></span><br><span class="line"> total_acc_val = <span class="number">0</span></span><br><span class="line"> total_loss_val = <span class="number">0</span></span><br><span class="line"> <span class="comment"># 不需要计算梯度</span></span><br><span class="line"> <span class="keyword">with</span> torch.no_grad():</span><br><span class="line"> <span class="keyword">for</span> val_input, val_label <span class="keyword">in</span> val_dataloader:</span><br><span class="line"> val_label = val_label.to(device)</span><br><span class="line"> mask = val_input[<span class="string">'attention_mask'</span>].to(device)</span><br><span class="line"> input_id = val_input[<span class="string">'input_ids'</span>].squeeze(<span class="number">1</span>).to(device)</span><br><span class="line"></span><br><span class="line"> output = model(input_id, mask)</span><br><span class="line"></span><br><span class="line"> batch_loss = criterion(output, val_label)</span><br><span class="line"> total_loss_val += batch_loss.item()</span><br><span class="line"></span><br><span class="line"> acc = (output.argmax(dim=<span class="number">1</span>) == val_label).<span class="built_in">sum</span>().item()</span><br><span class="line"> total_acc_val += acc</span><br><span class="line"> <span class="built_in">print</span>(</span><br><span class="line"> <span class="string">f'''Epochs: <span class="subst">{epoch_num + <span class="number">1</span>}</span> </span></span><br><span class="line"><span class="string"> | Train Loss: <span class="subst">{total_loss_train / <span class="built_in">len</span>(train_data): <span class="number">.3</span>f}</span> </span></span><br><span class="line"><span class="string"> | Train Accuracy: <span class="subst">{total_acc_train / <span class="built_in">len</span>(train_data): <span class="number">.3</span>f}</span> </span></span><br><span class="line"><span class="string"> | Val Loss: <span class="subst">{total_loss_val / <span class="built_in">len</span>(val_data): <span class="number">.3</span>f}</span> </span></span><br><span class="line"><span class="string"> | Val Accuracy: <span class="subst">{total_acc_val / <span class="built_in">len</span>(val_data): <span class="number">.3</span>f}</span>'''</span>)</span><br><span class="line"></span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">model = BertClassifier()</span><br><span class="line"></span><br><span class="line">train(model, df_train, df_val, learning_rate=<span class="number">1e-6</span>, epochs=<span class="number">10</span>)</span><br></pre></td></tr></table></figure><pre><code>Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias']- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).100%|██████████| 168/168 [00:17<00:00, 9.70it/s]Epochs: 1 | Train Loss: 0.198 | Train Accuracy: 0.277 | Val Loss: 0.177 | Val Accuracy: 0.470100%|██████████| 168/168 [00:17<00:00, 9.55it/s]Epochs: 2 | Train Loss: 0.146 | Train Accuracy: 0.618 | Val Loss: 0.109 | Val Accuracy: 0.772100%|██████████| 168/168 [00:17<00:00, 9.67it/s]Epochs: 3 | Train Loss: 0.062 | Train Accuracy: 0.925 | Val Loss: 0.053 | Val Accuracy: 0.913100%|██████████| 168/168 [00:17<00:00, 9.71it/s]Epochs: 4 | Train Loss: 0.023 | Train Accuracy: 0.976 | Val Loss: 0.018 | Val Accuracy: 0.987100%|██████████| 168/168 [00:17<00:00, 9.71it/s]Epochs: 5 | Train Loss: 0.012 | Train Accuracy: 0.989 | Val Loss: 0.015 | Val Accuracy: 0.973100%|██████████| 168/168 [00:17<00:00, 9.71it/s]Epochs: 6 | Train Loss: 0.009 | Train Accuracy: 0.990 | Val Loss: 0.025 | Val Accuracy: 0.953100%|██████████| 168/168 [00:17<00:00, 9.76it/s]Epochs: 7 | Train Loss: 0.006 | Train Accuracy: 0.990 | Val Loss: 0.010 | Val Accuracy: 0.973100%|██████████| 168/168 [00:17<00:00, 9.78it/s]Epochs: 8 | Train Loss: 0.006 | Train Accuracy: 0.991 | Val Loss: 0.015 | Val Accuracy: 0.973100%|██████████| 168/168 [00:17<00:00, 9.72it/s]Epochs: 9 | Train Loss: 0.006 | Train Accuracy: 0.993 | Val Loss: 0.011 | Val Accuracy: 0.987100%|██████████| 168/168 [00:17<00:00, 9.71it/s]Epochs: 10 | Train Loss: 0.008 | Train Accuracy: 0.988 | Val Loss: 0.014 | Val Accuracy: 0.980</code></pre><h2 id="在测试数据集上评估模型">在测试数据集上评估模型</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">evaluate</span>(<span class="params">model, test_data</span>):</span><br><span class="line"> test_data = Dataset(df_test)</span><br><span class="line"> test_dataloader = torch.utils.data.DataLoader(test_data, batch_size=<span class="number">1</span>)</span><br><span class="line"> use_cuda = torch.cuda.is_available()</span><br><span class="line"> device = torch.device(<span class="string">'cuda'</span> <span class="keyword">if</span> use_cuda <span class="keyword">else</span> <span class="string">'cpu'</span>)</span><br><span class="line"> <span class="keyword">if</span> use_cuda:</span><br><span class="line"> model.cuda()</span><br><span class="line"></span><br><span class="line"> total_acc_test = <span class="number">0</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">with</span> torch.no_grad():</span><br><span class="line"> <span class="keyword">for</span> test_input, test_label <span class="keyword">in</span> test_dataloader:</span><br><span class="line"> test_label = test_label.to(device)</span><br><span class="line"> mask = test_input[<span class="string">'attention_mask'</span>].to(device)</span><br><span class="line"> input_id = test_input[<span class="string">'input_ids'</span>].squeeze(<span class="number">1</span>).to(device)</span><br><span class="line"> output = model(input_id, mask)</span><br><span class="line"> acc = (output.argmax(dim=<span class="number">1</span>) == test_label).<span class="built_in">sum</span>().item()</span><br><span class="line"> total_acc_test += acc</span><br><span class="line"></span><br><span class="line"> <span class="built_in">print</span>(<span class="string">f"Test Accuacy:<span class="subst">{total_acc_test / <span class="built_in">len</span>(test_data):<span class="number">.3</span>f}</span>"</span>)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">evaluate(model, df_test)</span><br></pre></td></tr></table></figure><pre><code>Test Accuacy:0.980</code></pre><h2 id="保存模型">保存模型</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">Savemodel</span>(<span class="params">model,modelName</span>):</span><br><span class="line"> save_folder = <span class="string">'models'</span></span><br><span class="line"> os.makedirs(save_folder, exist_ok=<span class="literal">True</span>)</span><br><span class="line"> save_path = os.path.join(save_folder, modelName)</span><br><span class="line"> torch.save(model.state_dict(), save_path)</span><br><span class="line"></span><br><span class="line">Savemodel(model, <span class="string">'Bert.ckpt'</span>)</span><br></pre></td></tr></table></figure><p>用上面这中方法保存模型后,要先写出模型结构的类,再通过<code>model.load_state_dict()</code>来读取模型的参数</p><h2 id="附">附</h2><h3 id="magic-command">magic command</h3><p>在 Jupyter Notebook 或 Jupyter Lab 中,%%capture 是一个所谓的 "cell magic" 命令,用于捕获和丢弃或存储特定单元格的 stdout、stderr 输出。这在您想要防止某些命令的输出被显示在输出单元格中时非常有用。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">%%capture</span><br><span class="line"><span class="built_in">print</span>(<span class="string">"This won't be displayed in the output cell."</span>)</span><br></pre></td></tr></table></figure><p>执行上面的单元格将不会显示任何输出,尽管 print 语句被执行了。</p><p>参考自: <a href="https://zhuanlan.zhihu.com/p/524487313">用PyTorch和BERT进行文本分类</a></p><h3 id="jupyter转markdown">jupyter转markdown</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">jupyter nbconvert --to markdown {notebook_path} --output {output_path}</span><br></pre></td></tr></table></figure>]]></content>
<tags>
<tag> Python </tag>
<tag> Deep Learning </tag>
</tags>
</entry>
<entry>
<title>Debug Macbook</title>
<link href="/2023/08/25/debug-macbook/"/>
<url>/2023/08/25/debug-macbook/</url>
<content type="html"><![CDATA[<h1 id="硬盘无法读取问题">硬盘无法读取问题</h1><p>如果你没有正确弹出外置移动硬盘,可以尝试:</p><ol type="1"><li><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">➜ diskutil list</span><br><span class="line">/dev/disk0 (internal):...</span><br><span class="line">/dev/disk1 (synthesized): ...</span><br><span class="line">/dev/disk2 (external, physical): </span><br><span class="line"></span><br><span class="line">diskutil unmountDisk /dev/disk2diskutil eject /dev/disk2</span><br></pre></td></tr></table></figure></li><li>或者直接打开Disk Utility进行一个repair</li></ol>]]></content>
<tags>
<tag> Macbook </tag>
</tags>
</entry>
<entry>
<title>About git</title>
<link href="/2023/08/23/About-git/"/>
<url>/2023/08/23/About-git/</url>
<content type="html"><![CDATA[<h2 id="git-clone">git clone</h2><ul><li><code>-b</code>可以制定克隆的分支</li></ul><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git clone -b master https://github.com/jerryc127/hexo-theme-butterfly.git themes/butterfly</span><br></pre></td></tr></table></figure><h2 id="本地和远程如果发生冲突了怎么解决">本地和远程如果发生冲突了怎么解决</h2><p>1、把远程仓库master分支下载到本地并存为tmp分支</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git fetch origin master:tmp</span><br></pre></td></tr></table></figure><p>2、查看tmp分支与本地原有分支的不同</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git diff tmp</span><br></pre></td></tr></table></figure><p>这里主要是看看有没有其他的改动…</p><p>3、将tmp分支和本地的master分支合并</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git merge tmp</span><br></pre></td></tr></table></figure><p>这个时候呢,本地与远程就没有冲突了,而且还保留了我今天的代码,现在Push就OK啦!</p><p>4、最后别忘记删除tmp分支 <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git branch -d tmp</span><br></pre></td></tr></table></figure></p><h2 id="撤销一个git-commit">撤销一个Git commit</h2><ol type="1"><li><p>git reset --hard HEAD~1: 这条命令会删除最后一次的commit。所有的更改(包括提交和工作目录)都会被彻底删除,就好像这次commit从未发生过一样。</p></li><li><p>git reset --soft HEAD~1: 这条命令也会撤销最后一次的commit,但是不会删除你所做的更改。这些更改会被重新放回你的工作目录中,以便你能够重新提交它们。</p></li><li><p>git revert HEAD: 这条命令会创建一个新的commit,该commit的内容与你想要撤销的commit正好相反。这意味着你的commit历史不会被更改,只会添加一个新的commit。</p></li></ol><p>在使用这些命令之前,请确保你的工作目录是清晰的,以避免任何未提交的更改被覆盖或者删除。</p>]]></content>
<tags>
<tag> Git </tag>
</tags>
</entry>
<entry>
<title>配置深度学习工作站</title>
<link href="/2023/08/23/%E9%85%8D%E7%BD%AE%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E5%B7%A5%E4%BD%9C%E7%AB%99/"/>
<url>/2023/08/23/%E9%85%8D%E7%BD%AE%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E5%B7%A5%E4%BD%9C%E7%AB%99/</url>
<content type="html"><![CDATA[<h1 id="安装深度学习工作站">安装深度学习工作站</h1><p><a href="https://www.zhihu.com/question/33996159/answer/2435396388">如何配置一台适用于深度学习的工作站?</a></p><h2 id="中文输入法配置">中文输入法配置</h2><p><a href="https://muzing.top/posts/3fc249cf/">在 Ubuntu 安装配置 Fcitx 5 中文输入法_</a></p><h2 id="cudacudnn配置">cuda,cudnn配置</h2><p><a href="https://www.bilibili.com/video/BV1YX4y1b7La/?spm_id_from=333.880.my_history.page.click&vd_source=7bc6e2633aba83c9f343b3df8a31905d">ubuntu系统安装CUDA和CUDNNcudnn安装</a></p><h2 id="tensorrt-安装">TensorRT 安装</h2><blockquote><p>参考自Nvidia官方安装文档</p></blockquote><ol type="1"><li>Install the TensorRT Python Wheel</li></ol><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">python -m pip install --upgrade tensorrt</span><br><span class="line">python -m pip install --upgrade tensorrt_lean</span><br><span class="line">python -m pip install --upgrade tensorrt_dispatch</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>To verify that your installation is working,use the following Python commands</li></ol><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> tensorrt</span><br><span class="line"><span class="built_in">print</span>(tensorrt.__version__)</span><br><span class="line"><span class="keyword">assert</span> tensorrt.Builder(tensorrt.Logger())</span><br></pre></td></tr></table></figure><h2 id="ubuntu安装了没有网络">ubuntu安装了没有网络</h2><p><a href="https://www.bilibili.com/video/BV11X4y1h7qN/?spm_id_from=333.880.my_history.page.click&vd_source=7bc6e2633aba83c9f343b3df8a31905d">Ubuntu安装后,没有网络连接怎么办?</a></p><h2 id="nvidia显卡问题">nvidia显卡问题</h2><p><a href="Ubuntu安装英伟达NVIDIA显卡驱动黑屏99%的解决办法">Ubuntu安装英伟达NVIDIA显卡驱动黑屏99%的解决办法</a></p><h2 id="tensorflow问题">tensorflow问题</h2><p>tensorflow(((有事可以先看官方,先看官方,先看官方!!!tf2.10之后不再支持windows了!!</p>]]></content>
<tags>
<tag> Ubuntu </tag>
<tag> Deep Learning </tag>
</tags>
</entry>
<entry>
<title>Python装饰器</title>
<link href="/2023/08/19/Python%E8%A3%85%E9%A5%B0%E5%99%A8/"/>
<url>/2023/08/19/Python%E8%A3%85%E9%A5%B0%E5%99%A8/</url>
<content type="html"><![CDATA[<h1 id="property"><span class="citation" data-cites="property">@property</span></h1><h2 id="什么是property">什么是property</h2><p>简单地说就是一个类里面的方法一旦被@property装饰,就可以像调用属性一样地去调用这个方法,它能够简化调用者获取数据的流程,而且不用担心将属性暴露出来,有人对其进行赋值操作(避免使用者的不合理操作)。需要注意的两点是:</p><ol type="1"><li>调用被装饰方法的时候是不用加括号的</li><li>方法定义的时候有且只能有self一个参数</li></ol><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Goods</span>():</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self,unit_price,weight</span>):</span><br><span class="line"> self.unit_price = unit_price</span><br><span class="line"> self.weight = weight</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta"> @property</span></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">price</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="keyword">return</span> self.unit_price * self.weight</span><br><span class="line"><span class="meta">>>> </span>lemons = Goods(<span class="number">7</span>,<span class="number">4</span>)</span><br><span class="line"><span class="meta">>>> </span>lemons.price</span><br><span class="line"><span class="number">28</span></span><br></pre></td></tr></table></figure><p>上面通过调用属性的方式直接调用到 price 方法,property把复杂的处理过程封装到了方法里面去,取值的时候调用相应的方法名即可。</p><h2 id="property属性定义的两种方式">property属性定义的两种方式</h2><h3 id="装饰器方式">装饰器方式</h3><p>在类的方法上应用@property装饰器,即上面那种方式。</p><h3 id="类属性方式">类属性方式</h3><p>创建一个实例对象赋值给类属性</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Lemons</span>():</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self,unit_price=<span class="number">7</span></span>):</span><br><span class="line"> self.unit_price = unit_price</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">get_unit_price</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="keyword">return</span> self.unit_price</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">set_unit_price</span>(<span class="params">self,new_unit_price</span>):</span><br><span class="line"> self.unit_price = new_unit_price</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">del_unit_price</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="keyword">del</span> self.unit_price</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"> x = <span class="built_in">property</span>(get_unit_price, set_unit_price, del_unit_price)</span><br><span class="line"></span><br><span class="line"><span class="meta">>>> </span>fruit = Lemons()</span><br><span class="line"><span class="meta">>>> </span></span><br><span class="line"><span class="meta">>>> </span>fruit.x <span class="comment">#调用 fruit.x 触发 get_unit_price</span></span><br><span class="line"><span class="number">7</span></span><br><span class="line"><span class="meta">>>> </span></span><br><span class="line"><span class="meta">>>> </span>fruit.x = <span class="number">9</span> <span class="comment">#调用 fruit.x = 9 触发 set_unit_price</span></span><br><span class="line"><span class="meta">>>> </span></span><br><span class="line"><span class="meta">>>> </span>fruit.x</span><br><span class="line"><span class="number">9</span></span><br><span class="line"><span class="meta">>>> </span></span><br><span class="line"><span class="meta">>>> </span>fruit.unit_price <span class="comment">#调用 fruit.unit_price 触发 get_unit_price</span></span><br><span class="line"><span class="number">9</span></span><br><span class="line"><span class="meta">>>> </span><span class="keyword">del</span> fruit.x <span class="comment">#调用 del fruit.x 触发 del_unit_price </span></span><br><span class="line"><span class="meta">>>> </span></span><br><span class="line"><span class="meta">>>> </span>fruit.unit_price</span><br><span class="line">Traceback (most recent call last):</span><br><span class="line"> File <span class="string">"<pyshell#23>"</span>, line <span class="number">1</span>, <span class="keyword">in</span> <module></span><br><span class="line"> l.unit_price</span><br><span class="line">AttributeError: <span class="string">'Lemons'</span> <span class="built_in">object</span> has no attribute <span class="string">'unit_price'</span></span><br></pre></td></tr></table></figure><p>property方法可以接收四个参数</p><ul><li>第一个参数是获得属性的方法名,调用 对象.属性时自动触发</li><li>第二个参数是设置属性的方法名, 给属性赋值时自动触发</li><li>第三个参数是删除属性的方法名,删除属性时自动触发</li><li>第四个参数是字符串,是属性的描述文档,调用对象.属性.doc时触发</li></ul><h2 id="用property代替getter和setter方法">用property代替getter和setter方法</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Watermelon</span>():</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self,price</span>):</span><br><span class="line"> self._price = price <span class="comment">#私有属性,外部无法修改和访问</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">get_price</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="keyword">return</span> self._price</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">set_price</span>(<span class="params">self,new_price</span>):</span><br><span class="line"> <span class="keyword">if</span> new_price > <span class="number">0</span>:</span><br><span class="line"> self._price = new_price</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="keyword">raise</span> <span class="string">'error:价格必须大于零'</span></span><br></pre></td></tr></table></figure><p>用property代替getter和setter</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Watermelon</span>():</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self,price</span>):</span><br><span class="line"> self._price = price</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta"> @property </span><span class="comment">#使用@property装饰price方法</span></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">price</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="keyword">return</span> self._price</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta"> @price.setter </span><span class="comment">#使用@property装饰方法,当对price赋值时,调用装饰方法</span></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">price</span>(<span class="params">self,new_price</span>):</span><br><span class="line"> <span class="keyword">if</span> new_price > <span class="number">0</span>:</span><br><span class="line"> self._price = new_price</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="keyword">raise</span> <span class="string">'error:价格必须大于零'</span></span><br><span class="line"></span><br><span class="line"><span class="meta">>>> </span>watermelon = Watermelon(<span class="number">4</span>)</span><br><span class="line"><span class="meta">>>> </span></span><br><span class="line"><span class="meta">>>> </span>watermelon.price</span><br><span class="line"><span class="number">4</span></span><br><span class="line"><span class="meta">>>> </span></span><br><span class="line"><span class="meta">>>> </span>watermelon.price = <span class="number">7</span></span><br><span class="line"><span class="meta">>>> </span></span><br><span class="line"><span class="meta">>>> </span>watermelon.price</span><br><span class="line"><span class="number">7</span></span><br></pre></td></tr></table></figure>]]></content>
<tags>
<tag> Python </tag>
<tag> Grammar </tag>
</tags>
</entry>
<entry>
<title>Linux command</title>
<link href="/2023/08/19/Linux-command/"/>
<url>/2023/08/19/Linux-command/</url>
<content type="html"><![CDATA[<h1 id="wc">wc</h1><p>这是一个用于文件内字符、字节、行数统计的命令</p><h2 id="相关参数">相关参数</h2><ul><li><code>-w</code>:只计算字(word)数</li><li><code>-l</code>:只计算行数</li><li><code>-c</code>:只计算字节数</li><li><code>-m</code>:只计算字符数</li></ul><h1 id="tar">tar</h1><p>这是一个用于压缩和解压缩的命令</p><h2 id="相关参数-1">相关参数</h2><ul><li><code>-z</code>:用gzip算法压缩文件,后缀通常为tar.gz</li><li><code>-x</code>:解压缩文件</li><li><code>-c</code>:压缩文件</li><li><code>-v</code>:解压或者压缩时列出文件名字</li><li><code>-f</code>:文件名</li></ul><h2 id="操作">操作</h2><ul><li><code>tar -cf testdir.tar testdir</code>普通压缩testdir</li><li><code>tar -xf testdir.tar</code>普通解压缩</li><li><code>tar -zcf testdir.tar.gz testdir</code>用gzip算法压缩</li><li><code>tar -zxvf testdir.tar.gz</code>解压gzip算法压缩的文件并列出文件名字</li></ul><h1 id="tail">tail</h1><p>查看文件尾部的几行内容,默认打印最后10行,常常用于查看日志文件</p><h2 id="相关参数-2">相关参数</h2><ul><li><code>-n</code>:指定几行(5表示打印倒数五行,+5表示从第5行开始打印)</li><li><code>-f</code>:监视一个文件,实时更新现实内容</li><li><code>-F</code>:持续监视,不会因为文件变化而之后无法监视</li></ul><h1 id="head">head</h1><p>与tail对应,但相对用的较少,没有<code>-f</code>和<code>-F</code>参数</p><h2 id="相关参数-3">相关参数</h2><ul><li><code>-n</code>:指定几行(5表示打印前5行的内容,-5表示打印到倒数第5行)</li></ul><h1 id="vimvi">vim/vi</h1><h2 id="命令行模式一般模式">命令行模式/一般模式</h2><ol type="1"><li><code>ctrl f</code>:forward,向前翻一页</li><li><code>ctrl b</code>:backward,向后翻一页</li><li><code>ctrl d</code>:down,往下走</li><li><code>ctrl u</code>:up,往上走</li><li><code>y y</code>:复制</li><li><code>P</code>:往这一行前面粘贴</li><li><code>p</code>:往这一行后面粘贴</li><li><code>space</code>:光标往右移动</li><li><code>$</code>:跳到行末</li><li><code>0</code>:跳到行首</li><li><code>g g</code>:文首跳到文章尾部</li><li><code>G</code>:跳到文章尾部</li></ol><blockquote><p>在这些命令前加一个数字,你可以设置往上翻多少行,往下翻多少行,往右多少个字符</p></blockquote><h2 id="编辑模式">编辑模式</h2><p>在一般模式下按i进入</p><h2 id="底线命令行模式">底线命令行模式</h2><p>按<code>esc</code>回到一般模式,再在一般模式下按<code>:</code></p><ol type="1"><li><code>set number</code>:给文件加一个行号</li><li><code>set nonumber</code>:取消行号</li><li><code>/</code>可以往后进行搜索,<code>?</code>可以往回搜索,按<code>n</code>就可以往下切,按<code>N</code>可以往上切</li></ol><h1 id="chmod">chmod</h1><h2 id="文件用户">文件用户</h2><ul><li><code>u</code>:文件所有者</li><li><code>g</code>:文件所有者同组的用户</li><li><code>o</code>:其他用户</li><li><code>a</code>:所有用户</li></ul><h2 id="文件权限">文件权限</h2><ul><li><code>r</code>:读权限——4</li><li><code>w</code>:写权限——2</li><li><code>x</code>:执行权限——1</li></ul><blockquote><p>同时添加两种权限可以通过对数字进行一个相加实现</p></blockquote><blockquote><p><code>---X------</code>(用<code>ll</code>命令可以看到):第一个-代表是一个普通文件,左三个代表所有者权限,中间三个代表所有者同组用户的权限,右边三个代表其他用户权限</p></blockquote><h2 id="命令参数">命令参数</h2><ul><li><code>-c</code>:打印权限改变信息</li><li><code>-v</code>:打印详细信息</li><li><code>-R</code>:递归改变文件夹及其下所有文件的权限</li></ul><h2 id="操作-1">操作</h2><ul><li><code>chmod u+rw testfile.txt</code>:给当前用户添加读写权限</li><li><code>chmod 760 testfile.txt</code>:所有者——执行,读写,所有者同组用户——读写,其他用户——无</li><li><code>chmod u-w testfile.txt</code>:给当前用户取消写权限</li><li><code>chmod 100 testfile.txt</code>:所有者——执行,所有者同组用户——无,其他用户——无</li><li><code>chmod a-x testdir</code>:给所有用户取消执行权限</li><li><code>chmod 777 -R testdir</code>:递归给目录和其下面所有文件都给予所有权限</li><li><code>chmod -c a-x testdir</code>:取消执行权限的同时,打印出有变化的权限</li><li><code>chmod -v a-x testdir</code>:取消权限的同时,打印出当前命令的一个效果</li></ul><h1 id="grep">grep</h1><p>这是一个用于文件查找的命令</p><h2 id="相关参数-4">相关参数</h2><ul><li><code>-i</code>:忽略大小写</li><li><code>-r</code>:递归查找目录下面的所有文件</li><li><code>-e</code>:指定查找内容,可以是多个</li><li><code>-E</code>:用正则表达式去查找</li><li><code>-v</code>:反向查找,只输出不匹配的行</li><li><code>-l</code>:只输出包含匹配内容的文件名</li><li><code>-n</code>:显示匹配内容的行号</li><li><code>-w</code>:只输出完全匹配的内容</li></ul><h2 id="操作-2">操作</h2><ul><li><code>grep hello testfile.txt</code>:区分大小写查找文件中的内容</li><li><code>grep -i hello testfile.txt</code>:不区分大小写查找</li><li><code>grep -w hello testfile.txt</code>:精确匹配,只打印出文件中的hello</li><li><code>grep -e hello -e today testfile.txt</code>:同时匹配包含hello或是包含today的</li><li><code>grep -n hello testfile.txt</code>:打印对应的字符串行数</li><li><code>grep -v hello testfile.txt</code>:打印不包含hello的字符串</li><li><code>grep -r hello testdir</code>:递归查找文件夹中包含hello的文件,打印文件以及内容</li><li><code>grep -l hello testdir</code>:递归查找文件夹中包含hello的文件,只打印文件名</li><li><code>grep -E ‘hello|today testfile.txt’</code>:用正则表达式来查找</li></ul><h1 id="sed">sed</h1><p>替换</p><h2 id="相关参数-5">相关参数</h2><ul><li>常用选项<ul><li><code>-e</code>:(expression)后面跟脚本的表达式,单独使用时(只可以看到表达式的作用效果)一般不省略,平常一般会省略</li><li><code>-i</code>:(in-place)直接修改原始文件内容</li><li><code>-f</code>:file,指定执行的脚本文件</li><li><code>-n</code>:silent,只打印经过编辑的行</li></ul></li><li>常用脚本<ul><li><code>i</code>:(insert)指定行前面插入内容</li><li><code>a</code>:(append)指定行后面插入内容</li><li><code>d</code>:(delete)删除指定内容</li><li><code>c</code>:(copy)覆盖指定的整行内容</li><li><code>s</code>:(substitute)局部替换</li><li><code>p</code>:(print)打印指定行内容</li></ul></li></ul><h2 id="操作-3">操作</h2><ul><li><p><code>sed -e '1i\a new line' test.txt</code>:在test.txt的第一行前面去插入一行,内容是a new line,但不直接生效于文件本身</p></li><li><p><code>sed -ie '1i\a new line' test.txt</code>:在test.txt的第一行前面去插入一行,内容是a new line,直接作用于文件,且生成一个文件,注意这里的e被作为扩展后缀了,所以生成的文件名字是test.txte</p></li><li><p><code>sed -i '1i\a new line test.txt'</code>:在test.txt的第一行前面去插入一行,内容是a new line,直接作用于文件,不生成备份文件</p></li><li><p><code>sed -e '4a\line' test.txt</code>:在test.txt的第四行后面去插入一行,内容是line,但不直接生效于文件本身</p></li><li><p><code>sed -e '1d' test.txt</code>:删除test.txt的第一行,但不直接生效于文件本身</p></li><li><p><code>sed -e '1c\line' test.txt</code>:替换text.txt的第一行,替换后的内容为line,但不直接生效于文件本身</p></li><li><p><code>sed -e '1s/new/old' test.txt</code>:替换第一行中的第一个new替换为old,但不直接生效于文件本身</p></li><li><p><code>sed -e '1s/new/old/g' test.txt</code>:替换第一行中的全部的new替换为old,但不直接生效于文件本身(这里是正则表达式的用法)</p></li><li><p><code>sed -n '1p' test.txt</code>:只打印出了我们想要的第一行的内容</p></li><li><p><code>sed -n -e '1p' test.txt -n -e '2p' test.txt</code>:同时让两个命令一起执行</p></li><li><p>编辑脚本test.sh</p><ul><li><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">1a\hahaha</span><br><span class="line">2p</span><br></pre></td></tr></table></figure></li><li><code>sed -f test.sh test.txt</code>:在第一行的后面新增了一个hahaha,又重复打印了第二行</li></ul></li></ul><h1 id="管道符号">管道符号 |</h1><p>将前一条命令的标准输出作为下一条命令的标准输入来使用</p><h2 id="前置知识">前置知识</h2><ul><li>标准输入——0</li><li>标准输出——1</li><li>标准错误输出——2</li></ul><h2 id="操作-4">操作</h2><ul><li><code>cat test.txt |wc -l</code>:统计cat test.txt输出的行数</li></ul><h1 id="输出重定向">输出重定向 ></h1><h2 id="相关参数-6">相关参数</h2><ul><li><code>></code>:用标准输出去覆盖原有文件内容</li><li><code>>></code>:将标准输出追加到文件的后面</li><li><code>2></code>:用标准错误输出覆盖原有文件的内容</li><li><code>2>></code>:将标准错误输出追加到文件的后面</li></ul><h2 id="操作-5">操作</h2><ul><li><code>cat test.txt | grep helloworld > test.log</code>:将从test.txt中查找到的helloworld覆盖test.log中的内容</li><li><code>cat test.txt | grep helloworld >> test.log</code>:将从test.txt中查找到的helloworld追加到test.log的后面</li><li><code>cat test.txt test1.txt > test.log 2>& 1</code>(test1.txt不存在):将标准输出和标准错误输出都输出到test.log中间去</li><li><code>cat test.txt test1.txt > /dev/null 2>& 1</code>(test1.txt不存在):将标准输出和标准错误输出都输出到不存在的一个空间中间去(可以想象成黑洞),这个是我们在脚本里面的常用用法</li></ul><blockquote><p>注意我们的shell在执行命令之前会先去确定我们的输出会写到哪里去再去执行我们的命令</p></blockquote><h1 id="输入重定向">输入重定向 <</h1><h2 id="相关参数-7">相关参数</h2><ul><li><code><[文件名]</code>:将文件内容作为标准输入</li><li><code><<[结束标识符]</code>:<ul><li>将两个结束标识符之间的内容作为标准输入</li><li>结束标识符由用户自定义,常用EOF(end of file)</li></ul></li></ul><h2 id="操作-6">操作</h2><ul><li><code>cat > test2.txt << EOF</code>:将标准输出重定向到test2.txt中,其中输出以EOF结束</li></ul><blockquote><p>注意命令指向文件<code>></code>的是输出重定向,文件指向命令<code><</code>是输入重定向</p></blockquote><h1 id="awk">awk</h1><p>格式化处理文本</p><p>基本的命令格式:<code>awk [ ] ' ' [ ]</code></p><blockquote><p>后面跟着的参数:操作,脚本(important),要处理的文件名</p></blockquote><p>其中一行称为一个record,一个字段称为一个field,有field separator也有record separator</p><h2 id="相关参数-8">相关参数</h2><h3 id="常用参数">常用参数</h3><ul><li><code>-v</code>:value,设置变脸</li><li><code>-F</code>:filed separator,输入字段分隔符</li></ul><h3 id="常用内置变量">常用内置变量</h3><ul><li><code>$[n]</code>:number,第n个字段</li><li><code>$0</code>:当前整条记录</li><li><code>FS</code>:filed separator,输入字段分隔符</li><li><code>RS</code>:record separator,输入记录分隔符</li><li><code>OFS</code>:output filed separator,输出字段分隔符</li><li><code>ORS</code>:output record separator,输出记录分隔符</li><li><code>NF</code>:number of fileds,当前记录包含的字段数</li><li><code>NR</code>:number of records,当前已经处理的记录数</li></ul><h2 id="操作-7">操作</h2><ul><li><code>awk '{print $NF}' testfile.txt</code>:打印最后一个字段(NF——number of fields)</li><li><code>awk '{OFS="#";$1=$1;print $0}' testfile.txt</code>:将输出分隔符修改成#(OFS——output field separator)</li><li><code>awk -v OFS="#" '{$1=$1;print $0}' testfile.txt</code>:将输出分隔符修改成#,其中-v支持我们从外面也就是我们的shell当中去读入我们的参数</li><li><code>awk '{OFS="#";print $1,$2,$3,$4}' testfile.txt</code>:将输出分隔符修改成#并输出,这里是一个字段一个字段去读取,所以不需要添加<code>$1=$1</code>也可以实现刷新</li><li><code>awk '{printf "%-3s %2d %2d %2d\n",$1,$2,$3,$4}' testfile.txt</code>:用printf让输出更加整齐</li><li><code>awk '{if(NR==3){print $0} else{print "不是第三行"}}' testfile.txt</code>:只输出第三行,在其他行输出不是第三行</li><li><code>awk 'NR==1,NR==3{print $0}' testfile.txt</code>:打印第一行到第三行,这是一个模式加上操作的用法</li><li><code>awk 'NR==1||NR==3{print $0}' testfile.txt</code>:打印第一行和第三行</li><li><code>awk '/xm/{print $0}' testfile.txt</code>:打印xm开头的record,这里的模式是用正则表达式是去匹配</li></ul><h1 id="环境变量source">环境变量,source</h1><p>我们一般执行一个可执行文件,直接使用<code>shell</code>,是会报错的</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">bash:shell:command not found</span><br></pre></td></tr></table></figure><p>这是因为我们shell文件所在的路径没有被添加到环境变量当中,所谓环境变量就是一些路径,当我们直接在terminal中执行一些命令时,系统就会去这些环境变量中找是否存在相应的可执行文件。如果我们需要直接使用<code>shell</code>,则需要将shell所在的路径添加到环境变量中去。</p><ol type="1"><li>修改用户目录下的.bashrc文件,在文件的最后添加如下内容</li></ol><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">export PATH=$PATH:[shell文件所在的路径]</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>使用<code>source ~/.bashrc</code>立即更新一下,或者在关闭终端后再重新打开也能生效</li></ol>]]></content>
<tags>
<tag> Linux </tag>
</tags>
</entry>
<entry>
<title>Concerning Network</title>
<link href="/2023/08/13/concerning-network/"/>
<url>/2023/08/13/concerning-network/</url>
<content type="html"><![CDATA[<h1 id="windows">Windows</h1><h2 id="github">Github</h2><h3 id="通过sshkey的方式拉取代码报错kex_exchange_identification-connection-closed-by-remote">通过sshkey的方式拉取代码报错kex_exchange_identification: Connection closed by remote</h3><h4 id="前言">前言</h4><p>最近通过sshkey的方式拉取GitHub代码报错:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">kex_exchange_identification: Connection closed by remote host</span><br></pre></td></tr></table></figure><p>通过查阅资料,这个报错其实跟梯子有关~但是不用梯子,速度感人!</p><h4 id="解决">解决</h4><ol type="1"><li>关掉梯子(不推荐)</li><li>将 Github 的连接端口从 22 改为 443 即可</li></ol><h4 id="操作">操作</h4><p>编辑 <strong>~/.ssh/config</strong> 文件(没有就新增),windows在用户目录下的.ssh目录,添加如下内容</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">Host github.com</span><br><span class="line"> HostName ssh.github.com</span><br><span class="line"> User git</span><br><span class="line"> Port 443</span><br><span class="line"></span><br><span class="line">Host github.com</span><br><span class="line"> User git</span><br><span class="line"> ProxyCommand connect -H 127.0.0.1:7890 %h %p</span><br></pre></td></tr></table></figure><h1 id="ubuntu">Ubuntu</h1><h2 id="github-1">Github</h2><p>对于Ubuntu用户,你可以将端口号替换为7890,并使用相应的命令来设置代理。以下是在Ubuntu系统下配置SSH代理的方法:</p><ol type="1"><li>打开或创建SSH的config文件:</li></ol><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi ~/.ssh/config</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>将以下内容加到config文件中(在一定情况下可以去掉第一行):</li></ol><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">ProxyCommand nc -v -x 127.0.0.1:7890 %h %p</span><br><span class="line"></span><br><span class="line">Host github.com</span><br><span class="line"> Hostname ssh.github.com</span><br><span class="line"> User git</span><br><span class="line"> Port 443 </span><br><span class="line"> Hostname github.com</span><br><span class="line"><span class="meta prompt_"> # </span><span class="language-bash">注意修改路径为你的路径</span></span><br><span class="line"> IdentityFile /home/your_user_name/.ssh/id_rsa</span><br><span class="line"> TCPKeepAlive yes</span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">Host ssh.github.com</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash"> User git</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash"> Port 443</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash"> Hostname ssh.github.com</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash"> <span class="comment"># 注意修改路径为你的路径</span></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash"> IdentityFile /home/your_user_name/.ssh/id_rsa</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash"> TCPKeepAlive <span class="built_in">yes</span></span></span><br></pre></td></tr></table></figure><p>因为有些梯子对于 22 端口做了限制,要么禁止了,要么有些抽风<br />所以如果 22 端口不畅就使用 443,安全稳定可靠。ps: 22 端口时 hostname 请填 github.com。</p><ol start="3" type="1"><li>确保config文件的权限设置正确:</li></ol><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">chmod 600 ~/.ssh/config</span><br></pre></td></tr></table></figure><p>这样,Ubuntu用户就可以通过端口7890的代理连接到GitHub。当你通过SSH与GitHub进行通信时,你的连接将通过此代理。</p><blockquote><p>参考自:<a href="https://zhuanlan.zhihu.com/p/481574024">设置代理解决github被墙</a>、<a href="https://hellodk.cn/post/975">GitHub 加速终极教程</a></p></blockquote><p><strong>测试与github连接</strong></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh -T [email protected]</span><br></pre></td></tr></table></figure><blockquote><p>亲测有效:不如直接断开wifi重新连接一次看看能不能连通<code>ssh -T [email protected]</code></p></blockquote><h2 id="apt">apt</h2><ol type="1"><li>编辑 apt 的代理设置:使用以下命令打开 apt 的配置文件</li></ol><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo nano /etc/apt/apt.conf</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>在配置文件中添加代理设置:在打开的文件中,你可以添加以下行来设置代理(将 "your_proxy_address" 和 "your_proxy_port" 替换为实际的代理地址和端口):</li></ol><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">Acquire::http::Proxy "http://127.0.0.1:7890";</span><br><span class="line">Acquire::https::Proxy "http://127.0.0.1:7890";</span><br></pre></td></tr></table></figure><blockquote><p>其实就是http://your_proxy_address:your_proxy_port/</p></blockquote><ol start="3" type="1"><li>保存和退出:按下 Ctrl + O 来保存修改,然后按下 Ctrl + X 来退出文本编辑器。</li><li>更新软件包列表,运行软件更新</li></ol><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo apt update</span><br><span class="line">sudo apt upgrade</span><br></pre></td></tr></table></figure><h1 id="python">Python</h1><h2 id="python脚本">Python脚本</h2><p>如下操作可以让程序走代理 <figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> os</span><br><span class="line"></span><br><span class="line">os.environ[<span class="string">"http_proxy"</span>] = <span class="string">"http://127.0.0.1:7890"</span></span><br><span class="line">os.environ[<span class="string">"https_proxy"</span>] = <span class="string">"http://127.0.0.1:7890"</span></span><br></pre></td></tr></table></figure></p><h2 id="pip">pip</h2><h3 id="单次设置">单次设置</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch</span><br><span class="line">或者</span><br><span class="line">pip install torch -i https://pypi.tuna.tsinghua.edu.cn/simple</span><br></pre></td></tr></table></figure><h3 id="全局设置">全局设置</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple</span><br><span class="line">pip config set install.trusted-host mirrors.aliyun.com</span><br></pre></td></tr></table></figure>]]></content>
<tags>
<tag> Network </tag>
</tags>
</entry>
<entry>
<title>让hexo支持数学公式</title>
<link href="/2023/08/13/%E8%AE%A9hexo%E6%94%AF%E6%8C%81%E6%95%B0%E5%AD%A6%E5%85%AC%E5%BC%8F/"/>
<url>/2023/08/13/%E8%AE%A9hexo%E6%94%AF%E6%8C%81%E6%95%B0%E5%AD%A6%E5%85%AC%E5%BC%8F/</url>
<content type="html"><![CDATA[<h1 id="数学公式">数学公式</h1><h2 id="卸载hexo-math和hexo-renderer-marked">卸载hexo-math和hexo-renderer-marked</h2><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">npm un hexo-math</span><br><span class="line">npm un hexo-renderer-marked</span><br></pre></td></tr></table></figure><h2 id="安装hexo-renderer-pandoc渲染器">安装hexo-renderer-pandoc渲染器</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">npm i hexo-renderer-pandoc</span><br></pre></td></tr></table></figure><h1 id="pandoc报错">Pandoc报错</h1><h2 id="安装pandoc">安装Pandoc</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install pandoc</span><br></pre></td></tr></table></figure><p>在安装好后可以通过下面的命令查看是否安装成功 <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">pandoc -v</span><br></pre></td></tr></table></figure></p><h2 id="一个莫名其妙的错误">一个莫名其妙的错误</h2><p>先回到hexo目录,执行hexo -s,如果你没有出现这个报错:</p><blockquote><p>pandoc exited with code 9: pandoc: Unknown extension: smart</p></blockquote><p>那么恭喜你,你的这个问题并不存在,可以选择跳过。但是如果你和我 一样报这个错误,可能就开始头疼了。不过我终于还是找到了解决方法。 首先我是找到了这篇文章:<a href="https://www.cnblogs.com/diralpo/p/12542450.html">配置hexo时遇到的问题 - diralpo - 博客园 (cnblogs.com)</a></p><p>从这篇文章得知,导致该报错的原因是pandoc版本过低 ,而且还不是一般原因引起的版本过低,因为前面我们已经安装了最新版本的pandoc。但是最新版本的没起作用。于是我打开了everything查找电脑上存在的pandoc。然后发现位于Anaconda,真正问题也出在这儿。</p><p>是因为Anaconda安装的pandoc版本过低,而且hexo默认使用的是Anaconda的pandoc。</p><p>在某一篇文章得知,pandoc版本应该在2.0以上,但那个pandoc好像是1.9。那接下来的就简单了,直接把新下载的pandoc.exe替换Anaconda里的pandoc.exe。</p>]]></content>
<tags>
<tag> Blog </tag>
</tags>
</entry>
<entry>
<title>常用markdown</title>
<link href="/2023/08/13/%E5%B8%B8%E7%94%A8markdown/"/>
<url>/2023/08/13/%E5%B8%B8%E7%94%A8markdown/</url>
<content type="html"><![CDATA[<h1 id="一些矩阵的打印方法">一些矩阵的打印方法</h1><p><span class="math display">\[\begin{matrix} a & b \\ c & d \end{matrix}\]</span></p><p><span class="math display">\[\begin{array}{cc} a & b \\ c & d \end{array}\]</span></p><p><span class="math display">\[\begin{Bmatrix} & \end{Bmatrix}\begin{bmatrix} & \end{bmatrix}\begin{pmatrix} & \end{pmatrix}\begin{vmatrix} & \end{vmatrix}\]</span></p>]]></content>
<tags>
<tag> Markdown </tag>
</tags>
</entry>
<entry>
<title>层次分析法</title>
<link href="/2023/08/13/%E5%B1%82%E6%AC%A1%E5%88%86%E6%9E%90%E6%B3%95/"/>
<url>/2023/08/13/%E5%B1%82%E6%AC%A1%E5%88%86%E6%9E%90%E6%B3%95/</url>
<content type="html"><![CDATA[<h1 id="典型应用">典型应用</h1><ol type="1"><li>用于最佳方案的选取(选择运动员、选择地址)</li><li>用于评价类问题(评价水质状况、评价环境)</li><li>于指标体系的优选(兼顾科学和效率)</li></ol><h1 id="思维方式归纳">思维方式归纳</h1><h2 id="三个层次">三个层次</h2><h3 id="最高层目标层">最高层:目标层</h3><p>表示解决问题的目的,即层次分析要达到的总目标。通常只有一个总目标。</p><h3 id="中间层准则层指标层...">中间层:准则层、指标层、...</h3><p>表示采取某种措施、政策、方案等实现预定总目标所涉及的中间环节;又分为准则层、指标层、策略层、约束层等。</p><h3 id="最低层方案层">最低层:方案层</h3><p>表示将选用的解决问题的各种措施、政策、方案等。通常有几个方案可选。每层有若干元素,层间元素的关系用相连直线表示。</p><h2 id="构造判断比较矩阵">构造判断比较矩阵</h2><p>在确定各层次各因素之间的权重时,如果只是定性的结果,则常常不容易被别人接受,因而Santy等人提出:一致矩阵法,即:</p><ol type="1"><li>不把所有因素放在一起比较,而是两两相互比较</li><li>对此时采用相对尺度,以尽可能减少性质不同的诸因素相互比较的困难,以提高准确度。</li></ol><blockquote><p>判断矩阵是表示本层所有因素针对上一层某一个因素的相对重要性的比较。判断矩阵的元素<span class="math inline">\(a_{ij}\)</span>用Santy的1—9标度方法给出。</p></blockquote><p>设要比较各准则<span class="math inline">\(C_1,C_2,... , C_n\)</span>对目标<span class="math inline">\(O\)</span>的重要性</p><p><span class="math display">\[A = \begin{bmatrix}1 & \frac{1}{2} & 4 & 3 & 3 \\2 & 1 & 7 & 5 & 5 \\\frac{1}{4} & \frac{1}{7} & 1 & \frac{1}{2} & \frac{1}{3} \\\frac{1}{3} & \frac{1}{5} & 2 & 1 & 1 \\\frac{1}{3} & \frac{1}{5} & 3 & 1 & 1 \end{bmatrix},\quad a_{ij} > 0, \quad a_{ji} = \frac{1}{a_{ij}}\]</span></p><p>注意:公式排版可以用 \begin{align *}和\end{align*} 来排版,这样在Blog中就不用加入$$符号了了</p><p>注意:在该Blog中*的转义只用给后面的那个*进行转义即可</p><p>我们很快就能发现成对比较的不一致情况:</p><p><span class="math inline">\(a_{21}=C_2:C_1=2,a_{13}=C_1:C_3=4\)</span>如果二者要一致的话<span class="math inline">\(a_{23}=a_{21}*a_{13}=8\)</span></p><blockquote><p>我们在这里允许不一致,但要确定不一致的允许范围</p></blockquote><h3 id="一致性">一致性</h3><p>由于最大的特征值<span class="math inline">\(λ\)</span>连续的依赖于<span class="math inline">\(a_{ij}\)</span> ,则<span class="math inline">\(λ\)</span>比<span class="math inline">\(n\)</span>大的越多,<span class="math inline">\(A\)</span>的不一致性越严重。用最大特征值对应的特征向量作为被比较因素对上层某因素影响程度的权向量,其不一致程度越大,引起的判断误差越大。因而可以用<span class="math inline">\(λ-n\)</span>数值的大小来衡量<span class="math inline">\(A\)</span>的不一致程度。 定义一致性指标: <span class="math display">\[CI = \frac{\lambda_{\text{max}} - n}{n - 1}\]</span></p><p>其中,<span class="math inline">\(CI = 0\)</span> 时,有完全的一致性;</p><p><span class="math inline">\(CI\)</span> 接近于 0 时,有满意的一致性;</p><p><span class="math inline">\(CI\)</span> 越大,不一致越严重。</p><p>为衡量 <span class="math inline">\(CI\)</span> 的大小,引入随机一致性指标 <span class="math inline">\(RI\)</span>。方法为随机构造 500 个成对比较矩阵 <span class="math inline">\(A_1, A_2, \ldots, A_{500}\)</span>,则可得一致性指标 <span class="math inline">\(CI_1, CI_2, \ldots, CI_{500}\)</span>。 <span class="math display">\[RI = \frac{\sum_{i=1}^{500}CI_i}{500}\\]</span></p><p>Saaty的结果如下: <span class="math display">\[\begin{array}{c|c}n & RI \\\hline1 & 0.00 \\2 & 0.00 \\3 & 0.58 \\4 & 0.90 \\5 & 1.12 \\6 & 1.24 \\7 & 1.32 \\8 & 1.41 \\9 & 1.45 \\10 & 1.49 \\11 & 1.51 \\\end{array}\]</span></p>]]></content>
<tags>
<tag> Mathematics Modeling </tag>
</tags>
</entry>
<entry>
<title>OCR截图脚本</title>
<link href="/2023/08/12/OCR%E8%84%9A%E6%9C%AC/"/>
<url>/2023/08/12/OCR%E8%84%9A%E6%9C%AC/</url>
<content type="html"><![CDATA[<h1 id="创建python脚本">创建python脚本</h1><blockquote><p>创建一个python脚本文件,名为 ocr.py 并填充以下内容</p></blockquote><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment">#coding=utf-8</span></span><br><span class="line"><span class="keyword">from</span> paddleocr <span class="keyword">import</span> PaddleOCR, draw_ocr</span><br><span class="line"><span class="keyword">import</span> sys</span><br><span class="line"><span class="keyword">import</span> getopt</span><br><span class="line"><span class="keyword">from</span> PIL <span class="keyword">import</span> Image</span><br><span class="line"></span><br><span class="line"><span class="comment"># 执行ocr并写入txt文件</span></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">exe_ocr</span>(<span class="params">img_path,file_txt = <span class="string">"result.txt"</span>,img_result = <span class="string">"result.jpg"</span></span>):</span><br><span class="line"> <span class="comment"># Paddleocr目前支持的多语言语种可以通过修改lang参数进行切换</span></span><br><span class="line"> <span class="comment"># 例如`ch`, `en`, `fr`, `german`, `korean`, `japan`</span></span><br><span class="line"> ocr = PaddleOCR(use_angle_cls=<span class="literal">False</span>, lang=<span class="string">"ch"</span>) <span class="comment"># need to run only once to download and load model into memory</span></span><br><span class="line"> result = ocr.ocr(img_path, cls=<span class="literal">False</span>)</span><br><span class="line"> res = result[<span class="number">0</span>] <span class="comment"># 因为只有一张图片,所以结果只有1个,直接取出</span></span><br><span class="line"> boxes = [] <span class="comment"># 检测框坐标</span></span><br><span class="line"> txt = <span class="string">""</span> <span class="comment"># 检测识别结果</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> line <span class="keyword">in</span> res:</span><br><span class="line"> <span class="comment">#print(line[1][0])</span></span><br><span class="line"> txt += line[<span class="number">1</span>][<span class="number">0</span>]+<span class="string">"\n"</span> <span class="comment"># 取出文本</span></span><br><span class="line"> boxes.append(line[<span class="number">0</span>]) <span class="comment"># 取出检测框</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">with</span> <span class="built_in">open</span>(file_txt, <span class="string">'w'</span>)<span class="keyword">as</span> f: <span class="comment"># 以w方式打开,没有就创建,有就覆盖</span></span><br><span class="line"> f.write(txt)</span><br><span class="line"></span><br><span class="line"> image = Image.<span class="built_in">open</span>(img_path).convert(<span class="string">'RGB'</span>) <span class="comment"># 读取原图片</span></span><br><span class="line"> im_show = draw_ocr(image, boxes) <span class="comment"># 画检测框</span></span><br><span class="line"> im_show = Image.fromarray(im_show) <span class="comment"># 转换</span></span><br><span class="line"> im_show.save(img_result) <span class="comment"># 保存</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment"># 主函数</span></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">main</span>(<span class="params">argv</span>):</span><br><span class="line"></span><br><span class="line"> img_path = <span class="string">""</span> <span class="comment"># 图片路径</span></span><br><span class="line"> file_txt = <span class="string">"result.txt"</span> <span class="comment"># 输出的文本文件路径</span></span><br><span class="line"> img_result = <span class="string">"result.jpg"</span> <span class="comment"># 检测结果图片路径</span></span><br><span class="line"></span><br><span class="line"> <span class="comment"># 解析参数</span></span><br><span class="line"> <span class="comment"># "hi:o:": 短格式分析串, h 后面没有冒号, 表示后面不带参数; i 和 o 后面带有冒号, 表示后面带参数</span></span><br><span class="line"> <span class="comment"># ["help", "input_file=", "output_file="]: 长格式分析串列表, help后面没有等号, 表示后面不带参数; input_file和output_file后面带冒号, 表示后面带参数</span></span><br><span class="line"> <span class="comment"># 返回值包括 `opts` 和 `args`, opts 是以元组为元素的列表, 每个元组的形式为: (选项, 附加参数),如: ('-i', 'test.png');</span></span><br><span class="line"> <span class="comment"># args是个列表,其中的元素是那些不含'-'或'--'的参数</span></span><br><span class="line"> opts, args = getopt.getopt(argv[<span class="number">1</span>:], <span class="string">"hi:o:"</span>, [<span class="string">"help"</span>, <span class="string">"input_file="</span>, <span class="string">"output_file="</span>])</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> opt, arg <span class="keyword">in</span> opts:</span><br><span class="line"> <span class="keyword">if</span> opt <span class="keyword">in</span> (<span class="string">"-h"</span>, <span class="string">"--help"</span>):</span><br><span class="line"> <span class="built_in">print</span>(<span class="string">'python3 ocr.py -i <input_file.png> -o <output_file.txt>'</span>)</span><br><span class="line"> <span class="built_in">print</span>(<span class="string">'or: python3 ocr.py --input_file=<input_file.png> --output_file=<output_file.txt>'</span>)</span><br><span class="line"> sys.exit()</span><br><span class="line"> <span class="keyword">elif</span> opt <span class="keyword">in</span> (<span class="string">"-i"</span>, <span class="string">"--input_file"</span>):</span><br><span class="line"> img_path = arg</span><br><span class="line"> <span class="keyword">elif</span> opt <span class="keyword">in</span> (<span class="string">"-o"</span>, <span class="string">"--output_file"</span>):</span><br><span class="line"> file_txt = arg</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> img_path == <span class="string">""</span>:</span><br><span class="line"> <span class="comment">#print("必须指定一个图片文件")</span></span><br><span class="line"> sys.exit()</span><br><span class="line"></span><br><span class="line"> img_result = file_txt[:file_txt.rindex(<span class="string">'.'</span>)+<span class="number">1</span>]+<span class="string">"jpg"</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">#print('输入图片文件为:', img_path)</span></span><br><span class="line"> <span class="comment">#print('输出txt文件为: ', file_txt)</span></span><br><span class="line"> <span class="comment">#print('输出result文件为: ', img_result)</span></span><br><span class="line"></span><br><span class="line"> exe_ocr(img_path,file_txt,img_result)</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line"> main(sys.argv)</span><br></pre></td></tr></table></figure><h1 id="创建shell脚本">创建shell脚本</h1><blockquote><p>创建一个shell脚本文件,名为 ocr.sh 并填充以下内容</p></blockquote><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_">#</span><span class="language-bash">!/bin/env bash</span></span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">定义一个文件基本路径,假设用户名是 pi,临时文件放在主目录下的ocrtemp下</span></span><br><span class="line">SCR="/home/rexkev/ocrtemp/ocr_image" # 注意最后的ocr_image${SCR}</span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">激活虚拟环境</span></span><br><span class="line">source /home/rexkev/miniconda3/etc/profile.d/conda.sh #注意此处自己找到conda.sh激活</span><br><span class="line">conda activate ocr</span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">切换到脚本所在的目录</span></span><br><span class="line">cd /home/rexkev/ocrtemp</span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">获取一个截图</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">这里用到了gnome-screenshot,需要先安装好:sudo apt install gnome-screenshot</span></span><br><span class="line">gnome-screenshot -a -f ${SCR}.png</span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">放大图片</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">如果觉得效果不好,可以尝试把图片放大</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">需要先安装一个软件包:sudo apt install imagemagick</span></span><br><span class="line">mogrify -modulate 100,0 -resize 400% ${SCR}.png</span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">OCR by paddleocr</span></span><br><span class="line">python3 ocr.py -i ${SCR}.png -o ${SCR}.txt</span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">打开文件</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">调用系统默认程序分别打开:原始图片、检测结果、识别出来的文本</span></span><br><span class="line">xdg-open ${SCR}.png</span><br><span class="line">xdg-open ${SCR}.jpg</span><br><span class="line">xdg-open ${SCR}.txt</span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">把文本复制到剪切板</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">需要先安装软件包:sudo apt install xclip</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">由于ocr结果一般都会有各种问题,其实并不能直接使用</span></span><br><span class="line">cat ${SCR}.txt | xclip -selection clipboard</span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">退出</span></span><br><span class="line">exit</span><br></pre></td></tr></table></figure><h1 id="准备目录">准备目录</h1><blockquote><p>在主目录下创建临时文件夹:mkdir ocrtemp 把ocr.py和ocr.sh 两个文件放到主目录下</p></blockquote><h1 id="配置快捷键">配置快捷键</h1><ol type="1"><li>在Ubuntu的系统设置里面,找到“键盘”,再找到“键盘快捷键”,然后点开“查看及自定义快捷键”</li><li>点开“自定义快捷键”</li><li>点加号新建一个快捷键</li><li>设置好名称和快捷键,命令指向ocr.sh</li></ol><h1 id="常见问题">常见问题</h1><h2 id="在ubuntu-22.04安装libssl1.1">在ubuntu 22.04安装libssl1.1</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">echo "deb http://security.ubuntu.com/ubuntu focal-security main" | sudo tee /etc/apt/sources.list.d/focal-security.list</span><br><span class="line"> </span><br><span class="line">sudo apt-get update</span><br><span class="line">sudo apt-get install libssl1.1</span><br></pre></td></tr></table></figure><h2 id="pip下载过慢">pip下载过慢</h2><h3 id="单次设置">单次设置</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch</span><br><span class="line">或者</span><br><span class="line">pip install torch -i https://pypi.tuna.tsinghua.edu.cn/simple</span><br></pre></td></tr></table></figure><h3 id="全局设置">全局设置</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple</span><br><span class="line">pip config set install.trusted-host mirrors.aliyun.com</span><br></pre></td></tr></table></figure><h2 id="pip安装内容">pip安装内容</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">pip install paddlepaddle paddleocr pillow</span><br></pre></td></tr></table></figure>]]></content>
<tags>
<tag> Ubuntu </tag>
<tag> Script </tag>
</tags>
</entry>
<entry>
<title>pathlib</title>
<link href="/2023/08/08/pathlib/"/>
<url>/2023/08/08/pathlib/</url>
<content type="html"><![CDATA[<h1 id="pathlib">pathlib</h1><p>用该库获取某个文件夹下所有的txt文件将会变得极其简单</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> pathlib <span class="keyword">import</span> Path</span><br><span class="line"></span><br><span class="line">dir_path = Path(<span class="string">"/home/user/documents"</span>)</span><br><span class="line">files = <span class="built_in">list</span>(dir_path.glob(<span class="string">"*.txt"</span>)) </span><br></pre></td></tr></table></figure><h1 id="处理路径">处理路径</h1><h2 id="创建路径">创建路径</h2><p>几乎所有pathlib 的功能都可以通过其 Path 子类访问,可以使用该类创建文件和目录</p><p>有多种初始化Path的方式,比如,获取当前工作路径 :</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> pathlib <span class="keyword">import</span> Path</span><br><span class="line"></span><br><span class="line">Path.cwd()</span><br></pre></td></tr></table></figure><p>使用home</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">Path.home()<span class="comment">#PosixPath('/Users/mac')</span></span><br></pre></td></tr></table></figure><p>同样的可以指定字符串路径创建路径</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">p = Path(<span class="string">"documents"</span>) <span class="comment"># PosixPath('documents') </span></span><br></pre></td></tr></table></figure><p>使用<strong>正斜杠</strong>运算符进行路径连接</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">data_dir = Path(<span class="string">"."</span>) / <span class="string">"data"</span></span><br><span class="line">csv_file = data_dir / <span class="string">"file.csv"</span></span><br><span class="line"><span class="built_in">print</span>(data_dir) <span class="comment"># data</span></span><br><span class="line"><span class="built_in">print</span>(csv_file) <span class="comment"># data/file.csv</span></span><br></pre></td></tr></table></figure><p>检查路径或者文件是否存在,可以使用布尔函数 exists</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">data_dir.exists()</span><br><span class="line">csv_file.exists()</span><br></pre></td></tr></table></figure><p>使用 is_dir 或 is_file 函数来检查是否为文件夹、文件</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">data_dir.is_dir()</span><br><span class="line"></span><br><span class="line">csv_file.is_file()</span><br></pre></td></tr></table></figure><p>大多数路径都与当前运行目录相关,但某些情况下必须提供文件或目录的绝对路径,可以使用 absolute</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">csv_file.absolute() <span class="comment"># PosixPath('/home/user/Downloads/data/file.csv') </span></span><br></pre></td></tr></table></figure><h2 id="path属性">Path属性</h2><p>Path 对象有许多有用属性,一起来看看这些示例,首先定义一个图片路径</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">image_file = Path(<span class="string">"images/midjourney.png"</span>).absolute() <span class="comment">#PosixPath('/home/user/Downloads/images/midjourney.png')</span></span><br></pre></td></tr></table></figure><p>先从 parent 开始,它将返回当前工作目录的上一级</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">image_file.parent <span class="comment"># PosixPath('/home/user/Downloads/images')</span></span><br></pre></td></tr></table></figure><p>获取文件名</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">image_file.name <span class="comment"># 'midjourney.png'</span></span><br></pre></td></tr></table></figure><p>它将返回带有后缀的文件名,若只想要前缀,则使用stem</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">image_file.stem <span class="comment"># midjourney</span></span><br></pre></td></tr></table></figure><p>只想要后缀也很简单</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">image_file.suffix <span class="comment"># '.png'</span></span><br></pre></td></tr></table></figure><p>如果要将路径分成多个部分,可以使用 parts</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">image_file.parts # ('/', 'home', 'user', 'Downloads', 'images', 'midjourney.png')</span><br></pre></td></tr></table></figure><p>如果希望这些组件本身就是 Path 对象,可以使用 <strong>parents(注意s)</strong> 属性,它会创建一个生成器</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> i <span class="keyword">in</span> image_file.parents:</span><br><span class="line"> <span class="built_in">print</span>(i)</span><br><span class="line"></span><br><span class="line"><span class="comment"># /home/user/Downloads/images</span></span><br><span class="line"><span class="comment"># /home/user/Downloads</span></span><br><span class="line"><span class="comment"># /home/user</span></span><br><span class="line"><span class="comment"># /home</span></span><br><span class="line"><span class="comment"># /</span></span><br></pre></td></tr></table></figure><h1 id="处理文件">处理文件</h1><p>处理文件</p><p>想要创建文件并写入内容,不必再使用 open 函数,只需创建一个 Path 对象搭配 write_text 或 write_btyes 即可</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">markdown = data_dir / <span class="string">"file.md"</span> <span class="comment"># 此处的data_dir是Path对象 </span></span><br><span class="line"></span><br><span class="line"><span class="comment">#Create (override) and write text</span></span><br><span class="line"></span><br><span class="line">markdown.write_text(<span class="string">"This is a test markdown"</span>)</span><br></pre></td></tr></table></figure><p>读取文件,则可以 read_text 或 read_bytes</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">markdown.read_text() <span class="comment"># 'This is a test markdown'</span></span><br><span class="line"><span class="built_in">len</span>(image_file.read_bytes()) <span class="comment"># 1962148</span></span><br></pre></td></tr></table></figure><p>但请注意, write_text 或 write_bytes 会覆盖文件的现有内容</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># Write new text to existing file</span></span><br><span class="line"></span><br><span class="line">markdown.write_text(<span class="string">"## This is a new line"</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment"># The file is overridden</span></span><br><span class="line"></span><br><span class="line">markdown.read_text() <span class="comment"># '## This is a new line'</span></span><br></pre></td></tr></table></figure><p>要将新信息附加到现有文件,应该在 a (附加)模式下使用 Path 对象的 open 方法:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># Append text</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">with</span> markdown.<span class="built_in">open</span>(mode=<span class="string">"a"</span>) <span class="keyword">as</span> file:</span><br><span class="line"> file.write(<span class="string">"\n### This is the second line"</span>)</span><br><span class="line"></span><br><span class="line">markdown.read_text() <span class="comment"># '## This is a new line\n### This is the second line'</span></span><br></pre></td></tr></table></figure><p>使用rename 重命名文件,比如在当前目录中重命名,如下file.md 变成了 new_markdown.md</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">renamed_md = markdown.with_stem(<span class="string">"new_markdown"</span>)</span><br><span class="line"></span><br><span class="line">markdown.rename(renamed_md) <span class="comment"># PosixPath('data/new_markdown.md')</span></span><br></pre></td></tr></table></figure><p>通过 stat().st_size 查看文件大小</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># Display file size</span></span><br><span class="line"></span><br><span class="line">renamed_md.stat().st_size <span class="comment"># 49</span></span><br></pre></td></tr></table></figure><p>查看最后一次修改文件的时间</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> datetime <span class="keyword">import</span> datetime</span><br><span class="line"></span><br><span class="line">modified_timestamp = renamed_md.stat().st_mtime</span><br><span class="line"></span><br><span class="line">datetime.fromtimestamp(modified_timestamp) <span class="comment"># datetime.datetime(2023, 8, 1, 13, 32, 45, 542693)</span></span><br></pre></td></tr></table></figure><p>st_mtime 返回一个自 1970 年 1 月 1 日以来的秒数。为了使其可读,搭配使用 datatime 的 fromtimestamp 函数。</p><p>要删除不需要的文件,可以 unlink</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">renamed_md.unlink(missing_ok=<span class="literal">True</span>)</span><br></pre></td></tr></table></figure><p>如果文件不存在,将 missing_ok 设置为 True 则在文件不存在的时候不会引起报错</p><h1 id="处理目录">处理目录</h1><p>要捕获具有特定扩展名或名称的所有文件,可以将 glob 函数与正则表达式结合使用。</p><p>例如,使用 glob("*.txt") 查找主目录中所有文本文件</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">home = Path.home()</span><br><span class="line">text_files = <span class="built_in">list</span>(home.glob(<span class="string">"*.txt"</span>))</span><br><span class="line"></span><br><span class="line"><span class="built_in">len</span>(text_files) <span class="comment"># 3</span></span><br></pre></td></tr></table></figure><p>要递归搜索文本文件(即在所有子目录中),可以glob 与 rglob 结合使用:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">all_text_files = [p <span class="keyword">for</span> p <span class="keyword">in</span> home.rglob(<span class="string">"*.txt"</span>)]</span><br><span class="line"></span><br><span class="line"><span class="built_in">len</span>(all_text_files) <span class="comment"># 5116 </span></span><br></pre></td></tr></table></figure>]]></content>
<tags>
<tag> Python </tag>
<tag> Grammar </tag>
</tags>
</entry>
<entry>
<title>collections</title>
<link href="/2023/08/08/collections/"/>
<url>/2023/08/08/collections/</url>
<content type="html"><![CDATA[<h1 id="defaultdict">defaultdict</h1><h2 id="认识defaultdict">认识defaultdict</h2><p>当我们使用普通的字典时,用法一般是dict={},添加元素的只需要dict[element] =value即,调用的时候也是如此,dict[element] = xxx,但前提是element字典里,如果不在字典里就会报错。</p><p>这时defaultdict就能排上用场了,defaultdict的作用是在于,当字典里的key不存在但被查找时,返回的不是keyError而是一个默认值。</p><h2 id="使用defaultdict">使用defaultdict</h2><p>defaultdict接受一个工厂函数作为参数,如下来构造:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">dict</span> =defaultdict( factory_function)</span><br></pre></td></tr></table></figure><p>这个factory_function可以是list、set、str等等,作用是当key不存在时,返回的是工厂函数的默认值,比如list对应[ ],str对应的是空字符串,set对应set( ),int对应0,如下举例:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> collections <span class="keyword">import</span> defaultdict</span><br><span class="line"></span><br><span class="line">dict1 = defaultdict(<span class="built_in">int</span>)</span><br><span class="line">dict2 = defaultdict(<span class="built_in">set</span>)</span><br><span class="line">dict3 = defaultdict(<span class="built_in">str</span>)</span><br><span class="line">dict4 = defaultdict(<span class="built_in">list</span>)</span><br><span class="line">dict1[<span class="number">2</span>] =<span class="string">'two'</span></span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(dict1[<span class="number">1</span>])</span><br><span class="line"><span class="built_in">print</span>(dict2[<span class="number">1</span>])</span><br><span class="line"><span class="built_in">print</span>(dict3[<span class="number">1</span>])</span><br><span class="line"><span class="built_in">print</span>(dict4[<span class="number">1</span>])</span><br></pre></td></tr></table></figure><p>输出:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="number">0</span></span><br><span class="line"><span class="built_in">set</span>()</span><br><span class="line"></span><br><span class="line">[]</span><br></pre></td></tr></table></figure><h1 id="counter">Counter</h1><p>Counter是一个计数器,用于方便快捷的计数</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># Tally occurrences of words in a list</span></span><br><span class="line">cnt = Counter()</span><br><span class="line"><span class="keyword">for</span> word <span class="keyword">in</span> [<span class="string">'red'</span>, <span class="string">'blue'</span>, <span class="string">'red'</span>, <span class="string">'green'</span>, <span class="string">'blue'</span>, <span class="string">'blue'</span>]:</span><br><span class="line"> cnt[word] += <span class="number">1</span></span><br><span class="line">cnt</span><br><span class="line"><span class="comment">#输出:</span></span><br><span class="line"><span class="comment">#Counter({'blue': 3, 'red': 2, 'green': 1})</span></span><br></pre></td></tr></table></figure><p>Counter 是一个 dict 子类,用于可哈希对象进行计数。它是一个集合,其中的元素作为字典键存储,其计数作为字典值存储。计数可以是任何整数值,包括零或负计数。Counter 类类似于其他语言中的 bag 或 multisets。</p><h2 id="创建counter对象">创建Counter对象</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">c = Counter() <span class="comment"># a new, empty counter</span></span><br><span class="line">c = Counter(<span class="string">'gallahad'</span>) <span class="comment"># a new counter from an iterable</span></span><br><span class="line">c = Counter({<span class="string">'red'</span>: <span class="number">4</span>, <span class="string">'blue'</span>: <span class="number">2</span>}) <span class="comment"># a new counter from a mapping</span></span><br><span class="line">c = Counter(cats=<span class="number">4</span>, dogs=<span class="number">8</span>) <span class="comment"># a new counter from keyword args</span></span><br></pre></td></tr></table></figure><h2 id="methods">Methods</h2><h3 id="elements">elements()</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> i <span class="keyword">in</span> c.elements():</span><br><span class="line"> <span class="built_in">print</span>(i,end=<span class="string">''</span>)</span><br><span class="line"><span class="comment">#输出:</span></span><br><span class="line"><span class="comment">#aaaabbbbb</span></span><br></pre></td></tr></table></figure><h3 id="most_commonn">most_common(n)</h3><p>返回 n 个最常见元素的列表,以及从最常见到最少的计数。如果省略 n 或为 None,most_common() 将返回计数器中的所有元素。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">Counter(<span class="string">'abracadabra'</span>).most_common(<span class="number">3</span>)</span><br><span class="line"><span class="comment">#输出:</span></span><br><span class="line"><span class="comment">#[('a', 5), ('b', 2), ('r', 2)]</span></span><br></pre></td></tr></table></figure><h1 id="defaultdictcounter">defaultdict&Counter</h1><p>通过使用defaultdict和Counter实现将多个csv通过投票法合成一个csv</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> csv</span><br><span class="line"><span class="keyword">from</span> collections <span class="keyword">import</span> defaultdict, Counter</span><br><span class="line"></span><br><span class="line"><span class="comment"># 存储每个名称的所有标签</span></span><br><span class="line">name_labels = defaultdict(<span class="built_in">list</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment"># 读取每个CSV文件并收集标签</span></span><br><span class="line"><span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(<span class="number">1</span>, <span class="number">8</span>):</span><br><span class="line"> <span class="keyword">with</span> <span class="built_in">open</span>(<span class="string">f'<span class="subst">{i}</span>.csv'</span>, <span class="string">'r'</span>) <span class="keyword">as</span> csvfile:</span><br><span class="line"> reader = csv.DictReader(csvfile)</span><br><span class="line"> <span class="keyword">for</span> row <span class="keyword">in</span> reader:</span><br><span class="line"> name_labels[row[<span class="string">'name'</span>]].append(row[<span class="string">'label'</span>])</span><br><span class="line"> </span><br><span class="line"><span class="comment"># 找到每个名称对应的标签的众数,并保存到新的字典中</span></span><br><span class="line">result = {}</span><br><span class="line"><span class="keyword">for</span> name, labels <span class="keyword">in</span> name_labels.items():</span><br><span class="line"> most_common_label, _ = Counter(labels).most_common(<span class="number">1</span>)[<span class="number">0</span>]</span><br><span class="line"> result[name] = most_common_label</span><br><span class="line"></span><br><span class="line"><span class="comment"># 将结果保存到新的CSV文件</span></span><br><span class="line"><span class="keyword">with</span> <span class="built_in">open</span>(<span class="string">'result.csv'</span>, <span class="string">'w'</span>, newline=<span class="string">''</span>) <span class="keyword">as</span> csvfile:</span><br><span class="line"> fieldnames = [<span class="string">'name'</span>, <span class="string">'label'</span>]</span><br><span class="line"> writer = csv.DictWriter(csvfile, fieldnames=fieldnames)</span><br><span class="line"> writer.writeheader()</span><br><span class="line"> <span class="keyword">for</span> name, label <span class="keyword">in</span> result.items():</span><br><span class="line"> writer.writerow({<span class="string">'name'</span>: name, <span class="string">'label'</span>: label})</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">'合并完成,结果已保存到result.csv文件中。'</span>)</span><br></pre></td></tr></table></figure>]]></content>
<tags>
<tag> Python </tag>
<tag> Grammar </tag>
</tags>
</entry>
<entry>
<title>博客搭建</title>
<link href="/2023/08/06/%E5%8D%9A%E5%AE%A2%E6%90%AD%E5%BB%BA/"/>
<url>/2023/08/06/%E5%8D%9A%E5%AE%A2%E6%90%AD%E5%BB%BA/</url>
<content type="html"><![CDATA[<h1 id="安装相关项目">安装相关项目</h1><h2 id="node.js">Node.js</h2><p><a href="https://m.php.cn/faq/520426.html">Nodejs安装博客</a></p><p>在 Linux 系统下,你可以通过查看 Node.js 的版本号来判断它是否已经安装成功。在终端中输入以下命令: <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">node -v</span><br></pre></td></tr></table></figure> 如果你系统中安装了 Node.js,终端会显示它的版本号。如果终端中没有任何输出,说明你需要先安装 Node.js。 在终端中输入以下命令安装: <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt-get install nodejs</span><br></pre></td></tr></table></figure> 安装完成后,你可以再次输入: <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">node -v </span><br></pre></td></tr></table></figure> 来确认是否安装成功</p><h3 id="升级nodejs">升级nodejs</h3><p>查看当前 nodejs 的版本为</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">node -v </span><br></pre></td></tr></table></figure><p>首先下载 n 这个用于更新 node 版本的工具</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo npm install n -g</span><br></pre></td></tr></table></figure><p>然后通过 n 这个工具下载 nodejs 的最新稳定版本</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">sudo n stable</span><br><span class="line"><span class="meta prompt_"> </span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">也可升级到指定版本 sudo n (node版本号)</span></span><br><span class="line"> </span><br><span class="line">sudo n 18.17.1</span><br></pre></td></tr></table></figure><blockquote><p>下载完成后 如果发现 node -v 仍然是之前的版本,根据不同的 shell 版本执行 hash -r 或者 rehash 即可 hash -r (for bash, zsh, ash, dash, and ksh) rehash (for csh and tcsh)</p></blockquote><h2 id="hexo">hexo</h2><p><a href="https://hexo.io/zh-cn/docs/index.html">hexo安装博客</a><br />### 安装前提<br />安装 Hexo 相当简单,只需要先安装下列应用程序即可:</p><ul><li>Node.js (Node.js 版本需不低于 10.13,建议使用 Node.js 12.0 及以上版本)</li><li>Git</li></ul><p>如果您的电脑中已经安装上述必备程序,那么恭喜您!你可以直接前往 安装 Hexo 步骤。</p><p>如果您的电脑中尚未安装所需要的程序,请根据以下安装指示完成安装。</p><h3 id="安装-hexo">安装 Hexo</h3><p>所有必备的应用程序安装完成后,即可使用 npm 安装 Hexo。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">npm install -g hexo-cli</span><br></pre></td></tr></table></figure><h1 id="配置本地hexo">配置本地hexo</h1><ol type="1"><li>创建一个本地文件夹,命名为My_blog如果是其他名字也无所谓</li><li>通过一下命令来初始化仓库</li></ol><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">hexo init</span><br></pre></td></tr></table></figure><ol start="3" type="1"><li>安装相关配置</li></ol><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">npm install</span><br></pre></td></tr></table></figure><ol start="4" type="1"><li>测试本地仓库</li></ol><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">hexo cl</span><br><span class="line">hexo g</span><br><span class="line">hexo s</span><br></pre></td></tr></table></figure><h1 id="部署">部署</h1><p>参考自:<a href="https://hebe-tian.github.io/2023/02/15/%E7%AC%AC%E4%B8%80%E7%AF%87%E5%8D%9A%E5%AE%A2%EF%BC%8CHexo-GitHub-Pages/">Hexo+GitHub Pages</a></p><h2 id="github">github</h2><p>目前博客只能在本地访问,如果需要关联到github上,我们需要另外在_config.yml文件中配置 1. 在github对应的仓库中使用ssh形式clone,得到对应的SSH key(就是可以用ssh远程访问该仓库的链接) 2. 编辑本地仓库中_config.yml文件</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">deploy: </span><br><span class="line"> type: git</span><br><span class="line"> repository: <yourSSHKey></span><br><span class="line"> branch: master</span><br></pre></td></tr></table></figure><p>需要注意所有:之后都要带一个空格,否则不会生效 3. 部署到GitHub Pages - 打开命令行,先输入<code>hexo clean</code>清理缓存 - 在命令行输入<code>hexo generate</code>或者<code>hexo g</code>用来渲染页面 - 在命令行输入<code>hexo server</code>或者<code>hexo s</code>用来打开本地服务器,命令行会提示可以访问http://localhost:4000 - 调试完成之后使用<code>hexo deploy</code>或者<code>hexo d</code>用来把博客部署到git的服务器上这一步需要注意,如果之前没有执行过<code>npm install hexo-deployer-git --save</code>,需要在hexo d之前执行</p><h2 id="相关问题">相关问题</h2><ul><li>注意,尽量不使用sudo来进行部署,因为sudo用的是root用户,可能导致git相关配置出现问题,如果在执行<code>hexo d</code>时出现了<code>permission denied</code>可使用如下命令进行解决</li></ul><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo chown -R $USER:$USER /home/rexkev/sources/github/Rex_Blog(<path/to/your/blog>)</span><br></pre></td></tr></table></figure>]]></content>
<tags>
<tag> Blog </tag>
</tags>
</entry>
</search>