forked from lilianweng/lilianweng.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.xml
421 lines (337 loc) · 34.7 KB
/
index.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>Lil'Log</title>
<link>https://lilianweng.github.io/</link>
<description>Recent content on Lil'Log</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Thu, 08 Sep 2022 10:00:00 -0700</lastBuildDate><atom:link href="https://lilianweng.github.io/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>Some Math behind Neural Tangent Kernel</title>
<link>https://lilianweng.github.io/posts/2022-09-08-ntk/</link>
<pubDate>Thu, 08 Sep 2022 10:00:00 -0700</pubDate>
<guid>https://lilianweng.github.io/posts/2022-09-08-ntk/</guid>
<description>Neural networks are well known to be over-parameterized and can often easily fit data with near-zero training loss with decent generalization performance on test dataset. Although all these parameters are initialized at random, the optimization process can consistently lead to similarly good outcomes. And this is true even when the number of model parameters exceeds the number of training data points.
Neural tangent kernel (NTK) (Jacot et al. 2018) is a kernel to explain the evolution of neural networks during training via gradient descent.</description>
</item>
<item>
<title>Generalized Visual Language Models</title>
<link>https://lilianweng.github.io/posts/2022-06-09-vlm/</link>
<pubDate>Thu, 09 Jun 2022 15:10:30 -0700</pubDate>
<guid>https://lilianweng.github.io/posts/2022-06-09-vlm/</guid>
<description>Processing images to generate text, such as image captioning and visual question-answering, has been studied for years. Traditionally such systems rely on an object detection network as a vision encoder to capture visual features and then produce text via a text decoder. Given a large amount of existing literature, in this post, I would like to only focus on one approach for solving vision language tasks, which is to extend pre-trained generalized language models to be capable of consuming visual signals.</description>
</item>
<item>
<title>Learning with not Enough Data Part 3: Data Generation</title>
<link>https://lilianweng.github.io/posts/2022-04-15-data-gen/</link>
<pubDate>Fri, 15 Apr 2022 15:10:30 -0700</pubDate>
<guid>https://lilianweng.github.io/posts/2022-04-15-data-gen/</guid>
<description>Here comes the Part 3 on learning with not enough data (Previous: Part 1 and Part 2). Let’s consider two approaches for generating synthetic data for training.
Augmented data. Given a set of existing training samples, we can apply a variety of augmentation, distortion and transformation to derive new data points without losing the key attributes. We have covered a bunch of augmentation methods on text and images in a previous post on contrastive learning.</description>
</item>
<item>
<title>Learning with not Enough Data Part 2: Active Learning</title>
<link>https://lilianweng.github.io/posts/2022-02-20-active-learning/</link>
<pubDate>Sun, 20 Feb 2022 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2022-02-20-active-learning/</guid>
<description>This is part 2 of what to do when facing a limited amount of labeled data for supervised learning tasks. This time we will get some amount of human labeling work involved, but within a budget limit, and therefore we need to be smart when selecting which samples to label.
Notations Symbol Meaning $K$ Number of unique class labels. $(\mathbf{x}^l, y) \sim \mathcal{X}, y \in \{0, 1\}^K$ Labeled dataset.</description>
</item>
<item>
<title>Learning with not Enough Data Part 1: Semi-Supervised Learning</title>
<link>https://lilianweng.github.io/posts/2021-12-05-semi-supervised/</link>
<pubDate>Sun, 05 Dec 2021 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2021-12-05-semi-supervised/</guid>
<description>When facing a limited amount of labeled data for supervised learning tasks, four approaches are commonly discussed.
Pre-training + fine-tuning: Pre-train a powerful task-agnostic model on a large unsupervised data corpus, e.g. pre-training LMs on free text, or pre-training vision models on unlabelled images via self-supervised learning, and then fine-tune it on the downstream task with a small set of labeled samples. Semi-supervised learning: Learn from the labelled and unlabeled samples together.</description>
</item>
<item>
<title>How to Train Really Large Models on Many GPUs?</title>
<link>https://lilianweng.github.io/posts/2021-09-25-train-large/</link>
<pubDate>Fri, 24 Sep 2021 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2021-09-25-train-large/</guid>
<description>[Updated on 2022-03-13: add expert choice routing.] [Updated on 2022-06-10]: Greg and I wrote a shorted and upgraded version of this post, published on OpenAI Blog: &ldquo;Techniques for Training Large Neural Networks&rdquo;
In recent years, we are seeing better results on many NLP benchmark tasks with larger pre-trained language models. How to train large and deep neural networks is challenging, as it demands a large amount of GPU memory and a long horizon of training time.</description>
</item>
<item>
<title>What are Diffusion Models?</title>
<link>https://lilianweng.github.io/posts/2021-07-11-diffusion-models/</link>
<pubDate>Sun, 11 Jul 2021 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2021-07-11-diffusion-models/</guid>
<description>[Updated on 2021-09-19: Highly recommend this blog post on score-based generative modeling by Yang Song (author of several key papers in the references)]. [Updated on 2022-08-27: Added classifier-free guidance, GLIDE, unCLIP and Imagen. [Updated on 2022-08-31: Added latent diffusion model.
So far, I&rsquo;ve written about three types of generative models, GAN, VAE, and Flow-based models. They have shown great success in generating high-quality samples, but each has some limitations of its own.</description>
</item>
<item>
<title>Contrastive Representation Learning</title>
<link>https://lilianweng.github.io/posts/2021-05-31-contrastive/</link>
<pubDate>Mon, 31 May 2021 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2021-05-31-contrastive/</guid>
<description>The goal of contrastive representation learning is to learn such an embedding space in which similar sample pairs stay close to each other while dissimilar ones are far apart. Contrastive learning can be applied to both supervised and unsupervised settings. When working with unsupervised data, contrastive learning is one of the most powerful approaches in self-supervised learning.
Contrastive Training Objectives In early versions of loss functions for contrastive learning, only one positive and one negative sample are involved.</description>
</item>
<item>
<title>Reducing Toxicity in Language Models</title>
<link>https://lilianweng.github.io/posts/2021-03-21-lm-toxicity/</link>
<pubDate>Sun, 21 Mar 2021 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2021-03-21-lm-toxicity/</guid>
<description>Large pretrained language models are trained over a sizable collection of online data. They unavoidably acquire certain toxic behavior and biases from the Internet. Pretrained language models are very powerful and have shown great success in many NLP tasks. However, to safely deploy them for practical real-world applications demands a strong safety control over the model generation process.
Many challenges are associated with the effort to diminish various types of unsafe content:</description>
</item>
<item>
<title>Controllable Neural Text Generation</title>
<link>https://lilianweng.github.io/posts/2021-01-02-controllable-text-generation/</link>
<pubDate>Sat, 02 Jan 2021 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2021-01-02-controllable-text-generation/</guid>
<description>[Updated on 2021-02-01: Updated to version 2.0 with several work added and many typos fixed.] [Updated on 2021-05-26: Add P-tuning and Prompt Tuning in the &ldquo;prompt design&rdquo; section.] [Updated on 2021-09-19: Add &ldquo;unlikelihood training&rdquo;.]
There is a gigantic amount of free text on the Web, several magnitude more than labelled benchmark datasets. The state-of-the-art language models (LM) are trained with unsupervised Web data in large scale. When generating samples from LM by iteratively sampling the next token, we do not have much control over attributes of the output text, such as the topic, the style, the sentiment, etc.</description>
</item>
<item>
<title>How to Build an Open-Domain Question Answering System?</title>
<link>https://lilianweng.github.io/posts/2020-10-29-odqa/</link>
<pubDate>Thu, 29 Oct 2020 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2020-10-29-odqa/</guid>
<description>[Updated on 2020-11-12: add an example on closed-book factual QA using OpenAI API (beta).
A model that can answer any question with regard to factual knowledge can lead to many useful and practical applications, such as working as a chatbot or an AI assistant🤖. In this post, we will review several common approaches for building such an open-domain question answering system.
Disclaimers given so many papers in the wild:
Assume we have access to a powerful pretrained language model.</description>
</item>
<item>
<title>Neural Architecture Search</title>
<link>https://lilianweng.github.io/posts/2020-08-06-nas/</link>
<pubDate>Thu, 06 Aug 2020 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2020-08-06-nas/</guid>
<description>Although most popular and successful model architectures are designed by human experts, it doesn&rsquo;t mean we have explored the entire network architecture space and settled down with the best option. We would have a better chance to find the optimal solution if we adopt a systematic and automatic way of learning high-performance model architectures.
Automatically learning and evolving network topologies is not a new idea (Stanley &amp; Miikkulainen, 2002). In recent years, the pioneering work by Zoph &amp; Le 2017 and Baker et al.</description>
</item>
<item>
<title>Exploration Strategies in Deep Reinforcement Learning</title>
<link>https://lilianweng.github.io/posts/2020-06-07-exploration-drl/</link>
<pubDate>Sun, 07 Jun 2020 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2020-06-07-exploration-drl/</guid>
<description>[Updated on 2020-06-17: Add &ldquo;exploration via disagreement&rdquo; in the &ldquo;Forward Dynamics&rdquo; section.
Exploitation versus exploration is a critical topic in Reinforcement Learning. We&rsquo;d like the RL agent to find the best solution as fast as possible. However, in the meantime, committing to solutions too quickly without enough exploration sounds pretty bad, as it could lead to local minima or total failure. Modern RL algorithms that optimize for the best returns can achieve good exploitation quite efficiently, while exploration remains more like an open topic.</description>
</item>
<item>
<title>The Transformer Family</title>
<link>https://lilianweng.github.io/posts/2020-04-07-the-transformer-family/</link>
<pubDate>Tue, 07 Apr 2020 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2020-04-07-the-transformer-family/</guid>
<description>It has been almost two years since my last post on attention. Recent progress on new and enhanced versions of Transformer motivates me to write another post on this specific topic, focusing on how the vanilla Transformer can be improved for longer-term attention span, less memory and computation consumption, RL task solving and more.
Notations Symbol Meaning $d$ The model size / hidden state dimension / positional encoding size.</description>
</item>
<item>
<title>Curriculum for Reinforcement Learning</title>
<link>https://lilianweng.github.io/posts/2020-01-29-curriculum-rl/</link>
<pubDate>Wed, 29 Jan 2020 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2020-01-29-curriculum-rl/</guid>
<description>[Updated on 2020-02-03: mentioning PCG in the &ldquo;Task-Specific Curriculum&rdquo; section. [Updated on 2020-02-04: Add a new &ldquo;curriculum through distillation&rdquo; section.
It sounds like an impossible task if we want to teach integral or derivative to a 3-year-old who does not even know basic arithmetics. That&rsquo;s why education is important, as it provides a systematic way to break down complex knowledge and a nice curriculum for teaching concepts from simple to hard.</description>
</item>
<item>
<title>Self-Supervised Representation Learning</title>
<link>https://lilianweng.github.io/posts/2019-11-10-self-supervised/</link>
<pubDate>Sun, 10 Nov 2019 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2019-11-10-self-supervised/</guid>
<description>[Updated on 2020-01-09: add a new section on Contrastive Predictive Coding]. [Updated on 2020-04-13: add a &ldquo;Momentum Contrast&rdquo; section on MoCo, SimCLR and CURL.] [Updated on 2020-07-08: add a &ldquo;Bisimulation&rdquo; section on DeepMDP and DBC.] [Updated on 2020-09-12: add MoCo V2 and BYOL in the &ldquo;Momentum Contrast&rdquo; section.] [Updated on 2021-05-31: remove section on &ldquo;Momentum Contrast&rdquo; and add a pointer to a full post on &ldquo;Contrastive Representation Learning&rdquo;]</description>
</item>
<item>
<title>Evolution Strategies</title>
<link>https://lilianweng.github.io/posts/2019-09-05-evolution-strategies/</link>
<pubDate>Thu, 05 Sep 2019 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2019-09-05-evolution-strategies/</guid>
<description>Stochastic gradient descent is a universal choice for optimizing deep learning models. However, it is not the only option. With black-box optimization algorithms, you can evaluate a target function $f(x): \mathbb{R}^n \to \mathbb{R}$, even when you don&rsquo;t know the precise analytic form of $f(x)$ and thus cannot compute gradients or the Hessian matrix. Examples of black-box optimization methods include Simulated Annealing, Hill Climbing and Nelder-Mead method.
Evolution Strategies (ES) is one type of black-box optimization algorithms, born in the family of Evolutionary Algorithms (EA).</description>
</item>
<item>
<title>Meta Reinforcement Learning</title>
<link>https://lilianweng.github.io/posts/2019-06-23-meta-rl/</link>
<pubDate>Sun, 23 Jun 2019 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2019-06-23-meta-rl/</guid>
<description>In my earlier post on meta-learning, the problem is mainly defined in the context of few-shot classification. Here I would like to explore more into cases when we try to &ldquo;meta-learn&rdquo; Reinforcement Learning (RL) tasks by developing an agent that can solve unseen tasks fast and efficiently.
To recap, a good meta-learning model is expected to generalize to new tasks or new environments that have never been encountered during training. The adaptation process, essentially a mini learning session, happens at test with limited exposure to the new configurations.</description>
</item>
<item>
<title>Domain Randomization for Sim2Real Transfer</title>
<link>https://lilianweng.github.io/posts/2019-05-05-domain-randomization/</link>
<pubDate>Sun, 05 May 2019 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2019-05-05-domain-randomization/</guid>
<description>In Robotics, one of the hardest problems is how to make your model transfer to the real world. Due to the sample inefficiency of deep RL algorithms and the cost of data collection on real robots, we often need to train models in a simulator which theoretically provides an infinite amount of data. However, the reality gap between the simulator and the physical world often leads to failure when working with physical robots.</description>
</item>
<item>
<title>Are Deep Neural Networks Dramatically Overfitted?</title>
<link>https://lilianweng.github.io/posts/2019-03-14-overfit/</link>
<pubDate>Thu, 14 Mar 2019 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2019-03-14-overfit/</guid>
<description>[Updated on 2019-05-27: add the section on Lottery Ticket Hypothesis.]
If you are like me, entering into the field of deep learning with experience in traditional machine learning, you may often ponder over this question: Since a typical deep neural network has so many parameters and training error can easily be perfect, it should surely suffer from substantial overfitting. How could it be ever generalized to out-of-sample data points?
The effort in understanding why deep neural networks can generalize somehow reminds me of this interesting paper on System Biology &mdash; &ldquo;Can a biologist fix a radio?</description>
</item>
<item>
<title>Generalized Language Models</title>
<link>https://lilianweng.github.io/posts/2019-01-31-lm/</link>
<pubDate>Thu, 31 Jan 2019 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2019-01-31-lm/</guid>
<description>[Updated on 2019-02-14: add ULMFiT and GPT-2.] [Updated on 2020-02-29: add ALBERT.] [Updated on 2020-10-25: add RoBERTa.] [Updated on 2020-12-13: add T5.] [Updated on 2020-12-30: add GPT-3.] [Updated on 2021-11-13: add XLNet, BART and ELECTRA; Also updated the Summary section.]
Fig. 0. I guess they are Elmo & Bert? (Image source: here) We have seen amazing progress in NLP in 2018. Large-scale pre-trained language modes like OpenAI GPT and BERT have achieved great performance on a variety of language tasks using generic model architectures.</description>
</item>
<item>
<title>Object Detection Part 4: Fast Detection Models</title>
<link>https://lilianweng.github.io/posts/2018-12-27-object-recognition-part-4/</link>
<pubDate>Thu, 27 Dec 2018 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2018-12-27-object-recognition-part-4/</guid>
<description>In Part 3, we have reviewed models in the R-CNN family. All of them are region-based object detection algorithms. They can achieve high accuracy but could be too slow for certain applications such as autonomous driving. In Part 4, we only focus on fast object detection models, including SSD, RetinaNet, and models in the YOLO family.
Links to all the posts in the series: [Part 1] [Part 2] [Part 3] [Part 4].</description>
</item>
<item>
<title>Meta-Learning: Learning to Learn Fast</title>
<link>https://lilianweng.github.io/posts/2018-11-30-meta-learning/</link>
<pubDate>Fri, 30 Nov 2018 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2018-11-30-meta-learning/</guid>
<description>[Updated on 2019-10-01: thanks to Tianhao, we have this post translated in Chinese!]
A good machine learning model often requires training with a large number of samples. Humans, in contrast, learn new concepts and skills much faster and more efficiently. Kids who have seen cats and birds only a few times can quickly tell them apart. People who know how to ride a bike are likely to discover the way to ride a motorcycle fast with little or even no demonstration.</description>
</item>
<item>
<title>Flow-based Deep Generative Models</title>
<link>https://lilianweng.github.io/posts/2018-10-13-flow-models/</link>
<pubDate>Sat, 13 Oct 2018 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2018-10-13-flow-models/</guid>
<description>So far, I&rsquo;ve written about two types of generative models, GAN and VAE. Neither of them explicitly learns the probability density function of real data, $p(\mathbf{x})$ (where $\mathbf{x} \in \mathcal{D}$) &mdash; because it is really hard! Taking the generative model with latent variables as an example, $p(\mathbf{x}) = \int p(\mathbf{x}\vert\mathbf{z})p(\mathbf{z})d\mathbf{z}$ can hardly be calculated as it is intractable to go through all possible values of the latent code $\mathbf{z}$.
Flow-based deep generative models conquer this hard problem with the help of normalizing flows, a powerful statistics tool for density estimation.</description>
</item>
<item>
<title>From Autoencoder to Beta-VAE</title>
<link>https://lilianweng.github.io/posts/2018-08-12-vae/</link>
<pubDate>Sun, 12 Aug 2018 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2018-08-12-vae/</guid>
<description>[Updated on 2019-07-18: add a section on VQ-VAE &amp; VQ-VAE-2.] [Updated on 2019-07-26: add a section on TD-VAE.]
Autocoder is invented to reconstruct high-dimensional data using a neural network model with a narrow bottleneck layer in the middle (oops, this is probably not true for Variational Autoencoder, and we will investigate it in details in later sections). A nice byproduct is dimension reduction: the bottleneck layer captures a compressed latent encoding.</description>
</item>
<item>
<title>Attention? Attention!</title>
<link>https://lilianweng.github.io/posts/2018-06-24-attention/</link>
<pubDate>Sun, 24 Jun 2018 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2018-06-24-attention/</guid>
<description>[Updated on 2018-10-28: Add Pointer Network and the link to my implementation of Transformer.] [Updated on 2018-11-06: Add a link to the implementation of Transformer model.] [Updated on 2018-11-18: Add Neural Turing Machines.] [Updated on 2019-07-18: Correct the mistake on using the term &ldquo;self-attention&rdquo; when introducing the show-attention-tell paper; moved it to Self-Attention section.] [Updated on 2020-04-07: A follow-up post on improved Transformer models is here.]
Attention is, to some extent, motivated by how we pay visual attention to different regions of an image or correlate words in one sentence.</description>
</item>
<item>
<title>Implementing Deep Reinforcement Learning Models with Tensorflow + OpenAI Gym</title>
<link>https://lilianweng.github.io/posts/2018-05-05-drl-implementation/</link>
<pubDate>Sat, 05 May 2018 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2018-05-05-drl-implementation/</guid>
<description>The full implementation is available in lilianweng/deep-reinforcement-learning-gym
In the previous two posts, I have introduced the algorithms of many deep reinforcement learning models. Now it is the time to get our hands dirty and practice how to implement the models in the wild. The implementation is gonna be built in Tensorflow and OpenAI gym environment. The full version of the code in this tutorial is available in [lilian/deep-reinforcement-learning-gym].
Environment Setup Make sure you have Homebrew installed: /usr/bin/ruby -e &#34;$(curl -fsSL https://raw.</description>
</item>
<item>
<title>Policy Gradient Algorithms</title>
<link>https://lilianweng.github.io/posts/2018-04-08-policy-gradient/</link>
<pubDate>Sun, 08 Apr 2018 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2018-04-08-policy-gradient/</guid>
<description>[Updated on 2018-06-30: add two new policy gradient methods, SAC and D4PG.] [Updated on 2018-09-30: add a new policy gradient method, TD3.] [Updated on 2019-02-09: add SAC with automatically adjusted temperature]. [Updated on 2019-06-26: Thanks to Chanseok, we have a version of this post in Korean]. [Updated on 2019-09-12: add a new policy gradient method SVPG.] [Updated on 2019-12-22: add a new policy gradient method IMPALA.</description>
</item>
<item>
<title>A (Long) Peek into Reinforcement Learning</title>
<link>https://lilianweng.github.io/posts/2018-02-19-rl-overview/</link>
<pubDate>Mon, 19 Feb 2018 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2018-02-19-rl-overview/</guid>
<description>[Updated on 2020-09-03: Updated the algorithm of SARSA and Q-learning so that the difference is more pronounced. [Updated on 2021-09-19: Thanks to 爱吃猫的鱼, we have this post in Chinese].
A couple of exciting news in Artificial Intelligence (AI) has just happened in recent years. AlphaGo defeated the best professional human player in the game of Go. Very soon the extended algorithm AlphaGo Zero beat AlphaGo by 100-0 without supervised learning on human knowledge.</description>
</item>
<item>
<title>The Multi-Armed Bandit Problem and Its Solutions</title>
<link>https://lilianweng.github.io/posts/2018-01-23-multi-armed-bandit/</link>
<pubDate>Tue, 23 Jan 2018 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2018-01-23-multi-armed-bandit/</guid>
<description>The algorithms are implemented for Bernoulli bandit in lilianweng/multi-armed-bandit.
Exploitation vs Exploration The exploration vs exploitation dilemma exists in many aspects of our life. Say, your favorite restaurant is right around the corner. If you go there every day, you would be confident of what you will get, but miss the chances of discovering an even better option. If you try new places all the time, very likely you are gonna have to eat unpleasant food from time to time.</description>
</item>
<item>
<title>Object Detection for Dummies Part 3: R-CNN Family</title>
<link>https://lilianweng.github.io/posts/2017-12-31-object-recognition-part-3/</link>
<pubDate>Sun, 31 Dec 2017 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2017-12-31-object-recognition-part-3/</guid>
<description>[Updated on 2018-12-20: Remove YOLO here. Part 4 will cover multiple fast object detection algorithms, including YOLO.] [Updated on 2018-12-27: Add bbox regression and tricks sections for R-CNN.]
In the series of &ldquo;Object Detection for Dummies&rdquo;, we started with basic concepts in image processing, such as gradient vectors and HOG, in Part 1. Then we introduced classic convolutional neural network architecture designs for classification and pioneer models for object recognition, Overfeat and DPM, in Part 2.</description>
</item>
<item>
<title>Object Detection for Dummies Part 2: CNN, DPM and Overfeat</title>
<link>https://lilianweng.github.io/posts/2017-12-15-object-recognition-part-2/</link>
<pubDate>Fri, 15 Dec 2017 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2017-12-15-object-recognition-part-2/</guid>
<description>Part 1 of the &ldquo;Object Detection for Dummies&rdquo; series introduced: (1) the concept of image gradient vector and how HOG algorithm summarizes the information across all the gradient vectors in one image; (2) how the image segmentation algorithm works to detect regions that potentially contain objects; (3) how the Selective Search algorithm refines the outcomes of image segmentation for better region proposal.
In Part 2, we are about to find out more on the classic convolution neural network architectures for image classification.</description>
</item>
<item>
<title>Object Detection for Dummies Part 1: Gradient Vector, HOG, and SS</title>
<link>https://lilianweng.github.io/posts/2017-10-29-object-recognition-part-1/</link>
<pubDate>Sun, 29 Oct 2017 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2017-10-29-object-recognition-part-1/</guid>
<description>I&rsquo;ve never worked in the field of computer vision and has no idea how the magic could work when an autonomous car is configured to tell apart a stop sign from a pedestrian in a red hat. To motivate myself to look into the maths behind object recognition and detection algorithms, I&rsquo;m writing a few posts on this topic &ldquo;Object Detection for Dummies&rdquo;. This post, part 1, starts with super rudimentary concepts in image processing and a few methods for image segmentation.</description>
</item>
<item>
<title>Learning Word Embedding</title>
<link>https://lilianweng.github.io/posts/2017-10-15-word-embedding/</link>
<pubDate>Sun, 15 Oct 2017 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2017-10-15-word-embedding/</guid>
<description>Human vocabulary comes in free text. In order to make a machine learning model understand and process the natural language, we need to transform the free-text words into numeric values. One of the simplest transformation approaches is to do a one-hot encoding in which each distinct word stands for one dimension of the resulting vector and a binary value indicates whether the word presents (1) or not (0).
However, one-hot encoding is impractical computationally when dealing with the entire vocabulary, as the representation demands hundreds of thousands of dimensions.</description>
</item>
<item>
<title>Anatomize Deep Learning with Information Theory</title>
<link>https://lilianweng.github.io/posts/2017-09-28-information-bottleneck/</link>
<pubDate>Thu, 28 Sep 2017 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2017-09-28-information-bottleneck/</guid>
<description>Professor Naftali Tishby passed away in 2021. Hope the post can introduce his cool idea of information bottleneck to more people.
Recently I watched the talk &ldquo;Information Theory in Deep Learning&rdquo; by Prof Naftali Tishby and found it very interesting. He presented how to apply the information theory to study the growth and transformation of deep neural networks during training. Using the Information Bottleneck (IB) method, he proposed a new learning bound for deep neural networks (DNN), as the traditional learning theory fails due to the exponentially large number of parameters.</description>
</item>
<item>
<title>From GAN to WGAN</title>
<link>https://lilianweng.github.io/posts/2017-08-20-gan/</link>
<pubDate>Sun, 20 Aug 2017 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2017-08-20-gan/</guid>
<description>[Updated on 2018-09-30: thanks to Yoonju, we have this post translated in Korean!] [Updated on 2019-04-18: this post is also available on arXiv.]
Generative adversarial network (GAN) has shown great results in many generative tasks to replicate the real-world rich content such as images, human language, and music. It is inspired by game theory: two models, a generator and a critic, are competing with each other while making each other stronger at the same time.</description>
</item>
<item>
<title>How to Explain the Prediction of a Machine Learning Model?</title>
<link>https://lilianweng.github.io/posts/2017-08-01-interpretation/</link>
<pubDate>Tue, 01 Aug 2017 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2017-08-01-interpretation/</guid>
<description>The machine learning models have started penetrating into critical areas like health care, justice systems, and financial industry. Thus to figure out how the models make the decisions and make sure the decisioning process is aligned with the ethnic requirements or legal regulations becomes a necessity.
Meanwhile, the rapid growth of deep learning models pushes the requirement of interpreting complicated models further. People are eager to apply the power of AI fully on key aspects of everyday life.</description>
</item>
<item>
<title>Predict Stock Prices Using RNN: Part 2</title>
<link>https://lilianweng.github.io/posts/2017-07-22-stock-rnn-part-2/</link>
<pubDate>Sat, 22 Jul 2017 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2017-07-22-stock-rnn-part-2/</guid>
<description>In the Part 2 tutorial, I would like to continue the topic on stock price prediction and to endow the recurrent neural network that I have built in Part 1 with the capability of responding to multiple stocks. In order to distinguish the patterns associated with different price sequences, I use the stock symbol embedding vectors as part of the input.
Dataset During the search, I found this library for querying Yahoo!</description>
</item>
<item>
<title>Predict Stock Prices Using RNN: Part 1</title>
<link>https://lilianweng.github.io/posts/2017-07-08-stock-rnn-part-1/</link>
<pubDate>Sat, 08 Jul 2017 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2017-07-08-stock-rnn-part-1/</guid>
<description>This is a tutorial for how to build a recurrent neural network using Tensorflow to predict stock market prices. The full working code is available in github.com/lilianweng/stock-rnn. If you don&rsquo;t know what is recurrent neural network or LSTM cell, feel free to check my previous post.
One thing I would like to emphasize that because my motivation for writing this post is more on demonstrating how to build and train an RNN model in Tensorflow and less on solve the stock prediction problem, I didn&rsquo;t try hard on improving the prediction outcomes.</description>
</item>
<item>
<title>An Overview of Deep Learning for Curious People</title>
<link>https://lilianweng.github.io/posts/2017-06-21-overview/</link>
<pubDate>Wed, 21 Jun 2017 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/posts/2017-06-21-overview/</guid>
<description>(The post was originated from my talk for WiMLDS x Fintech meetup hosted by Affirm.)
I believe many of you have watched or heard of the games between AlphaGo and professional Go player Lee Sedol in 2016. Lee has the highest rank of nine dan and many world championships. No doubt, he is one of the best Go players in the world, but he lost by 1-4 in this series versus AlphaGo.</description>
</item>
<item>
<title>FAQ</title>
<link>https://lilianweng.github.io/faq/</link>
<pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
<guid>https://lilianweng.github.io/faq/</guid>
<description></description>
</item>
</channel>
</rss>