-
Notifications
You must be signed in to change notification settings - Fork 6
/
CHANGELOG
276 lines (185 loc) · 9.2 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
2024-08-22 Akkana Peck <[email protected]>
New config variable allow_dup_titles.
Allow guid as unique id.
Accept <link> as well as <links>.
2024-02-12 Akkana Peck <[email protected]>
Add html_index_links, an option that allows feeding from an
HTML page rather than RSS/Atom.
2024-01-07 Akkana Peck <[email protected]>
Add multipage_pat, for sites that spread stories over multiple pages.
2023-12-17 Akkana Peck <[email protected]>
Add new levels=1.5 stage, to not fetch story pages for sites
that put the entire story in the feed.
Bump version to 1.1b6.
2023-12-15 Akkana Peck <[email protected]>
Refactoring, and rename feedmeparser to pageparser.
2023-12-12 Akkana Peck <[email protected]>
More improvement on not showing nonlocal images.
2023-12-05 Akkana Peck <[email protected]>
Continuing work on showing something in the index file
when stories couldn't be fetched.
urlrss: better local operation (helpful for testing).
2023-07-21 Akkana Peck <[email protected]>
Use BeautifulSoup for parsing.
2023-07-20 Akkana Peck <[email protected]>
Smarter handling of nonlocal images.
(Work continues for a few months.)
2023-05-18 Akkana Peck <[email protected]>
Add a new config variable, skip_nodes, to skip whole parsed
nodes and all children.
2023-05-11 Akkana Peck <[email protected]>
Show images in index page as well as sub-pages.
2022-11-05 Akkana Peck <[email protected]>
Show images in index page as well as sub-pages, if RSS has images.
2022-09-09 Akkana Peck <[email protected]>
Various tweaks to handle images from Wordpress plugins
that don't use img src.
2021-10-07 Akkana Peck <[email protected]>
Implement page and feed helpers, including a page helper
that uses selenium for the New York Times.
2021-06-07 Akkana Peck <[email protected]>
New site variable, allow_repeats.
2021-11-11 Akkana Peck <[email protected]>
Allow specifying a firefox cookie file, for authenticated sites.
Add a NY Times site file using the cookie file.
2021-10-26 Akkana Peck <[email protected]>
Allow helper files, and add an example that fetches NY Times
using Selenium.
2021-03-27 Akkana Peck <[email protected]>
Allow links in RSS content.
2020-08-11 Akkana Peck <[email protected]>
Ensure legal HTML after truncating RSS entries.
Add new dependency on BeautifulSoup.
2020-08-11 Akkana Peck <[email protected]>
Commandline args: accept filename, with or without .conf,
instead of requiring the full quoted feed name.
2020-04-25 Akkana Peck <[email protected]>
Improve nonlocal image blocking, rename pref to block_nonlocal_images.
2020-04-12 Akkana Peck <[email protected]>
Skip <source> tags, in ongoing effort to cache images
locally and not use bandwidth later when the file is read.
2019-11-30 Akkana Peck <[email protected]>
Add a test framework.
IMPORTANT NOTE: This required renaming feedme to feedme.py.
2019-11-30 Akkana Peck <[email protected]>
1.0:
Bump version to 1.0, finally!
2019-09-22 Akkana Peck <[email protected]>
Add a meta viewport line to every HTML file: iOS needs it.
Skip style tags that set fonts.
2018-10-14 Akkana Peck <[email protected]>
Skip stories with the same title as one we've seen before
(Washington Post repeats stories over and over).
2018-09-28 Akkana Peck <[email protected]>
New config option "alt_domains"
for allowing images from domains other than a site's main domain.
2018-04-13 Akkana Peck <[email protected]>
Feedviewer, a minimal Python feed viewing program,
in case there's ever a portable reader that can run Python.
2018-03-30 Akkana Peck <[email protected]>
Handle img srcset. Add a new config var max_srcset_size
specifying what size of image we should try to download
if there are multiple sizes.
2018-03-13 Akkana Peck <[email protected]>
Add a new config variable, "block_nonlocal_images",
to replace remote img src with a bogus local entry,
to avoid unwanted bandwidth.
2018-03-10 Akkana Peck <[email protected]>
Screen out stories that are repeated multiple times
in the same day's feed.
2018-02-03 Akkana Peck <[email protected]>
CSS: Restrict width of figure as well as img,
for sites like High Country News that wrap every img in a figure.
2017-10-09 Akkana Peck <[email protected]>
1.0b4:
Add two new config flags: simplify and rss_entry_size.
Both are for the LA Daily Post's new misbehavior of putting
the whole story into the RSS, along with broken formatting
that sometimes makes the whole story unreadable (font colors
and sizes).
2017-07-26 Akkana Peck <[email protected]>
Port to Python 3.
Rewrite images to local from RSS as well as HTML.
2017-06-23 Akkana Peck <[email protected]>
Try to eliminate audio/video links.
2017-06-17 Akkana Peck <[email protected]>
1.0b3:
Rename all the skip_*_pat to end with "pats" for consistency:
they all accept multiple values.
Add skip_content_pats and skip_title_pats.
Document all the skip_*_pats more clearly.
2017-04-17 Akkana Peck <[email protected]>
1.0b2:
Accept multiple configuration files (e.g. one .conf file per site).
Write each story's URL in the footer.
Write .html files first to MANIFEST, so they'll be
fetched first in case of dodgy networks.
Fix sites that use duplicate image names.
Update the documentation.
2016-12-13 Akkana Peck <[email protected]>
Write the cache in a custom, human readable format instead of pickle.
Skip entries so old they've expired from cache.
Accept application/atom+xml as well as application/rss for RSS pages.
2016-11-23 Akkana Peck <[email protected]>
Fetch the RSS page with urllib2, not with feedparser,
to guard against feedparser doing bogus charset remapping
(May be just a bug on Debian Stretch's feedparser) and feedparser's
inability to read from file:// URLs.
Add allow_gzip = false option for sites where gzip is broken.
2016-10-07 Akkana Peck <[email protected]>
On URL errors, include a link so the user can try again.
Check publication date against last time we fetched the current feed.
Include continue_on_timeout in the ConfigParser options.
Add user_agent as a config option.
Don't run if there's a feedme process already running.
Guard against problems due to recent Python strptime parser changes.
Handle pickle errors better.
2015-09-25 Akkana Peck <[email protected]>
1.0b1: Save a MANIFEST file with a list of all filenames written.
Add a line to index.html on errors fetching stories.
Allow for cookies in the request. Handle gzipped http.
2015-05-25 Akkana Peck <[email protected]>
Set the User-Agent. Better handling of timeouts.
Pay attention to base href when downloading images.
Make images fit on screen (mostly) on phones.
2013-12-10 Akkana Peck <[email protected]>
Add urlrss python CGI script, and make it easier to kick off
feedme from CGI so it can be initiated remotely.
Add LOG file.
2013-06-25 Akkana Peck <[email protected]>
Add "when" so sites can be checked less often than daily.
Handle file:// since feedparser doesn't. Add skip_links option.
Handle meta refresh directives, and skip a lot of problematic tags.
Handle cases where no content is downloaded.
2012-05-18 Akkana Peck <[email protected]>
0.9: Parse with lxml.html, better URL rewriting and image downloading.
Add author names to stories. Handle redirects. Omit iframes.
2011-12-23 Akkana Peck <[email protected]>
0.8: Several reliability fixes, guards against bad file types, etc.
2011-01-30 Akkana Peck <[email protected]>
0.7: Clean up old feed directories (new config param "save_days").
Handle multi-line configs, e.g. for skip_patterns.
Match skip, start and end patterns that span multiple lines in the source page.
Add stylesheet to output files, for use with html readers or FeedViewer.
Fix case where truncated title includes a start tag but not the corresponding end tag.
New "formats" parameter to specify which format(s) to generate.
2010-12-22 Akkana Peck <[email protected]>
0.7b1: handle epub format, or none (specified in config file); save each day's feed to its own dated directory.
2010-11-17 Akkana Peck <[email protected]>
0.6: handle optional FB2 format; show author.
2010-08-02 Akkana Peck <[email protected]>
0.5: Beef up the interrupt handling to fix places where it didn't work; reject non-text files (e.g. MP3s from podcasts).
2010-03-03 Akkana Peck <[email protected]>
0.4:
Rewrite URLs so unfollowed links show up as coming from the original site, not file://
error/msg logging: show at end, not inline
handle failures like not finding plucker
make backup of cache file
2009-12-09 Akkana Peck <[email protected]>
0.3: add commandline arguments, including -c to bypass caching. Handle failures to download articles.
2009-10-20 Akkana Peck <[email protected]>
0.2: integrate ununicode.toascii; add extra content link at end of each index page entry.
2009-10-15 Akkana Peck <[email protected]>
0.2pre1: smarter config parsing (~ and HOME), some support for ascii conversion, some extra links for added convenience.
2009-10-06 Akkana Peck <[email protected]>
First release.