-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathget-comics.1
173 lines (173 loc) · 5.19 KB
/
get-comics.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
.TH GET-COMICS "1" "October 2002" "get-comics" "get-comics"
.SH NAME
get-comics \- download comics from the net
.SH SYNOPSIS
.B get-comics
[\fI-hckvCV\fR]
[\fI-d comics-directory\fR]
[\fI-i index-directory\fR]
[\fI-l links_file\fR]
[\fI-t threads\fR]
[\fI-T timeout\fR]
[\fIconfig-file...\fR]
.SH DESCRIPTION
.PP
get-comics is a program to batch download comics from the net. It was
written to automagically download comics every day while I was on
vacation and computerless.
.PP
get-comics uses an json file to describe how to download the comics.
.SH OPTIONS
.TP
\fB\-c\fR
clean (remove) images from comics dir before downloading. Also removes
html files from the index directory if specified.
.TP
\fB\-d comics-directory\fR
where to download the comics to. Overrides the json file. Defaults to
$HOME/comics.
.TP
\fB\-h\fR
usage help message.
.TP
\fB\-i index-directory\fR
where to download the index files to. Defaults to the comics directory.
.TP
\fB\-h\fR
usage help message.
.TP
\fB\-k\fR
keep the downloaded index files. Usually these are deleted. Useful
for debugging. Note: failed comics are always kept.
.TP
\fB\-l links file\fR
produce a file with links to all the comics. Does not download the
comics.
.TP
\fB\-t threads\fR
number of simultaneous download threads
.TP
\fB\-v\fR
increase verbosity
.TP
\fB\-C\fR
list ciphers if SSL enabled.
.TP
\fB\-T timeout\fR
read timeout in seconds. (default 5 minutes)
.TP
\fB\-V\fR
verify the json file and exit.
.TP
\fBjson_file(s)\fR
alternate json file, or files, to use. Defaults to /usr/share/get-comics/comics.json
.SH "HOW IT WORKS"
.PP
get-comics reads the json file and downloads the comics. It can
download more than one comic at a time. The default
is to limit to 100 concurrent connections, but the \fB\-t threads\fR option
can be used to override this.
.PP
The comics are divided into two types: direct download and
two-stage download. Direct download corresponds to comics with a fixed
or date based URL. All URLs are passed through
.IR strftime ,
with the current time so that all
.IR strftime ,
format characters work.
.PP
Two-stage downloads are for non-fixed URLs. First, the comics page is
downloaded. A regular expression (see
.BR regex (7))
is applied to the page and the result is the comic to download.
.SH "JSON FORMAT"
.PP
The json file is very powerful. The sample \fIget-comics.json\fR file has a lot of
examples, some fairly complex.
.PP
C style comments \f\/* */\fR can be placed almost anywhere.
.PP
.SH "Top Level Tags"
.PP
The following tags are general and are overridden by the command line
options:
.TP
.B directory
specifies the directory to put the comics in.
.TP
.B gocomics-regexp
specifics the regular expression to use for gocomics.
.TP
.B gocmics-url
specifies the URL format to use for gocomics. It is run through
\fIsprintf\fR to add the specific comic. It is also run through
\fIstrftime\fR like other urls.
.TP
.B threads
specifies the maximum number of threads to create at one time.
.TP
.B timeout
specifies the maximum time to wait for a read in milliseconds
.SH "Comic Tags"
.TP
.B gocomic
specifies a comic from http://gocomics.com/. Just give the gocomic
name, the \fBgocomics-url\fR and \fBgocomics-regexp\fR tags are used
to build the request. The only other tag that makes sense here is
\fBdays\fR.
.TP
.B url
specifies an URL. Either the comic URL for direct comics, or the index
page for two-stage comics. The URL is processed through \fIstrftime\fR
before being used.
.TP
.B regexp
for two-stage comics only, this is the regular expression used to try
to match the comic in the index page.
.TP
.B regmatch
if you want to match only a sub-expression of the regular expression,
put the sub-expression number here. get-comics will store up to three sub-expressions.
.TP
.B days
some comics are only available on certain days of the week. The days
tag has the following format: \fB<days>smtwtfs</days>\fR. i.e. the
days of the week starting with Sunday. If a certain day should not be
downloaded, replace it's letter with an X. For example, to download
comics on Monday, Wednesday, and Friday only:
\fB<days>XmXwXfX</days>\fR.
.TP
.B output
sometimes the file name of the comic will clash with other comics. For
example, two comics may use the date as the file name. To keep the
comics from overwriting each other, a unique output file name can be
specified with this tag. It is highly recommended that you always set this.
.TP
.B href
sometimes the URL from the regular expression is not good enough to
specify the entire URL. In this case, you can specify a path to
prepend with the \fB<href>\fR tag.
.TP
.B referer
Some pages must see a referer to work. You can specify the referer
here. You can also specify \fBurl\fR and the value from the
\fB<url>\fR tag will be used.
.TP
.B insecure
If you set insecure to non-zero, then curl will not verify the peer or
the hostname. This is less secure, but might be needed for some
comics.
.SH "FILES"
.BR comics.json
.SH "SEE ALSO"
.BR strftime (3),
.BR regex (7)
.SH AUTHOR
Written by Sean MacLennan
.SH "REPORTING BUGS"
Report bugs to <[email protected]>.
.SH COPYRIGHT
Copyright \(co 2002-2017 Sean MacLennan
.br
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.