-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
45 lines (30 loc) · 1.69 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
* Author: Zack Galbreath
* Software used: Mac OS 10.6, Python 2.6.1, SQLite 3.6.12, PycURL 7.19.0,
BeautifulSoap 3.2.0, Pipermail 0.09 (Mailman edition).
* Project home: git://github.com/zackgalbreath/MailingListData.git
* MD5 checksum of the "official" database file is:
c17555700f3c9f0a019be6a537f83bc6
== Contents ==
README: The document you're currently reading
MailingListData.{db,csv,xlsx}: My dataset, in SQLite, comma-separated-values,
and Microsoft Excel 2007-2008 format
collectData.py: Generates the dataset and stores it in an SQLite database
convertSQLiteToCSV.py: Converts the SQLite database to CSV format
== Usage notes ==
* Running collectData.py will overwrite an existing MailingListData.db
If you modify the database and wish to keep your changes, you should rename
it or save it somewhere else.
* Similarly, convertSQLiteToCSV.py overwrites MailingListData.csv.
== Details about the columns ==
* When recording Message_Subject, any tab character was converted to a single
space.
* A Received_Reply of "self only" means that the original author was the only
person to reply to the message.
* Time_of_Day is recorded in EDT using a 24 hour clock.
* Message_Length is the number of characters in the HTML source of Archive_URL.
* Any_Attachments is set to "yes" when we detect a "non-text" attachment that is
not a "pgp-signature". Note that Pipermail considers C++ source code
(MIME-type text/x-c++src) to be a "non-text" attachment.
* You should be able to read each original email in its entirety at its
Archive_URL. This list of URLs was taken from:
http://www.itk.org/pipermail/insight-users/2011-August/thread.html