-
Notifications
You must be signed in to change notification settings - Fork 7
/
movies.data.html
133 lines (115 loc) · 6.13 KB
/
movies.data.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
<!------------------------------------------------------------------------
<! Movies
<!----------------------------------------------------------------------->
<HTML>
<HEAD>
<TITLE> Movies </TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF">
<!------------------------------------------------------------------------
<! Title
<!----------------------------------------------------------------------->
<H1> Movies </H1>
<!------------------------------------------------------------------------
<! Data Type
<!----------------------------------------------------------------------->
<H2>Data Type</H2>
relational
<!------------------------------------------------------------------------
<! Abstract
<!----------------------------------------------------------------------->
<H2>Abstract</H2>
<p>
This data set contains a list of over 10000 films including many older, odd, and cult films. There is information on actors, casts, directors, producers, studios, etc. The material also includes some social information, as `lived-with' and `married to'.
</p>
<!------------------------------------------------------------------------
<! Sources
<!----------------------------------------------------------------------->
<H2> Sources</H2>
<H4> Original Owner and Donor</H4>
<PRE>
Gio Wiederhold
Stanford University
650-725-8363
<A HREF="mailto:[email protected]">[email protected]</A>
</PRE>
<B>Date Donated: </B> July 7, 1999
<!------------------------------------------------------------------------
<! Data Characteristics
<!----------------------------------------------------------------------->
<H2> Data Characteristics</H2>
<P>
The data is stored in relational form across several files. The central file (MAIN) is a list of movies, each with a unique identifier. These identifiers may change in successive versions. The actors (CAST) for those movies are listed with their roles in a distinct file. More information about individual actors (ACTORS) is in a third file. All directors in MAIN are listed in a fourth file (PEOPLE), with a number of important producers, writers, and cinematographers. A fifth file (REMAKES) links movies that were copied to a substantial extent from each other. The sixth file (STUDIOS) provides some information about studios shown in MAIN.
</P>
<P>
The original motivation was for database class exercises, to replace the boring `manager of the toy-department' queries. Note that the CASTS, refering MAIN and ACTORS is logically identical to the inventory file refering to suppliers and assemblies in the the standard bill-of-materials problems. Personal interests caused the database to be made complete for all Hitchcock movies and TV episodes. Related films by type and actor were added gradually.
</P>
<P>
Subsequent research on temporal databases caused date fields (years only) to be added. It allows testing, say, if the dates-of-work of an ACTOR match the dates of the MAIN films that the CAST relation shows. Object-oriented database features could be tested with fields having multiple and two-level values, as documented in DOC.
</P>
<P>
The entries were gradually collected during course work starting about 1975 and are still being updated. Most of the entries were manual. The DOC file lists some of the reference works used. Corrections and additions continue to be appreciated.
</P>
<P>
Detailed descriptions of the fields and their formats is provided in <A HREF="doc.html">doc.html</A>.
</P>
<H4>Missing Values</H4>
<P>
Outside of key fields, missing values are common. Their encoding is described in DOC. Sometimes the data seems to be unavailable, sometimes it hasn't been entered. Some information, as `lived-with' is inherently incomplete.
</P>
<H4>Censored Data</H4>
Minor actors are ignored.
<H4>Dependencies</H4>
<P> Every MAIN film must have a director in PEOPLE. About 50 pseudo director names ahve been listed in PEOPLE to allow interesting films to with (yet) unknown directors to be entered. Every CASTS entry must relate to a MAIN film entry. Every ACTOR should appear in some CASTS entry, but not vice versa. See DOC for more type information.
</P>
<!------------------------------------------------------------------------
<! Other Relevant Information
<!----------------------------------------------------------------------->
<H2> Other Relevant Information</H2>
Films are listed, if known, with their original language title. An Alt(T: ) field provides English translations, where known.
<!------------------------------------------------------------------------
<! Data Format
<!----------------------------------------------------------------------->
<H2>Data Format</H2>
The current files are in HTML, to allow easy parsing to other
formats. An XML version is being considered.
<PRE>
The approximate file sizes are
DOC ....... 50K
MAIN ...... 1 145K 11 400 entries
PEOPLE .... 355K 3 290 entries
CASTS ..... 4 340K 46 000 entries
ACTORS .... 811K 6 800 entries
REMAKES ... 135K 1 278 entries
STUDIOS ... 26K 200 entries
</PRE>
<!------------------------------------------------------------------------
<! Acknowledgements
<!----------------------------------------------------------------------->
<H2> Acknowledgements, Copyright Information, and Availability</H2>
<P>
Copyright held by Gio Wiederhold, 1990-1999. This data may not be used for commercial resale.
</P>
<P>
Please acknowledge the source when used: Gio Wiederhold, Stanford University.
</P>
<!------------------------------------------------------------------------
<! References
<!----------------------------------------------------------------------->
<H2>References and Further Information</H2>
<p>
<A HREF="http://www-db.stanford.edu/pub/movies/doc.html"> Master Documentation</A>.
</p>
<!------------------------------------------------------------------------
<! Signature
<!----------------------------------------------------------------------->
<p>
<hr>
<ADDRESS>
<A href="http://kdd.ics.uci.edu/">The UCI KDD Archive</A><br>
<a href="http://www.ics.uci.edu/">Information and Computer Science</a><br>
<a href="http://www.uci.edu/">University of California, Irvine</a><br>
Irvine, CA 92697-3425 <br>
</ADDRESS>
Last modified: July 12, 1999 </BODY>
</HTML>