What can we learn by applying network and text analysis to the law? This project contains code to analyze legal text and citation networks using data generously provided by CourtListener and the Supreme Court Database.
Some interesting networks include
- Supreme Court citation network (27,885 nodes, 234,312 directed edges)
- Federal Appellate circuit (959,985 nodes, 6,649,916 directed edges)
- any one of the over 400 jurisdiction subnetworks listed on CourtListener
These all have accompanying opinion text files as well as additional node metadata such as the case date and hand coded issue area (for SCOTUS).
We recently gave a presentation about our exploratory analysis at the PyData conference.
You can load the SCOTUS subnetwork (saved in this directory as a .graphml file)
import igraph as ig
G = ig.Graph.Read_GraphML('scotus_network.graphml')
User beware: we have not yet make the code clean/robust/user friendly/pleasant/etc -- we will get to this soon. If you have trouble with something please reach out to Iain ([email protected]).
To download much more data see download_data.ipynb. This notebook allows you to work with other jurisdiction subnetworks and the opinion text files. Note the two directories you have to change at the top of the notebook.
One of the functions in download_data.ipynb will set up a data directory. I suggest putting data_dir
outside your copy of the github repo or Dropbox. Github doesn't like large data files and Dropbox might slow things down if you do a lot of reading and writing (i.e. for some NLP operations).
Current we are using data from CourtListener (CL) and the Supreme Court Data Base (SCDB)
-
the citation network comes from CL
-
opinion texts come from CL
-
some case metadata (jurisdiction, data, judges) comes from CL
-
additional case meta data comes from SCDB
- for
issueArea
we have coded Missing as 0. Only SCOTUS cases can have issueArea.
- for
-
we identify cases by their CourtListener opinion id
- CL opinion ids and cluster ids are not necessarily the same. One cluster can have many opinions.
The code is written in Python 2.7. You need
-
- after installing nltk run the following commands in python
import nltk
nltk.download()
If you are interested in collaborating feel free to reach out to us! This is a collaboration between