This is the data and code accompanying the paper: Jeff Alstott, Giorgio Triulzi, Bowen Yan, Jianxi Luo. (2017). "Mapping Technology Space by Normalizing Technology Relatedness Networks." Scientometrics. 110(1):443–479. Available at Scientometrics or on arXiv at arXiv:1509.07285 [physics.soc-ph]
- The data describing the technology space are available on Zenodo here in output_data-technology_space.zip. The simplest data are the network as caculated with data from 1975-2010, which is recorded as CSVs. These are in three folders depending on what classification system you're using:
- "USPC" (the United States Patent Classification System)
- "IPC" (the International Patent Classification System, at the 3-digit level)
- "IPC4" (the International Patent Classification System, at the 4-digit level)
There are several different possible measures for relatedness, and accordingly there are several different CSVs in the folder for each classification system. It was a finding of the paper that after normalization these different measures of relatedness all correlate more, and so we recommend the simplest measure: "Direct Citation" (just the count of the number of citations from patents in one class to patents in the other).
Each CSV is just a rectangular array (number of classes * number of classes), with the values between the strength of the relatedness between each class. Short names for each class in the IPC and IPC4 systems are included as separate text files.
- More sophisticated data is in the HDF5 file
class_relatedness_networks.h5
. This file containspandas
data frames with information such as: - empirical networks' values
- randomized networks' values (mean and standard deviation)
- empirical networks' values, expressed as z-scores relative to the randomized controls
- empirical networks' values, expressed as z-scores relative to the randomized controls, but deflated to counteract the fact that z-scores grow with more patents (this is the data expressed in the simple CSVs)
In this repository are the code to perform the analyses and create the figures in the paper.
The code base is organized as a set of IPython notebooks, which are also duplicated as simple Python .py
script files. The only thing you should need to touch directly is the notebook Manuscript_Code , which walks through all the steps of:
- organizing the raw empirical data
- creating technology relatedness networks from the empirical data
- creating randomized versions of the data and calculating technology relatedness networks from it.
- comparing the empirical and randomized versions of the networks
- creating figures for the manuscript, the source code for which is also contained in this repository.
The data files we use are too large to host on Github (>100MB), and so are hosted as a 1.9GB ZIP file on Zenodo here in input_data-technology_space.zip. Just download and unzip it in technologyspace
folder, alongside src
and manuscript
. This file contains both the raw input data and several intermediate data files produced by the pipeline.
This pipeline involves creating thousands of randomized versions of the historical patent data. In order to do this, we employ a computational cluster running the PBS job scheduling system. Running this code currently assumes you have one of those. If you are lucky enough to be from the future, maybe you have a big enough machine that you can simply create and analyze thousands of randomized versions of the historical patent data using a simple for
loop. We don't yet support that.
- Python 3.x
- powerlaw
- seaborn
- pyBiRewire
- the standard scientific computing Python stack, which we recommend setting up by simply using the Anaconda Python distributon. Relevant packages include:
- numpy
- scipy
- matplotlib
- pandas
- citing_cited.csv
- PATENT_US_CLASS_SUBCLASSES_1975_2011.csv
- pid_issdate_ipc.csv
- disamb_data_ipc_citations_2.csv
- pnts_multiple_ipcs_76_06_valid_ipc.csv
- patent_ipc_1976_2010.csv
- Please contact the authors if you have questions/comments/concerns/stories:
- gtriulzi at mit dot edu
- jeffalstott at gmail dot com