DOI: 10.5281/zenodo.3569549
In June 2019 we conducted a survey of software use across 6355 academic staff and PhD students. The survey was open for two weeks and collected 603 responses.
The raw data was cleaned using Open Refine to remove email addresses, for privacy reasons, to remove responses that were not valid (namely responses that were not associated with a known faculty at the University of Southampton) and to reduce the job title provided by the respondents into a set of known job titles (e.g. convert "Prof", "Professor", "Proffessor" [sic] to "Professor"). The result of this cleaning is the file data/Cleaning-of-Uni-Soton-Software-Survey-26Jun19.csv
.
If you want quick access to the results, take a look at the report.
Charts of the univariate analysis can also be seen in this simple presentation.
- Licence for the code, data, reports and charts can be found in the the LICENCE, DATA LICENCE and REPORT LICENCE files respectively.
- The code runs on Python 3.
- Get the files and data: Clone the git repository
- We suggest the use of a virtual environment. The file
requirements.txt
can be used to load the necessary libraries. - Run the analysis script
analyse_survey.py
.
Note that the file column_name_renaming.py
contains instructions for shortening the column names (using the full question for the column name gets tedious) and lists which questions are sorted in which way (some questions are best suited to the results being sorted by the size of response, others - like the scale questions that rank responses from 1 to 5 - require the results to be sorted in specific order (i.e. 1 to 5).
Bivariate analysis is controlled by the file bivariate_instructions.py
. It's a dictionary called which_by_which
. The values represent questions of interest and the key represents the question by which you wish to segment the questions of interest. For example, if you want to investigate how the number of people who develop code varies by faculty, you would set up the dictionary found in the bivariate_instructions.py
file as follows:
which_by_which = {'faculty': ['develop_own_code']}
if you also wanted to investigate how the training question segemented by faculty, you would use a dictionary:
which_by_which = {'faculty': ['develop_own_code', 'training']}
The separate bivariate files (found in output_csv/bivariate
) are brought together into summary csvs by the script combine_bivariate_results_for_graphing.py
to produce csvs the csvs found in output_csv/bivariate/summaries
analyse_survey.py
: the main analysis script that converts the survey data into csvs that each summarise a question.column_name_renaming.py
: lookup file used for shortening names of columns of data.bivariate_instructions.py
: lookup file used to instructanalyse_survey.py
on which bivariate analyses to conduct.combine_bivariate_results_for_graphing.py
: combines the individual bivariate csv files to produce useful summaries.UniSotonSoftwareSurvey_June2019.pdf
: a pdf file of the original survey used to collect the datadata/Cleaning-of-Uni-Soton-Software-Survey-26Jun19.csv
: an anonymised version of the survey resultsoutput/csv/
: all output csvs are stored in this directory and the enclosed directories.report/Results of University of Southampton software survey June 2019.ipynb
: Jupyter notebook used to write reportreport/Results of University of Southampton software survey June 2019.pdf
: pdf of Jupyter notebookcharts/
: charts of all the output csvs as png imagescharts/plot_details/
: csvs holding parameters used to draw charts
You can plot the csv files using any graphing program of your choice. Personally, I use a graphing program I wrote in Python to make the results look pretty. Feel free to use it too (made easier if you use the pre-existing parameters in the csvs held in report/charts/plot_details/
.