Skip to content

Other_analysis_methods

JF edited this page Oct 20, 2021 · 21 revisions

Data Analysis: Getting Started Tutorial

Contents


The goal of this tutorial is to introduce basic aspects of working with neuropixels data and to point to some useful existing code, to help you get started quickly. I hope that you can find the particular figures produced here useful, but also that you can look at how they are implemented if you want to make something similar yourself.

This brief tutorial specifically covers Neuropixels Phase3A data in Matlab, and in some cases specifically for files recorded with SpikeGLX and/or processed with Kilosort. But it will hopefully be useful also as a guide for other situations.

The code here is also contained in the exampleScript.m file of the spikes repository. You will also need the npy-matlab repository to load kilosort/phy files.

For questions, and especially if you find that any of these functions are missing dependencies, please email me - nick[dot]steinmetz[at]gmail. Thanks to Andy Peters, Mush Okun, Carl Schoonover, and Andrew Fink for contributions to the shared code that I've highlighted here! Further contributions are always welcome.

For Neuropixels analysis code in Python, this resource might be helpful: https://github.com/m-beau

Loading kilosort/phy data easily

Kilosort output files can be loaded directly:

spikeTimes = readNPY('spike_times.npy');

But more convenient will be to load many of the relevant files quickly with one command:

myKsDir = 'C:\...\data\myKilosortOutputDirectory';
sp = loadKSdir(myKsDir)

sp.st are spike times in seconds, and sp.clu are cluster identities.

sp.cgs are the "cluster groups", i.e. the labels that you gave to the clusters during manual sorting in phy (1=MUA, 2=Good, 3=Unsorted). Spikes from clusters labeled "noise" have already been omitted. sp.cids tells you which cluster corresponds to each entry of sp.cgs, e.g. sp.cgs(sp.cids==943) will give you the cluster group for cluster 943. See this phy/kilosort documentation for further description of the variables.

Analyzing Drift

Because the Neuropixels probes sample densely of a long continuous span, movement of the brain relative to the electrode can be well detected and quantified. Methods to correct for it are under development.

To observe whether there was drift in your recording, a useful type of plot is the “driftmap”. Make sure to zoom in on the y-axis to look carefully. A drift of 20µm can move a neuron off your site, but the probe is 4mm long.

[spikeTimes, spikeAmps, spikeDepths, spikeSites] = ksDriftmap(myKsDir);
figure; plotDriftmap(spikeTimes, spikeAmps, spikeDepths);

Note that the channel map file used here has channel 1 at y-position=0, and since channel 1 is the site nearest the tip of the probe, this plot goes from the tip of the probe at the bottom to the most superficial part at the top.

Quantification of spiking amplitudes

To see where on the probe spikes of different amplitudes were recorded, we can plot a colormap of the distribution of spikes across depth and amplitude.

depthBins = 0:40:3840;
ampBins = 0:30:min(max(spikeAmps),800);
recordingDur = sp.st(end);

[pdfs, cdfs] = computeWFampsOverDepth(spikeAmps, spikeDepths, ampBins, depthBins, recordingDur);
plotWFampCDFs(pdfs, cdfs, ampBins, depthBins);

Basic LFP characterization

Here we compute some power spectra from the recorded local field potentials and plot them across depth. The method that computes them ("lfpBandPower") can be a useful starting point for seeing how to load LFP data.

lfpD = dir(fullfile(myKsDir, '*.lf.bin')); % LFP file from spikeGLX specifically
lfpFilename = fullfile(myKsDir, lfpD(1).name);

lfpFs = 2500;  % neuropixels phase3a
nChansInFile = 385;  % neuropixels phase3a, from spikeGLX

[lfpByChannel, allPowerEst, F, allPowerVar] = ...
    lfpBandPower(lfpFilename, lfpFs, nChansInFile, []);

chanMap = readNPY(fullfile(myKsDir, 'channel_map.npy'));
nC = length(chanMap);

allPowerEst = allPowerEst(:,chanMap+1)'; % now nChans x nFreq

% plot LFP power
dispRange = [0 100]; % Hz
marginalChans = [10:50:nC];
freqBands = {[1.5 4], [4 10], [10 30], [30 80], [80 200]};

plotLFPpower(F, allPowerEst, dispRange, marginalChans, freqBands);

Computing some useful properties of the spikes and templates

One function is particularly useful for some standard computations you might want to do, like the position on the probe of each template, and the waveform duration of each template.

[spikeAmps, spikeDepths, templateYpos, tempAmps, tempsUnW, tempDur, tempPeakWF] = ...
    templatePositionsAmplitudes(sp.temps, sp.winv, sp.ycoords, sp.spikeTemplates, sp.tempScalingAmps);

Loading synchronization data

Neuropixels Phase3a probes have 16 digital inputs recorded alongside the data from the probe. In spikeGLX these are encoded as the 16 bits in the last (385th) row of the data file. This is true of both the LFP and AP band files, but it's much quicker to load the information from the LFP file (because it is smaller) and the temporal resolution should be sufficient for most applications (2.5kHz). So to load and parse them:

syncChanIndex = 385;
syncDat = extractSyncChannel(myKsDir, nChansInFile, syncChanIndex);

eventTimes = spikeGLXdigitalParse(syncDat, lfpFs);

% - eventTimes{1} contains the sync events from digital channel 1, as three cells: 
% - eventTimes{1}{1} is the times of all events
% - eventTimes{1}{2} is the times the digital bit went from off to on
% - eventTimes{1}{2} is the times the digital bit went from on to off

To synchronize the event times recorded here with those recorded on another system, like a NI board or the FPGA of another probe, it's important to not just find an offset but also a slope. I find that the clocks on the probes drift by up to 50msec/hour relative to each other. A linear regression between the times in the two systems will do this, and to make it simple you can use these functions:

[~,b] = makeCorrection(syncTimesProbe1, syncTimesProbe2, false);
correctedSpikeTimes = applyCorrection(spikeTimesProbe2, b);

Looking at PSTHs aligned to some event

If you now have a vector of relevant event times, called eventTimes (but not the cell array as above, just a vector):

window = [-0.3 1]; % look at spike times from 0.3 sec before each event to 1 sec after

% if your events come in different types, like different orientations of a
% visual stimulus, then you can provide those values as "trial groups",
% which will be used to construct a tuning curve. Here we just give a
% vector of all ones. 
trialGroups = ones(size(eventTimes)); 

psthViewer(sp.st, sp.clu, eventTimes, window, trialGroups);

This gives a figure like below, and you can browse through all the neurons with the left/right arrow keys.

You can also compute the PSTH for spikes at all depths on the probe (in this case, independent of which neuron they came from).

depthBinSize = 80; % in units of the channel coordinates, in this case µm
timeBinSize = 0.01; % seconds
bslWin = [-0.2 -0.05]; % window in which to compute "baseline" rates for normalization
psthType = 'norm'; % show the normalized version
eventName = 'stimulus onset'; % for figure labeling

[timeBins, depthBins, allP, normVals] = psthByDepth(spikeTimes, spikeDepths, ...
    depthBinSize, timeBinSize, eventTimes, window, bslWin);

figure;
plotPSTHbyDepth(timeBins, depthBins, allP, eventName, psthType);

Loading raw waveforms (or spike-triggered LFP, e.g.)

To get the true waveforms of the spikes (not just kilosort's template shapes), use the getWaveForms function. This function can also be used to load clips of any raw binary file at specified times (such as for computing a spike-triggered LFP).

gwfparams.dataDir = myKsDir;    % KiloSort/Phy output folder
apD = dir(fullfile(myKsDir, '*ap*.bin')); % AP band file from spikeGLX specifically
gwfparams.fileName = apD(1).name;         % .dat file containing the raw 
gwfparams.dataType = 'int16';            % Data type of .dat file (this should be BP filtered)
gwfparams.nCh = 385;                      % Number of channels that were streamed to disk in .dat file
gwfparams.wfWin = [-40 41];              % Number of samples before and after spiketime to include in waveform
gwfparams.nWf = 2000;                    % Number of waveforms per unit to pull out
gwfparams.spikeTimes = ceil(sp.st(sp.clu==155)*30000); % Vector of cluster spike times (in samples) same length as .spikeClusters
gwfparams.spikeClusters = sp.clu(sp.clu==155);

wf = getWaveForms(gwfparams);

figure; 
imagesc(squeeze(wf.waveFormsMean))
set(gca, 'YDir', 'normal'); xlabel('time (samples)'); ylabel('channel number'); 
colormap(colormap_BlueWhiteRed); caxis([-1 1]*max(abs(caxis()))/2); box off;