About the read pairing. #657

DontCareYaYa · 2024-11-15T22:18:41Z

A great tool for Hi-C data analysis.

The alignment ratio of my raw data R1 and R2 are both around 65, which I know is not particularly ideal. However, in the read pairing step, according to the script mergeSAM.py, the bam obtained after merging R1_hg38.bwt2merged.bam and R2_hg38.bwt2merged.bam has too many Pairs_with_singleton, accounting for about 55%, according to the statistical file. I checked the matching and naming of my files and there is no problem. Hi-C data was built using Arima Kit. I would like to know whether the restriction enzyme information is used in the read pairing step? I think not. Do you know what is the reason? Thank you!
Below is my configure file and the statistical information about bwt2pairs.bam.

Please change the variable settings below if necessary

#########################################################################

Paths and Settings - Do not edit !

#########################################################################

TMP_DIR = tmp
LOGS_DIR = logs
BOWTIE2_OUTPUT_DIR = bowtie_results
MAPC_OUTPUT = hic_results
RAW_DIR = rawdata

#######################################################################

SYSTEM AND SCHEDULER - Start Editing Here !!

#######################################################################
N_CPU = 50
SORT_RAM = 30000M
LOGFILE = hicpro.log

JOB_NAME =
JOB_MEM =
JOB_WALLTIME =
JOB_QUEUE =
JOB_MAIL =

#########################################################################

Data

#########################################################################

PAIR1_EXT = _R1
PAIR2_EXT = _R2

#######################################################################

Alignment options

#######################################################################

MIN_MAPQ = 10

BOWTIE2_IDX_PATH = /mnt/d/Reference/hg38/bowtie2_index/
BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder
BOWTIE2_LOCAL_OPTIONS = --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder

#######################################################################

Annotation files

#######################################################################

REFERENCE_GENOME = hg38
GENOME_SIZE = /mnt/d/Softwares/HiC-Pro_3.1.0/annotation/chrom_hg38.sizes

#######################################################################

Allele specific analysis

#######################################################################

ALLELE_SPECIFIC_SNP =

#######################################################################

Capture Hi-C analysis

#######################################################################

CAPTURE_TARGET =
REPORT_CAPTURE_REPORTER = 1

#######################################################################

Digestion Hi-C

#######################################################################

GENOME_FRAGMENT = /mnt/d/Softwares/HiC-Pro_3.1.0/annotation/ArimaKit_redfrag_hg38.bed
LIGATION_SITE = GATCGATC,GANTGATC,GANTANTC,GATCANTC
MIN_FRAG_SIZE =
MAX_FRAG_SIZE =
MIN_INSERT_SIZE =
MAX_INSERT_SIZE =

#######################################################################

Hi-C processing

#######################################################################

MIN_CIS_DIST =
GET_ALL_INTERACTION_CLASSES = 1
GET_PROCESS_SAM = 0
RM_SINGLETON = 1
RM_MULTI = 1
RM_DUP = 1

#######################################################################

Contact Maps

#######################################################################

BIN_SIZE = 5000 10000 20000 25000 40000 100000 500000
MATRIX_FORMAT = upper

#######################################################################

Normalization

#######################################################################
MAX_ITER = 100
FILTER_LOW_COUNT_PERC = 0.02
FILTER_HIGH_COUNT_PERC = 0
EPS = 0.1

########### read information
Total_pairs_processed 661159587 100.0
Unmapped_pairs 56087180 8.483
Low_qual_pairs 235828621 35.669
Unique_paired_alignments 312027 0.047
Multiple_pairs_alignments 0 0.0
Pairs_with_singleton 368931759 55.801
Low_qual_singleton 0 0.0
Unique_singleton_alignments 0 0.0
Multiple_singleton_alignments 0 0.0
Reported_pairs 312027 0.047

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the read pairing. #657

About the read pairing. #657

DontCareYaYa commented Nov 15, 2024

About the read pairing. #657

About the read pairing. #657

Comments

DontCareYaYa commented Nov 15, 2024

Please change the variable settings below if necessary

Paths and Settings - Do not edit !

SYSTEM AND SCHEDULER - Start Editing Here !!

Data

Alignment options

Annotation files

Allele specific analysis

Capture Hi-C analysis

Digestion Hi-C

Hi-C processing

Contact Maps

Normalization