Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the read pairing. #657

Open
DontCareYaYa opened this issue Nov 15, 2024 · 0 comments
Open

About the read pairing. #657

DontCareYaYa opened this issue Nov 15, 2024 · 0 comments

Comments

@DontCareYaYa
Copy link

A great tool for Hi-C data analysis.

The alignment ratio of my raw data R1 and R2 are both around 65, which I know is not particularly ideal. However, in the read pairing step, according to the script mergeSAM.py, the bam obtained after merging R1_hg38.bwt2merged.bam and R2_hg38.bwt2merged.bam has too many Pairs_with_singleton, accounting for about 55%, according to the statistical file. I checked the matching and naming of my files and there is no problem. Hi-C data was built using Arima Kit. I would like to know whether the restriction enzyme information is used in the read pairing step? I think not. Do you know what is the reason? Thank you!
Below is my configure file and the statistical information about bwt2pairs.bam.

Please change the variable settings below if necessary

#########################################################################

Paths and Settings - Do not edit !

#########################################################################

TMP_DIR = tmp
LOGS_DIR = logs
BOWTIE2_OUTPUT_DIR = bowtie_results
MAPC_OUTPUT = hic_results
RAW_DIR = rawdata

#######################################################################

SYSTEM AND SCHEDULER - Start Editing Here !!

#######################################################################
N_CPU = 50
SORT_RAM = 30000M
LOGFILE = hicpro.log

JOB_NAME =
JOB_MEM =
JOB_WALLTIME =
JOB_QUEUE =
JOB_MAIL =

#########################################################################

Data

#########################################################################

PAIR1_EXT = _R1
PAIR2_EXT = _R2

#######################################################################

Alignment options

#######################################################################

MIN_MAPQ = 10

BOWTIE2_IDX_PATH = /mnt/d/Reference/hg38/bowtie2_index/
BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder
BOWTIE2_LOCAL_OPTIONS = --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder

#######################################################################

Annotation files

#######################################################################

REFERENCE_GENOME = hg38
GENOME_SIZE = /mnt/d/Softwares/HiC-Pro_3.1.0/annotation/chrom_hg38.sizes

#######################################################################

Allele specific analysis

#######################################################################

ALLELE_SPECIFIC_SNP =

#######################################################################

Capture Hi-C analysis

#######################################################################

CAPTURE_TARGET =
REPORT_CAPTURE_REPORTER = 1

#######################################################################

Digestion Hi-C

#######################################################################

GENOME_FRAGMENT = /mnt/d/Softwares/HiC-Pro_3.1.0/annotation/ArimaKit_redfrag_hg38.bed
LIGATION_SITE = GATCGATC,GANTGATC,GANTANTC,GATCANTC
MIN_FRAG_SIZE =
MAX_FRAG_SIZE =
MIN_INSERT_SIZE =
MAX_INSERT_SIZE =

#######################################################################

Hi-C processing

#######################################################################

MIN_CIS_DIST =
GET_ALL_INTERACTION_CLASSES = 1
GET_PROCESS_SAM = 0
RM_SINGLETON = 1
RM_MULTI = 1
RM_DUP = 1

#######################################################################

Contact Maps

#######################################################################

BIN_SIZE = 5000 10000 20000 25000 40000 100000 500000
MATRIX_FORMAT = upper

#######################################################################

Normalization

#######################################################################
MAX_ITER = 100
FILTER_LOW_COUNT_PERC = 0.02
FILTER_HIGH_COUNT_PERC = 0
EPS = 0.1

########### read information
Total_pairs_processed 661159587 100.0
Unmapped_pairs 56087180 8.483
Low_qual_pairs 235828621 35.669
Unique_paired_alignments 312027 0.047
Multiple_pairs_alignments 0 0.0
Pairs_with_singleton 368931759 55.801
Low_qual_singleton 0 0.0
Unique_singleton_alignments 0 0.0
Multiple_singleton_alignments 0 0.0
Reported_pairs 312027 0.047

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant