-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
network issue for jcmq.run #49
Comments
Is this because of the need for networking? |
Hi @shengxindaniu, From what I can see, yes I highly doubt it is because of network connection, for alternative promoter, since the junction sequence usually is too far away from the encoding gene, it will query UCSC genome browser based on coordinate to retrieve the sequence, which requires internet connection. Normally institution HPC and linux server should be able to config network connection and UCSC genome browser should be in the whitelist. You can further refer to this answer (#33 (comment)) to check if you can get the junction sequence for a particular junction. Best, |
Thank you very much for your answer, but our hospital's HPC prohibits the use of networks. Is there any way to skip the networking step |
Hi @shengxindaniu, Ok that's gonna be an issue. My first suggestion would be to confirm because usually there are ways of poking through the firewall and use the network even just temporarily. If it turns out to be a absolute no. what I can think about are below: [1] in your counts file, manually remove all junctions that have "U" in that, you can either code or use excel, so basically due to this limitation, let's not consider this type of splicing junction, but all the others should be able to run. [2] manually change the code (https://github.com/frankligy/SNAF/blob/main/snaf/snaf.py#L1305-L1316), you can find your downloaded code in your conda_env_name/lib/python3.7/site_packages/snaf # original
def retrieveSeqFromUCSCapi(chr_,start,end):
url = 'http://genome.ucsc.edu/cgi-bin/das/hg38/dna?segment={0}:{1},{2}'.format(chr_,start,end)
response = requests.get(url)
status_code = response.status_code
assert status_code == 200
try:
my_dict = xmltodict.parse(response.content)
except:
exon_seq = '#' * 10 # indicating the UCSC doesn't work
return exon_seq
exon_seq = my_dict['DASDNA']['SEQUENCE']['DNA']['#text'].replace('\n','').upper()
return exon_seq
# change to
# HG38_SEQ can be downloaded from here (https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz), and this is the path to where you gunzip it
def retrieveSeqFromUCSCapi(chr_,start,end):
dict_fa = {}
with open(HG38_SEQ,'r') as in_handle:
for title,seq in SimpleFastaParser(in_handle):
dict_fa[title] = seq
dna = dict_fa[chr_][start-1:end]
return dna [3] as the last resort, i can help you run that and send you result, in that case, I just need your counts file and hla information sent to my email ([email protected]). Best, |
snaf.JunctionCountMatrixQuery.generate_results(path='/public/home/shutao/Neo/SNAF/result/after_prediction.p',outdir='/public/home/shutao/Neo/ SNAF/result') During handling of the above exception, another exception occurred: Traceback (most recent call last): During handling of the above exception, another exception occurred: Traceback (most recent call last): During handling of the above exception, another exception occurred: Traceback (most recent call last): Thank you very much for your help. After choosing the second option, the error issue with jcmq.run has been resolved. But snaf JunctionCountMatrixQuery.ggenerate_results still reports an error when outputting |
(base) [shutao@login01 result]$ ls -lh Also, my output results are only a few of these. Could you please take a look at the reason? Is it because pymc is not installed, which caused me to not output the file freq_dage3-verbosity1_id_uid_gene_stymbol_comrd_cean_stxt |
Ah right, that's the second place the internet connection is needed, since you have managed to modify the code, I believe the following should be easy for you. code block (https://github.com/frankligy/SNAF/blob/main/snaf/downstream.py#L913-L933) def ensemblgene_to_symbol(query,species):
'''
Examples::
from sctriangulate.preprocessing import GeneConvert
converted_list = GeneConvert.ensemblgene_to_symbol(['ENSG00000010404','ENSG00000010505'],species='human')
'''
# assume query is a list, will also return a list
import mygene
mg = mygene.MyGeneInfo()
out = mg.querymany(query,scopes='ensemblgene',fileds='symbol',species=species,returnall=True,as_dataframe=True,df_index=True)
df = out['out']
df_unique = df.loc[~df.index.duplicated(),:]
df_unique['symbol'].fillna('unknown_gene',inplace=True)
mapping = df_unique['symbol'].to_dict()
result = []
for item in query:
result.append(mapping[item])
return result change to below, I attached the def ensemblgene_to_symbol(query,species):
mapping = {}
with open('ensg2symbol.txt','r') as f:
for line in f:
ensg_v, gs = line.rstrip('\n').split(' ')
ensg = ensg_v.split('.')[0]
mapping[ensg] = gs
result = []
for ensg in query:
result.append(mapping.get(ensg,'unknown_gene'))
return result I literally just wrote it on the fly, I haven't tested it, but the logic is very simple, just mapping the ensg id to symbol. And don't worry about your results for now, the pickle file means your Hoping it works on your and, sorry for the inconvenience, |
Thank you very much for your help. All issues have been successfully resolved. Thanks again |
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 1363, in subexon_tran
attrs = dict_exonCoords[EnsID][subexon]
KeyError: 'U0.1'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connection.py", line 175, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/util/connection.py", line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/socket.py", line 752, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connectionpool.py", line 723, in urlopen
chunked=chunked,
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connectionpool.py", line 416, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connection.py", line 244, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/http/client.py", line 1281, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/http/client.py", line 1327, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/http/client.py", line 1276, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/http/client.py", line 1036, in _send_output
self.send(msg)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/http/client.py", line 976, in send
self.connect()
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connection.py", line 205, in connect
conn = self._new_conn()
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connection.py", line 187, in _new_conn
self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x2b40a8228890>: Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connectionpool.py", line 803, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/util/retry.py", line 594, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='genome.ucsc.edu', port=80): Max retries exceeded with url: /cgi-bin/das/hg38/dna?segment=chr7:72969600,72969699 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b40a8228890>: Failed to establish a new connection: [Errno -2] Name or service not known'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 360, in each_chunk_func
nj.retrieve_junction_seq()
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 1025, in retrieve_junction_seq
seq1 = subexon_tran(subexon1,ensid,'site1',code)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 1366, in subexon_tran
exon_seq = utrJunction(suffix,EnsID,strandUTR,chrUTR,flag)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 1266, in utrJunction
exon_seq = retrieveSeqFromUCSCapi(chr_,int(otherSite),int(site))
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 1282, in retrieveSeqFromUCSCapi
response = requests.get(url)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='genome.ucsc.edu', port=80): Max retries exceeded with url: /cgi-bin/das/hg38/dna?segment=chr7:72969600,72969699 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b40a8228890>: Failed to establish a new connection: [Errno -2] Name or service not known'))
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "", line 1, in
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 429, in run
self.parallelize_run(kind=1,strict=strict)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 488, in parallelize_run
result = collect.get()
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
requests.exceptions.ConnectionError: None: Max retries exceeded with url: /cgi-bin/das/hg38/dna?segment=chr7:72969600,72969699 (Caused by None)
The text was updated successfully, but these errors were encountered: