Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network issue for jcmq.run #49

Open
shengxindaniu opened this issue Sep 25, 2024 · 8 comments
Open

network issue for jcmq.run #49

shengxindaniu opened this issue Sep 25, 2024 · 8 comments

Comments

@shengxindaniu
Copy link

jcmq.run(hlas=hlas, outdir='/public/home/shutao/Neo/SNAF/result')
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 1327, in subexon_tran
attrs = dict_exonCoords[EnsID][subexon] # [chr,strand,start,end,suffer]
KeyError: 'U0.1_72969699'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 1363, in subexon_tran
attrs = dict_exonCoords[EnsID][subexon]
KeyError: 'U0.1'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connection.py", line 175, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/util/connection.py", line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/socket.py", line 752, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connectionpool.py", line 723, in urlopen
chunked=chunked,
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connectionpool.py", line 416, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connection.py", line 244, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/http/client.py", line 1281, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/http/client.py", line 1327, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/http/client.py", line 1276, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/http/client.py", line 1036, in _send_output
self.send(msg)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/http/client.py", line 976, in send
self.connect()
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connection.py", line 205, in connect
conn = self._new_conn()
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connection.py", line 187, in _new_conn
self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x2b40a8228890>: Failed to establish a new connection: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connectionpool.py", line 803, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/util/retry.py", line 594, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='genome.ucsc.edu', port=80): Max retries exceeded with url: /cgi-bin/das/hg38/dna?segment=chr7:72969600,72969699 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b40a8228890>: Failed to establish a new connection: [Errno -2] Name or service not known'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 360, in each_chunk_func
nj.retrieve_junction_seq()
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 1025, in retrieve_junction_seq
seq1 = subexon_tran(subexon1,ensid,'site1',code)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 1366, in subexon_tran
exon_seq = utrJunction(suffix,EnsID,strandUTR,chrUTR,flag)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 1266, in utrJunction
exon_seq = retrieveSeqFromUCSCapi(chr_,int(otherSite),int(site))
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 1282, in retrieveSeqFromUCSCapi
response = requests.get(url)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='genome.ucsc.edu', port=80): Max retries exceeded with url: /cgi-bin/das/hg38/dna?segment=chr7:72969600,72969699 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b40a8228890>: Failed to establish a new connection: [Errno -2] Name or service not known'))
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "", line 1, in
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 429, in run
self.parallelize_run(kind=1,strict=strict)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 488, in parallelize_run
result = collect.get()
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
requests.exceptions.ConnectionError: None: Max retries exceeded with url: /cgi-bin/das/hg38/dna?segment=chr7:72969600,72969699 (Caused by None)

@shengxindaniu
Copy link
Author

Is this because of the need for networking?

@frankligy
Copy link
Owner

Hi @shengxindaniu,

From what I can see, yes I highly doubt it is because of network connection, for alternative promoter, since the junction sequence usually is too far away from the encoding gene, it will query UCSC genome browser based on coordinate to retrieve the sequence, which requires internet connection. Normally institution HPC and linux server should be able to config network connection and UCSC genome browser should be in the whitelist.

You can further refer to this answer (#33 (comment)) to check if you can get the junction sequence for a particular junction.

Best,
Frank

@shengxindaniu
Copy link
Author

Thank you very much for your answer, but our hospital's HPC prohibits the use of networks. Is there any way to skip the networking step

@frankligy
Copy link
Owner

frankligy commented Sep 26, 2024

Hi @shengxindaniu,

Ok that's gonna be an issue. My first suggestion would be to confirm because usually there are ways of poking through the firewall and use the network even just temporarily.

If it turns out to be a absolute no. what I can think about are below:

[1] in your counts file, manually remove all junctions that have "U" in that, you can either code or use excel, so basically due to this limitation, let's not consider this type of splicing junction, but all the others should be able to run.

[2] manually change the code (https://github.com/frankligy/SNAF/blob/main/snaf/snaf.py#L1305-L1316), you can find your downloaded code in your conda_env_name/lib/python3.7/site_packages/snaf

# original
def retrieveSeqFromUCSCapi(chr_,start,end):
    url = 'http://genome.ucsc.edu/cgi-bin/das/hg38/dna?segment={0}:{1},{2}'.format(chr_,start,end)
    response = requests.get(url)
    status_code = response.status_code
    assert status_code == 200
    try:
        my_dict = xmltodict.parse(response.content)
    except:
        exon_seq = '#' * 10  # indicating the UCSC doesn't work
        return exon_seq
    exon_seq = my_dict['DASDNA']['SEQUENCE']['DNA']['#text'].replace('\n','').upper()
    return exon_seq

# change to
# HG38_SEQ can be downloaded from here (https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz), and this is the path to where you gunzip it
def retrieveSeqFromUCSCapi(chr_,start,end):

    dict_fa = {}  
    with open(HG38_SEQ,'r') as in_handle:
        for title,seq in SimpleFastaParser(in_handle):
            dict_fa[title] = seq

    dna = dict_fa[chr_][start-1:end]

return dna

[3] as the last resort, i can help you run that and send you result, in that case, I just need your counts file and hla information sent to my email ([email protected]).

Best,
Frank

@frankligy frankligy changed the title jcmq.run network issue for jcmq.run Sep 26, 2024
@shengxindaniu
Copy link
Author

snaf.JunctionCountMatrixQuery.generate_results(path='/public/home/shutao/Neo/SNAF/result/after_prediction.p',outdir='/public/home/shutao/Neo/ SNAF/result')
adding gene symbol
Traceback (most recent call last):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connection.py", line 175, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/util/connection.py", line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/socket.py", line 752, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connectionpool.py", line 723, in urlopen
chunked=chunked,
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connectionpool.py", line 404, in _make_request
self._validate_conn(conn)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connectionpool.py", line 1061, in _validate_conn
conn.connect()
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connection.py", line 363, in connect
self.sock = conn = self._new_conn()
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connection.py", line 187, in _new_conn
self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x2ae749825c90>: Failed to establish a new connection: [Errn o -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/connectionpool.py", line 803, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/urllib3/util/retry.py", line 594, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='mygene.info', port=443): Max retries exceeded with url: /v3/query/ (Caused by NewConn ectionError('<urllib3.connection.HTTPSConnection object at 0x2ae749825c90>: Failed to establish a new connection: [Errno -2] Name or service not known'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 456, in generate_results
enhance_frequency_table(df,True,True,outdir,'frequency_stage{}_verbosity1_uid_gene_symbol_coord_mean_mle.txt'.format(stage))
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py", line 1395, in enhance_frequency_table
df = add_gene_symbol_frequency_table(df=df,remove_quote=remove_quote)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/downstream.py", line 893, in add_gene_symbol_frequency_table
symbol_list = ensemblgene_to_symbol(ensg_list,'human')
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/snaf/downstream.py", line 921, in ensemblgene_to_symbol
out = mg.querymany(query,scopes='ensemblgene',fileds='symbol',species=species,returnall=True,as_dataframe=True,df_index=True)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/biothings_client/base.py", line 575, in _querymany
for hits in self._repeated_query(query_fn, qterms, verbose=verbose):
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/biothings_client/base.py", line 255, in _repeated_query
from_cache, query_result = query_fn(batch, **fn_kwargs)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/biothings_client/base.py", line 573, in query_fn
return self._querymany_inner(qterms, verbose=verbose, **kwargs)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/biothings_client/base.py", line 519, in _querymany_inner
return self._post(_url, params=_kwargs, verbose=verbose)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/biothings_client/base.py", line 197, in _post
res = requests.post(url, data=params, headers=headers)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/api.py", line 117, in post
return request('post', url, data=data, json=json, **kwargs)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/public/home/shutao/.conda/envs/SNAF/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='mygene.info', port=443): Max retries exceeded with url: /v3/query/ (Caused by NewC onnectionError('<urllib3.connection.HTTPSConnection object at 0x2ae749825c90>: Failed to establish a new connection: [Errno -2] Name or service n ot known'))

Thank you very much for your help. After choosing the second option, the error issue with jcmq.run has been resolved. But snaf JunctionCountMatrixQuery.ggenerate_results still reports an error when outputting

@shengxindaniu
Copy link
Author

(base) [shutao@login01 result]$ ls -lh
total 67M
-rw-rw-r-- 1 shutao shutao 45M Sep 27 16:56 after_prediction.p
-rw-rw-r-- 1 shutao shutao 5.0M Sep 27 16:58 burden_stage0.txt
-rw-rw-r-- 1 shutao shutao 8.1M Sep 27 16:59 burden_stage3.txt
-rw-rw-r-- 1 shutao shutao 1.9M Sep 27 16:58 frequency_stage0.txt
-rw-rw-r-- 1 shutao shutao 321K Sep 27 16:59 frequency_stage3.txt
-rw-rw-r-- 1 shutao shutao 524K Sep 27 16:59 frequency_stage3_verbosity1_uid.txt
-rw-rw-r-- 1 shutao shutao 6.2M Sep 27 16:35 NeoJunction_statistics_maxmin.txt
-rw-rw-r-- 1 shutao shutao 129K Sep 27 16:59 x_neoantigen_frequency_stage3.pdf
-rw-rw-r-- 1 shutao shutao 8.7K Sep 27 16:59 x_occurence_frequency_stage3.pdf

Also, my output results are only a few of these. Could you please take a look at the reason? Is it because pymc is not installed, which caused me to not output the file freq_dage3-verbosity1_id_uid_gene_stymbol_comrd_cean_stxt

@frankligy
Copy link
Owner

frankligy commented Sep 27, 2024

Ah right, that's the second place the internet connection is needed, since you have managed to modify the code, I believe the following should be easy for you.

code block (https://github.com/frankligy/SNAF/blob/main/snaf/downstream.py#L913-L933)

def ensemblgene_to_symbol(query,species):
    '''
    Examples::
        from sctriangulate.preprocessing import GeneConvert
        converted_list = GeneConvert.ensemblgene_to_symbol(['ENSG00000010404','ENSG00000010505'],species='human')
    '''
    # assume query is a list, will also return a list
    import mygene
    mg = mygene.MyGeneInfo()
    out = mg.querymany(query,scopes='ensemblgene',fileds='symbol',species=species,returnall=True,as_dataframe=True,df_index=True)

    df = out['out']
    df_unique = df.loc[~df.index.duplicated(),:]
    df_unique['symbol'].fillna('unknown_gene',inplace=True)
    mapping = df_unique['symbol'].to_dict()

    result = []
    for item in query:
        result.append(mapping[item])

    return result

change to below, I attached the ensg2symbol.txt here:
ensg2symbol.txt

def ensemblgene_to_symbol(query,species):
    
    mapping = {}
    with open('ensg2symbol.txt','r') as f:
        for line in f:
            ensg_v, gs =  line.rstrip('\n').split(' ')
            ensg = ensg_v.split('.')[0]
            mapping[ensg] = gs

    result = []
    for ensg in query:
        result.append(mapping.get(ensg,'unknown_gene'))

    return result

I literally just wrote it on the fly, I haven't tested it, but the logic is very simple, just mapping the ensg id to symbol.

And don't worry about your results for now, the pickle file means your .run step is finished, and generate_result step failed when it was trying to add gene symbol column to the result, so the program exited.

Hoping it works on your and, sorry for the inconvenience,
Frank

@shengxindaniu
Copy link
Author

Thank you very much for your help. All issues have been successfully resolved. Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants