Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text getting split into multiple entries #89

Open
atulkakrana opened this issue Mar 30, 2023 · 0 comments
Open

Text getting split into multiple entries #89

atulkakrana opened this issue Mar 30, 2023 · 0 comments

Comments

@atulkakrana
Copy link

I got this error while running train.sh

2023-03-30 23:04:06 | INFO | fairseq.data.data_utils | loaded 35 examples from: ../../data/BC5CDR/relis-bin/train.x-y.x
2023-03-30 23:04:06 | INFO | fairseq.data.data_utils | loaded 18 examples from: ../../data/BC5CDR/relis-bin/train.x-y.y
Traceback (most recent call last):
  File "/opt/conda/bin/fairseq-train", line 8, in <module>
    sys.exit(cli_main())
  File "/opt/conda/lib/python3.8/site-packages/fairseq_cli/train.py", line 557, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/opt/conda/lib/python3.8/site-packages/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/fairseq_cli/train.py", line 164, in main
    extra_state, epoch_itr = checkpoint_utils.load_checkpoint(
  File "/opt/conda/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 272, in load_checkpoint
    epoch_itr = trainer.get_train_iterator(
  File "/opt/conda/lib/python3.8/site-packages/fairseq/trainer.py", line 695, in get_train_iterator
    batch_iterator = self.task.get_batch_iterator(
  File "/opt/conda/lib/python3.8/site-packages/fairseq/tasks/fairseq_task.py", line 286, in get_batch_iterator
    indices = dataset.ordered_indices()
  File "/home/jovyan/github/BioGPT/src/language_model_prompt_dataset.py", line 197, in ordered_indices
    return indices[np.argsort(self.sizes[indices], kind="mergesort")]
IndexError: index 19 is out of bounds for axis 0 with size 18

I noticed that number of examples from train.x-y.x and different from train.x-y.y (see above). So, I checked these files and indeed I found that the text in train.x-y.x getting split into multiple different entries. Here is an example:

Text in my train.json

title: "miR-338-3p Plays a Significant Role in Casticin-Induced Suppression of Acute Myeloid Leukemia via Targeting PI3K/Akt Pathway."
abstract: "OBJECTIVE: Casticin is generally used in traditional herbal medicine for its anti-inflammatory and anticarcinogenic pharmacological properties. Also, microRNAs are indispensable oncogenes or cancer suppressors being dysregulated in various diseases. In this study, we aimed to elucidate the mechanisms underlying effects of casticin on the progression of acute myeloid leukemia (AML). METHODS: CCK-8 and flow cytometry were utilized to measure the proliferation and apoptosis of AML cell lines, respectively, after treatment with different concentrations of casticin. The alteration of several microRNA expressions in response to casticin treatment was detected by performing qRT-PCR, and the activity of PI3K/Akt pathways was evaluated through immunoblotting. Afterwards, the potential target gene of miR-338-3p was investigated by dual-luciferase reporter assay. In order to evaluate the role of miR-338-3p in the casticin-induced cellular phenotype changes, AML cells were transfected with miR-338-3p mimics or inhibitor and then subjected to proliferation and apoptosis analysis. Finally, a mouse xenograft model system was employed to investigate the role of casticin in AML progression in vivo. RESULTS: Suppressed cellular proliferation and enhanced apoptosis were observed in HL-60 and THP-1 cells after exposure to casticin, accompanied by remarkable upregulation of the miR-338-3p expression as well as a decline in the phosphorylation of PI3K and Akt proteins. RUNX2 was identified as a direct target molecular of miR-338-3p, which might account for the findings that miR-338-3p knockdown enhanced the PI3K/Akt pathway activity, whereas the miR-338-3p overexpression inactivated this signaling pathway. In addition, the inhibition of the miR-338-3p expression attenuated severe cell apoptosis and suppressions of PI3K/Akt pathway induced by casticin. Furthermore, casticin treatment retarded tumor growth rate in mouse models, whilst elevating miR-338 expression and repressing the activity of PI3K/Akt pathway in vivo. However, miR-338-3p depletion could also abolish the phenotypic alterations caused by casticin treatment. CONCLUSION: Casticin promotes AML cell apoptosis but inhibits AML cell proliferation in vitro and tumor growth in vivo by upregulating miR-338-3p, which targets RUNX2 and thereafter inactivates PI3K-Akt signaling pathway. Our results provide insights into the mechanisms underlying the action of casticin in the control of AML progression."

Split to multiple parts exactly at capitalized headings:

mir-338-3p plays a significant role in casticin-induced suppression of acute myeloid leukemia via targeting pi3k/akt pathway. objective: casticin is generally used in traditional herbal medicine for its anti-inflammatory and anticarcinogenic pharmacological properties. also, micrornas are indispensable oncogenes or cancer suppressors being dysregulated in various diseases. in this study, we aimed to elucidate the mechanisms underlying effects of casticin on the progression of acute myeloid leukemia (aml).

methods: cck-8 and flow cytometry were utilized to measure the proliferation and apoptosis of aml cell lines, respectively, after treatment with different concentrations of casticin. the alteration of several microrna expressions in response to casticin treatment was detected by performing qrt-pcr, and the activity of pi3k/akt pathways was evaluated through immunoblotting. afterwards, the potential target gene of mir-338-3p was investigated by dual-luciferase reporter assay. in order to evaluate the role of mir-338-3p in the casticin-induced cellular phenotype changes, aml cells were transfected with mir-338-3p mimics or inhibitor and then subjected to proliferation and apoptosis analysis. finally, a mouse xenograft model system was employed to investigate the role of casticin in aml progression in vivo.

results: suppressed cellular proliferation and enhanced apoptosis were observed in hl-60 and thp-1 cells after exposure to casticin, accompanied by remarkable upregulation of the mir-338-3p expression as well as a decline in the phosphorylation of pi3k and akt proteins. runx2 was identified as a direct target molecular of mir-338-3p, which might account for the findings that mir-338-3p knockdown enhanced the pi3k/akt pathway activity, whereas the mir-338-3p overexpression inactivated this signaling pathway. in addition, the inhibition of the mir-338-3p expression attenuated severe cell apoptosis and suppressions of pi3k/akt pathway induced by casticin. furthermore, casticin treatment retarded tumor growth rate in mouse models, whilst elevating mir-338 expression and repressing the activity of pi3k/akt pathway in vivo. however, mir-338-3p depletion could also abolish the phenotypic alterations caused by casticin treatment.

conclusion: casticin promotes aml cell apoptosis but inhibits aml cell proliferation in vitro and tumor growth in vivo by upregulating mir-338-3p, which targets runx2 and thereafter inactivates pi3k-akt signaling pathway. our results provide insights into the mechanisms underlying the action of casticin in the control of aml progression.

Is there any way I can fix it? or stop the splitting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant