Sentimental-Analysis-for-financial-texts

This is a mini-project which analysis financial documents available on the sec website and returns the polarity, complex words, positive score, negative score and several other attributes in an excel sheet
Make sure to download the list of stop words, positive words , negative words/

Objective Objective of this assignment is to extract some sections (which are mentioned below) from SEC / EDGAR financial reports and perform text analysis to compute variables those are explained below. Link to SEC / EDGAR financial reports are given in excel spreadsheet “cik_list.xlsx”
Please add https://www.sec.gov/Archives/ to every cells of column F (cik_list.xlsx) to access link to the financial report.
Example: Row 2, column F contains edgar/data/3662/0000950170-98-000413.txt
Add https://www.sec.gov/Archives/ to form financial report link i.e.
https://www.sec.gov/Archives/edgar/data/3662/0000950170-98-000413.txt

Variables: “Text Analysis.docx” you need to compute following:

Section 1.1: Positive score, negative score, polarity score
Section 2: Average Sentence Length, percentage of complex words, fog index
Section 4: Complex word count
Section 5: Word count

In addition to these eight variables, compute two more items: “uncertainty” and “constraining”. These variables are calculated similar to the ones in Section 1.1 or Section 4. Attached the lists of words that are classified as uncertain or constraining.

For uncertainty: “uncertainty_dictionary.xlsx” For constraining: “constraining_dictionary.xlsx”

That means you need to collect/compute 10 variables in total.

Sections: For each report (financial reports, links available in excel, cik list), we would like these 10 variables calculated for three sections. These are “Management's Discussion and Analysis”, “Quantitative and Qualitative Disclosures about Market Risk”, and “Risk Factors”. If a report does not include any of these sections, leave those fields blank.

In other words, we need 10 x 3 = 30 variables. Attached the spreadsheet “cik_list.xlsx”, which also contains the links to reports. It would be ideal if you could add 30 columns to each row, so that we would have the # rows unchanged after your data collection.

Additional Variables: positive/negative and uncertainty/constraining word proportion The absolute values of “Positive/Negative Scores” are equal to the number of positive/negative words in each section of 10-Q/K; so the (Loughran-McDonald) positive/negative word proportion can be simply calculated as “Positive/Negative Scores divided by Word Count – compute these measure in addition to Polarity Score. And, the “uncertainty score” and “constraining score” will be also just equal to the number of corresponding words and you can calculate the portion of these words as same as above.

Additional Variable: Constraining words for whole report Add one variable to the mix, which will be calculated only once for the whole report (i.e., not three times). It’s the number of “constraining” words over the whole report rather than in any specific section.

Output Data Structure Notations: “Management's Discussion and Analysis”: MDA “Quantitative and Qualitative Disclosures about Market Risk”: QQDMR “Risk Factors”: RF

Output Variables:

All input variables in “cik_list.xlsx”
mda_positive_score
mda_negative_score
mda_polarity_score
mda_average_sentence_length
mda_percentage_of_complex_words
mda_fog_index
mda_complex_word_count
mda_word_count
mda_uncertainty_score
mda_constraining_score
mda_positive_word_proportion
mda_negative_word_proportion
mda_uncertainty_word_proportion
mda_constraining_word_proportion
qqdmr_positive_score
qqdmr_negative_score
qqdmr_polarity_score
qqdmr_average_sentence_length
qqdmr_percentage_of_complex_words
qqdmr_fog_index
qqdmr_complex_word_count
qqdmr_word_count
qqdmr_uncertainty_score
qqdmr_constraining_score
qqdmr_positive_word_proportion
qqdmr_negative_word_proportion
qqdmr_uncertainty_word_proportion
qqdmr_constraining_word_proportion
rf_positive_score
rf_negative_score
rf_polarity_score
rf_average_sentence_length
rf_percentage_of_complex_words
rf_fog_index
rf_complex_word_count
rf_word_count
rf_uncertainty_score
rf_constraining_score
rf_positive_word_proportion
rf_negative_word_proportion
rf_uncertainty_word_proportion
rf_constraining_word_proportion
constraining_words_whole_report

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
tryscraper.py		tryscraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentimental-Analysis-for-financial-texts

About

Releases

Packages

Languages

adit-negi/Sentimental-Analysis-for-financial-texts

Folders and files

Latest commit

History

Repository files navigation

Sentimental-Analysis-for-financial-texts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages