Skip to content

This is a mini-project which analysis financial documents available on the sec website and returns the polarity, complex words, positive score, negative score and several other attributes.

Notifications You must be signed in to change notification settings

aryantandon01/Sentimental-Analysis-for-financial-texts

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Sentimental-Analysis-for-financial-texts

This is a mini-project which analysis financial documents available on the sec website and returns the polarity, complex words, positive score, negative score and several other attributes in an excel sheet
Make sure to download the list of stop words, positive words , negative words/


Objective Objective of this assignment is to extract some sections (which are mentioned below) from SEC / EDGAR financial reports and perform text analysis to compute variables those are explained below. Link to SEC / EDGAR financial reports are given in excel spreadsheet “cik_list.xlsx”
Please add https://www.sec.gov/Archives/ to every cells of column F (cik_list.xlsx) to access link to the financial report.
Example: Row 2, column F contains edgar/data/3662/0000950170-98-000413.txt
Add https://www.sec.gov/Archives/ to form financial report link i.e.
https://www.sec.gov/Archives/edgar/data/3662/0000950170-98-000413.txt

Variables: “Text Analysis.docx” you need to compute following:

  1. Section 1.1: Positive score, negative score, polarity score
  2. Section 2: Average Sentence Length, percentage of complex words, fog index
  3. Section 4: Complex word count
  4. Section 5: Word count

In addition to these eight variables, compute two more items: “uncertainty” and “constraining”. These variables are calculated similar to the ones in Section 1.1 or Section 4. Attached the lists of words that are classified as uncertain or constraining.

For uncertainty: “uncertainty_dictionary.xlsx” For constraining: “constraining_dictionary.xlsx”

That means you need to collect/compute 10 variables in total.

Sections: For each report (financial reports, links available in excel, cik list), we would like these 10 variables calculated for three sections. These are “Management's Discussion and Analysis”, “Quantitative and Qualitative Disclosures about Market Risk”, and “Risk Factors”. If a report does not include any of these sections, leave those fields blank.

In other words, we need 10 x 3 = 30 variables. Attached the spreadsheet “cik_list.xlsx”, which also contains the links to reports. It would be ideal if you could add 30 columns to each row, so that we would have the # rows unchanged after your data collection.

Additional Variables: positive/negative and uncertainty/constraining word proportion The absolute values of “Positive/Negative Scores” are equal to the number of positive/negative words in each section of 10-Q/K; so the (Loughran-McDonald) positive/negative word proportion can be simply calculated as “Positive/Negative Scores divided by Word Count – compute these measure in addition to Polarity Score. And, the “uncertainty score” and “constraining score” will be also just equal to the number of corresponding words and you can calculate the portion of these words as same as above.

Additional Variable: Constraining words for whole report Add one variable to the mix, which will be calculated only once for the whole report (i.e., not three times). It’s the number of “constraining” words over the whole report rather than in any specific section.

Output Data Structure Notations: “Management's Discussion and Analysis”: MDA “Quantitative and Qualitative Disclosures about Market Risk”: QQDMR “Risk Factors”: RF

Output Variables:

  • All input variables in “cik_list.xlsx”
  • mda_positive_score
  • mda_negative_score
  • mda_polarity_score
  • mda_average_sentence_length
  • mda_percentage_of_complex_words
  • mda_fog_index
  • mda_complex_word_count
  • mda_word_count
  • mda_uncertainty_score
  • mda_constraining_score
  • mda_positive_word_proportion
  • mda_negative_word_proportion
  • mda_uncertainty_word_proportion
  • mda_constraining_word_proportion
  • qqdmr_positive_score
  • qqdmr_negative_score
  • qqdmr_polarity_score
  • qqdmr_average_sentence_length
  • qqdmr_percentage_of_complex_words
  • qqdmr_fog_index
  • qqdmr_complex_word_count
  • qqdmr_word_count
  • qqdmr_uncertainty_score
  • qqdmr_constraining_score
  • qqdmr_positive_word_proportion
  • qqdmr_negative_word_proportion
  • qqdmr_uncertainty_word_proportion
  • qqdmr_constraining_word_proportion
  • rf_positive_score
  • rf_negative_score
  • rf_polarity_score
  • rf_average_sentence_length
  • rf_percentage_of_complex_words
  • rf_fog_index
  • rf_complex_word_count
  • rf_word_count
  • rf_uncertainty_score
  • rf_constraining_score
  • rf_positive_word_proportion
  • rf_negative_word_proportion
  • rf_uncertainty_word_proportion
  • rf_constraining_word_proportion
  • constraining_words_whole_report

Note this code needs to be rewritten

About

This is a mini-project which analysis financial documents available on the sec website and returns the polarity, complex words, positive score, negative score and several other attributes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%