Skip to content

Commit

Permalink
Prevent duplicated sections
Browse files Browse the repository at this point in the history
  • Loading branch information
ferblape committed Nov 18, 2024
1 parent df4e95b commit d4d7b24
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
4 changes: 2 additions & 2 deletions lib/section_extractor/document_parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def extract_sections(content, tocs)

0.upto(toc.toc_items.size - 1) do |index|
section = Section.new(content, toc.toc_items[index], toc.toc_items[index + 1])
sections << section
sections << section unless sections.find{ |s| s.raw_title == section.raw_title && s.positions&.first == section.positions&.first }
# TODO: review
# Skip empty sections, because they are not real sections, but just sentences that start with
# toc item title format
Expand All @@ -39,7 +39,7 @@ def extract_sections(content, tocs)
puts "- Skipping #{toc_items_to_skip.join(", ")} empty sections" if toc_items_to_skip.any?
toc_items_to_skip.each { |index| toc.toc_items.delete_at(index) }
end
sections
sections.sort_by{ |s| s.positions.first }
end

def extract_tocs(content)
Expand Down
2 changes: 1 addition & 1 deletion lib/section_extractor/section.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ def initialize(document_content, toc_item, next_toc_item)
def inspect
# Restore
# "#<Section title: #{@raw_title}, content: #{@content.slice(0, 50)}>"
"#<Section title: #{@raw_title}>"
"#<Section title: #{@raw_title} positions: #{@positions}>"
end

def full_content
Expand Down

0 comments on commit d4d7b24

Please sign in to comment.