Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FluentBit is unable to recover from too many accumulated buffers with filesystem storage type & multiline filter #9794

Open
mohannara opened this issue Jan 3, 2025 · 0 comments

Comments

@mohannara
Copy link

Bug Report

Describe the bug

Fluentbit is sending data to s3 and it works fine for sometime and then its stuck and not sending data to S3.
can see lot of chunks are pending in below locations.
sh-4.2$ pwd
/var/fluent-bit/state/flb-storage/emitter.9
sh-4.2$ ls -ltr|wc
1104

To Reproduce

  • Example log message if applicable:
    [2024/12/18 12:33:43] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1734160287.631424024.flb
    [2024/12/18 12:33:43] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1734160331.698451330.flb
    [2024/12/18 12:33:43] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1734160364.698626403.flb
    [2024/12/18 12:33:43] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1734160430.908663992.flb
    [2024/12/18 12:33:43] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1734160462.846875919.flb
    [2024/12/18 12:33:43] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1734160468.165929808.flb

Expected behavior

  • multiline feature should support
  • should process all the log records

Configuration

 apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: amazon-cloudwatch
  labels:
    k8s-app: fluent-bit
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush                     5
        Grace                     30
        Log_Level                 info
        Daemon                    off
        Parsers_File              parsers.conf
        HTTP_Server               ${HTTP_SERVER}
        HTTP_Listen               0.0.0.0
        HTTP_Port                 ${HTTP_PORT}
        storage.path              /var/fluent-bit/state/flb-storage/
        storage.sync              normal
        storage.checksum          off
        storage.metrics           On
        storage.backlog.mem_limit 5M

        #storage.max_chunks_up     1024
        

    @INCLUDE application-log.conf
    @INCLUDE dataplane-log.conf
    @INCLUDE host-log.conf

  application-log.conf: |
    [INPUT]
        Name                tail
        Tag                 application.*
        Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
        Path                /var/log/containers/*.log
        multiline.parser    docker, cri
        DB                  /var/fluent-bit/state/flb_container.db
        Mem_Buf_Limit       50MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Rotate_Wait         30
        storage.type        filesystem
        Read_from_Head      ${READ_FROM_HEAD}

        #storage.pause_on_chunks_overlimit   true
        

    [INPUT]
        Name                tail
        Tag                 application.*
        Path                /var/log/containers/fluent-bit*
        multiline.parser    docker, cri
        DB                  /var/fluent-bit/state/flb_log.db
        Mem_Buf_Limit       5MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Read_from_Head      ${READ_FROM_HEAD}

    [INPUT]
        Name                tail
        Tag                 application.*
        Path                /var/log/containers/cloudwatch-agent*
        multiline.parser    docker, cri
        DB                  /var/fluent-bit/state/flb_cwagent.db
        Mem_Buf_Limit       5MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Read_from_Head      ${READ_FROM_HEAD}

    [FILTER]
        Name                  multiline
        match                 application.*
        multiline.key_content log
        multiline.parser      java_multiline
        emitter_storage.type  filesystem
        # emitter_mem_buf_limit 100MB
        # flush_ms              2000

        #buffer                off



    [FILTER]
        Name                kubernetes
        Match               application.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_Tag_Prefix     application.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
        Labels              Off
        Annotations         Off
        Use_Kubelet         On
        Kubelet_Port        10250
        Buffer_Size         0


    [OUTPUT]
        Name s3
        Match application.*
        bucket dte-crm-subsystem-prod-stdout
        region ${AWS_REGION}
        store_dir /var/log/s3-dte
        total_file_size 5MB
        upload_timeout 1m
        retry_limit 3
        preserve_data_ordering On

        #store_dir_limit_size 300MB

 
  parsers.conf: |
    [PARSER]
        Name                syslog
        Format              regex
        Regex               ^(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key            time
        Time_Format         %b %d %H:%M:%S

    [PARSER]
        Name                container_firstline
        Format              regex
        Regex               (?<log>(?<="log":")\S(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ

    [PARSER]
        Name                cwagent_firstline
        Format              regex
        Regex               (?<log>(?<="log":")\d{4}[\/-]\d{1,2}[\/-]\d{1,2}[ T]\d{2}:\d{2}:\d{2}(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ

    [MULTILINE_PARSER]
        Name          java_multiline
        type          regex
        flush_timeout 1000
        
        # Regex rules for multiline parsing
        # ---------------------------------
        
        # configuration hints:
        
        #  - first state always has the name: start_state
        #  - every field in the rule must be inside double quotes
        
        # rules |   state name  | regex pattern                  | next state
        # ------|---------------|--------------------------------------------
        rule      "start_state"   "(^\[?\d{4}-\d{1,2}-\d{1,2}[T\s]\d{1,2}:\d{1,2}:\d{1,2}.*)$"  "cont"
        rule      "cont"          "/^(?!(\[?\d{4}-\d{1,2}-\d{1,2}).*)/"                     "cont"

Your Environment

  • Fluent Bit is running as a Daemonset. Image ID -> public.ecr.aws/aws-observability/aws-for-fluent-bit:2.32.2.20241008
  • Version used: version=1.9.10, commit=eba89f4660
  • Environment name and version: EKS 1.30
  • Server type and version: EC2 -> c5a.8xlarge
  • Operating System and version: Amazon Linux. Version - 2
  • Filters and plugins: filter -> multiline , output plugin -> s3

Additional context
some of the logs were missing in the s3 bucket. Also fluentbit memory usage is high.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant