-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pcap_stats woefully under-reports ifdropped packets on Linux #1328
Comments
That page lists several different receive errors.
Some others, even though they say "See the network driver for the exact meaning of this value.", are a bit more specific, such as:
So that'd count
That sounds like packets dropped of some other reason, whether it's one of the above or a reason for which there is no statistic. |
Yes, I think you're right about the vagueness there. Would a patch adding one or more of the above be welcomed? I'd like to do a bit more investigation on this to figure out what might be going on with various popular drivers before coming up with such a patch. |
Adding one or more Linux-specific Rx error counters to the libpcap API would not solve the problem reliably because at least some Linux network interfaces do not report correctly at least some counters depending on:
Moreover, after you pinpoint a particular counter bug and want it fixed, the responsible party may decline to fix it on the grounds of backward compatibility, or not respond at all. If you would prefer to focus on your project instead, let me recommend picking a particular combination of network hardware and driver that experiences the least problems with the counters, and using that for the debugging. |
Right now, our assumption is that while drivers may report stats to different places, none are double-accounting stats into multiple counters, so that adding any missing counters into libpcap's calculations, while it may not solve the problem 100%, can only get the results returned by libpcap closer to the truth. But yes, we need to do more investigation to confirm that that is the case. |
We performed a set of tests in laboratory conditions and we have confirmed that adding
We had erroneously concluded before that there were still unaccounted for packets after this fix because the site we tested it on had unusually large average packet size (due to jumbo frames + unusual traffic patterns) and we are just, therefore, getting better than normal performance. I shall prepare a PR for your consideration. |
Okay, so we started testing in a wider array of NICs and we ran into some weird stuff. The NIC It would be great if any Linux net guys could comment? @davem330 :) ? |
Many drivers do not report anything into
rx_missed_errors
orrx_fifo_errors
, see vmxnet for example.At least one issue appears to be that linux_if_drops() does not report the value of
rx_dropped
, which indicates the number of packets not forwarded to the upper layers for processing.The impact of this problem is that if I do a controlled test sending fixed size packets at line rate, I know a system can only handle 2-3gbps of traffic on a single core. But when I start capturing with libpcap and look at the drop rates, they stay at zero (or unbelievably low), even when the traffic is in excess of 15gbps...
Although, when we added the
rx_dropped
stats in, we still saw the problem onvmxnet3
, so maybe that's a red-herring, or maybe there are multiple issues here.The text was updated successfully, but these errors were encountered: