Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explaining the labels_correlogram.jpg? #5138

Closed
zxq309 opened this issue Oct 12, 2021 · 60 comments
Closed

Explaining the labels_correlogram.jpg? #5138

zxq309 opened this issue Oct 12, 2021 · 60 comments
Labels
question Further information is requested Stale Stale and schedule for closing soon

Comments

@zxq309
Copy link

zxq309 commented Oct 12, 2021

❔Question

Can you explain this? I don't understand what it means. Thank you

Add

177224d0734a431bbef5b5f4d7451234
itional context

@zxq309 zxq309 added the question Further information is requested label Oct 12, 2021
@glenn-jocher
Copy link
Member

Correlogram is a group of 2d histograms showing each axis of your data against each other axis. The labels in your image are in xywh space.

@zxq309
Copy link
Author

zxq309 commented Oct 12, 2021

Correlogram is a group of 2d histograms showing each axis of your data against each other axis. The labels in your image are in xywh space.

ok

@github-actions
Copy link
Contributor

github-actions bot commented Nov 12, 2021

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Nov 12, 2021
@IlamSaran
Copy link

What to interpret from the label correlogram obtained for an custom dataset generated?

@glenn-jocher
Copy link
Member

@IlamSaran the label correlogram provides insight into the relationships between different label dimensions in your custom dataset. It can help identify patterns or correlations that may be useful for understanding the distribution of object annotations in your data.

@IlamSaran
Copy link

IlamSaran commented Nov 17, 2023

Thank you. You mean patterns or correlations of multi-scale objects or?? could you please clarify.
How label correlogram helps to identify patterns or correlations that may be useful for understanding the distribution of multi-class object annotations?

@glenn-jocher
Copy link
Member

@IlamSaran The label correlogram can help identify patterns or correlations in the distribution of object annotations across different classes and scales. For example, it can reveal if certain classes tend to co-occur frequently in the same image or if certain classes are more likely to appear at specific scales. This information can be valuable for understanding the characteristics of your dataset and for informing decisions related to model training and evaluation.

@IlamSaran
Copy link

My DL model for object detection task results with [email protected] = 90% and [email protected]:0.95 =78% on my custom created dataset.
But the same model results with 96% for [email protected] and only 55% [email protected]:0.95 for a public benchmark dataset. Note (both dataset contain same classes). Though [email protected] is greater for public dataset, [email protected]:0.95 is COMPARITBELY VERY LESS COMPARED TO CUSTOM dataset. Please justify and can i conclude that my model works better on my custom dataset.

@glenn-jocher
Copy link
Member

@IlamSaran The difference in [email protected]:0.95 between your custom dataset and the public benchmark dataset suggests that while your model is good at detecting objects at a specific IoU threshold (0.5), it may not be as robust across a range of IoU thresholds (0.5 to 0.95). This could be due to various factors such as differences in object scale, aspect ratios, or occlusions between the datasets.

The higher [email protected]:0.95 on your custom dataset does indicate that your model is better at generalizing across different levels of localization accuracy on that dataset. However, the lower [email protected]:0.95 on the public dataset suggests that there may be room for improvement in the model's ability to accurately localize objects across all scales and aspect ratios present in the public dataset.

In conclusion, your model seems to perform better on your custom dataset, but you should consider investigating the discrepancies on the public dataset to improve the model's robustness across various IoU thresholds.

@IlamSaran
Copy link

label_correlogram

Can you explain the four images in the above figure of a custom dataset?

@glenn-jocher
Copy link
Member

Sure! The figure shows a label correlogram of a custom dataset, broken into four sections, each representing a 2D histogram of label dimensions:

  1. Top-left: Object class correlations. It might show if certain classes are more likely to appear together.
  2. Top-right & Bottom-left: These are often mirror images displaying correlations between object dimensions (like width and height) or positions (like center x, center y) across different axes. They can highlight common sizes or aspect ratios.
  3. Bottom-right: Distributions of individual label attributes such as width, height, or even object classes. It gives a quick overview of the common dimensions or the prevalence of classes within your dataset.

This correlogram provides insights into your dataset's internal structure, which can be invaluable for tuning your model or understanding its performance.

@IlamSaran
Copy link

Thank you for detailed information on label correlogram.

@glenn-jocher
Copy link
Member

You're welcome! If you have any more questions or need further assistance, feel free to ask. Happy coding! 😊

@IlamSaran
Copy link

IlamSaran commented Apr 22, 2024

Can you please clarify the splitting strategy of an custom dataset containing multiple class objects (70:30 or 80:20). Whether randomly split or do we have to follow some structure? If it is random, how realistic the results will be?

@glenn-jocher
Copy link
Member

@IlamSaran, for splitting your custom dataset with multiple classes, you can go with either a 70:30 or 80:20 train-test split based on your dataset size and diversity. A random split is commonly used and can provide realistically varied results if your dataset is sufficiently large and representative.

However, ensure roughly equal representation of each class in both training and testing sets to avoid biases. This might involve stratified sampling if your classes are unevenly distributed.

A simple way to random split in Python could look like this:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

Replace X and y with your image paths and labels, respectively, and adjust the test_size as needed.

Happy training! 😄

@IlamSaran
Copy link

Thank you . Further, if the dataset contains images from different cameras, locations and varying lighting conditions, will random split of 70:30 perform well?

@glenn-jocher
Copy link
Member

Absolutely, a random split can still be effective for a diverse dataset with images from various cameras, locations, and lighting conditions. It ensures that both your training and validation sets contain a mix of these variations, helping your model generalize better across unseen data. Just make sure your dataset is sufficiently large and representative of all classes and conditions. If certain conditions or classes are rare, you might consider stratification to maintain balance across your splits. Happy modeling! 😊

@IlamSaran
Copy link

Thank you very much for detailed information on train and test split.

@glenn-jocher
Copy link
Member

@IlamSaran you're welcome! If you have any more questions as you move forward or need further clarification on anything else, don't hesitate to ask. Happy training! 😊

@IlamSaran
Copy link

IlamSaran commented May 23, 2024

I annotate my image dataset using polygon annotation. The intended task is object detection using Yolov5 model. I have also exported it in yolov5 text format. Now, that the trained model results with a bounding box over the detected objects. Though polygon annotation is used for the ground truth objects , the result appears with bounding box. IS IT CORRECT. How the IoU computations are possible ? Please clarify.

@glenn-jocher
Copy link
Member

Hello! Yes, it's correct that YOLOv5 uses bounding boxes for detection, even if your original annotations were polygons. When you export your annotations in YOLO format, they are converted to bounding boxes by taking the minimum bounding rectangle that encloses the polygon.

For IoU (Intersection over Union) computations, it compares the overlap between the predicted bounding box and the ground truth bounding box. Even though the original annotations were polygons, the IoU is calculated based on their bounding box representations. This is standard practice for models like YOLOv5 that are designed to predict rectangular bounding boxes.

If you need more detailed guidance on preparing your data or understanding the output, check out the training custom data section here: https://docs.ultralytics.com/yolov5/tutorials/train_custom_data/. Happy training! 😊

@IlamSaran
Copy link

Hi. This is regarding integrating YOLOv5 with ByteTrack tracking algorithm. While YOLOv5 also carries out NMS post processing to remove the redundant/low confidence detections and Byte Track also involves splitting of high confidence and low-confidence detections. how this works? Can you please clarify the process, while integrating YOLOv5 detection model with ByteTrack tracking algorithm.

@glenn-jocher
Copy link
Member

@IlamSaran hello! Thanks for reaching out with your question about integrating YOLOv5 with the ByteTrack tracking algorithm.

You're correct that YOLOv5 performs Non-Maximum Suppression (NMS) to filter out redundant and low-confidence detections. ByteTrack, on the other hand, splits detections into high-confidence and low-confidence categories to improve tracking performance.

Here's a brief overview of how you can integrate YOLOv5 with ByteTrack:

  1. YOLOv5 Detection: First, YOLOv5 processes the input frames and outputs bounding boxes with associated confidence scores and class labels. This includes the NMS step to remove redundant detections.

  2. ByteTrack Integration: After obtaining the YOLOv5 detections, you can feed these into ByteTrack. ByteTrack will then split the detections into high-confidence and low-confidence categories. High-confidence detections are used to update existing tracks, while low-confidence detections are used to recover tracks that might have been missed in previous frames.

Here's a simplified code example to illustrate the integration:

import torch
from yolov5 import YOLOv5
from bytetrack import ByteTrack

# Load YOLOv5 model
model = YOLOv5('yolov5s.pt')

# Initialize ByteTrack
tracker = ByteTrack()

# Process a video frame-by-frame
for frame in video_frames:
    # Perform detection with YOLOv5
    results = model(frame)
    
    # Extract bounding boxes, confidence scores, and class labels
    bboxes = results.xyxy[:, :4]
    scores = results.xyxy[:, 4]
    class_ids = results.xyxy[:, 5]
    
    # Integrate with ByteTrack
    tracked_objects = tracker.update(bboxes, scores, class_ids)
    
    # Visualize or process tracked objects
    visualize(frame, tracked_objects)

This is a high-level overview, and you might need to adjust the integration based on your specific requirements and the ByteTrack implementation details.

If you encounter any issues or need further assistance, please ensure you provide a minimum reproducible code example. This helps us better understand and reproduce the issue. You can find more details on creating a minimum reproducible example here: https://docs.ultralytics.com/help/minimum_reproducible_example.

Also, please verify that you are using the latest versions of torch and https://github.com/ultralytics/yolov5 to ensure compatibility and access to the latest features and fixes.

Feel free to reach out if you have any more questions. Happy coding! 😊

@IlamSaran
Copy link

IlamSaran commented Jul 12, 2024 via email

@glenn-jocher
Copy link
Member

Hi @IlamSaran,

Thank you for your follow-up question!

You bring up a great point about the interaction between YOLOv5's NMS and ByteTrack's handling of detections. Here's a more detailed explanation:

  1. YOLOv5 NMS: YOLOv5 performs Non-Maximum Suppression (NMS) to remove redundant and low-confidence detections, ensuring that only the most confident and non-overlapping bounding boxes are retained.

  2. ByteTrack's Role: ByteTrack further processes these detections by splitting them into high-confidence and low-confidence categories. The key reason for this additional step is to enhance the tracking performance. While YOLOv5's NMS outputs high-confidence detections, ByteTrack uses the low-confidence detections to help recover tracks that might have been missed in previous frames. This is particularly useful in scenarios where an object might be partially occluded or momentarily lost.

By leveraging both high and low-confidence detections, ByteTrack can maintain more robust and continuous tracking, even in challenging conditions.

If you have any further questions or need additional clarification, feel free to ask. We're here to help! 😊

@IlamSaran
Copy link

IlamSaran commented Jul 13, 2024 via email

@glenn-jocher
Copy link
Member

Hi @IlamSaran,

Thank you for your insightful question!

You are correct that YOLOv5's NMS typically outputs only high-confidence detections. However, for integrating with ByteTrack, you can modify the NMS step to retain both high and low-confidence detections. This way, ByteTrack can utilize the low-confidence detections to help recover tracks that might have been missed.

Here's how you can adjust the NMS step to retain both high and low-confidence detections:

  1. Modify NMS Thresholds: Adjust the NMS confidence threshold to a lower value to retain more detections, including those with lower confidence scores.

  2. Separate High and Low-Confidence Detections: After obtaining the detections, you can split them into high and low-confidence categories based on a secondary threshold.

Here's a simplified code example to illustrate this:

import torch
from yolov5 import YOLOv5
from bytetrack import ByteTrack

# Load YOLOv5 model
model = YOLOv5('yolov5s.pt')

# Initialize ByteTrack
tracker = ByteTrack()

# Define confidence thresholds
high_conf_thresh = 0.5
low_conf_thresh = 0.1

# Process a video frame-by-frame
for frame in video_frames:
    # Perform detection with YOLOv5
    results = model(frame)
    
    # Extract bounding boxes, confidence scores, and class labels
    bboxes = results.xyxy[:, :4]
    scores = results.xyxy[:, 4]
    class_ids = results.xyxy[:, 5]
    
    # Split detections into high and low confidence
    high_conf_detections = results.xyxy[scores >= high_conf_thresh]
    low_conf_detections = results.xyxy[(scores >= low_conf_thresh) & (scores < high_conf_thresh)]
    
    # Integrate with ByteTrack
    tracked_objects = tracker.update(high_conf_detections, low_conf_detections)
    
    # Visualize or process tracked objects
    visualize(frame, tracked_objects)

This approach ensures that ByteTrack receives both high and low-confidence detections, allowing it to perform more robust tracking.

If you encounter any issues or need further assistance, please ensure you provide a minimum reproducible code example. This helps us better understand and reproduce the issue. You can find more details on creating a minimum reproducible example here: https://docs.ultralytics.com/help/minimum_reproducible_example.

Also, please verify that you are using the latest versions of torch and https://github.com/ultralytics/yolov5 to ensure compatibility and access to the latest features and fixes.

Feel free to reach out if you have any more questions. We're here to help! 😊

@glenn-jocher
Copy link
Member

Hello!

You're very welcome! I'm glad you found the explanation helpful. 😊

To add a bit more context, the mAP @IlamSaran.5:0.95 metric is indeed a more rigorous and informative measure of a model's performance, especially in scenarios where precise localization is crucial. It provides a balanced view by considering multiple IoU thresholds, making it a preferred choice for evaluating modern object detection models.

Additional Tips for Computing mAP Metrics

When working with Vision Transformers like DETR or Swin Transformer, you can typically use the evaluation scripts provided by the respective repositories. These scripts are designed to compute mAP metrics and other evaluation metrics efficiently.

For example, if you're using DETR, you can follow their evaluation guidelines:

# Clone the DETR repository
git clone https://github.com/facebookresearch/detr.git
cd detr

# Install the required dependencies
pip install -r requirements.txt

# Evaluate the model on the COCO dataset
python3 -m torch.distributed.launch --nproc_per_node=NUM_GPUS --use_env main.py --coco_path /path/to/coco --eval

This will compute the [email protected]:0.95 along with other metrics.

Ensuring Reproducibility

If you encounter any issues or bugs while computing these metrics, please ensure that you are using the latest versions of the packages and that the issue is reproducible with the latest codebase. This helps in diagnosing and resolving the problem more effectively.

Community and Resources

Feel free to explore the Ultralytics YOLOv5 documentation for more insights and resources. The community is also a great place to share your experiences and get support from fellow developers.

If you have any more questions or need further assistance, don't hesitate to ask. We're here to help!

Happy coding and best of luck with your projects! 🚀

@IlamSaran
Copy link

IlamSaran commented Aug 11, 2024

Hello Mr. Glenn
This is a follow-up question .
It is said that [email protected]:0.95 is more stringent and provides a better assessment of the model's ability to precisely localize objects.
Can you please give a deep insight on how this metric assures precise localization of objects. And comment its impact on classification accuracy of the multiple objects.
Thank you..

@glenn-jocher
Copy link
Member

Hello @IlamSaran,

Thank you for your follow-up question! I'm happy to provide more insights into how the [email protected]:0.95 metric ensures precise localization and its impact on classification accuracy.

Deep Insight into [email protected]:0.95

1. Multiple IoU Thresholds:

  • IoU (Intersection over Union) measures the overlap between the predicted bounding box and the ground truth bounding box.
  • [email protected]:0.95 calculates the mean Average Precision at multiple IoU thresholds (from 0.5 to 0.95 in increments of 0.05). This means the model's predictions are evaluated at various levels of overlap, not just a single threshold.

2. Stringency and Precision:

  • At lower IoU thresholds (e.g., 0.5), the model only needs to achieve a 50% overlap between the predicted and ground truth boxes to be considered a correct detection. This is relatively lenient.
  • At higher IoU thresholds (e.g., 0.95), the model needs to achieve a 95% overlap, which is much more stringent. This requires the model to predict bounding boxes that closely match the ground truth, ensuring precise localization.

3. Holistic Performance Evaluation:

  • By averaging the precision across multiple IoU thresholds, [email protected]:0.95 provides a more comprehensive evaluation of the model's performance. It ensures that the model is not only good at detecting objects but also at accurately localizing them.

Impact on Classification Accuracy

1. Localization and Classification:

  • Precise localization inherently impacts classification accuracy. If a model can accurately localize an object, it is more likely to correctly classify it as well. Mislocalized objects can lead to incorrect classifications, especially in cluttered scenes with multiple objects.

2. Balanced Metric:

  • [email protected]:0.95 balances the need for both high localization precision and classification accuracy. It penalizes models that may have high classification accuracy but poor localization, ensuring that only well-rounded models score high.

Example

To illustrate, consider two models:

Model B would have a higher [email protected]:0.95, reflecting its superior performance in both detection and localization.

Conclusion

In summary, [email protected]:0.95 is a stringent and comprehensive metric that ensures models are evaluated on their ability to both detect and precisely localize objects. This leads to better overall performance, including improved classification accuracy.

If you have any further questions or need additional details, feel free to ask. We're here to help! 😊

@IlamSaran
Copy link

Hi Mr. Gelnn
Thank you very much for giving deep insight about my previous queries that will greatly support my research findings.
Recently, i came across a metric F1-beta score. So what is what is F1-beta score? Difference between F1-score and F-beta score. How to set the beta value for computing it. Can you please share some valuable information on this regard.
Thank you.

@glenn-jocher
Copy link
Member

Hello @IlamSaran,

Thank you for your kind words! I'm glad the previous insights were helpful for your research. 😊

Understanding F1-beta Score

1. F1 Score:

  • The F1 Score is the harmonic mean of Precision and Recall. It is a balanced metric that considers both false positives and false negatives.
  • Formula:
    [
    F1 = 2 \cdot \frac{{\text{Precision} \cdot \text{Recall}}}{{\text{Precision} + \text{Recall}}}
    ]

2. F-beta Score:

  • The F-beta Score is a generalized form of the F1 Score that allows you to weigh Precision and Recall differently.
  • Formula:
    [
    F_{\beta} = (1 + \beta^2) \cdot \frac{{\text{Precision} \cdot \text{Recall}}}{{(\beta^2 \cdot \text{Precision}) + \text{Recall}}}
    ]
  • Here, β (beta) is a parameter that determines the weight of Recall in the combined score.
    • If β = 1, the F-beta score is equivalent to the F1 score.
    • If β > 1, Recall is weighted more heavily.
    • If β < 1, Precision is weighted more heavily.

Choosing the Beta Value

  • Application-Specific: The choice of β depends on the specific requirements of your application.
    • For example, in medical diagnostics, you might prioritize Recall (sensitivity) to ensure that all potential cases are identified, even if it means having more false positives. In this case, you would choose a β > 1.
    • Conversely, in spam detection, you might prioritize Precision to minimize false positives, choosing a β < 1.

Example Code

Here's a simple example of how you might compute the F-beta score using Python:

from sklearn.metrics import fbeta_score

# Example precision and recall values
precision = 0.8
recall = 0.6

# Compute F1 score (beta=1)
f1_score = fbeta_score([1, 1, 0, 0], [1, 0, 1, 1], beta=1)
print(f"F1 Score: {f1_score}")

# Compute F-beta score with beta=2 (favoring recall)
f2_score = fbeta_score([1, 1, 0, 0], [1, 0, 1, 1], beta=2)
print(f"F2 Score: {f2_score}")

# Compute F-beta score with beta=0.5 (favoring precision)
f05_score = fbeta_score([1, 1, 0, 0], [1, 0, 1, 1], beta=0.5)
print(f"F0.5 Score: {f05_score}")

Conclusion

The F-beta score is a flexible metric that allows you to tailor the balance between Precision and Recall to suit your specific needs. By adjusting the β value, you can emphasize the aspect that is more critical for your application.

If you have any further questions or need additional details, feel free to ask. We're here to help! 😊

@IlamSaran
Copy link

Hello Mr. Glenn
Thank you very much for your prompt reply. Here are some more queries regarding the learning rate. The initial learning rate (lr0) required to be pre-set before training for optimizers such as SGD, ADAM varies. why?
Can you please give details on how to set initial learning rates, whether I can set same lr0 for all optimizers or different.

@glenn-jocher
Copy link
Member

Hello,

The initial learning rate (lr0) varies for different optimizers because each optimizer has unique characteristics and convergence behaviors. For instance, SGD typically requires a smaller learning rate compared to Adam, which can handle larger learning rates due to its adaptive nature. It's generally recommended to start with the default values provided in the YOLOv5 repository and adjust based on your specific dataset and training results. If you have further questions, please refer to the YOLOv5 documentation for detailed guidance.

@IlamSaran
Copy link

Thank you very much for the information.

@glenn-jocher
Copy link
Member

You're welcome! If you have any more questions or need further assistance, feel free to ask.

@IlamSaran
Copy link

Hi
This question is regarding the train and test split. Initially we split the data samples into either 70:30 or 80:20 train/test split ratio. But only after splitting, the 70 % train data is subjected to different data augmentation techniques to increase the volume of training samples. Now that training set contains more no. of training samples, how the 70:30 maintained?

@glenn-jocher
Copy link
Member

Hi,

When you apply data augmentation, the original 70:30 train/test split ratio is maintained because the augmentation techniques generate variations of the existing training samples rather than adding new, unique samples. The test set remains unchanged, ensuring the split ratio is preserved. If you have further questions, please refer to the YOLOv5 documentation for detailed guidance.

@IlamSaran
Copy link

IlamSaran commented Aug 19, 2024

Hi

Thank you for your detailed explanation regarding my previous question. It was incredibly helpful.
I have another question. While working on multi-class object detection say (4 fruit classes: apple, orange, banana and guava).
Now while calculating the accuracy , we use TP, FP and FN in the confusion matrix as per yolov5 documentation. What about TN? Do we consider TN in the confusion matrix?
Please clarify.

@IlamSaran
Copy link

IlamSaran commented Aug 19, 2024

confusion_matrix.
Hi Glenn
This is follow-up of the previous question How to interpret this confusion matrix. Does confusion matrix majorly contribute to evaluate an multi-class object detection model

@glenn-jocher
Copy link
Member

Hi,

In multi-class object detection, the confusion matrix helps evaluate model performance by showing the counts of true positives (TP), false positives (FP), and false negatives (FN) for each class. True negatives (TN) are typically not included in object detection confusion matrices since they represent the absence of objects, which is less informative for this task. The confusion matrix is a valuable tool for understanding class-specific performance and identifying areas for improvement.

@IlamSaran
Copy link

Hi
Thanks a lot. A more detailed information will help me to conclude my research results. In a multi-class object detection, What are FP and FN? In literatures, Confusion matrix contains (TP, FP.FN and TN) can we neglect TN in computing the accuracy metrics. Please justify .

@glenn-jocher
Copy link
Member

Hi,

In multi-class object detection, FP (False Positives) are incorrect detections, and FN (False Negatives) are missed detections. TN (True Negatives) are typically not used in object detection metrics as they represent the absence of objects. Metrics like precision, recall, and mAP focus on TP, FP, and FN to evaluate model performance effectively.

@IlamSaran
Copy link

IlamSaran commented Aug 22, 2024

confusion_matrix
Can you please explain what the circled values mean in a multi-class classification.
Also, how the values 0.96 for car, 0.96 for bus and 0.98 for person are computed? Kindly help me in this regard.

@glenn-jocher
Copy link
Member

The circled values in the confusion matrix represent the precision for each class. Precision is calculated as TP / (TP + FP). For example, a precision of 0.96 for the car class means that 96% of the detected cars are true positives. The same calculation applies to the bus and person classes.

@pderrenger
Copy link
Member

Yes, YOLOv5 uses bounding boxes for detection, even if the ground truth is polygonal. The IoU is computed based on these bounding boxes. If you need polygonal outputs, consider post-processing techniques or models designed for instance segmentation.

@IlamSaran
Copy link

Hello Glenn

This is query is regarding transformer based object detector models (Eg. DETR). The architecture uses Hungarian algorithm to solve the matching problem between the prediction and ground truth objects. If we want to link a object tracker algorithm to this model, which also utilizes Hungarian strategy to link the detected objects over time, how to proceed further. should we use the hungarian two times. Kindly help in this regard

@pderrenger
Copy link
Member

Yes, YOLOv5 uses bounding boxes for detection, even if the ground truth is polygonal. The IoU is computed based on these bounding boxes, which is standard for object detection tasks. If you need polygonal outputs, consider post-processing techniques or models designed for instance segmentation.

@IlamSaran
Copy link

Hello Glenn

This is query is regarding transformer based object detector models (Eg. DETR). The architecture uses Hungarian algorithm to solve the matching problem between the prediction and ground truth objects. If we want to link a object tracker algorithm to this model, which also utilizes Hungarian strategy to link the detected objects over time, how to proceed further. should we use the Hungarian two times. Kindly help in this regard.

@pderrenger
Copy link
Member

Yes, YOLOv5 uses bounding boxes for detection, even if your annotations are polygons. The IoU is computed based on these bounding boxes, which is standard for object detection tasks.

@IlamSaran
Copy link

Yes, YOLOv5 uses bounding boxes for detection, even if your annotations are polygons. The IoU is computed based on these bounding boxes, which is standard for object detection tasks.

I GOT IT. THANK YOU FOR THE REPLY.

@IlamSaran
Copy link

Hello

I have another query regarding Transformer based object detector models (Eg. DETR). The architecture uses Hungarian algorithm to solve the matching problem between the prediction and ground truth objects. If we want to link a object tracker algorithm to this model, which also utilizes Hungarian strategy to link the detected objects over time, how to proceed further. should we use the Hungarian two times. Kindly help in this regard.

@pderrenger
Copy link
Member

Yes, YOLOv5 uses bounding boxes for detection, even if the ground truth is polygonal. The IoU is computed based on these bounding boxes. If you need polygonal outputs, consider post-processing techniques or models designed for instance segmentation.

@ultralytics ultralytics deleted a comment from pderrenger Nov 3, 2024
@ultralytics ultralytics deleted a comment from pderrenger Nov 3, 2024
@Leod12
Copy link

Leod12 commented Nov 20, 2024

The circled values in the confusion matrix represent the precision for each class. Precision is calculated as TP / (TP + FP). For example, a precision of 0.96 for the car class means that 96% of the detected cars are true positives. The same calculation applies to the bus and person classes.
@pderrenger @glenn-jocher
Hi, could you explain to me what the graph on the top right means?
labels

@pderrenger
Copy link
Member

The graph on the top right is likely a visual representation of class distribution within your dataset. It shows the frequency of each class label, helping you understand whether your dataset is balanced or if there are any classes with significantly more or fewer instances. A balanced dataset is generally preferable for training robust models. If you have further questions, feel free to check out the YOLOv5 documentation for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale Stale and schedule for closing soon
Projects
None yet
Development

No branches or pull requests

5 participants