Managing Song Storage for Rare vs Common species #796
-
I just deleted for the sake of my SD card running out of room, all 220 GB of /Extracted/By_Date bird songs, which is all recorded songs from October 2022 until now on this station. I foolishly thought a 250GB SD card would last me for at least 2 years of recordings... I don't have the time to go through many thousands of recordings, but what I would love is a way to delete (or perhaps, never even store in Extracted) song recordings from the 10-15 most common species which make up ~98% of all recordings - while keeping the rarer detections to go in and manually verify (which I do sometimes). Anyone know of a good way to handle this? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
This script should work, or at least give you some ideas. You'll have to un-comment the rm if you actually want to delete the directories, otherwise it'll just print out "REMOVING". It will only keep the bottom 10% of species by count, change 0.1 in the script as desired. #!/bin/bash
# Set variables
db_path=~/BirdNET-Pi/scripts/birds.db
dir_path=~/BirdSongs/Extracted/By_Date
# Calculate the average count of detections for each species
avg_counts=$(sqlite3 $db_path "SELECT Com_Name || '|' || AVG(count) FROM (SELECT Com_Name, COUNT(*) as count FROM detections GROUP BY Com_Name) GROUP BY Com_Name ORDER BY AVG(count)")
# Get the species that are in the bottom 10% of this distribution by count
bottom_10=$(echo "$avg_counts" | awk -F'|' -v n="$(echo "$avg_counts" | wc -l)" 'NR <= n * 0.1 {print $1}')
# Remove all directories from $dir_path that aren't in the bottom 10% rarest
find $dir_path -type d | while read dir; do
dir_name=$(basename "$dir")
if ! echo "$bottom_10" | grep -q "$dir_name"; then
rm -r "$dir"
fi
done |
Beta Was this translation helpful? Give feedback.
This script should work, or at least give you some ideas. You'll have to un-comment the rm if you actually want to delete the directories, otherwise it'll just print out "REMOVING". It will only keep the bottom 10% of species by count, change 0.1 in the script as desired.