Skip to content

Commit

Permalink
Audio PR - rocAL Audio decoder support (#118)
Browse files Browse the repository at this point in the history
* Audio Decoder PR 1

* channge image_info to sample_info to maintain a generic name for all the use-cases

* Change the copyright year from 2023 to 2024

* formatting the files

* Resolve PR comments

* Resolve PR comments

* Change decoded_img_info to decoded_video_info

* Change the file_path() function to virtual from pure virtual

* Minor change

* Minor changes

* Add the unit test file

* Revert "Add the unit test file"

This reverts commit e79cc06.

* Introduce CMake for sndfile

Modify CMakeLists.txt for the same

* Resolve 1st set of PR commenst

* Remove commented code for last batch polices and unsued imports

* ROI related changes - change from xy to wh to use for samples and channels

* Fix seg fault with ROI

* Remove opencv usage from the unit test

* Resolve the PR comments

* Remove instances of the audio_*_time - use the existing variables from Timing struct

* Formatting changes in rocal_api_data_loader.cpp and add the opencl and hip conditions for audio loaders

* Resolve the internal PR comments

* Reformatting the file_source_reader.cpp

* Remove _input_path from audio_source_evaluator and audio_read_and_decode as it is unecessary

* Change the header formatting

* Changes in copy_data() for audio samples

* Initialize the status at the beginning

* Cmake related changes for audio

* Resolve PR comments

* Add condition check to eliminate any other file extensions other than a wav file / other image formats and call open_folder deom subfolder_reading() function

* Update audio_read_and_decode.cpp

* Revert file source reader changes

* Update master_graph.cpp

* Update tensor.cpp - Remove a commented line of code

* Introduce ROCAL_AUDIO flag

Introduce flag for audio code, to be disabled when sndfile not found

* Minor changes

* Minor changes

* Add output comparision for Audio outputs

* Minor changes

* Minor changes to unit test

* Remove max_frames and max_channels args

* Remove max_frames, max_channels and sample rate from unit test

* Minor change

* Add python script to run audio unittests

* Clean up C++ audio unit test

* Modify rocal audio unit test

Update README

* Minor change

* Minor change

* Minor variable name change

* Minor changes

Add wav extension in file reader
Add reader in unit test

* Update C++ unit test

* Name change from sample to data

* Change from decoded_data_info to DecodedDataInfo

* Remove audio_decoder_factory.cpp file

* Minor change

* Change variable name

* Update the struct variable name in audio files

* Minor changes

* Change ROCAL_DATA_PATH to exclude rocal_data

* Use Pascal case for function names in audio decoder

* Modify cmake to have SNDFILE in all capital

* Minor changes

* Add struct for audio info in AudioReadAndDecode

* Fix merge conflict

* Renaming crop_image_info to CropImageInfo

* Remove - actual_host_buffers - Unused

* Rename TimingDBG to TimingDbg

* Move the instances of DecodedDataInfo to its base class LoaderModule

* Fix a WRN msg in master_graph.cpp

* Remove a dangling comment

* Rename _circ_data_info to _circ_buff_data_info

* Add Glob to CMakeLists.txt

* Rename SndFileDecoder to GenericAudioDecoder

* Fix build issues

* Minor change

* Update audio unit test README

* Revert "Add Glob to CMakeLists.txt"

This reverts commit 47263d9.

* Fix include headers for Audio files

* Fix copy data 2D

* Minor changes

* Pass decoded data info to load routine instead of separate vectors

* Update CHANGELOG.md

* Change swap_handle_time variable name in loader

* Formatting changes

Add comments

* Update doxygen comments

* Move file source reader from readers/image to readers folder

* Update README for audio test

* Minor fix

* Minor changes shard_count argument name

* Rename set and get functions of data_info to decoded_data_info

---------

Co-authored-by: root <[email protected]>
Co-authored-by: Swetha B S <[email protected]>
Co-authored-by: swetha097 <[email protected]>
Co-authored-by: swetha097 <[email protected]>
Co-authored-by: Swetha B S <>
Co-authored-by: Rajy Rawther <[email protected]>
  • Loading branch information
6 people authored Apr 19, 2024
1 parent 20b2fdc commit 422cbe5
Show file tree
Hide file tree
Showing 42 changed files with 2,177 additions and 34 deletions.
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@

### Added

* Packages - dev & test
* Packages - dev & tests
* Support for audio loader and decoder, which uses libsndfile library to decode wav files
* C++ rocAL audio unit test and python script to run and compare the outputs

### Optimizations

Expand Down Expand Up @@ -41,6 +43,7 @@
* OpenCV - [4.6.0](https://github.com/opencv/opencv/releases/tag/4.6.0)
* Turbo JPEG - [Version 3.0.1](https://libjpeg-turbo.org/)
* PyBind11 - [v2.10.4](https://github.com/pybind/pybind11)
* libsndfile - [1.0.31](https://github.com/libsndfile/libsndfile/releases/tag/1.0.31)
* rocAL Setup Script - `V2.0.0`
* Dependencies for all the above packages

Expand Down
69 changes: 69 additions & 0 deletions cmake/FindSndFile.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
################################################################################
#
# MIT License
#
# Copyright (c) 2024 Advanced Micro Devices, Inc.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
#
################################################################################
find_path(SNDFILE_INCLUDE_DIRS
NAMES sndfile.h
HINTS
$ENV{SNDFILE_PATH}/include
PATHS
/usr/local/include
/usr/include
)
mark_as_advanced(SNDFILE_INCLUDE_DIRS)

find_library(SNDFILE_LIBRARIES
NAMES sndfile libsndfile
HINTS
$ENV{SNDFILE_PATH}/lib
$ENV{SNDFILE_PATH}/lib64
PATHS
${CMAKE_SYSTEM_PREFIX_PATH}
${SNDFILE_PATH}
/usr/local/
PATH_SUFFIXES lib lib64
)
mark_as_advanced(SNDFILE_LIBRARIES)

if(SNDFILE_LIBRARIES AND SNDFILE_INCLUDE_DIRS)
set(SNDFILE_FOUND TRUE)
endif()

include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(SndFile
FOUND_VAR SNDFILE_FOUND
REQUIRED_VARS
SNDFILE_LIBRARIES
SNDFILE_INCLUDE_DIRS
)

set(SNDFILE_FOUND ${SNDFILE_FOUND} CACHE INTERNAL "")
set(SNDFILE_LIBRARIES ${SNDFILE_LIBRARIES} CACHE INTERNAL "")
set(SNDFILE_INCLUDE_DIRS ${SNDFILE_INCLUDE_DIRS} CACHE INTERNAL "")

if(SNDFILE_FOUND)
message("-- ${White}Using SndFile -- \n\tLibraries:${SNDFILE_LIBRARIES} \n\tIncludes:${SNDFILE_INCLUDE_DIRS}${ColourReset}")
else()
message( "-- ${Yellow}NOTE: FindSndFile failed to find -- SndFile${ColourReset}" )
endif()
10 changes: 10 additions & 0 deletions rocAL/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ find_package(LMDB QUIET)
find_package(RapidJSON QUIET)
find_package(StdFilesystem QUIET)
find_package(HALF QUIET)
find_package(SndFile QUIET)

# HIP Backend
if(GPU_SUPPORT AND "${BACKEND}" STREQUAL "HIP")
Expand Down Expand Up @@ -295,6 +296,15 @@ if(${BUILD_ROCAL})
else()
message(FATAL_ERROR "No filesystem library found.")
endif()
# SndFile
if(NOT SNDFILE_FOUND)
message("-- ${Yellow}NOTE: rocAL built without SndFile - Audio Functionalities will not be supported${ColourReset}")
else()
include_directories(${SNDFILE_INCLUDE_DIRS})
set(LINK_LIBRARY_LIST ${LINK_LIBRARY_LIST} ${SNDFILE_LIBRARIES})
message("-- ${White}rocAL built with Audio Functionality${ColourReset}")
target_compile_definitions(${PROJECT_NAME} PUBLIC -DROCAL_AUDIO)
endif()
# -Wall -- Enable most warning messages
# -mavx2 -- Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and AVX2 built-in functions and code generation
# -mfma -- Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and FMA built-in functions and code generation
Expand Down
40 changes: 40 additions & 0 deletions rocAL/include/api/rocal_api_data_loaders.h
Original file line number Diff line number Diff line change
Expand Up @@ -824,4 +824,44 @@ extern "C" RocalTensor ROCAL_API_CALL rocalJpegExternalFileSource(RocalContext p
RocalDecoderType rocal_decoder_type = RocalDecoderType::ROCAL_DECODER_TJPEG,
RocalExternalSourceMode external_source_mode = RocalExternalSourceMode::ROCAL_EXTSOURCE_FNAME);

/*! Creates Audio file reader and decoder. It allocates the resources and objects required to read and decode audio files stored on the file systems. It has internal sharding capability to load/decode in parallel if user wants.
* If the files are not in standard audio compression formats they will be ignored, Currently wav format is supported
* \param [in] context Rocal context
* \param [in] source_path A NULL terminated char string pointing to the location of files on the disk
* \param [in] shard_count Defines the parallelism level by internally sharding the input dataset and load/decode using multiple decoder/loader instances. Using shard counts bigger than 1 improves the load/decode performance if compute resources (CPU cores) are available.
* \param [in] is_output Boolean variable to enable the audio to be part of the output.
* \param [in] shuffle Boolean variable to shuffle the dataset.
* \param [in] loop Boolean variable to indefinitely loop through audio.
* \param [in] downmix Boolean variable to downmix all input channels to mono. If downmixing is turned on, the decoder output is 1D. If downmixing is turned off, it produces 2D output with interleaved channels incase of multichannel audio.
* \return Reference to the output audio
*/
extern "C" RocalTensor ROCAL_API_CALL rocalAudioFileSource(RocalContext context,
const char* source_path,
unsigned shard_count,
bool is_output,
bool shuffle = false,
bool loop = false,
bool downmix = false);

/*! Creates Audio file reader and decoder. It allocates the resources and objects required to read and decode audio files stored on the file systems. It has internal sharding capability to load/decode in parallel is user wants.
* If the files are not in standard audio compression formats they will be ignored.
* \param [in] context Rocal context
* \param [in] source_path A NULL terminated char string pointing to the location of files on the disk
* \param [in] shard_id Shard id for this loader
* \param [in] shard_count Defines the parallelism level by internally sharding the input dataset and load/decode using multiple decoder/loader instances. Using shard counts bigger than 1 improves the load/decode performance if compute resources (CPU cores) are available.
* \param [in] is_output Boolean variable to enable the audio to be part of the output.
* \param [in] shuffle Boolean variable to shuffle the dataset.
* \param [in] loop Boolean variable to indefinitely loop through audio.
* \param [in] downmix Boolean variable to downmix all input channels to mono. If downmixing is turned on, the decoder output is 1D. If downmixing is turned off, it produces 2D output with interleaved channels incase of multichannel audio.
* \return Reference to the output audio
*/
extern "C" RocalTensor ROCAL_API_CALL rocalAudioFileSourceSingleShard(RocalContext p_context,
const char* source_path,
unsigned shard_id,
unsigned shard_count,
bool is_output,
bool shuffle = false,
bool loop = false,
bool downmix = false);

#endif // MIVISIONX_ROCAL_API_DATA_LOADERS_H
6 changes: 5 additions & 1 deletion rocAL/include/api/rocal_api_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,11 @@ enum RocalDecoderType {
ROCAL_DECODER_VIDEO_FFMPEG_SW = 3,
/*! \brief AMD ROCAL_DECODER_VIDEO_FFMPEG_HW
*/
ROCAL_DECODER_VIDEO_FFMPEG_HW = 4
ROCAL_DECODER_VIDEO_FFMPEG_HW = 4,
/*! \brief AMD ROCAL_DECODER_AUDIO_GENERIC
* Uses SndFile library to read audio files
*/
ROCAL_DECODER_AUDIO_GENERIC = 5
};

enum RocalOutputMemType {
Expand Down
51 changes: 51 additions & 0 deletions rocAL/include/decoders/audio/audio_decoder.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
/*
Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/

#pragma once

#include <cstddef>
#include <vector>

#ifdef ROCAL_AUDIO
#include "sndfile.h"

class AudioDecoder {
public:
enum class Status {
OK = 0,
HEADER_DECODE_FAILED,
CONTENT_DECODE_FAILED,
UNSUPPORTED,
FAILED,
NO_MEMORY
};
virtual AudioDecoder::Status Initialize(const char* src_filename) = 0;
virtual AudioDecoder::Status Decode(float* buffer) = 0;
virtual AudioDecoder::Status DecodeInfo(int* samples, int* channels, float* sample_rates) = 0;
virtual void Release() = 0;
virtual ~AudioDecoder() = default;

protected:
SF_INFO _sfinfo;
SNDFILE* _sf_ptr;
};
#endif
37 changes: 37 additions & 0 deletions rocAL/include/decoders/audio/audio_decoder_factory.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
/*
Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/

#pragma once

#include "decoders/audio/audio_decoder.h"
#include "decoders/audio/generic_audio_decoder.h"

#ifdef ROCAL_AUDIO
static std::shared_ptr<AudioDecoder> create_audio_decoder(DecoderConfig config) {
switch (config.type()) {
case DecoderType::AUDIO_SOFTWARE_DECODE:
return std::make_shared<GenericAudioDecoder>();
default:
THROW("Unsupported decoder type " + TOSTR(config.type()));
}
}
#endif
38 changes: 38 additions & 0 deletions rocAL/include/decoders/audio/generic_audio_decoder.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
/*
Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/

#pragma once

#include "decoders/audio/audio_decoder.h"

#ifdef ROCAL_AUDIO
class GenericAudioDecoder : public AudioDecoder {
public:
//! Default constructor
GenericAudioDecoder();
AudioDecoder::Status Initialize(const char* src_filename) override;
AudioDecoder::Status Decode(float* buffer) override;
AudioDecoder::Status DecodeInfo(int* samples, int* channels, float* sample_rates) override;
void Release() override;
~GenericAudioDecoder() override;
};
#endif
1 change: 1 addition & 0 deletions rocAL/include/decoders/image/decoder.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ enum class DecoderType {
OVX_FFMPEG = 5, //!< Uses FFMPEG to decode video streams, can decode up to 4 video streams simultaneously
FFMPEG_SOFTWARE_DECODE = 6,
FFMPEG_HARDWARE_DECODE = 7,
AUDIO_SOFTWARE_DECODE = 8 //!< Uses sndfile to decode audio files
};

class DecoderConfig {
Expand Down
86 changes: 86 additions & 0 deletions rocAL/include/loaders/audio/audio_loader.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
/*
Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/

#pragma once

#include <string>
#include <thread>
#include <vector>

#include "loaders/audio/audio_read_and_decode.h"
#include "loaders/circular_buffer.h"
#include "pipeline/commons.h"
#include "meta_data/meta_data_reader.h"

#ifdef ROCAL_AUDIO

// AudioLoader runs an internal thread for loading and decoding of audios asynchronously
// It uses a circular buffer to store decoded audios for the user
class AudioLoader : public LoaderModule {
public:
explicit AudioLoader(void* dev_resources);
~AudioLoader() override;
LoaderModuleStatus load_next() override;
void initialize(ReaderConfig reader_cfg, DecoderConfig decoder_cfg, RocalMemType mem_type, unsigned batch_size, bool keep_orig_size = false) override;
void set_output(Tensor* output_audio) override;
size_t remaining_count() override; // returns number of remaining items to be loaded
void reset() override; // Resets the loader to load from the beginning of the media
Timing timing() override;
void start_loading() override;
LoaderModuleStatus set_cpu_affinity(cpu_set_t cpu_mask);
LoaderModuleStatus set_cpu_sched_policy(struct sched_param sched_policy);
std::vector<std::string> get_id() override;
DecodedDataInfo get_decode_data_info() override;
void set_prefetch_queue_depth(size_t prefetch_queue_depth) override;
void set_gpu_device_id(int device_id);
void shut_down() override;
void feed_external_input(const std::vector<std::string>& input_images_names, const std::vector<unsigned char*>& input_buffer,
const std::vector<ROIxywh>& roi_xywh, unsigned int max_width, unsigned int max_height, unsigned int channels,
ExternalSourceFileMode mode, bool eos) override { THROW("external source feed is not supported in audio loader") }

private:
bool is_out_of_data();
void de_init();
void stop_internal_thread();
LoaderModuleStatus update_output_audio();
LoaderModuleStatus load_routine();
std::shared_ptr<AudioReadAndDecode> _audio_loader;
Tensor* _output_tensor;
std::vector<std::string> _output_names; // audio file name/ids that are stored in the _output_audio
MetaDataBatch* _meta_data = nullptr; // The output of the meta_data_graph
bool _internal_thread_running;
size_t _output_mem_size, _batch_size, _max_decoded_samples, _max_decoded_channels;
std::thread _load_thread;
RocalMemType _mem_type;
DecodedDataInfo _decoded_audio_info;
DecodedDataInfo _output_decoded_audio_info;
CircularBuffer _circ_buff;
TimingDbg _swap_handle_time;
bool _is_initialized;
bool _stopped = false;
bool _loop; // If true the reader will wrap around at the end of the media (files/audios/...) and wouldn't stop
size_t _prefetch_queue_depth = 0; // Used for circular buffer's internal buffer allocation
size_t _audio_counter = 0; // How many audios have been loaded already
size_t _remaining_audio_count; // How many audios are there yet to be loaded
int _device_id;
};
#endif
Loading

0 comments on commit 422cbe5

Please sign in to comment.