Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio PR - rocAL Audio decoder support #118

Merged
merged 111 commits into from
Apr 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
be883dc
Audio Decoder PR 1
Mar 6, 2024
b180b78
channge image_info to sample_info to maintain a generic name for all …
Mar 8, 2024
3863a62
Merge branch 'generic-name-change' into swbs/audio/pr1
swetha097 Mar 8, 2024
853ea8e
Change the copyright year from 2023 to 2024
Mar 11, 2024
3bdcf8e
formatting the files
Mar 11, 2024
bc34817
Resolve PR comments
Mar 11, 2024
a6c5727
Resolve PR comments
Mar 12, 2024
89fd4cb
Change decoded_img_info to decoded_video_info
Mar 12, 2024
5f627b2
Change the file_path() function to virtual from pure virtual
Mar 12, 2024
97d0628
Minor change
Mar 12, 2024
6d53e57
Minor changes
Mar 12, 2024
e79cc06
Add the unit test file
Mar 12, 2024
9b7e839
Revert "Add the unit test file"
Mar 12, 2024
a52ae62
Introduce CMake for sndfile
fiona-gladwin Mar 13, 2024
2d6eaa5
Resolve 1st set of PR commenst
Mar 14, 2024
d248676
Merge remote-tracking branch 'swe_fork/swbs/audio/pr1' into swbs/audi…
Mar 14, 2024
eac59e3
Remove commented code for last batch polices and unsued imports
Mar 14, 2024
ab165ee
ROI related changes - change from xy to wh to use for samples and cha…
Mar 14, 2024
e2fe45c
Fix seg fault with ROI
Mar 14, 2024
e5ff5e4
Remove opencv usage from the unit test
Mar 14, 2024
3988643
Resolve the PR comments
Mar 14, 2024
8790f57
Remove instances of the audio_*_time - use the existing variables fro…
Mar 14, 2024
8134705
Formatting changes in rocal_api_data_loader.cpp and add the opencl an…
Mar 14, 2024
e37bcad
Resolve the internal PR comments
Mar 14, 2024
760ba3a
Reformatting the file_source_reader.cpp
Mar 14, 2024
2534ca9
Remove _input_path from audio_source_evaluator and audio_read_and_dec…
Mar 14, 2024
73f8a31
Change the header formatting
Mar 14, 2024
6feedda
Changes in copy_data() for audio samples
Mar 14, 2024
c3babe0
Initialize the status at the beginning
Mar 14, 2024
3d1bbc0
Cmake related changes for audio
Mar 18, 2024
a0c3b9e
Resolve PR comments
Mar 18, 2024
ca06049
Add condition check to eliminate any other file extensions other than…
Mar 18, 2024
5527cc0
Update audio_read_and_decode.cpp
swetha097 Mar 19, 2024
0b058bb
Revert file source reader changes
fiona-gladwin Mar 19, 2024
18a8a83
Update master_graph.cpp
swetha097 Mar 19, 2024
e5f3840
Update tensor.cpp - Remove a commented line of code
swetha097 Mar 19, 2024
a8507bf
Introduce ROCAL_AUDIO flag
fiona-gladwin Mar 19, 2024
421cc29
Merge branch 'swbs/audio/pr1' of https://github.com/swetha097/rocAL i…
fiona-gladwin Mar 19, 2024
ea6158b
Minor changes
fiona-gladwin Mar 19, 2024
4874e4f
Minor changes
fiona-gladwin Mar 19, 2024
75c360f
Add output comparision for Audio outputs
swetha097 Mar 19, 2024
38dc894
Merge branch 'swbs/audio/pr1' of https://github.com/swetha097/rocAL i…
swetha097 Mar 19, 2024
13ea715
Minor changes
fiona-gladwin Mar 19, 2024
033f967
Minor changes to unit test
fiona-gladwin Mar 20, 2024
f4bf553
Remove max_frames and max_channels args
fiona-gladwin Mar 20, 2024
1e6240a
Remove max_frames, max_channels and sample rate from unit test
fiona-gladwin Mar 20, 2024
d6f60ac
Minor change
fiona-gladwin Mar 20, 2024
b9a39a9
Add python script to run audio unittests
swetha097 Mar 20, 2024
6018273
Merge branch 'swbs/audio/pr1' of https://github.com/swetha097/rocAL i…
swetha097 Mar 20, 2024
08cf352
Clean up C++ audio unit test
fiona-gladwin Mar 20, 2024
7789333
Modify rocal audio unit test
fiona-gladwin Mar 20, 2024
741a677
Minor change
fiona-gladwin Mar 20, 2024
2150bbc
Merge branch 'develop' of https://github.com/ROCm/rocAL into swbs/aud…
fiona-gladwin Mar 20, 2024
e1eda57
Merge branch 'swbs/audio/pr1' of https://github.com/swetha097/rocAL i…
fiona-gladwin Mar 20, 2024
ebb6212
Minor change
fiona-gladwin Mar 21, 2024
db7f126
Minor variable name change
fiona-gladwin Mar 22, 2024
87ab2c3
Minor changes
fiona-gladwin Mar 22, 2024
e21d617
Update C++ unit test
fiona-gladwin Mar 22, 2024
c9a6dbb
Merge branch 'generic-name-change' into swbs/audio/pr1
fiona-gladwin Mar 26, 2024
ca6f311
Merge branch 'develop' of https://github.com/ROCm/rocAL into generic-…
fiona-gladwin Mar 26, 2024
8d34902
Merge branch 'generic-name-change' into swbs/audio/pr1
fiona-gladwin Mar 26, 2024
ffb284d
Name change from sample to data
Mar 26, 2024
ff12843
Merge branch 'generic-name-change' of https://github.com/swetha097/ro…
Mar 26, 2024
e53388f
Change from decoded_data_info to DecodedDataInfo
Mar 26, 2024
0774f69
Remove audio_decoder_factory.cpp file
fiona-gladwin Mar 26, 2024
90b9d83
Minor change
fiona-gladwin Mar 26, 2024
531e5fb
Change variable name
fiona-gladwin Mar 26, 2024
98ce527
Merge branch 'generic-name-change' into swbs/audio/pr1
fiona-gladwin Mar 26, 2024
7d4c1fd
Update the struct variable name in audio files
fiona-gladwin Mar 26, 2024
a9e6497
Minor changes
fiona-gladwin Mar 27, 2024
85d21e6
Change ROCAL_DATA_PATH to exclude rocal_data
fiona-gladwin Mar 27, 2024
3a86507
Use Pascal case for function names in audio decoder
fiona-gladwin Mar 27, 2024
7f46a25
Modify cmake to have SNDFILE in all capital
fiona-gladwin Apr 2, 2024
70aa700
Minor changes
fiona-gladwin Apr 2, 2024
0693605
Add struct for audio info in AudioReadAndDecode
fiona-gladwin Apr 2, 2024
44e654d
Merge branch 'develop' of https://github.com/ROCm/rocAL into generic-…
fiona-gladwin Apr 2, 2024
c77140c
Merge branch 'generic-name-change' into swbs/audio/pr1
fiona-gladwin Apr 2, 2024
f96a92b
Fix merge conflict
fiona-gladwin Apr 2, 2024
91d0615
Renaming crop_image_info to CropImageInfo
swetha097 Apr 3, 2024
bb4e5a5
Remove - actual_host_buffers - Unused
swetha097 Apr 3, 2024
50829f6
Rename TimingDBG to TimingDbg
swetha097 Apr 3, 2024
d0a456b
Move the instances of DecodedDataInfo to its base class LoaderModule
swetha097 Apr 3, 2024
a80a3a6
Fix a WRN msg in master_graph.cpp
swetha097 Apr 3, 2024
f648feb
Remove a dangling comment
swetha097 Apr 3, 2024
6146bac
Rename _circ_data_info to _circ_buff_data_info
swetha097 Apr 3, 2024
47263d9
Add Glob to CMakeLists.txt
fiona-gladwin Apr 4, 2024
8623be3
Rename SndFileDecoder to GenericAudioDecoder
fiona-gladwin Apr 4, 2024
c4af22c
Merge branch 'develop' of https://github.com/ROCm/rocAL into generic-…
fiona-gladwin Apr 4, 2024
5b9be3d
Merge branch 'generic-name-change' of https://github.com/swetha097/ro…
fiona-gladwin Apr 4, 2024
47dea85
Merge branch 'generic-name-change' into swbs/audio/pr1
fiona-gladwin Apr 4, 2024
660071b
Fix build issues
fiona-gladwin Apr 4, 2024
4f9ab6b
Minor change
fiona-gladwin Apr 4, 2024
0f0be88
Update audio unit test README
fiona-gladwin Apr 4, 2024
e496f3a
Revert "Add Glob to CMakeLists.txt"
fiona-gladwin Apr 10, 2024
5df0055
Merge branch 'develop' of https://github.com/ROCm/rocAL into swbs/aud…
fiona-gladwin Apr 10, 2024
7dc7092
Fix include headers for Audio files
fiona-gladwin Apr 10, 2024
19e30cf
Fix copy data 2D
fiona-gladwin Apr 10, 2024
4c02dfb
Minor changes
fiona-gladwin Apr 11, 2024
e3f350f
Pass decoded data info to load routine instead of separate vectors
fiona-gladwin Apr 11, 2024
67cda83
Update CHANGELOG.md
fiona-gladwin Apr 11, 2024
8b1c59f
Change swap_handle_time variable name in loader
fiona-gladwin Apr 11, 2024
91fed39
Formatting changes
fiona-gladwin Apr 11, 2024
6a80714
Update doxygen comments
fiona-gladwin Apr 11, 2024
689985d
Move file source reader from readers/image to readers folder
fiona-gladwin Apr 11, 2024
d000af0
Update README for audio test
fiona-gladwin Apr 11, 2024
7415447
Minor fix
fiona-gladwin Apr 12, 2024
f6bffef
Merge branch 'develop' of https://github.com/ROCm/rocAL into swbs/aud…
fiona-gladwin Apr 12, 2024
689c55f
Minor changes shard_count argument name
fiona-gladwin Apr 12, 2024
1079d50
Rename set and get functions of data_info to decoded_data_info
fiona-gladwin Apr 12, 2024
d928c48
Merge branch 'develop' of https://github.com/ROCm/rocAL into swbs/aud…
fiona-gladwin Apr 17, 2024
df5c0d3
Merge branch 'develop' into swbs/audio/pr1
rrawther Apr 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@

### Added

* Packages - dev & test
* Packages - dev & tests
* Support for audio loader and decoder, which uses libsndfile library to decode wav files
* C++ rocAL audio unit test and python script to run and compare the outputs

### Optimizations

Expand Down Expand Up @@ -41,6 +43,7 @@
* OpenCV - [4.6.0](https://github.com/opencv/opencv/releases/tag/4.6.0)
* Turbo JPEG - [Version 3.0.1](https://libjpeg-turbo.org/)
* PyBind11 - [v2.10.4](https://github.com/pybind/pybind11)
* libsndfile - [1.0.31](https://github.com/libsndfile/libsndfile/releases/tag/1.0.31)
* rocAL Setup Script - `V2.0.0`
* Dependencies for all the above packages

Expand Down
69 changes: 69 additions & 0 deletions cmake/FindSndFile.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
################################################################################
#
# MIT License
#
# Copyright (c) 2024 Advanced Micro Devices, Inc.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
#
################################################################################
find_path(SNDFILE_INCLUDE_DIRS
NAMES sndfile.h
HINTS
$ENV{SNDFILE_PATH}/include
PATHS
/usr/local/include
/usr/include
)
mark_as_advanced(SNDFILE_INCLUDE_DIRS)

find_library(SNDFILE_LIBRARIES
NAMES sndfile libsndfile
HINTS
$ENV{SNDFILE_PATH}/lib
$ENV{SNDFILE_PATH}/lib64
PATHS
${CMAKE_SYSTEM_PREFIX_PATH}
${SNDFILE_PATH}
/usr/local/
PATH_SUFFIXES lib lib64
)
mark_as_advanced(SNDFILE_LIBRARIES)

if(SNDFILE_LIBRARIES AND SNDFILE_INCLUDE_DIRS)
set(SNDFILE_FOUND TRUE)
endif()

include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(SndFile
FOUND_VAR SNDFILE_FOUND
REQUIRED_VARS
SNDFILE_LIBRARIES
SNDFILE_INCLUDE_DIRS
)

set(SNDFILE_FOUND ${SNDFILE_FOUND} CACHE INTERNAL "")
set(SNDFILE_LIBRARIES ${SNDFILE_LIBRARIES} CACHE INTERNAL "")
set(SNDFILE_INCLUDE_DIRS ${SNDFILE_INCLUDE_DIRS} CACHE INTERNAL "")

if(SNDFILE_FOUND)
message("-- ${White}Using SndFile -- \n\tLibraries:${SNDFILE_LIBRARIES} \n\tIncludes:${SNDFILE_INCLUDE_DIRS}${ColourReset}")
else()
message( "-- ${Yellow}NOTE: FindSndFile failed to find -- SndFile${ColourReset}" )
endif()
10 changes: 10 additions & 0 deletions rocAL/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ find_package(LMDB QUIET)
find_package(RapidJSON QUIET)
find_package(StdFilesystem QUIET)
find_package(HALF QUIET)
find_package(SndFile QUIET)

# HIP Backend
if(GPU_SUPPORT AND "${BACKEND}" STREQUAL "HIP")
Expand Down Expand Up @@ -295,6 +296,15 @@ if(${BUILD_ROCAL})
else()
message(FATAL_ERROR "No filesystem library found.")
endif()
# SndFile
if(NOT SNDFILE_FOUND)
message("-- ${Yellow}NOTE: rocAL built without SndFile - Audio Functionalities will not be supported${ColourReset}")
else()
include_directories(${SNDFILE_INCLUDE_DIRS})
set(LINK_LIBRARY_LIST ${LINK_LIBRARY_LIST} ${SNDFILE_LIBRARIES})
message("-- ${White}rocAL built with Audio Functionality${ColourReset}")
target_compile_definitions(${PROJECT_NAME} PUBLIC -DROCAL_AUDIO)
endif()
# -Wall -- Enable most warning messages
# -mavx2 -- Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and AVX2 built-in functions and code generation
# -mfma -- Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and FMA built-in functions and code generation
Expand Down
40 changes: 40 additions & 0 deletions rocAL/include/api/rocal_api_data_loaders.h
Original file line number Diff line number Diff line change
Expand Up @@ -824,4 +824,44 @@ extern "C" RocalTensor ROCAL_API_CALL rocalJpegExternalFileSource(RocalContext p
RocalDecoderType rocal_decoder_type = RocalDecoderType::ROCAL_DECODER_TJPEG,
RocalExternalSourceMode external_source_mode = RocalExternalSourceMode::ROCAL_EXTSOURCE_FNAME);

/*! Creates Audio file reader and decoder. It allocates the resources and objects required to read and decode audio files stored on the file systems. It has internal sharding capability to load/decode in parallel if user wants.
* If the files are not in standard audio compression formats they will be ignored, Currently wav format is supported
* \param [in] context Rocal context
* \param [in] source_path A NULL terminated char string pointing to the location of files on the disk
* \param [in] shard_count Defines the parallelism level by internally sharding the input dataset and load/decode using multiple decoder/loader instances. Using shard counts bigger than 1 improves the load/decode performance if compute resources (CPU cores) are available.
* \param [in] is_output Boolean variable to enable the audio to be part of the output.
* \param [in] shuffle Boolean variable to shuffle the dataset.
* \param [in] loop Boolean variable to indefinitely loop through audio.
* \param [in] downmix Boolean variable to downmix all input channels to mono. If downmixing is turned on, the decoder output is 1D. If downmixing is turned off, it produces 2D output with interleaved channels incase of multichannel audio.
* \return Reference to the output audio
*/
extern "C" RocalTensor ROCAL_API_CALL rocalAudioFileSource(RocalContext context,
const char* source_path,
unsigned shard_count,
bool is_output,
bool shuffle = false,
bool loop = false,
bool downmix = false);

/*! Creates Audio file reader and decoder. It allocates the resources and objects required to read and decode audio files stored on the file systems. It has internal sharding capability to load/decode in parallel is user wants.
* If the files are not in standard audio compression formats they will be ignored.
* \param [in] context Rocal context
* \param [in] source_path A NULL terminated char string pointing to the location of files on the disk
* \param [in] shard_id Shard id for this loader
* \param [in] shard_count Defines the parallelism level by internally sharding the input dataset and load/decode using multiple decoder/loader instances. Using shard counts bigger than 1 improves the load/decode performance if compute resources (CPU cores) are available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the shard_count be greater than 1? This function is singleShard

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reader requires the num of shards in the pipeline to split dataset.

Copy link
Contributor

@rrawther rrawther Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the shard_count used without shard_id?
We used to use internal_shard_count before which is not used anymore. So shard_count is not needed for this API

* \param [in] is_output Boolean variable to enable the audio to be part of the output.
* \param [in] shuffle Boolean variable to shuffle the dataset.
* \param [in] loop Boolean variable to indefinitely loop through audio.
* \param [in] downmix Boolean variable to downmix all input channels to mono. If downmixing is turned on, the decoder output is 1D. If downmixing is turned off, it produces 2D output with interleaved channels incase of multichannel audio.
* \return Reference to the output audio
*/
extern "C" RocalTensor ROCAL_API_CALL rocalAudioFileSourceSingleShard(RocalContext p_context,
const char* source_path,
unsigned shard_id,
unsigned shard_count,
bool is_output,
bool shuffle = false,
bool loop = false,
bool downmix = false);

#endif // MIVISIONX_ROCAL_API_DATA_LOADERS_H
6 changes: 5 additions & 1 deletion rocAL/include/api/rocal_api_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,11 @@ enum RocalDecoderType {
ROCAL_DECODER_VIDEO_FFMPEG_SW = 3,
/*! \brief AMD ROCAL_DECODER_VIDEO_FFMPEG_HW
*/
ROCAL_DECODER_VIDEO_FFMPEG_HW = 4
ROCAL_DECODER_VIDEO_FFMPEG_HW = 4,
/*! \brief AMD ROCAL_DECODER_AUDIO_GENERIC
* Uses SndFile library to read audio files
*/
ROCAL_DECODER_AUDIO_GENERIC = 5
};

enum RocalOutputMemType {
Expand Down
51 changes: 51 additions & 0 deletions rocAL/include/decoders/audio/audio_decoder.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
/*
Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/

#pragma once

#include <cstddef>
#include <vector>

#ifdef ROCAL_AUDIO
#include "sndfile.h"
LakshmiKumar23 marked this conversation as resolved.
Show resolved Hide resolved

class AudioDecoder {
public:
enum class Status {
OK = 0,
HEADER_DECODE_FAILED,
CONTENT_DECODE_FAILED,
UNSUPPORTED,
FAILED,
NO_MEMORY
};
virtual AudioDecoder::Status Initialize(const char* src_filename) = 0;
virtual AudioDecoder::Status Decode(float* buffer) = 0;
virtual AudioDecoder::Status DecodeInfo(int* samples, int* channels, float* sample_rates) = 0;
virtual void Release() = 0;
virtual ~AudioDecoder() = default;

protected:
SF_INFO _sfinfo;
LakshmiKumar23 marked this conversation as resolved.
Show resolved Hide resolved
SNDFILE* _sf_ptr;
};
#endif
37 changes: 37 additions & 0 deletions rocAL/include/decoders/audio/audio_decoder_factory.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
/*
Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/

#pragma once

#include "decoders/audio/audio_decoder.h"
#include "decoders/audio/generic_audio_decoder.h"

#ifdef ROCAL_AUDIO
static std::shared_ptr<AudioDecoder> create_audio_decoder(DecoderConfig config) {
switch (config.type()) {
case DecoderType::AUDIO_SOFTWARE_DECODE:
return std::make_shared<GenericAudioDecoder>();
default:
THROW("Unsupported decoder type " + TOSTR(config.type()));
}
}
#endif
38 changes: 38 additions & 0 deletions rocAL/include/decoders/audio/generic_audio_decoder.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
/*
Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/

#pragma once

#include "decoders/audio/audio_decoder.h"

#ifdef ROCAL_AUDIO
class GenericAudioDecoder : public AudioDecoder {
public:
//! Default constructor
GenericAudioDecoder();
AudioDecoder::Status Initialize(const char* src_filename) override;
AudioDecoder::Status Decode(float* buffer) override;
AudioDecoder::Status DecodeInfo(int* samples, int* channels, float* sample_rates) override;
void Release() override;
~GenericAudioDecoder() override;
};
#endif
1 change: 1 addition & 0 deletions rocAL/include/decoders/image/decoder.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ enum class DecoderType {
OVX_FFMPEG = 5, //!< Uses FFMPEG to decode video streams, can decode up to 4 video streams simultaneously
FFMPEG_SOFTWARE_DECODE = 6,
FFMPEG_HARDWARE_DECODE = 7,
AUDIO_SOFTWARE_DECODE = 8 //!< Uses sndfile to decode audio files
};

class DecoderConfig {
Expand Down
86 changes: 86 additions & 0 deletions rocAL/include/loaders/audio/audio_loader.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
/*
Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/

#pragma once

#include <string>
#include <thread>
#include <vector>

#include "loaders/audio/audio_read_and_decode.h"
#include "loaders/circular_buffer.h"
#include "pipeline/commons.h"
#include "meta_data/meta_data_reader.h"

#ifdef ROCAL_AUDIO

// AudioLoader runs an internal thread for loading and decoding of audios asynchronously
// It uses a circular buffer to store decoded audios for the user
class AudioLoader : public LoaderModule {
public:
explicit AudioLoader(void* dev_resources);
~AudioLoader() override;
LoaderModuleStatus load_next() override;
LakshmiKumar23 marked this conversation as resolved.
Show resolved Hide resolved
void initialize(ReaderConfig reader_cfg, DecoderConfig decoder_cfg, RocalMemType mem_type, unsigned batch_size, bool keep_orig_size = false) override;
void set_output(Tensor* output_audio) override;
size_t remaining_count() override; // returns number of remaining items to be loaded
void reset() override; // Resets the loader to load from the beginning of the media
Timing timing() override;
void start_loading() override;
LoaderModuleStatus set_cpu_affinity(cpu_set_t cpu_mask);
LoaderModuleStatus set_cpu_sched_policy(struct sched_param sched_policy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using these functions in audio?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, They are being used here link

std::vector<std::string> get_id() override;
DecodedDataInfo get_decode_data_info() override;
void set_prefetch_queue_depth(size_t prefetch_queue_depth) override;
void set_gpu_device_id(int device_id);
void shut_down() override;
void feed_external_input(const std::vector<std::string>& input_images_names, const std::vector<unsigned char*>& input_buffer,
const std::vector<ROIxywh>& roi_xywh, unsigned int max_width, unsigned int max_height, unsigned int channels,
ExternalSourceFileMode mode, bool eos) override { THROW("external source feed is not supported in audio loader") }

private:
bool is_out_of_data();
void de_init();
void stop_internal_thread();
LoaderModuleStatus update_output_audio();
LoaderModuleStatus load_routine();
std::shared_ptr<AudioReadAndDecode> _audio_loader;
Tensor* _output_tensor;
std::vector<std::string> _output_names; // audio file name/ids that are stored in the _output_audio
MetaDataBatch* _meta_data = nullptr; // The output of the meta_data_graph
bool _internal_thread_running;
size_t _output_mem_size, _batch_size, _max_decoded_samples, _max_decoded_channels;
std::thread _load_thread;
RocalMemType _mem_type;
DecodedDataInfo _decoded_audio_info;
DecodedDataInfo _output_decoded_audio_info;
CircularBuffer _circ_buff;
TimingDbg _swap_handle_time;
bool _is_initialized;
bool _stopped = false;
bool _loop; // If true the reader will wrap around at the end of the media (files/audios/...) and wouldn't stop
size_t _prefetch_queue_depth = 0; // Used for circular buffer's internal buffer allocation
size_t _audio_counter = 0; // How many audios have been loaded already
size_t _remaining_audio_count; // How many audios are there yet to be loaded
int _device_id;
};
#endif
Loading
Loading