Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input file problem #124

Open
ShevchenkoAlla opened this issue Nov 18, 2024 · 7 comments
Open

Input file problem #124

ShevchenkoAlla opened this issue Nov 18, 2024 · 7 comments
Labels
question Further information is requested

Comments

@ShevchenkoAlla
Copy link

Hello,
I have the output file from analysis of 16s dataset, picrust2 plagin was used in qiime2, then I have converted the biom file to tsv format. Now I am trying to visualise results, I have some issues. I have attached the tsv table picture, I think something wrong with it.

First I tried -

"""""
library(readr)
library(ggpicrust2)
library(tibble)
library(tidyverse)
library(ggprism)
library(patchwork)

Load necessary data: abundance data and metadata

abundance_file <- "/home/output/path_exported/ko_feature_table.biom.tsv"
metadata <- read_delim(
"/home/sample-metadata.tsv",
delim = "\t",
escape_double = FALSE,
trim_ws = TRUE
)

Run ggpicrust2 with input file path

results_file_input <- ggpicrust2(file = abundance_file,
metadata = metadata,
group = "Sample type", # For example dataset, group = "Environment"
reference = "Controls",
pathway = "KO",
daa_method = "LinDA",
ko_to_kegg = TRUE,
order = "pathway_class",
p_values_bar = TRUE,
x_lab = "pathway_name")

metadata$Sample type <- as.factor(metadata$Sample type)
levels(metadata$Sample type)
"""
I got next mistakes-
"""
Starting the ggpicrust2 analysis...

Converting KO to KEGG...

Loading data from file...
Rows: 10556 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): # Constructed from biom file

ℹ Use spec() to retrieve the full column specification for this data.
ℹ Specify the column types or set show_col_types = FALSE to quiet this message.
Loading KEGG reference data. This might take a while...
Performing KO to KEGG conversion. Please be patient, this might take a while...
|======================================================================================| 100%
KO to KEGG conversion completed. Time elapsed: 0.01 seconds.
Removing KEGG pathways with zero abundance across all samples...
KEGG abundance calculation completed successfully.
Performing pathway differential abundance analysis...

Sample names extracted.
Identifying matching columns in metadata...
Matching columns identified: #SampleID . This is important for ensuring data consistency.
Using all columns in abundance.
Converting abundance to a matrix...
Reordering metadata...
Converting metadata to a matrix and data frame...
Extracting group information...
Running LinDA analysis...
Error in relevel.factor(LinDA_metadata_df$Group_group_nonsense_, ref = reference) :
'ref' must be an existing level
In addition: Warning message:
One or more parsing issues, call problems() on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)

metadata$Sample type <- as.factor(metadata$Sample type)
levels(metadata$Sample type)
[1] "categorical" "Controls" "Permafrost"

Run ggpicrust2 with input file path

results_file_input <- ggpicrust2(file = abundance_file,

  •                              metadata = metadata,
    
  •                              group = "Sample type", 
    
  •                              reference = "Controls",
    
  •                              pathway = "KO",
    
  •                              daa_method = "LinDA",
    
  •                              ko_to_kegg = TRUE,
    
  •                              order = "pathway_class",
    
  •                              p_values_bar = TRUE,
    
  •                              x_lab = "pathway_name")
    

Starting the ggpicrust2 analysis...

Converting KO to KEGG...

Loading data from file...
Rows: 10556 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): # Constructed from biom file

ℹ Use spec() to retrieve the full column specification for this data.
ℹ Specify the column types or set show_col_types = FALSE to quiet this message.
Loading KEGG reference data. This might take a while...
Performing KO to KEGG conversion. Please be patient, this might take a while...
|======================================================================================| 100%
KO to KEGG conversion completed. Time elapsed: 0.01 seconds.
Removing KEGG pathways with zero abundance across all samples...
KEGG abundance calculation completed successfully.
Performing pathway differential abundance analysis...

Sample names extracted.
Identifying matching columns in metadata...
Matching columns identified: #SampleID . This is important for ensuring data consistency.
Using all columns in abundance.
Converting abundance to a matrix...
Reordering metadata...
Converting metadata to a matrix and data frame...
Extracting group information...
Running LinDA analysis...
Error in relevel.factor(LinDA_metadata_df$Group_group_nonsense_, ref = reference) :
'ref' must be an existing level
In addition: Warning message:
One or more parsing issues, call problems() on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)

"""""""
Screenshot from 2024-11-17 02-12-16

"""""""
I am not sure what this is about.
When I tried to check problems - seems there are no such things, but there is no input file which is there - the path are written correctly

""""problems(metadata)

A tibble: 0 × 5

ℹ 5 variables: row , col , expected , actual , file

problems(results_file_input)
Error: object 'results_file_input' not found
problems(abundance_file)
""""""
So, I have tried to do a different way

"""""""""""
metadata <- read_delim("/home/sample-metadata.tsv", delim = "\t", escape_double = FALSE, trim_ws = TRUE)
Rows: 24 Columns: 6
── Column specification ────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (4): #SampleID, Composition, Freezing time, Sample type
dbl (2): count_reads, Layers

ℹ Use spec() to retrieve the full column specification for this data.
ℹ Specify the column types or set show_col_types = FALSE to quiet this message.

"""""""'''
then
"""""""""

kegg_abundance <- ko2kegg_abundance("/home/output/path_exported/ko_feature_table.biom.tsv")
Loading data from file...
Rows: 10556 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): # Constructed from biom file

ℹ Use spec() to retrieve the full column specification for this data.
ℹ Specify the column types or set show_col_types = FALSE to quiet this message.
Loading KEGG reference data. This might take a while...
Performing KO to KEGG conversion. Please be patient, this might take a while...
|======================================================================================| 100%
KO to KEGG conversion completed. Time elapsed: 0.01 seconds.
Removing KEGG pathways with zero abundance across all samples...
KEGG abundance calculation completed successfully.
Warning message:
One or more parsing issues, call problems() on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)

"""""""""

I have tried next -

""""""
ko_abundance_file <- "/home/output/path_exported/ko_feature_table.biom.tsv"

kegg_abundance <- ko2kegg_abundance(ko_abundance_file)

Loading data from file...
Rows: 10556 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): # Constructed from biom file

ℹ Use spec() to retrieve the full column specification for this data.
ℹ Specify the column types or set show_col_types = FALSE to quiet this message.
Loading KEGG reference data. This might take a while...
Performing KO to KEGG conversion. Please be patient, this might take a while...
|======================================================================================| 100%
KO to KEGG conversion completed. Time elapsed: 0.01 seconds.
Removing KEGG pathways with zero abundance across all samples...
KEGG abundance calculation completed successfully.
Warning message:
One or more parsing issues, call problems() on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)
""""""""""""'""
here i have an emply "kegg_abundance" variable in the Rstudio

I think something wrong with the input tsv table, but I cant understand what and how to fix it.

I would appreciate any help
Thank you for your time
Best,
Alla

@cafferychen777
Copy link
Owner

Hi Alla,

After reviewing your error messages, I noticed the issue likely stems from your TSV file format. Looking at the error message, it seems the first line contains "# Constructed from biom file" which is causing the parsing issues.

Could you try:

  1. Opening the ko_feature_table.biom.tsv file
  2. Removing the first line that starts with "#"
  3. Save the file and run your analysis again

This should resolve the parsing error and allow ggpicrust2 to properly read your feature table.

Let me know if you need any further assistance.

Best regards,
Chen

@ShevchenkoAlla
Copy link
Author

ShevchenkoAlla commented Nov 18, 2024

Hi,
Thank you so much.
It worked)

But I can't do the analysis, I got another errors after that(

""""'''

Run ggpicrust2 with input file path

results_file_input <- ggpicrust2(file = abundance_file,

  •                              metadata = metadata,
    
  •                              group = "Sample type", # For example dataset, group = "Environment"
    
  •                              reference = "Controls",
    
  •                              pathway = "KO",
    
  •                              daa_method = "LinDA",
    
  •                              ko_to_kegg = TRUE,
    
  •                              order = "pathway_class",
    
  •                              p_values_bar = TRUE,
    
  •                              x_lab = "pathway_name")
    

Starting the ggpicrust2 analysis...

Converting KO to KEGG...

Loading data from file...
Rows: 10543 Columns: 25
── Column specification ────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): #OTU ID
dbl (24): 1A, 1B, 1C, 2A, 2B, 2C, 3AA, 3AB, 3AC, 3BA, 3BB, 3BC, 4A, 4B, 4C, 5A, 5B, 5C, 5D, 5E, ...

ℹ Use spec() to retrieve the full column specification for this data.
ℹ Specify the column types or set show_col_types = FALSE to quiet this message.
Loading KEGG reference data. This might take a while...
Performing KO to KEGG conversion. Please be patient, this might take a while...
|==========================================================================================| 100%
KO to KEGG conversion completed. Time elapsed: 31.63 seconds.
Removing KEGG pathways with zero abundance across all samples...
KEGG abundance calculation completed successfully.
Performing pathway differential abundance analysis...

Sample names extracted.
Identifying matching columns in metadata...
Matching columns identified: #SampleID . This is important for ensuring data consistency.
Using all columns in abundance.
Converting abundance to a matrix...
Reordering metadata...
Converting metadata to a matrix and data frame...
Extracting group information...
Running LinDA analysis...
Performing LinDA analysis...
0 features are filtered!
The filtered data has 24 samples and 299 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
Processing LinDA results...
LinDA analysis is complete.
Success: Found 222 statistically significant biomarker(s) in the dataset.
Annotating pathways...

Starting pathway annotation...
DAA results data frame is not null. Proceeding...
KO to KEGG is set to TRUE. Proceeding with KEGG pathway annotations...
We are connecting to the KEGG database to get the latest results, please wait patiently.

The number of statistically significant pathways exceeds the database's query limit. Please consider breaking down the analysis into smaller queries or selecting a subset of pathways for further investigation.

Returning DAA results filtered annotation data frame...
Creating pathway error bar plots...

The following pathways are missing annotations and have been excluded: ko05340, ko00564, ko00562, ko00563, ko03030, ko00561, ko00440, ko00250, ko04062, ko00740, ko00195, ko04650, ko03450, ko00920, ko00311, ko00310, ko04146, ko00600, ko04140, ko04142, ko00604, ko04260, ko05142, ko04540, ko04710, ko04712, ko00909, ko00513, ko05110, ko04974, ko04976, ko00450, ko01051, ko00565, ko00904, ko00524, ko00300, ko00905, ko00402, ko03440, ko00750, ko00950, ko05140, ko00592, ko00591, ko00590, ko00062, ko04662, ko03070, ko00253, ko03060, ko04370, ko04730, ko04740, ko00380, ko00500, ko05120, ko04666, ko04966, ko05322, ko04964, ko05320, ko04962, ko04960, ko04660, ko00625, ko00624, ko00627, ko00626, ko00623, ko00622, ko00270, ko04380, ko00941, ko00943, ko00100, ko00945, ko01057, ko01056, ko05016, ko01058, ko04145, ko00071, ko00072, ko04360, ko05219, ko05218, ko05216, ko05215, ko05213, ko05211, ko01055, ko00902, ko05330, ko00534, ko04910, ko00531, ko04916, ko00533, ko00532, ko00360, ko00633, ko00363, ko00364, ko05130, ko00121, ko04914, ko00130, ko03050, ko00361, ko00040, ko00730, ko00362, ko01040, ko00603, ko03018, ko04270, ko00281, ko00280, ko03013, ko04626, ko05200, ko00601, ko03015, ko00312, ko05143, ko00523, ko00520, ko00521, ko05146, ko00052, ko00051, ko00400, ko04020, ko00350, ko00480, ko00643, ko00640, ko00720, ko00120, ko00965, ko04614, ko04340, ko00980, ko00410, ko00983, ko05150, ko00791, ko05131, ko04711, ko00020, ko00710, ko00196, ko02060, ko00340, ko00785, ko00550, ko00650, ko03320, ko04744, ko04745, ko00522, ko04612, ko04621, ko04620, ko04623, ko04622, ko04971, ko00460, ko04970, ko00830, ko00780, ko00511, ko00970, ko00030, ko00232, ko00230, ko04120, ko04350, ko00540, ko03022, ko03020, ko00982, ko04630, ko03010, ko05100, ko00331, ko05310, ko00908, ko04930, ko04320, ko03430, ko00906, ko00901, ko04520, ko00903, ko00471, ko00472, ko00473, ko04510, ko00942, ko04810, ko04210, ko00240, ko04012, ko04011, ko00944, ko04113, ko04640, ko04310, ko03420, ko04912, ko00670, ko04672, ko04920, ko05160, ko04144, ko00930, ko04112, ko04720, ko04722, ko04075
You can use the 'pathway_annotation' function to add annotations for these pathways.
The 'method' column in the 'daa_results_df' data frame contains more than one method. Please filter it to contain only one method.
The 'group1' or 'group2' column in the 'daa_results_df' data frame contains more than one group. Please filter each to contain only one group.
Error in pathway_errorbar(abundance = abundance, daa_results_df = daa_sub_method_results_df, :
Visualization with 'pathway_errorbar' cannot be performed because there are no features with statistical significance. For possible solutions, please check the FAQ section of the tutorial.

"""""""""""
Somthing confising
First there are significant difference, than not..
I dont have - 'daa_results_df' data frame
Not sure what is it
Can you explain or suggest something?

I checked metadata file

it looks ok to me

metadata$Sample type <- as.factor(metadata$Sample type)

levels(metadata$Sample type)
[1] "Controls" "Permafrost"

Thank you,
Best,
Alla

@cafferychen777
Copy link
Owner

Hi Alla,

I see you've resolved the first issue, but now encountering errors with pathway annotations and visualizations. From the error messages, it seems the all-in-one ggpicrust2() function is having trouble handling your dataset.

I suggest using our step-by-step pipeline instead, which gives you more control over each stage of the analysis. You can follow these steps:

  1. First, convert KO to KEGG:
kegg_abundance <- ko2kegg_abundance(file = abundance_file)
  1. Then perform differential abundance analysis:
daa_results_df <- pathway_daa(abundance = kegg_abundance, 
                             metadata = metadata, 
                             group = "Sample type", 
                             daa_method = "LinDA",
                             select.taxa = NULL,
                             reference = "Controls")
  1. Add pathway annotations:
pathway_annotation_df <- pathway_annotation(pathway = "KO",
                                          daa_results_df = daa_results_df, 
                                          ko_to_kegg = TRUE)
  1. Finally, create the visualization:
pathway_errorbar_plot <- pathway_errorbar(abundance = kegg_abundance,
                                        daa_results_df = daa_results_df,
                                        pathway_annotation_df = pathway_annotation_df,
                                        group = "Sample type",
                                        p_values_threshold = 0.05,
                                        order = "pathway_class",
                                        select_pathway = NULL,
                                        p_value_bar = TRUE,
                                        x_lab = "pathway_name")

This approach will help you identify where exactly the analysis might be failing and give you more flexibility in adjusting parameters at each step.

Let me know if you need any clarification or run into other issues.

Best regards,
Chen

@cafferychen777 cafferychen777 added the question Further information is requested label Nov 18, 2024
@ShevchenkoAlla
Copy link
Author

Hello Chen,
Thank you so much for your response!

I have tried to follow your instructions and have encountered the following problems

"""""""

daa_results_df <- pathway_daa(abundance = kegg_abundance,

  •                           metadata = metadata, 
    
  •                           group = "Sample type", 
    
  •                           daa_method = "LinDA",
    
  •                           select.taxa = NULL,
    
  •                           reference = "Controls")
    

Error in pathway_daa(abundance = kegg_abundance, metadata = metadata, :
unused argument (select.taxa = NULL)
""""""""""'
So I just removed taxa, and left "select = Null"

""""""""'
daa_results_df <- pathway_daa(abundance = kegg_abundance,

  •                           metadata = metadata, 
    
  •                           group = "Sample type", 
    
  •                           daa_method = "LinDA",
    
  •                           select = NULL,
    
  •                           reference = "Controls")
    

Sample names extracted.
Identifying matching columns in metadata...
Matching columns identified: #SampleID . This is important for ensuring data consistency.
Using all columns in abundance.
Converting abundance to a matrix...
Reordering metadata...
Converting metadata to a matrix and data frame...
Extracting group information...
Running LinDA analysis...
Performing LinDA analysis...
Registered S3 method overwritten by 'rmutil':
method from
print.response httr
0 features are filtered!
The filtered data has 24 samples and 299 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
Processing LinDA results...
LinDA analysis is complete.

pathway_annotation_df <- pathway_annotation(pathway = "KO",

  •                                         daa_results_df = daa_results_df, 
    
  •                                         ko_to_kegg = TRUE)
    

Starting pathway annotation...
DAA results data frame is not null. Proceeding...
KO to KEGG is set to TRUE. Proceeding with KEGG pathway annotations...
We are connecting to the KEGG database to get the latest results, please wait patiently.

The number of statistically significant pathways exceeds the database's query limit. Please consider breaking down the analysis into smaller queries or selecting a subset of pathways for further investigation.

Returning DAA results filtered annotation data frame...
""'''''''''
Next I have tried

""""
pathway_annotation_df <- pathway_annotation(pathway = "KO",

  •                                         daa_results_df = daa_results_df, 
    
  •                                         ko_to_kegg = TRUE)
    

Starting pathway annotation...
DAA results data frame is not null. Proceeding...
KO to KEGG is set to TRUE. Proceeding with KEGG pathway annotations...
We are connecting to the KEGG database to get the latest results, please wait patiently.

The number of statistically significant pathways exceeds the database's query limit. Please consider breaking down the analysis into smaller queries or selecting a subset of pathways for further investigation.

Returning DAA results filtered annotation data frame...
""""""""""'

If I understand correctly, I have too many pathways in df, I need to reduce them.
Is there an easy way to reduce it to e.g. top 100 pathways? Or something like that?

Also in pathway_annotation_df I have N/A in the pathway_name, description etc columns, so I can't even get those names on a heatmap for example.

Maybe there is a way to just get the pathway names and groups and visualise it against the sample names, without even statistics, just take top of some ammount?

Thank you for your time
I appreciate your help
Best,
Alla

@cafferychen777
Copy link
Owner

Hi Alla,

Thank you for providing the detailed information about your issues. Let me help address them one by one:

  1. Regarding the "too many pathways" issue:
    You can filter the significant pathways before annotation by selecting the top ones based on p-values. Here's how you can modify your code:
# After running pathway_daa, filter for top 100 pathways based on p-values
daa_results_df <- daa_results_df[order(daa_results_df$p_values), ]  # Sort by p-values
daa_results_df <- daa_results_df[1:100, ]  # Keep top 100 pathways

# Then proceed with pathway annotation
pathway_annotation_df <- pathway_annotation(
  pathway = "KO",
  daa_results_df = daa_results_df,
  ko_to_kegg = TRUE
)
  1. For the NA values in pathway annotations:
    First, let's understand the extent of NA values in your results:
# Check how many pathways have NA values
na_count <- sum(is.na(pathway_annotation_df$pathway_name))
total_count <- nrow(pathway_annotation_df)

# Print the results
print(paste0("Pathways with NA values: ", na_count, " out of ", total_count))

# Look at some examples of pathways with and without annotations
# Show first few rows of pathways with NA values
print("Examples of pathways with NA values:")
head(pathway_annotation_df[is.na(pathway_annotation_df$pathway_name), ])

# Show first few rows of pathways with annotations
print("Examples of pathways with annotations:")
head(pathway_annotation_df[!is.na(pathway_annotation_df$pathway_name), ])

After checking this, if you want to proceed with only the annotated pathways:

# Remove rows with NA values in pathway_name
pathway_annotation_df <- pathway_annotation_df[!is.na(pathway_annotation_df$pathway_name), ]
  1. If you want to visualize without statistics, you can use the raw abundance data:
# Get top pathways by mean abundance
mean_abundances <- rowMeans(kegg_abundance)
top_pathways <- names(sort(mean_abundances, decreasing = TRUE)[1:50])  # Top 50 pathways

# Subset the abundance data
filtered_abundance <- kegg_abundance[top_pathways, ]

# Create a basic heatmap
library(pheatmap)
pheatmap(filtered_abundance,
         scale = "row",
         clustering_distance_rows = "correlation",
         clustering_distance_cols = "correlation",
         show_rownames = TRUE,
         show_colnames = TRUE)
  1. For the "select.taxa" error:
    This was a parameter naming issue. The correct parameter is just "select" as you figured out.

Please let me know:

  1. What the NA value check reveals about your data
  2. If you need any clarification on the suggested solutions
  3. If you run into any other issues while implementing these changes

Best regards,
Chen

@cafferychen777
Copy link
Owner

Hi Alla,

Let me add some additional tips about pathway_annotation, especially for handling those unannotated pathways:

  1. For pathways that failed to get annotations, you can try to annotate them separately:
# First, identify unannotated pathways
unannotated_pathways <- daa_results_df[daa_results_df$feature %in% 
  pathway_annotation_df[is.na(pathway_annotation_df$pathway_name), "feature"], ]

# Try to annotate these pathways again in smaller batches
batch_size <- 20
for(i in seq(1, nrow(unannotated_pathways), batch_size)) {
  end_idx <- min(i + batch_size - 1, nrow(unannotated_pathways))
  batch_pathways <- unannotated_pathways[i:end_idx, ]
  
  # Add some delay between batches to avoid overwhelming the KEGG API
  if(i > 1) Sys.sleep(2)
  
  batch_annotation <- pathway_annotation(
    pathway = "KO",
    daa_results_df = batch_pathways,
    ko_to_kegg = TRUE
  )
  
  # Merge successful annotations back
  if(!is.null(batch_annotation) && nrow(batch_annotation) > 0) {
    pathway_annotation_df <- rbind(
      pathway_annotation_df[!is.na(pathway_annotation_df$pathway_name), ],
      batch_annotation[!is.na(batch_annotation$pathway_name), ]
    )
  }
}
  1. You can also try different approaches to filter pathways before annotation:
# Filter by adjusted p-value
significant_pathways <- daa_results_df[daa_results_df$p_adjust < 0.05, ]

# Or filter by absolute log fold change
if("log2FoldChange" %in% colnames(daa_results_df)) {
  high_impact_pathways <- daa_results_df[abs(daa_results_df$log2FoldChange) > 1, ]
}

# Then annotate these filtered pathways
filtered_annotations <- pathway_annotation(
  pathway = "KO",
  daa_results_df = significant_pathways, # or high_impact_pathways
  ko_to_kegg = TRUE
)
  1. Save intermediate results to avoid losing progress:
# Save annotations after each successful batch
saveRDS(pathway_annotation_df, "pathway_annotations_progress.rds")

# Load saved progress if needed
pathway_annotation_df <- readRDS("pathway_annotations_progress.rds")
  1. Check the quality of annotations:
# Summary of annotation completeness
annotation_summary <- data.frame(
  total_pathways = nrow(pathway_annotation_df),
  annotated = sum(!is.na(pathway_annotation_df$pathway_name)),
  unannotated = sum(is.na(pathway_annotation_df$pathway_name))
)

# Check which columns have the most missing values
na_by_column <- sapply(pathway_annotation_df, function(x) sum(is.na(x)))
print(na_by_column)
  1. If you're working with a specific subset of pathways, you can create a custom list:
# Example: Focus on metabolism-related pathways
metabolism_pathways <- pathway_annotation_df[
  grepl("metabolism|biosynthesis|degradation", 
        pathway_annotation_df$pathway_name, 
        ignore.case = TRUE), ]

Let me know if you need any clarification on these approaches or if you'd like to try a different strategy!

Best regards

@ShevchenkoAlla
Copy link
Author

Thank you so much, for the detailed explanations!

I have tried to get a heatmap, as you have suggested.

""""

Get top pathways by mean abundance

mean_abundances <- rowMeans(kegg_abundance)
top_pathways <- names(sort(mean_abundances, decreasing = TRUE)[1:50]) # Top 50 pathways

Subset the abundance data

filtered_abundance <- kegg_abundance[top_pathways, ]

Create a basic heatmap

library(pheatmap)
pheatmap(filtered_abundance,
scale = "row",
clustering_distance_rows = "correlation",
clustering_distance_cols = "correlation",
show_rownames = TRUE,
show_colnames = TRUE)
"""""""

It worked perfectly, but I have "k00067"s, which is confusing :) can I create a heatmap with the "pathways names", and write the order of my samples the way I want to compare them? I used to do something like this for a phyloseq obect

""
sample_data(ASV_table)$SampleID <-factor(sample_data(ASV_table)$SampleID, levels = c("1", "2", "3", "4"))
""
Will it worked here?

I have tried to use another object - pathway_annotation_df, wich containes a comarison of two layers, and have only 100 features, but it didn't worked.

"""

pheatmap(pathway_annotation_df,

  •      scale = "row",
    
  •      clustering_distance_rows = "correlation",
    
  •      clustering_distance_cols = "correlation",
    
  •      show_rownames = TRUE,
    
  •      show_colnames = TRUE)
    

Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
is.atomic(x) is not TRUE
In addition: There were 50 or more warnings (use warnings() to see the first 50)
"""""
when I checked warnings i got
"""

warnings()
Warning messages:
1: In mean.default(newX[, i], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(newX[, i], ...) :
argument is not numeric or logical: returning NA
""""
Maybe because I have "NA" in the half of the pathway description column, not sure here.

About statistics, I have a question can I compare multiple groups? I want to see differences among the Layers, we have permafrost sample from 5 diffrent core layers. Column "Layer" is in metadata file.

I also tried next

"""
pathway_errorbar_plot <- pathway_errorbar(abundance = kegg_abundance,
daa_results_df = pathway_annotation_df,
Group = "Layers",
p_values_threshold = 0.05,
order = "pathway_class",
select = NULL,
p_value_bar = TRUE,
x_lab = "pathway_name")

The number of features with statistical significance exceeds 30, leading to suboptimal visualization. Please use 'select' to reduce the number of features.
Currently, you have these features: "ko05340", "ko00564", "ko00680", "ko00562", "ko00563", "ko00561", "ko00440", "ko04062", "ko04060", "ko00740", "ko04111", "ko04940", "ko05412", "ko04650", "ko00310", "ko00600", "ko04140", "ko04141", "ko04142", "ko00604", "ko04260", "ko03040", "ko04110", "ko05142", "ko04710", "ko00513", "ko04973", "ko00510", "ko05110", "ko04974", "ko04976", "ko01051", "ko00565", "ko00904", "ko00524", "ko00514", "ko00260", "ko00190", "ko05140", "ko00592", "ko00591", "ko00590", "ko00062", "ko00061", "ko00253", "ko04370", "ko04730", "ko04740", "ko00380", "ko00500", "ko05120", "ko04666", "ko04964", "ko04962", "ko04960", "ko04660", "ko00624", "ko00627", "ko00621", "ko00620", "ko00623", "ko00622", "ko00940", "ko00941", "ko01053", "ko00943", "ko00100", "ko00945".
You can find the statistically significant features with the following command:
daa_results_df %>% filter(p_adjust < 0.05) %>% select(c("feature","p_adjust"))
Error in pathway_errorbar(abundance = kegg_abundance, daa_results_df = pathway_annotation_df, :
"""""""'

The dataframe pathway_annotation_df, same object that wich containes a comarison of two layers, and have only 100 features. Picture attached.
Screenshot from 2024-11-21 17-26-10

Thank you so much for your help,
Best,
Alla

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants