-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BRENDA content collaboration #2
Comments
Rik van Rosmalen has also written a BRENDA parser Currently it dumps all of Brenda either in a SQLLite DB or a JSON file One of the main issues right now is that BRENDA's download does not include a metabolite reference table or any cross-references. However, UniChem does cross-reference metabolites to BRENDA via InChi, and has all their data open. This could make integration possible. |
@Midnighter, we're finally starting to work on BRENDA. We're trying to determine if BRENDA contains a record of the reaction associated with each K_cat and K_m (which SABIO-RK clearly displays). Neither the website or the text file shows this information, but the BRENDA output seems to contain this information. I suspect that the SBML output contains inferred kinetic parameters, rather than directly measured kinetic constants. Do you know what information is encoded in the SBML output? Any code we write will be shared via this repo. We tried to use Rik's code. Unfortunately, it appears to be out of date with respect to the current format of the BRENDA text file. |
I don't fully understand what you want to achieve. Given a specific Kcat or Km value, you want to list all reactions (by EC-code) that have this value? This should be possible with a SQL query, however, there are many reactions in BRENDA that specify Kcat and Km as ranges rather than fixed values. The same EC-code can also have different Kcat and Km values in different organisms, of course. I still haven't finished my BRENDA work as it was not high priority to me. I do have a branch that uses pyparsing to go over the flat file and it's quite promising. I can try to deliver a working version by the end of May. |
SABIO-RK contains information about the exact reaction associated with each measured kinetic parameter. In addition, SABIO-RK often presents pairs of kinetic parameters that were measured together (e.g., paired k_cat, K_m). In contrast, the BRENDA website, text file, and SOAP interface present coarser information. This is why we have preferred to work with SABIO-RK, even though SABIO-RK is also difficult to scrape. The BRENDA website only displays the EC number associated with each kinetic measurement, and the website doesn't present pairs of parameters. It appears that BRENDA annotates reactions more coarsely than SABIO-RK. However, BRENDA's SBML output suggests that the underlying BRENDA database might have finer-grained reaction information than what is presented in the BRENDA website, text file, and SOAP interface. We haven't found any documentation about the SBML output. We're trying to understand what those files means, and if this is a way to pull more information out of BRENDA than what is provided in the text file. |
SABIO-RK contains information about the exact reaction associated with each measured kinetic parameter. In addition, SABIO-RK often presents pairs of kinetic parameters that were measured together (e.g., paired k_cat, K_m). In contrast, the BRENDA website, text file, and SOAP interface present coarser information. This is why we have preferred to work with SABIO-RK, even though SABIO-RK is also difficult to scrape. The BRENDA website only displays the EC number associated with each kinetic measurement, and the website doesn't present pairs of parameters. It appears that BRENDA annotates reactions more coarsely than SABIO-RK. However, BRENDA's SBML output suggests that the underlying BRENDA database might have finer-grained reaction information than what is presented in the BRENDA website, text file, and SOAP interface. We haven't found any documentation about the SBML output. We're trying to understand what those files mean, and if they are a way to pull more information out of BRENDA than what is provided in the text file. |
1 similar comment
SABIO-RK contains information about the exact reaction associated with each measured kinetic parameter. In addition, SABIO-RK often presents pairs of kinetic parameters that were measured together (e.g., paired k_cat, K_m). In contrast, the BRENDA website, text file, and SOAP interface present coarser information. This is why we have preferred to work with SABIO-RK, even though SABIO-RK is also difficult to scrape. The BRENDA website only displays the EC number associated with each kinetic measurement, and the website doesn't present pairs of parameters. It appears that BRENDA annotates reactions more coarsely than SABIO-RK. However, BRENDA's SBML output suggests that the underlying BRENDA database might have finer-grained reaction information than what is presented in the BRENDA website, text file, and SOAP interface. We haven't found any documentation about the SBML output. We're trying to understand what those files mean, and if they are a way to pull more information out of BRENDA than what is provided in the text file. |
I have not found a way to reliably scrape all SBML output files from BRENDA as this required paid access previously, I think. It would be preferable, though, of course, to the terrible test format. With regard to the information that you are looking for: BRENDA gives entries for the K_cat value divided by the K_m value, for example,
So one could look at the matching K_m value (by protein and citation), in this case
FYI, this is for EC-code 2.7.4.8 and this specific entry is for
So that would give you what you are looking for? |
Basically, we're trying to infer the link between the I don't think the BRENDA text files provide enough information to reconstruct this.
This is what motivated us to look at the other BRENDA outputs, to try to extract this mapping out of BRENDA. |
I'll contact BRENDA to ask them about the SBML output. I can share what I learn. |
It would be super nice to just get a database dump rather than having to jump through so many hoops. |
I'm looking to understand if the text file lacks relationships between A database dump would be nice. Any format with this relational information would be an improvement. |
I still think it's possible to tell these apart, however, if you look at the comment in each entry.
There is only one entry in each section that has the same protein reference I'm not sure what you gain from the So if you start with I've only looked at a few examples, though, so I'm easily proven wrong. Also, it'd be painful to parse the information in this way so something structured is definitely preferable 👍 |
It shouldn't be this hard. Inferring the reaction associated with each
|
Okay, that's a clear counter example. Let's see if you get a reply from BRENDA. I tried once some years back and never got an answer. I was probably not persistent enough. The way that the textual data is structured I would definitely manually check a number of example to see if the associations presented by BRENDA are correct... |
FYI, I think the SBML output would also be difficult to use. It times out easily. You'd have to figure out how to make the queries small enough not to time out. One possibility is to iterate of each EC and each organism.
|
Also the SBML output is missing some of the information from the HTML preview of the SBML
The SMBL does give insight into how to parse temperature and pH from the comments:
|
I'm looking into your suggestion about matching tuples of protein ids, comments, and references. This might work for pairing k_cats with K_ms, but I don't think this works for inferring the reaction associated with each k_cat/K_m. It doesn't look like these relationships have been encoded into the text file. While you can find pairs of entries with overlapping protein ids, substrates, comments, and references, it appears to be difficult to unambiguously resolve relationships. I think trying to infer relationships is likely to infer false relationships that are not present in the underlying database. At least for our purposes, we're hesitant to add additional interpretation on top of the BRENDA data. In spite of these problems, I think BRENDA is doing exactly what you've suggested to build the SBML output. However, I think this is difficult to replicate because we don't know the details how BRENDA is encoded into the text file. |
I got a response from the BRENDA team:
For Datanator, we're hesitant to infer false relationships. We want Datanator to be as free of interpretation as possible so that our downstream projects have as much control over the representational of experimental data as possible. |
Thanks for the input. Any word on accessing all SBML or other structured data set? |
The BRENDA team didn't respond to my question about the SMBL output. I suspect the reactions in the SBML output are inferred from common enzymes, comments, and references. I think the temperature and pH are also inferred by similar string pattern matching of the comments. There's no other more structured output available. In any case, this wouldn't have the missing relationships because they have never been recorded. If you're looking for a more structured dataset, I recommend SABIO-RK. |
Hi,
I'm currently working on upgrading my parser for the BRENDA flat file download. I've implemented a few SQLAlchemy models that seemed fitting for the content. Is there any interest on your side in the content of BRENDA?
The text was updated successfully, but these errors were encountered: