Annotating KEGG compounds to pathway
To annotate a list of KEGG compounds to the KEGG pathways where they are involved I used the R package KEGGREST from Bioconductor.
library(KEGGREST)
So, having a list of KEGG compounds saved in a character vector like kegg_compounds
, we use the method keggGet
in batches of maximum 10 compounds to annotate them.
The following (rudimentary) code, queries the database in batches of ten compounds fiddling a list (pathways
) where it creates an entry per pathway and updates the field compounds
with the compounds from kegg_compounds
for each pathway.
pathways <- list()
sequence <- seq(1, length(kegg_compounds), by=10)
for(ii in sequence) {
jj <- ii + 9
if(jj > length(kegg_compounds)) jj <- length(kegg_compounds)
query <- keggGet(paste("cpd:", kegg_compounds[seq(ii, jj)]))
message("Query / ", ii, " - ", jj, " / ", length(query))
for(kk in seq(length(query))) {
for(id in names(query[[kk]]$PATHWAY)) {
if(id %in% names(pathways)) {
pathways[[id]]$compounds <- unique(c(pathways[[id]]$compounds , as.character(query[[kk]]$ENTRY)))
} else {
pathways[[id]] <- list(name = as.character(query[[kk]]$PATHWAY[id]),
id = id, compounds = c(as.character(query[[kk]]$ENTRY))
)
}
}
}
}