Welcome to the IntLIM shiny app!
The goal of this app is to provide users with a user-friendly platform for integrating multi-omics data. Specifically, the software finds analyte relationships that are specific to a given phenotype (e.g. cancer vs non-cancer). For example, a given analyte pair could show a strong correlation in one phenotype (e.g. cancer) and no correlation in the other (e.g. non-cancer).
More details can be found in our publication “IntLIM: integration using linear models of metabolomics and gene expression data”.
Getting started (loading in data)
Please be sure that all files noted in the CSV file, including the CSV file, are in the same folder. Do not include path names in the filenames.
Users will need to input files for analyte levels for analyte type 1 (e.g. metabolite abundance data), analyte levels for analyte type 2 (e.g. gene expression data), sample meta-data, analyte type 1 meta-data (optional) and analyte type 2 meta-data (optional).
Users also need to input a CSV file named 'input.csv' with two required columns: 'type' and 'filenames'.
The CSV file is expected to have the following 2 columns and 6 rows:
- type,filenames
- analyteType1,myfilename
- analyteType2,myfilename
- analyteType1MetaData,myfilename (optional)
- analyteType2MetaData,myfilename (optional)
- sampleMetaData,myfilename"
NOTE: For the ShinyApp, the meta-file must be named 'input.csv'
Note also that the input data files should be in a specific format:
- analyteType1: rows are analytes, columns are samples; the first row is assumed to have sample ids and these ids should be unique; the first column is assumed to have feature ids and those should be unique.
- analyteType2: rows are analytes, columns are samples; the first row is assumed to have sample ids and these ids should be unique; the first column is assumed to have feature ids and those should be unique.
- analyteType1MetaData: rows are analytes, features are columns
- analyteType2MetaData: rows are analytes, features are columns
- sampleMetaData: rows are samples, features are columns
NOTE: The first column of the sampleMetaData file is assumed to be the sample id, and those sample ids should match the first row of analyteType1 and analyteType2 (e.g. it is required that all sample ids in the analyteType1 and analyteType2 are also in the sampleMetaDatafile).
Test data
The package includes a reduced set of the original NCI-60 dataset. The CSV input file location for this test dataset can be located by typing the following in the R console:
dir <- system.file("extdata", package="IntLIM", mustWork=TRUE)
csvfile <- file.path(dir, "NCItestinput.csv")
csvfile
Please see the vignette at [https://github.com/ncats/IntLIM/tree/liz_dev/vignettes/IntLimVignette.Rmd) for additional information.
In addition, additional NCI-60 and breast cancer demo datasets can be found at https://github.com/ncats/IntLIM2.0ExtraDataVignettes.
Contact
If you have any questions, comments, or concerns on how to use IntLIM please contact Ewy Mathe at ewy.mathe@nih.gov or Tara Eicher at tara.eicher@nih.gov.
Load Data
This step takes all the relevant CSV files as input, including the following (See About for more details):
- input.csv (required): contains the names of all files input (See About)
- analyteType1Data (required): rows are analytes of the first type, columns are samples; the first row is assumed to have sample ids and these ids should be unique; the first column is assumed to have feature ids and those should be unique.
- sampleMetaData (required): rows are samples, features are columns
- analyteType2Data (required): rows are analytes of the second type, columns are samples; the first row is assumed to have sample ids and these ids should be unique; the first column is assumed to have feature ids and those should be unique.
- analyteType1MetaData (optional): rows are analytes of the first type, features are columns
- analyteType2MetaData (optional): rows are analytes of the second type, features are columns
Filter Data (optional)
This step allows you to filter the data by a user-defined percentile cutoff.
Run IntLIM
This step performs the linear models for all combinations of analyte pairs and then plots distribution of p-values.
The linear model performed is 'a_i ~ a_j + p + a_j:p' where
- 'a_i' is the outcome analyte level (may be of types 1 or 2)
- a_j is the independent analyte level (may be of types 1 or 2
- p is the phenotype (e.g. tumor vs non-tumor)
- a_j:p is the interaction between phenotype and independent analyte level