Software

miMediation: An R package for performing mediation test for microbiome data.

The package has implemented the phyloMed method. phyloMed enables us to test the mediation effect in high-dimensional microbial composition. The method leverages the hierarchical phylogeny relationship among different microbial taxa to decompose the complex mediation model on the full microbial composition into multiple simple independent local mediation models on subcompositions. The phyloMed function (a) performs the mediation test for the subcomposition at each internal node of the phylogenetic tree and pinpoint the mediating nodes with significant test p-values, and (b) combines all subcomposition p-values to assess the overall mediation effect of the entire microbial community.

References:Hong Q et al. 2023 .

MTGEI: An R package for multi-trait analysis of gene-by-environment interactions (GEI).

MTAGEI is a powerful, robust, and computationally efficient method to test the interaction between a gene and environmental groups on multiple traits in large-scale datasets, such as the UK Biobank. MTAGEI package has functions to (1) compute the summary statistics with different types of data input and adjust for the potential overlapping samples; and (2) perform summary-statistics-based multi-trait analysis of GEI tests or genetic main effect and GEI joint effect for both common and rare variants.

References:Luo L et al. 2023 .

PSCAN: An R package to perform protein-structure-guided scan (PSCAN) method for testing gene-level associations and identifying signal regions.

PSCAN is a new set of gene-level association tests and signal variant detection methods that leverage the tendency of functional variants to cluster in 3D protein space. PSCAN methods are built upon flexibly shaped spatial scan statistics, with scan windows adaptively defined to accommodate diverse topologies of variant positions in protein space. PSCAN performs fast gene-level association tests by combining SNP-set-based testing p-values across windows using Cauchy method. In addition, PSCAN implements an efficient search algorithm for the detection of multiple signal regions in protein space.

References:Tang ZZ et al. 2020 .

MTAR: An R package to perform multi-trait analysis of rare-variant associations.

MTAR is a framework for joint analysis of association summary statistics between multiple rare variants and different traits, possibly from overlapping samples. MTAR tests leverage the genetic correlations to accommodate a wide variety of association patterns across traits and variants and enrich their associations. Numerical study demonstrates that MTAR is substantially more powerful than single-trait-based tests and highlights the value of MTAR for novel pleiotropic gene discovery.

References:Luo L et al. 2020 .

miLineage: An R package to perform association tests for microbial lineages on a taxonomic tree.

miLineage package has functions that implement a variety of association tests for microbiome data. These functions allow users to (a) perform tests on multivariate taxon counts; (b) localize the covariate-associated lineages on the taxonomic tree; and (c) assess the overall association of the microbial community with the covariate of interest.

References: Tang ZZ et al. 2017 , Tang ZZ & Chen G 2018, Tang ZZ & Chen G 2019

miProfile: A C program for analyzing microbial community composition.

miProfile is a command-line program written in the C language to implement the new distance-based method described in Tang et al. 2016a for performing association test for microbial community composition with any covariates of interest (e.g., environmental factors, disease status, clinical outcomes, treatment groups). miProfile can construct all commonly-used distances, including unweighted UniFrac, weighted UniFrac, generalized UniFrac, presence-weighted UniFrac, Bray-Curtis distance, and Jaccard distance. Users can request any combination of these distances. miProfile produces the p-value for each of the requested distances and the unified p-value by combining all of the distances. The operating interface is simple and can directly input the .biom and .tre files. The OTU abundance table and the distance matrices can be generated as output and re-used in subsequent runs.

References:Tang ZZ et al. 2016 .

PreMeta: A C++ program to facilitate the exchange of information between four software packages for meta-analysis of rare-variant associations: MASS, RAREMETAL, MetaSKAT, and seqMeta.

PreMeta is a software program written in C++ that is designed to facilitate the exchange of information between four software packages for meta-analysis of rare-variant associations: MASS, RAREMETAL, MetaSKAT, and seqMeta. PreMeta has two related purposes: one is to allow the use of different software packages within the same consortium; and the second is to eliminate the need to recalculate summary statistics when investigators join a new consortium that has adopted a different software package.

References:Tang ZZ et al. 2017, Tang ZZ & Lin DY 2015

MASS: A C program for the meta-analysis of sequencing studies.

MASS is a command-line program written in C to perform fixed-effects (FE) and random-effects (RE) meta-analysis of sequencing studies by combining the score statistics from multiple studies. It implements three types of tests that encompass all commonly used association tests for rare variants, including simple burden test, CMC test (Li and Leal, 2008), weighted sum statistic (Madsen and Browning, 2009), variable-threshold (VT) test (Price et al., 2010; Lin and Tang, 2011), C-alpha test (Neale et al., 2011) and SKAT (Wu et al., 2011). The input file can be generated from the accompanying software SCORE-Seq. This bundle of programs allows meta-analysis of sequencing studies in a statistically accurate, numerically stable and computationally efficient manner.

References:Tang ZZ & Lin DY 2013, Tang ZZ & Lin DY 2014

SCORE-SeqTDS: A C program for analyzing primary and secondary traits in sequencing studies under trait- dependent sampling.

SCORE-SeqTDS is a software program developed for analyzing primary and secondary quantitative traits under trait-dependent sampling. The primary trait is the trait that is used to select subjects for sequencing, and all other traits are treated as secondary. Each quantitative trait is related to a genetic variable and possibly covariates through a linear regression model. Both the maximum likelihood estimation (MLE) and standard least-squares (LS) methods are available. The MLE method properly accounts for trait-dependent sampling whereas the LS method does not. The LS method is the ideal choice for random sampling and is approximately correct for analyzing secondary quantitative traits in case-control or case-only studies with rare diseases. SCORE-SeqTDS performs the LS analysis on secondary quantitative traits for random sampling, case-control and case-only sampling. For random sampling, all traits are treated as secondary (because the sampling does not depend on any particular trait.)

References: Lin DY, Zeng D, & Tang ZZ 2013.

SCORE-Seq: A C program which implements score statistics for detecting disease associations with rare variants in sequencing studies.

SCORE-Seq is a command-line program which implements score statistics for detecting disease associations with rare variants in sequencing studies. The mutation information is aggregated across multiple variant sites of a gene through a weighted linear combination and then related to disease phenotypes through appropriate regression models. The weights can be constant or dependent on allele frequencies and phenotypes. The association testing is based on score statistics. The allele-frequency threshold can be fixed or variable. Statistical significance can be assessed by using asymptotic normal approximation or resampling. The current release covers binary and continuous traits with arbitrary covariates under case-control and cross-sectional sampling.

References: Lin DY & Tang ZZ 2011.

genoCN: An R package for identifying copy number states and genotype calls using the SNP arrays data.

GenoCN is a software that simultaneously identify copy number states and genotype calls. Different strategies are implemented for the study of Copy Number Variations (CNVs) and Copy Number Aberrations (CNAs). While CNVs are naturally occurring and inheritable, CNAs are acquired somatic alterations most often observed in tumor tissues only. CNVs tend to be short and more sparsely located in the genome compared to CNAs. GenoCN consists of two components, genoCNV and genoCNA, designed for CNV and CNA studies, respectively. In contrast to most existing methods, genoCN is more flexible in that the model parameters are estimated from the data instead of being decided a priori. genoCNA also incorporates two important strategies for CNA studies. First, the effects of tissue contamination are explicitly modeled. Second, if SNP arrays are performed for both tumor and normal tissues of one individual, the genotype calls from normal tissue are used to study CNAs in tumor tissue.

References: Sun et al. 2009.