TRAPT package
November 15, 2024
Title TRAPT: A multi-stage fused deep learning framework for transcriptional regulators prediction via integrating large-scale epigenomic data
Maintainer Guorui Zhang <mp798378522@gmail.com>
Description TRAPT is a multi-omics integration framework designed for inferring transcriptional regulator activity from a set of query genes. TRAPT employs a multi-stage fusion strategy to address the issues of incomplete cis-regulatory profile coverage and TRBP problems. By leveraging two-stage self-knowledge distillation to extract the activity embedding of regulatory elements, TRAPT can predicts key regulatory factors for sets of query genes through a fusion strategy.
TRAPT.CalcSampleRPMatrix module
- TRAPT.CalcSampleRPMatrix.dhs2gene(args, sample)
Calculate the Epi regulatory potential score.
- Parameters:
args – argparse.Namespace Global parameters.
sample – str Epi sample name.
vec – np.array Epi PRE score.
- Returns:
Epi sample name, and Epi-RP score.
TRAPT.CalcTRAUC module
- class TRAPT.CalcTRAUC.CalcTRAUC(args, RP_Matrix_TR_Sample, w)
Bases:
objectCalculate the area under the curve (AUC) for each TR curve.
- args
Global parameters for TRAPT.
- Type:
- RP_Matrix_TR_Sample
The sum of TR-RP scores and D-RP scores.
- Type:
anndata.AnnData
- w
U-RP scores.
- Type:
np.array
Notes
The input is the RP matrix, and the calculation is performed as follows:
\[IRP = (TRRP + DRP) \times URP\]- static get_auc(params)
Parallel computing module.
Parameters: i : int
The i-th row of the I-RP matrix.
- jint
Default is 0.
- labelsnp.array
Gene vector.
- vecnp.array
I-RP vector.
- Returns:
i, j, and the AUC score of the i-th TR.
- iter_params(gene_vec, trunk)
Parallel parameter module.
Parameters: gene_vec : np.array
Gene vector.
- trunkint
Number of blocks.
- Returns:
An iterator.
- run()
TR auc calculation module execution entry point.
- Returns:
A pd.DataFrame of AUC scores for TRs.
TRAPT.CalcTRRPMatrix module
- TRAPT.CalcTRRPMatrix.dhs2gene(params)
Calculate the TR regulatory potential score.
- Parameters:
args – argparse.Namespace Global parameters.
sample – str TR sample name.
vec – np.array TR PRE score.
- Returns:
TR sample name, and TR-RP score.
- TRAPT.CalcTRRPMatrix.str2bool(v)
TRAPT.CalcTRSampleRPMatrix module
- class TRAPT.CalcTRSampleRPMatrix.CalcTRSampleRPMatrix(library='library', output='library', type='H3K27ac')
Bases:
objectThe D-RP model network aggregation module.
- library
Path to the background library.
- Type:
str
- output
Output path, default is ‘library’.
- Type:
str
- type
H3K27ac/ATAC.
- Type:
str
- run()
D-RP model network aggregation module execution entry point.
TRAPT.DLFS module
- class TRAPT.DLFS.CustomSigmoid(*args, **kwargs)
Bases:
Layer- call(x)
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- class TRAPT.DLFS.FeatureSelection(args, data_ad, type)
Bases:
object- TSFS(X, T)
Teacher-Student Feature Selection.
Parameters: X : np.array
Epi-RP matrix.
- Tstr
Input genes vector.
- Returns:
Index values sorted by Epi sample weights, and Epi sample weights.
- get_act(t=1)
U-RP teacher model activation function.
Parameters: t : float
Temperature value.
- get_corr(v1, v2)
Correlation calculation.
- get_loss()
U-RP teacher model loss function.
- run()
U-RP model execution entry point.
- Returns:
A pd.DataFrame of U-RP scores for query Genes, and selected sample information.
- sort_by_group(vec)
Grouping function.
- train(X, y)
U-RP model training entry function.
Parameters: X : np.array
Epi-RP matrix.
- ystr
Input genes vector.
- Returns:
U-RP model.
- class TRAPT.DLFS.SparseGroupLasso(l1=0.01, l2=0.01, groups=None)
Bases:
Regularizer- get_config()
Returns the config of the regularizer.
An regularizer config is a Python dictionary (serializable) containing all configuration parameters of the regularizer. The same regularizer can be reinstantiated later (without any saved state) from this configuration.
This method is optional if you are just training and executing models, exporting to and from SavedModels, or using weight checkpoints.
This method is required for Keras model_to_estimator, saving and loading models to HDF5 formats, Keras model cloning, some visualization utilities, and exporting models to and from JSON.
- Returns:
Python dictionary.
- TRAPT.DLFS.seed_tensorflow(seed=2023)
TRAPT.DLVGAE module
- class TRAPT.DLVGAE.CVAE(input_dim, condition_dim, h_dim, z_dim)
Bases:
Module- forward(x)
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- kl_div()
- predict_h(x)
- reparametrize(mu, logstd)
- Return type:
Tensor
-
training:
bool
- class TRAPT.DLVGAE.CalcSTM(RP_Matrix, type, checkpoint_path, device='cuda')
Bases:
objectD-RP model network reconstruction module.
Parameters: RP_Matrix : TRAPT.Tools.RP_Matrix
TR-RP matrix and Epi-RP matrix.
- typestr
Epi-RP type.
- checkpoint_pathstr
Model save path.
- devicestr, optional
cpu/cuda.
- static get_cos_similar_matrix(m1, m2)
Matrix cosine similarity calculation.
- get_edge_index(A, B, n=10)
Construct a heterogeneous network.
Parameters: A : anndata.AnnData
TR-RP matrix.
- Banndata.AnnData
Epi-RP matrix.
- nint
Number of nearest neighbors for TR.
- Returns:
TR-Epi heterogeneous network.
- init_cvae()
D-RP teacher model training function.
- init_vgae(h, use_kd)
D-RP student model training function.
Parameters: h : torch.Tensor
Potential representation of the D-RP teacher model.
- use_kdbool
Utilize knowledge distillation.
- recon_loss(z, data, norm, weight)
VGAE reconstruction loss.
- run(use_kd=True)
D-RP model network reconstruction module execution entry point.
- save_graph()
- static sparse_to_tensor(data, type='sparse')
- class TRAPT.DLVGAE.InnerProductDecoderWeight(A_e, *args, **kwargs)
Bases:
InnerProductDecoder- forward(z, edge_index=None, sigmoid=True)
Decodes the latent variables
zinto edge probabilities for the given node-pairsedge_index.- Parameters:
z (torch.Tensor) – The latent space \(\mathbf{Z}\).
sigmoid (bool, optional) – If set to
False, does not apply the logistic sigmoid function to the output. (default:True)
- Return type:
Tensor
-
training:
bool
- class TRAPT.DLVGAE.VariationalGCNEncoder(in_channels, h_dim, z_dim)
Bases:
Module- forward(x, edge_index)
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- predict_h(x, edge_index)
-
training:
bool
- TRAPT.DLVGAE.seed_torch(seed=2023)
TRAPT.Run module
- TRAPT.Run.main()
TRAPT method entry function.
- TRAPT.Run.runTRAPT(args)
TRAPT execution entry function.
- TRAPT.Run.args
Global parameters for TRAPT.
- Type:
- TRAPT.Run.Returns
A pd.DataFrame of TR activity.
- TRAPT.Run.str2bool(v)
TRAPT.Tools module
- class TRAPT.Tools.Args(input, output, library='library', threads=16, trunk_size=32768, background_genes=6000, use_kd=True, tr_type='all', source='all')
Bases:
objectTRAPT Global Parameters.
- input
Input path for the gene set.
- Type:
str
- output
Output path for TRAPT results.
- Type:
str
- library
Path to the background library, default is the ‘library’ path in the current directory.
- Type:
str, optional
- threads
Number of processes used for TRAPT inference.
- Type:
int, optional
- trunk_size
Size of the chunks.
- Type:
int, optional
- background_genes
Number of background genes selected.
- Type:
str, optional
- use_kd
Use knowledge distillation.
- Type:
str, optional
- tr_type
all/tf/tcof/cr.
- Type:
str, optional
- source
all/cistrome/chip_altas/gtrd/remap/chip-atlas/remap/encode/geo.
- Type:
str, optional
- class TRAPT.Tools.RPMatrix(library, name, to_array=True)
Bases:
object- add(data)
- binarization()
- get_data()
- Return type:
AnnData
- minmax_scale(axis=1)
- norm(type='l2', axis=1)
- standard_scale(axis=1)
- class TRAPT.Tools.RP_Matrix(library)
Bases:
object