TRAPT package

November 15, 2024

Title TRAPT: A multi-stage fused deep learning framework for transcriptional regulators prediction via integrating large-scale epigenomic data

Maintainer Guorui Zhang <mp798378522@gmail.com>

Description TRAPT is a multi-omics integration framework designed for inferring transcriptional regulator activity from a set of query genes. TRAPT employs a multi-stage fusion strategy to address the issues of incomplete cis-regulatory profile coverage and TRBP problems. By leveraging two-stage self-knowledge distillation to extract the activity embedding of regulatory elements, TRAPT can predicts key regulatory factors for sets of query genes through a fusion strategy.

TRAPT.CalcSampleRPMatrix module

TRAPT.CalcSampleRPMatrix.dhs2gene(args, sample)

Calculate the Epi regulatory potential score.

Parameters:

args – argparse.Namespace Global parameters.
sample – str Epi sample name.
vec – np.array Epi PRE score.

Returns:

Epi sample name, and Epi-RP score.

TRAPT.CalcTRAUC module

class TRAPT.CalcTRAUC.CalcTRAUC(args, RP_Matrix_TR_Sample, w)

Bases: object

Calculate the area under the curve (AUC) for each TR curve.

args

Global parameters for TRAPT.

Type:: TRAPT.Tools.Args

RP_Matrix_TR_Sample

The sum of TR-RP scores and D-RP scores.

Type:: anndata.AnnData

w

U-RP scores.

Type:: np.array

Notes

The input is the RP matrix, and the calculation is performed as follows:

\[IRP = (TRRP + DRP) \times URP\]

static get_auc(params)

Parallel computing module.

Parameters: i : int

The i-th row of the I-RP matrix.

jint: Default is 0.
labelsnp.array: Gene vector.
vecnp.array: I-RP vector.

Returns:: i, j, and the AUC score of the i-th TR.

iter_params(gene_vec, trunk)

Parallel parameter module.

Parameters: gene_vec : np.array

Gene vector.

trunkint: Number of blocks.

Returns:: An iterator.

run()

TR auc calculation module execution entry point.

Returns:: A pd.DataFrame of AUC scores for TRs.

TRAPT.CalcTRRPMatrix module

TRAPT.CalcTRRPMatrix.dhs2gene(params)

Calculate the TR regulatory potential score.

Parameters:

args – argparse.Namespace Global parameters.
sample – str TR sample name.
vec – np.array TR PRE score.

Returns:

TR sample name, and TR-RP score.

TRAPT.CalcTRRPMatrix.str2bool(v)

TRAPT.CalcTRSampleRPMatrix module

class TRAPT.CalcTRSampleRPMatrix.CalcTRSampleRPMatrix(library='library', output='library', type='H3K27ac')

Bases: object

The D-RP model network aggregation module.

library

Path to the background library.

Type:: str

output

Output path, default is ‘library’.

Type:: str

type

H3K27ac/ATAC.

Type:: str

run(): D-RP model network aggregation module execution entry point.

TRAPT.DLFS module

class TRAPT.DLFS.CustomSigmoid(*args, **kwargs)

Bases: Layer

call(x)

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:

inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.
- NumPy array or Python scalar values in inputs get cast as tensors.
- Keras mask metadata is only collected from inputs.
- Layers are built (build(input_shape) method) using shape info from inputs only.
- input_spec compatibility is only checked against inputs.
- Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.
- mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class TRAPT.DLFS.FeatureSelection(args, data_ad, type)

Bases: object

TSFS(X, T)

Teacher-Student Feature Selection.

Parameters: X : np.array

Epi-RP matrix.

Tstr: Input genes vector.

Returns:: Index values sorted by Epi sample weights, and Epi sample weights.

get_act(t=1)

U-RP teacher model activation function.

Parameters: t : float

Temperature value.

get_corr(v1, v2): Correlation calculation.

get_loss(): U-RP teacher model loss function.

run()

U-RP model execution entry point.

Returns:: A pd.DataFrame of U-RP scores for query Genes, and selected sample information.

sort_by_group(vec): Grouping function.

train(X, y)

U-RP model training entry function.

Parameters: X : np.array

Epi-RP matrix.

ystr: Input genes vector.

Returns:: U-RP model.

class TRAPT.DLFS.SparseGroupLasso(l1=0.01, l2=0.01, groups=None)

Bases: Regularizer

get_config()

Returns the config of the regularizer.

An regularizer config is a Python dictionary (serializable) containing all configuration parameters of the regularizer. The same regularizer can be reinstantiated later (without any saved state) from this configuration.

This method is optional if you are just training and executing models, exporting to and from SavedModels, or using weight checkpoints.

This method is required for Keras model_to_estimator, saving and loading models to HDF5 formats, Keras model cloning, some visualization utilities, and exporting models to and from JSON.

Returns:: Python dictionary.

TRAPT.DLFS.seed_tensorflow(seed=2023)

TRAPT.DLVGAE module

class TRAPT.DLVGAE.CVAE(input_dim, condition_dim, h_dim, z_dim)

Bases: Module

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

kl_div()

predict_h(x)

reparametrize(mu, logstd)

Return type:: Tensor

training: bool

class TRAPT.DLVGAE.CalcSTM(RP_Matrix, type, checkpoint_path, device='cuda')

Bases: object

D-RP model network reconstruction module.

Parameters: RP_Matrix : TRAPT.Tools.RP_Matrix

TR-RP matrix and Epi-RP matrix.

typestr: Epi-RP type.
checkpoint_pathstr: Model save path.
devicestr, optional: cpu/cuda.

static get_cos_similar_matrix(m1, m2): Matrix cosine similarity calculation.

get_edge_index(A, B, n=10)

Construct a heterogeneous network.

Parameters: A : anndata.AnnData

TR-RP matrix.

Banndata.AnnData: Epi-RP matrix.
nint: Number of nearest neighbors for TR.

Returns:: TR-Epi heterogeneous network.

init_cvae(): D-RP teacher model training function.

init_vgae(h, use_kd)

D-RP student model training function.

Parameters: h : torch.Tensor

Potential representation of the D-RP teacher model.

use_kdbool: Utilize knowledge distillation.

recon_loss(z, data, norm, weight): VGAE reconstruction loss.

run(use_kd=True): D-RP model network reconstruction module execution entry point.

save_graph()

static sparse_to_tensor(data, type='sparse')

class TRAPT.DLVGAE.InnerProductDecoderWeight(A_e, *args, **kwargs)

Bases: InnerProductDecoder

forward(z, edge_index=None, sigmoid=True)

Decodes the latent variables z into edge probabilities for the given node-pairs edge_index.

Parameters:

z (torch.Tensor) – The latent space \(\mathbf{Z}\).
sigmoid (bool, optional) – If set to False, does not apply the logistic sigmoid function to the output. (default: True)

Return type:

Tensor

training: bool

class TRAPT.DLVGAE.VariationalGCNEncoder(in_channels, h_dim, z_dim)

Bases: Module

forward(x, edge_index)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

predict_h(x, edge_index)

training: bool

TRAPT.DLVGAE.seed_torch(seed=2023)

TRAPT.Run module

TRAPT.Run.main(): TRAPT method entry function.

TRAPT.Run.runTRAPT(args)

TRAPT execution entry function.

TRAPT.Run.args

Global parameters for TRAPT.

Type:: TRAPT.Tools.Args

TRAPT.Run.Returns: A pd.DataFrame of TR activity.

TRAPT.Run.str2bool(v)

TRAPT.Tools module

class TRAPT.Tools.Args(input, output, library='library', threads=16, trunk_size=32768, background_genes=6000, use_kd=True, tr_type='all', source='all')

Bases: object

TRAPT Global Parameters.

input

Input path for the gene set.

Type:: str

output

Output path for TRAPT results.

Type:: str

library

Path to the background library, default is the ‘library’ path in the current directory.

Type:: str, optional

threads

Number of processes used for TRAPT inference.

Type:: int, optional

trunk_size

Size of the chunks.

Type:: int, optional

background_genes

Number of background genes selected.

Type:: str, optional

use_kd

Use knowledge distillation.

Type:: str, optional

tr_type

all/tf/tcof/cr.

Type:: str, optional

source

all/cistrome/chip_altas/gtrd/remap/chip-atlas/remap/encode/geo.

Type:: str, optional

class TRAPT.Tools.RPMatrix(library, name, to_array=True)

Bases: object

add(data)

binarization()

get_data()

Return type:: AnnData

minmax_scale(axis=1)

norm(type='l2', axis=1)

standard_scale(axis=1)

class TRAPT.Tools.RP_Matrix(library): Bases: object

class TRAPT.Tools.Type

Bases: object

ATAC = 'ATAC'

H3K27ac = 'H3K27ac'

TRAPT package

TRAPT.CalcSampleRPMatrix module

TRAPT.CalcTRAUC module

TRAPT.CalcTRRPMatrix module

TRAPT.CalcTRSampleRPMatrix module

TRAPT.DLFS module

TRAPT.DLVGAE module

TRAPT.Run module

TRAPT.Tools module

Module contents