Diffusion Maps

This protocol is extracted from research article:

Tetranucleosome Interactions Drive Chromatin Folding

**
ACS Cent Sci**,
May 7, 2021;
DOI:
10.1021/acscentsci.1c00085

Tetranucleosome Interactions Drive Chromatin Folding

Procedure

Diffusion maps are a nonlinear manifold
learning technique that have found extensive applications in generating
low-dimensional embeddings of high-dimensional molecular trajectories.^{41−43} Assuming that the distance metric used to compare pairs of configurational
microstates is a good proxy for short-time kinetic distance and that
the conformational dynamics over the state space may be approximated
as a diffusion process, the leading collective variables of the diffusion
map correspond to the large-scale, high-variance collective motions
of the system, and kinetically close configurational microstates are
embedded close together.^{24} We employ the
density-adaptive variant of diffusion maps, which we find to be particularly
useful for handling the large inhomogeneities in sampling densities
observed in our chromatin simulations.^{44} We provide a brief summary of the approach below, but direct the
reader to prior publications for mathematical and algorithmic details.^{24,41−43}

Pairwise distances, *d*_{ij}, are calculated between data points in
our set, *x*_{i} and *x*_{j}, which correspond to the
RMSD between translationally and rotationally aligned nucleosomal
coordinates in frames *i* and *j* of
the simulation. A Gaussian kernel is applied to *d*_{ij} to construct a threshold pairwise
distance matrix **A**,

where ϵ
is the kernel bandwidth and
defines the local neighborhood of each point and α is a parameter
that globally rescales pairwise distances to smooth out large density
fluctuations between densely and sparsely sampled regions of configurational
state space.^{44} Matrix **A** is
then row-normalized to form the transition matrix,

where * D* is a
diagonal matrix with elements,

The transition matrix, **M**, is
then diagonalized to calculate its eigenvectors ψ_{i} and eigenvalues λ_{i}. By the Markov property, the top eigenvalue–eigenvector pair
(ψ_{0} = $\stackrel{\u20d7}{1}$, λ_{0} = 1) is trivial,
corresponding to the steady-state distribution of a random walk. A
gap in the eigenvalue spectrum after the *k*th nontrivial
eigenvalue identifies the *k*-leading eigenvectors
corresponding to the leading high-variance nonlinear collective modes
of the system. Snapshot *i* of the molecular simulation
trajectory is embedded into these collective variables spanning the
so-called intrinsic manifold of the system under the mapping,

The ψ_{k} are
the leading nonlinear collective variables identified by the diffusion
map that correspond to the high-variance dynamical modes of the system
and are responsible for large-scale conformational rearrangements.

Free energy surfaces over the intrinsic manifold *G*(Ψ) are computed by collecting histogram approximations *$\widehat{P}$* to the observed distribution of configurational
microstates projected into the leading *k*-eigenvectors
Ψ = {ψ_{i}}_{i=1}^{k} and then inverting this distribution using the relation

where
β = 1/(*k*_{B}*T*) is
the inverse temperature and *C* is an arbitrary additive
constant that sets an absolute
free energy scale.^{45} By virtue of the interpretability
of the eigenvectors as the leading collective modes of the system,
the free energy surface constructed over the intrinsic manifold can
resolve both the metastable macrostates of the chromatin structure
and the interconversion pathways between them.^{24} Diffusion maps have already been used successfully to examine
the dynamics of DNA around histone proteins, thereby providing precedent
for our approach,^{43} but we note that we
could have employed tICA, VAMPnets, or SRVs in conjunction with Markov-state
models to identify kinetic microstates and macrostates.^{46−50} These approaches have the benefit of furnishing kinetic networks
without requiring that the assumption of diffusive dynamics be made.
In the present work, it is the structure and thermodynamics of the
metastable states that are of primary interest, as opposed to the
kinetic transition rates, and for this reason we favor the smooth,
continuous, and more structurally interpretable free energy surfaces
furnished by diffusion maps.

Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A

Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.