notes Gene Analyses (Web, YouTube, Copilot)

Amino acid code

Name	Single Letter Code	Multiple Letter Code	D-Amino Acid Code
Alanine	A	{ALA}	{d-ALA}
Arginine	R	{ARG}	{d-ARG}
Asparagine	N	{ASN}	{d-ASN}
Aspartic Acid	D	{ASP}	{d-ASP}
Cysteine	C	{CYS}	{d-CYS}
Glutamic Acid	E	{GLU}	{d-GLU}
Glutamine	Q	{GLN}	{d-GLN}
Glycine	G	{GLY}
Homocysteine		{HCY}	{d-HCY}
Histidine	H	{HIS}	{d-HIS}
Homoserine		{HSE}	{d-HSE}
Isoleucine	I	{ILE}	{d-ILE}
Leucine	L	{LEU}	{d-LEU}
Lysine	K	{LYS}	{d-LYS}
Methionine	M	{MET}	{d-MET}
Norleucine		{NLE}	{d-NLE}
Norvaline		{NVA}	{d-NVA}
Ornithine		{ORN}	{d-ORN}
Penicillamine		{PEN}	{d-PEN}
Phenylalanine	F	{PHE}	{d-PHE}
Proline	P	{PRO}	{d-PRO}
Serine	S	{SER}	{d-SER}
Threonine	T	{THR}	{d-THR}
Tryptophan	W	{TRP}	{d-TRP}
Tyrosine	Y	{TYR}	{d-TYR}
Valine	V	{VAL}	{d-VAL}
Pyroglutamate		{pGLU}	{d-pGLU}
Dinitrobenzylation (LYS)		{Lys(Dnp)}
Phosphorylation (THR)		{pTHR}
Phosphorylation (SER)		{pSER}
Phosphorylation (TYR)		{pTYR}
Gamma-GLU		{gamma-GLU}	{d-gamma-GLU}
Citrulline		{CIT}
N-methylated ALA		{nme-ALA}
N-methylated Isoleucine		{nme-ILE}
N-methylated Leucine		{nme-LEU}
N-methylated Phenylalanine		{nme-PHE}
N-methylated Valine		{nme-VAL}
N-methylated Serine		{nme-SER}
N-methylated Threonine		{nme-THR}
N-methylated Tyrosine		{nme-TYR}
Alpha Amino-Butyric Acid		{alpha-ABA}
Iso Aspartic Acid		{Beta-Asp}
Acetylation at alpha amine group		{Ac-LYS}
2-Methyl Alanine		{2-Me-ALA}
Oxamic Acid		{OXA}
Acetylation at the side chain		{Lys-Ac}
Methionine sulfoxid		{Met(O)}
Methionine sulfone		{Met(O)2}
Cyclopentylglycine		{Cpg}
Propargylglycine		{Pra}
1,2,3,4-Tetrahydroisoquinoline-3-carboxylic acid		{Tic}
Selenocysteine		{Sec}
Selenomethionine		{Se-Met}
Azido-Lysine		{Lys(N3)}
Beta-HomoLeucine		{Beta-HomoLeu}
Carboxyamidomethylated Cysteine		{Cys(Cam)}
Methylation at the side chain of Arginine		{Arg(Me)}
Arg(Me)2 asymmetrical		{ADMA}
Arg(Me)2 symmetrical		{SDMA}
Beta-Alanine		{Beta-Ala}
4-Aminobutyric acid		{GABA}
6-amino-hexanoic acid		{Ahx}
NH2-(PEG)2-CH2COOH		{PEG2}
NH2-(PEG)6-CH2CH2COOH		{PEG6}
NH2-(PEG)11-CH2COOH		{PEG11}
NH2-(PEG)12-CH2CH2COOH		{PEG12}
2,4-Diaminobutyric Acid		{Dab}
2,3-Diaminopropionic Acid		{Dap}
alpha-aminoisobutyric acid (alpha-methyl-alanine)		{AIB}
L-alpha-aminobutyric acid		{ABU}
Hydroxy Proline		{HYP}
beta-cyclohexyl-L-alanine		{Cha}	{D-Cha}
Phenylglycine		{Phg}
Biotin Lysine		{Lys(biotin)}
(L) 1-NAL		{L-1-NAL}	{d-1-NAL}
(L) 2-PAL		{L-2-PAL}	{d-2-PAL}
(L) 4-CL-PHE		{L-4-CL-PHE}	{d-4-CL-PHE}
{Chg}		{Chg}	{D-Chg}
{L-2-Nal}		{L-2-Nal}	{D-2-Nal}
{L-3-Pal}		{L-3-Pal}	{D-3-Pal}
{L-4-F-Phe}		{L-4-F-Phe}	{D-4-F-Phe}
{L-4-I-Phe}		{L-4-I-Phe}	{D-4-I-Phe}
{L-4-NO2-Phe}		{L-4-NO2-Phe}	{D-4-NO2-Phe}
{L-4-Pal}		{L-4-Pal}	{D-4-F-Phe}

https://www.genscript.com/Amino_Acid_Code.html

Kullback–Leibler (KL) divergence

https://www.youtube.com/watch?v=q0AkK8aYbLY

https://www.youtube.com/watch?v=SxGYPqCgJWM

https://www.youtube.com/watch?v=jmXDQUxgzsU

Copilot:

? What D-motifs Are

D-motifs are amino acid pairs (or triplets) that show statistically significant deviation from what you'd expect if amino acids were randomly distributed. This is quantified using:

Where:

( p(b^{(n)}) ) is the observed probability of a motif.
( q(b^{(n)}) ) is the expected probability assuming independence (from Eq. 2.8).
( k(p, q, b^{(n)}) ) is the log-ratio that measures how much more or less frequent the motif is than expected.

? The (D, σ)-Criterion

To decide whether a motif is statistically significant, they compare ( k(p, q, b^{(n)}) ) to the mean and standard deviation of all such motifs:

D-positive:

→ Motif occurs more often than expected. Possibly due to mutation or functional selection.
D-negative:

→ Motif occurs less often than expected. Possibly suppressed or structurally unfavorable.

They use D = 2, which is a common threshold in statistics (roughly 95% confidence).

? Mutation Implications

You're absolutely right in connecting this to mutation detection:

If a motif is D-positive, it might be the result of a mutation that increases its frequency. This could be biologically meaningful—like stabilizing a protein or forming a disulfide bridge (e.g., CC).
If a motif is D-negative, it’s unlikely to occur naturally. So if it does appear, it might signal a rare mutation or an error.

For example:

CC is D-positive: occurs more than expected → could be a stabilizing mutation.
CM is D-negative: occurs less than expected → unlikely to be a natural mutation unless something disrupts the usual pattern.

? Practical Use

This framework lets you:

Detect mutations by scanning for motifs that cross the D-threshold.
Classify proteins based on motif content.
Design sensors or diagnostics that look for these motifs as biomarkers.

To connect motif density to actual shape, you'd need:

Structural databases like PDB to map motifs onto 3D folds.
Secondary structure prediction tools (e.g. PSIPRED, AlphaFold).
Domain annotation tools like InterPro or Pfam.

But the beauty of Shen et al.'s method is that it gives you a fast, sequence-only proxy for domain inference—especially useful when structure is unknown.

PROSITE – A curated database of protein domains and motifs

CryoDomain – Uses cryo-EM density maps to identify domains without sequence alignment

Nature study on essential domains – Uses transposon mutagenesis to identify essential domains based on density and disruption pattern

Common Methods to Extract and Read DNA/RNA:

PCR (Polymerase Chain Reaction): Amplifies specific DNA segments.
Sanger Sequencing: Classic method using chain termination.
Next-Generation Sequencing (NGS): High-throughput, massively parallel sequencing.
Nanopore Sequencing: Reads DNA/RNA by detecting changes in electrical current as molecules pass through a pore.
RNA-Seq: Specifically sequences RNA to study gene expression.

To understand the “how” but not "what", researchers combine:

Genomic context (e.g. mutation hotspots)
Environmental data (e.g. carcinogen exposure)
Functional assays (e.g. protein activity tests)
Evolutionary analysis (e.g. selective pressure)

based on Shiyi Shena, Bo Kaia, Jishou Ruana, J. Torin Huzilb, Eric Carpenterb, Jack A. Tuszynskib article about probabilistic analyses

bazylevnik0

Search This Blog