Skip to main content

notes Gene Analyses (Web, YouTube, Copilot)

Amino acid code

Name

Single Letter Code

Multiple Letter Code

D-Amino Acid Code

Alanine

A

{ALA}

{d-ALA}

Arginine

R

{ARG}

{d-ARG}

Asparagine

N

{ASN}

{d-ASN}

Aspartic Acid

D

{ASP}

{d-ASP}

Cysteine

C

{CYS}

{d-CYS}

Glutamic Acid

E

{GLU}

{d-GLU}

Glutamine

Q

{GLN}

{d-GLN}

Glycine

G

{GLY}


Homocysteine


{HCY}

{d-HCY}

Histidine

H

{HIS}

{d-HIS}

Homoserine


{HSE}

{d-HSE}

Isoleucine

I

{ILE}

{d-ILE}

Leucine

L

{LEU}

{d-LEU}

Lysine

K

{LYS}

{d-LYS}

Methionine

M

{MET}

{d-MET}

Norleucine


{NLE}

{d-NLE}

Norvaline


{NVA}

{d-NVA}

Ornithine


{ORN}

{d-ORN}

Penicillamine


{PEN}

{d-PEN}

Phenylalanine

F

{PHE}

{d-PHE}

Proline

P

{PRO}

{d-PRO}

Serine

S

{SER}

{d-SER}

Threonine

T

{THR}

{d-THR}

Tryptophan

W

{TRP}

{d-TRP}

Tyrosine

Y

{TYR}

{d-TYR}

Valine

V

{VAL}

{d-VAL}

Pyroglutamate


{pGLU}

{d-pGLU}

Dinitrobenzylation (LYS)


{Lys(Dnp)}


Phosphorylation (THR)


{pTHR}


Phosphorylation (SER)


{pSER}


Phosphorylation (TYR)


{pTYR}


Gamma-GLU


{gamma-GLU}

{d-gamma-GLU}

Citrulline


{CIT}


N-methylated ALA


{nme-ALA}


N-methylated Isoleucine


{nme-ILE}


N-methylated Leucine


{nme-LEU}


N-methylated Phenylalanine


{nme-PHE}


N-methylated Valine


{nme-VAL}


N-methylated Serine


{nme-SER}


N-methylated Threonine


{nme-THR}


N-methylated Tyrosine


{nme-TYR}


Alpha Amino-Butyric Acid


{alpha-ABA}


Iso Aspartic Acid


{Beta-Asp}


Acetylation at alpha amine group


{Ac-LYS}


2-Methyl Alanine


{2-Me-ALA}


Oxamic Acid


{OXA}


Acetylation at the side chain


{Lys-Ac}


Methionine sulfoxid


{Met(O)}


Methionine sulfone


{Met(O)2}


Cyclopentylglycine


{Cpg}


Propargylglycine


{Pra}


1,2,3,4-Tetrahydroisoquinoline-3-carboxylic acid


{Tic}


Selenocysteine


{Sec}


Selenomethionine


{Se-Met}


Azido-Lysine


{Lys(N3)}


Beta-HomoLeucine


{Beta-HomoLeu}


Carboxyamidomethylated Cysteine


{Cys(Cam)}


Methylation at the side chain of Arginine


{Arg(Me)}


Arg(Me)2 asymmetrical


{ADMA}


Arg(Me)2 symmetrical


{SDMA}


Beta-Alanine


{Beta-Ala}


4-Aminobutyric acid


{GABA}


6-amino-hexanoic acid


{Ahx}


NH2-(PEG)2-CH2COOH


{PEG2}


NH2-(PEG)6-CH2CH2COOH


{PEG6}


NH2-(PEG)11-CH2COOH


{PEG11}


NH2-(PEG)12-CH2CH2COOH


{PEG12}


2,4-Diaminobutyric Acid


{Dab}


2,3-Diaminopropionic Acid


{Dap}


alpha-aminoisobutyric acid (alpha-methyl-alanine)


{AIB}


L-alpha-aminobutyric acid


{ABU}


Hydroxy Proline


{HYP}


beta-cyclohexyl-L-alanine


{Cha}

{D-Cha}

Phenylglycine


{Phg}


Biotin Lysine


{Lys(biotin)}


(L) 1-NAL


{L-1-NAL}

{d-1-NAL}

(L) 2-PAL


{L-2-PAL}

{d-2-PAL}

(L) 4-CL-PHE


{L-4-CL-PHE}

{d-4-CL-PHE}

{Chg}


{Chg}

{D-Chg}

{L-2-Nal}


{L-2-Nal}

{D-2-Nal}

{L-3-Pal}


{L-3-Pal}

{D-3-Pal}

{L-4-F-Phe}


{L-4-F-Phe}

{D-4-F-Phe}

{L-4-I-Phe}


{L-4-I-Phe}

{D-4-I-Phe}

{L-4-NO2-Phe}


{L-4-NO2-Phe}

{D-4-NO2-Phe}

{L-4-Pal}


{L-4-Pal}

{D-4-F-Phe}

https://www.genscript.com/Amino_Acid_Code.html


Kullback–Leibler (KL) divergence


https://www.youtube.com/watch?v=q0AkK8aYbLY

https://www.youtube.com/watch?v=SxGYPqCgJWM

https://www.youtube.com/watch?v=jmXDQUxgzsU


Copilot:

? What D-motifs Are

D-motifs are amino acid pairs (or triplets) that show statistically significant deviation from what you'd expect if amino acids were randomly distributed. This is quantified using:


Where:

  • ( p(b^{(n)}) ) is the observed probability of a motif. 

  • ( q(b^{(n)}) ) is the expected probability assuming independence (from Eq. 2.8). 

  • ( k(p, q, b^{(n)}) ) is the log-ratio that measures how much more or less frequent the motif is than expected. 

? The (D, σ)-Criterion

To decide whether a motif is statistically significant, they compare ( k(p, q, b^{(n)}) ) to the mean and standard deviation of all such motifs:

  • D-positive:

    → Motif occurs more often than expected. Possibly due to mutation or functional selection.

  • D-negative:


  • → Motif occurs less often than expected. Possibly suppressed or structurally unfavorable.

They use D = 2, which is a common threshold in statistics (roughly 95% confidence).

? Mutation Implications

You're absolutely right in connecting this to mutation detection:

  • If a motif is D-positive, it might be the result of a mutation that increases its frequency. This could be biologically meaningful—like stabilizing a protein or forming a disulfide bridge (e.g., CC). 

  • If a motif is D-negative, it’s unlikely to occur naturally. So if it does appear, it might signal a rare mutation or an error. 

For example:

  • CC is D-positive: occurs more than expected → could be a stabilizing mutation. 

  • CM is D-negative: occurs less than expected → unlikely to be a natural mutation unless something disrupts the usual pattern. 

? Practical Use

This framework lets you:

  • Detect mutations by scanning for motifs that cross the D-threshold. 

  • Classify proteins based on motif content. 

  • Design sensors or diagnostics that look for these motifs as biomarkers. 




To connect motif density to actual shape, you'd need:

  • Structural databases like PDB to map motifs onto 3D folds.

  • Secondary structure prediction tools (e.g. PSIPRED, AlphaFold).

  • Domain annotation tools like InterPro or Pfam.

But the beauty of Shen et al.'s method is that it gives you a fast, sequence-only proxy for domain inference—especially useful when structure is unknown.

  • PROSITE – A curated database of protein domains and motifs

  • CryoDomain – Uses cryo-EM density maps to identify domains without sequence alignment

  • Nature study on essential domains – Uses transposon mutagenesis to identify essential domains based on density and disruption pattern

Common Methods to Extract and Read DNA/RNA:

  • PCR (Polymerase Chain Reaction): Amplifies specific DNA segments.

  • Sanger Sequencing: Classic method using chain termination.

  • Next-Generation Sequencing (NGS): High-throughput, massively parallel sequencing.

  • Nanopore Sequencing: Reads DNA/RNA by detecting changes in electrical current as molecules pass through a pore.

  • RNA-Seq: Specifically sequences RNA to study gene expression.


To understand the “how” but not "what", researchers combine:

  • Genomic context (e.g. mutation hotspots)

  • Environmental data (e.g. carcinogen exposure)

  • Functional assays (e.g. protein activity tests)

  • Evolutionary analysis (e.g. selective pressure)

based on Shiyi Shena, Bo Kaia, Jishou Ruana, J. Torin Huzilb, Eric Carpenterb, Jack A. Tuszynskib article about probabilistic analyses

Comments