Challenges in Enzyme Design

(and How to Address Them)

Designing effective enzymes – especially novel de novo enzymes intended for therapeutic use – involves overcoming numerous scientific and engineering challenges. Below is an overview of key challenges in enzyme design, along with possible solution approaches for each. Each challenge is described, followed by strategies (focused on computational methods) that can help address it.

This page aims to become a comprehensive reference for enzyme design. If you identify additional challenges or solutions that are not included here, please send them to contact@fixaging.ai, so they can be reviewed and added.

Catalytic Efficiency and Active Site Design

Challenge: Natural enzymes are extraordinarily efficient catalysts, often accelerating reactions to near diffusion-controlled limits. In contrast, most designed enzymes have much lower activities – sometimes 6–8 orders of magnitude below those of natural enzymes. Achieving a high turnover rate and strong substrate affinity in a de novo enzyme is a formidable challenge because it requires precisely arranging catalytic residues and stabilizing the reaction's transition state. Early designs may show little or no detectable activity if the active site residues are not positioned optimally, or if the protein scaffold lacks the subtle dynamics that evolved enzymes use for catalysis. In short, matching the catalytic power of natural enzymes in a completely new protein design remains one of the hardest problems in enzyme engineering.

Solution Approaches: To bridge the activity gap, enzyme designers use advanced computational strategies to create and optimize active sites for higher catalytic performance. Modern AI-driven tools and physics-based models are employed to improve the positioning of key residues and enhance transition-state stabilization:

High-throughput in silico pipeline: Use an AI-driven design pipeline to generate a large virtual library of enzyme candidates and then filter them computationally. For example, thousands of candidate proteins can be created with generative models (such as RFdiffusion or ProteinMPNN), each containing a plausible active site. These are then screened in silico using techniques like transition state docking, Rosetta energy scoring, molecular dynamics (MD) simulations, or quantum mechanics/molecular mechanics (QM/MM) estimates. Only the top-ranked 10–50 designs (those predicted to bind the substrate tightly and stabilize the transition state) are selected for real-world testing. This triage approach, used by leading protein design labs, maximizes the chances of finding a highly active enzyme while minimizing wet-lab experiments.
Theozyme-guided design: Define an ideal "theozyme" (theoretical enzyme) geometry for the reaction – i.e. the perfect arrangement of catalytic residues around the substrate's transition state – and then build the protein around it. By constraining the design to achieve this geometry, one can place nucleophiles, general acids/bases, or metal ions in exactly the right positions for catalysis from the start.
AI-generative active site modeling: Leverage state-of-the-art AI models that generate enzyme structures conditioned on the chemical reaction. For instance, new design tools (such as GENzyme, EnzyGen, or EnzymeFlow) use deep learning to propose protein backbones and active sites tailored to a given substrate or reaction mechanism. Because these models learn from thousands of known enzyme structures, they can produce novel designs with pre-optimized catalytic pockets, increasing the likelihood of high activity.
Structure-aware diffusion models: Apply protein structure diffusion models (like RFdiffusion) with added constraints to ensure proper catalysis. These models can be guided by known catalytic motifs or provided with a bound transition-state model during generation. By integrating a required reactive conformation into the design process, the resulting enzymes are more likely to align structurally with the intended chemical mechanism.
Fragment-based scaffold building: Construct enzyme candidates by combining fragments of natural protein backbones with idealized catalytic site motifs. Using fragments from real proteins (that are known to fold well) provides stable scaffolds, while grafting in a designed active-site configuration ensures the key residues are in place. This approach improves the chances that the enzyme will both fold correctly and present the catalytic residues in the right orientation.
Multiresidue catalytic networks: Design active sites with multiple coordinated catalytic residues (such as catalytic dyads or triads, reminiscent of those in natural enzymes). Incorporating several residues that work together (for example, a nucleophile, a general base, and a stabilizing residue) can greatly enhance the reaction rate. Modern design algorithms can explicitly include such networks, creating enzymes that use cooperative catalysis to approach the efficiency of real enzymes.
Iterative AI-guided optimization: After an initial design is generated, apply iterative computational mutagenesis and optimization to boost its activity. Tools like ProteinMPNN can suggest mutations around the active site to improve binding or geometry, and machine-learning frameworks (e.g., Bayesian optimization methods such as BOLT or BoTorch) can guide the search through sequence space. In practice, one can simulate cycles of mutation and scoring, refining enzyme candidates until the models predict much higher k_cat/K_m values.

These cutting-edge approaches are rapidly narrowing the performance gap between designed and natural enzymes. By smartly generating and filtering designs in silico, it is increasingly feasible to create de novo enzymes with high catalytic turnover – without having to rely on lengthy directed evolution in the lab.

Protein Stability vs. Degradability

Challenge: A de novo enzyme must fold into a stable, functional structure, but it also should not be so stable that it evades the cell's normal recycling systems. On one hand, insufficient stability can lead to misfolding or denaturation under physiological conditions, meaning the enzyme would quickly lose activity in the cell. On the other hand, an overly stable protein might resist proteolytic degradation and persist indefinitely inside cells. If a therapeutic enzyme cannot be broken down by proteasomes or lysosomes, it could accumulate as "indigestible" material — ironically becoming a new form of intracellular junk. The challenge is to achieve a balance: the enzyme should be stable enough to perform its catalytic function reliably, yet still be susceptible to regulated degradation so that the cell can remove it when it's no longer needed.

Solution Approaches: Enzyme designers address this balance by tuning the protein's inherent stability through sequence and structure choices. Computational tools can predict stability changes from mutations, allowing fine control over the fold's strength while introducing features to enable degradation:

Use moderately stable scaffolds: Design or select protein scaffolds that are sufficiently stable to fold correctly at 37°C and function in the cell, but not ultra-stable. If starting with a very stable design, introduce a few flexibility-increasing or destabilizing mutations so the final enzyme can still be unfolded and degraded by cellular proteases when required. Gradual stability optimization (rather than maximizing stability outright) ensures the enzyme isn't completely impervious to the degradation machinery.
Introduce degron tags or cleavage sites: If a designed enzyme proves too long-lived, incorporate specific peptide sequences that signal for degradation. For example, adding a short degron motif like a PEST sequence (rich in Proline, Glutamic acid, Serine, Threonine) can tag the enzyme for rapid turnover via the ubiquitin-proteasome system. Similarly, engineering in one or two recognition sites for proteases (that do not affect enzyme function) can guarantee that the protein will be cleaved and recycled after fulfilling its role. These modifications can be planned into the sequence from the beginning using bioinformatics tools.
Thermostability tuning via computation: Predict the stability impact of mutations using software like FoldX, Rosetta ΔΔG predictions, or deep learning models (e.g., DeepDDG or protein language models). These tools allow designers to adjust the enzyme's thermal and conformational stability in silico. By scoring many variants, one can select those that maintain structure but have a slightly lower stability margin, making them easier for the cell to degrade.
Avoid overly stabilizing features: Certain structural features (e.g., multiple disulfide bonds, a highly hydrophobic core, extensive secondary structure packing) can make proteins extremely rigid and protease-resistant. Unless necessary for function, it's wise to avoid or minimize these in the design. The computational design process can be guided to favor soluble, protease-accessible regions on the protein surface – for example, ensuring some flexible loops or less-ordered regions that proteases can latch onto. This way, the enzyme is not too hardened against degradation.
In silico stability-versus-degradation screening: Incorporate predictions of degradation signals into the design pipeline. Machine-learning models (such as MODIFY or other predictors) can identify potential ubiquitination sites or MHC-binding peptides in the sequence. Designers can then modify those regions to either include a desirable degradation tag or remove an unwanted immunogenic/degradation-related signal. By scoring candidates for both stability and the presence of degradation motifs, one can computationally filter for enzymes that hit the sweet spot: they fold well but are not likely to accumulate.
Iterative refinement with simulations: Use molecular dynamics simulations and structure prediction (e.g., AlphaFold) in iterative loops to test how the enzyme behaves under cellular-like conditions. For example, an MD simulation can reveal if the enzyme remains stable over time or if it unfolds easily; this data helps in deciding whether to add stabilizing mutations. Conversely, simulations can identify extremely rigid regions that might need destabilizing tweaks. By iterating between sequence design and simulation, the enzyme's half-life in the cell can be computationally tuned before any real-world testing.

The overall goal is a protein that remains folded and active long enough to do its job, but which the cell can ultimately recognize and recycle via normal pathways. By computationally co-optimizing stability and degradability, it is possible to avoid both inactive, flimsy enzymes and overly persistent ones.

Substrate Specificity and Off-Target Effects

Challenge: Therapeutic enzymes intended to degrade specific intracellular "junk" must exhibit extremely high substrate specificity to avoid harming healthy molecules. Even subtle similarities between a target molecule and normal cellular components could cause an enzyme with a permissive active site to bind or cleave something it shouldn't. De novo designed enzymes haven't undergone millions of years of evolution to hone their specificity, so early designs may bind unintended substrates, especially if the active site is too broad or flexible. In a therapeutic context, any off-target activity could lead to serious side effects – for example, breaking down a critical structural protein or metabolite by mistake could be toxic. Therefore, a key challenge is ensuring each designed enzyme recognizes and acts on only the intended molecular target and nothing else. (A design philosophy can be to create one highly specific enzyme per type of damage; enzymes with broad activity can be avoided because they carry higher off-target risks and are harder to control safely.)

Solution Approaches: Achieving high specificity starts with careful active-site design and extensive computational screening for off-target interactions. Several strategies can be employed to make sure a designed enzyme will only bind and degrade the intended target molecule:

Precise active site shaping: Design the enzyme's binding pocket to uniquely complement the target molecule's features (shape, charge, hydrophobic pattern) so that no other molecule fits as well. Computer-aided docking studies and electrostatic surface analyses can guide modifications to ensure that if a molecule doesn't have the exact correct shape or functional groups, it won't bind strongly. Essentially, this builds a "lock" that only the right "key" can turn.
Multiple specific interactions: Incorporate a combination of binding interactions that collectively are unique to the target. For instance, the enzyme can be designed to make several hydrogen bonds and salt bridges to a specific arrangement of atoms on the target. If a potential off-target lacks even one of those interaction points, the binding affinity drops off sharply. By requiring a unique constellation of contacts between enzyme and substrate, the chance that other molecules will satisfy all of them simultaneously is drastically reduced.
Constraint-driven design for selectivity: Guide the generative design process with selectivity constraints from the outset. Modern protein design AIs can take into account a bound substrate or even be "told" to prioritize certain interactions. By using reaction-conditioned design tools (for example, instructing RFdiffusion or other models to build the enzyme around the substrate or a transition-state analog), the enzyme's structure can be intrinsically aligned to one target chemistry. This way, thousands of off-target, random interactions are implicitly ruled out during generation.
In silico off-target screening: Before any actual deployment, virtually screen each candidate enzyme against libraries of other compounds (normal metabolites, proteins, lipids, etc.). Using high-throughput docking simulations or AI prediction of binding, it can be checked if the enzyme shows any tendency to latch onto molecules it shouldn't. Any design that shows significant binding to an off-target in these computational panels is eliminated or redesigned. This computational "safety panel" is akin to running the enzyme against every relevant molecule in a simulation to ensure it's safe.
Molecular dynamics confirmation: Employ MD simulations to observe the enzyme-substrate interaction over time and under various conditions. These simulations can reveal if the enzyme's active site might flex open and accommodate other ligands inadvertently. By testing the stability and specificity of the enzyme-substrate complex in silico (including challenging it with potential competitors in the simulation), confidence can be gained that the enzyme will stay focused on its target in the dynamic cellular environment.

By using these techniques, it is possible to ensure that each designed enzyme binds only its intended target and catalyzes only the desired reaction. This minimizes the risk of side effects and makes the therapeutic approach as precise as possible, all verified through computation before any enzyme ever enters a cell.

Immunogenicity and Immune Response

Challenge: Introducing a novel protein into the human body – especially one that is wholly new or from a non-human source – runs the risk of the immune system recognizing it as "foreign." If a therapeutic enzyme is injected or produced in cells, the immune system might generate antibodies against it or mount a T-cell response. Inside cells, proteins are routinely broken down into peptides, and these fragments can be presented on MHC class I molecules on the cell surface. A de novo enzyme's peptide fragments might be unfamiliar to the immune system, potentially flagging the cell for destruction by cytotoxic T cells. Likewise, extracellular enzymes or their fragments could be presented on MHC class II by antigen-presenting cells, triggering antibody production. Immunogenicity has proven to be a major hurdle for many enzyme therapies: patients can develop neutralizing antibodies that reduce efficacy or experience allergic reactions. The challenge, therefore, is to design an enzyme that the immune system will tolerate – performing its job without provoking an unwanted immune attack.

Solution Approaches: Several strategies can be employed (during the design and delivery of the enzyme) to reduce immunogenicity. The focus can be on computational "de-immunization" of the protein sequence and smart delivery choices:

Sequence humanization (de-immunization): Use immunoinformatics tools (such as NetMHCpan or the IEDB analysis resource) to predict peptide segments of the enzyme that are likely to bind human MHC molecules and be seen by T cells. Those potentially immunogenic epitopes can then be altered through point mutations to disrupt their MHC binding, without changing the enzyme's function. By iterating this process – identify risky epitope, mutate, re-check – many of the immune-alerting features can be masked or removed from the protein. Additionally, scaffolds or amino acid motifs that are common in human proteins can be chosen, so that the enzyme "looks" more self-like to the immune system. This humanization process, now often assisted by AI, aims to make the engineered enzyme as immunologically bland as possible.
Incorporate non-natural amino acids: In some cases, replacing certain amino acids with chemically modified or D-amino acid versions can reduce immune recognition. For example, D-amino acids (the mirror-image of normal L-amino acids) are not readily processed by human proteases and thus produce fewer peptides for MHC presentation. Strategic placement of a few D-amino acids or other stabilizing modifications can make the enzyme less visible to immune surveillance. (This must be done carefully to ensure the enzyme still folds and works; it's more of an experimental strategy, but it is an option for reducing immunogenic epitopes.)
Immune-privileged expression sites: Take advantage of the body's own tolerance mechanisms by choosing where and how the enzyme is delivered. Notably, the liver is an immune-tolerant organ; proteins expressed in the liver tend to induce immune tolerance rather than attack. In gene therapy approaches, if the DNA or mRNA for the enzyme is delivered specifically to the liver (for example, using an AAV vector that targets hepatocytes), the enzyme produced in those cells can teach the immune system to accept it as non-threatening. This phenomenon, known as central tolerance induction, can become systemic – meaning once the immune system tolerates the enzyme from liver exposure, it's less likely to react to that enzyme anywhere in the body. (Some experimental therapies use this tactic to achieve long-term acceptance of a therapeutic protein.) Similarly, one could target other relatively immune-privileged sites or introduce the enzyme in a way that favors tolerance, although the liver is a prime choice for such strategies.
Stealth delivery and shielding: If delivering the enzyme as a drug (protein), formulation can mitigate immune detection. Encapsulating the enzyme in protective vehicles like nanoparticles, liposomes, or exosomes can conceal it from immediate recognition by immune cells and also control its release. Another proven approach is PEGylation – attaching polyethylene glycol chains to the enzyme's surface – which shields the protein from the immune system and can significantly reduce the formation of anti-drug antibodies. Many therapeutic enzymes and cytokines are PEGylated for this reason. Such chemical or nanoparticle coatings can be considered to make enzymes as "invisible" as possible to the immune system while in circulation.

By combining rational sequence design (to remove immune epitopes), choosing favorable expression sites, and using protective delivery methods, enzymes can be created that do their work without alerting the body's immune defenses.

Delivery and Targeting to Cells

Challenge: Even a brilliantly designed enzyme will fail as a therapy if it cannot reach the cells or organelles where its target resides. Delivering enzymes to the correct location in the body and into the interior of cells presents multiple sub-challenges:

Route of administration: If taken orally, protein enzymes face the harsh environment of the gastrointestinal tract – stomach acid and digestive enzymes will likely denature or digest them before any can be absorbed. Intravenous (IV) injection bypasses the GI tract, but then the enzyme must survive in the bloodstream and avoid rapid clearance. The kidneys can filter out and excrete small proteins, and the liver and spleen's reticuloendothelial system can capture and break down larger particles. Many enzymes injected into the blood have short half-lives (often only minutes to hours) unless specifically modified to evade degradation and clearance.
Cellular uptake: For intracellular targets, the enzyme must cross the cell membrane to get inside cells (and sometimes even further into a specific organelle like the lysosome or mitochondrion). Large proteins generally do not cross cell membranes on their own. Without a special mechanism, a therapeutic enzyme will mostly remain outside cells, which is ineffective for therapeutic purposes. Similarly, reaching certain organs like the brain is extremely challenging due to the blood–brain barrier (BBB), which prevents large molecules in the bloodstream from entering the brain tissue. If a target form of molecular damage is in the brain, delivering an enzyme there requires clever solutions to traverse or bypass the BBB.

Solution Approaches: Both protein-based delivery enhancements and gene-based delivery methods can be considered to ensure the enzyme reaches its site of action:

Protective formulations for oral delivery: Although oral delivery of proteins is very difficult, advanced formulation can help. Enteric coatings on capsules can prevent an enzyme from being released until it reaches the higher-pH environment of the intestines (bypassing the stomach). Enzyme formulations can also include protease inhibitors or be designed to be resistant to digestive enzymes to improve the chance that some active enzyme is absorbed. (For most therapies, injections or gene therapy rather than oral routes are anticipated, due to these challenges.)
Extending plasma half-life for IV delivery: If administering the enzyme directly into the bloodstream, modifications can be employed to help it circulate longer. One common technique is PEGylation (mentioned above), which not only helps with immunogenicity but also increases molecular size and shields degradation sites, thereby slowing kidney filtration and protease attack. Another approach is to fuse the enzyme with an antibody fragment (Fc domain) or albumin-binding peptide – this can greatly increase the serum half-life by taking advantage of natural recycling pathways that these molecules undergo. These strategies ensure the enzyme has enough time in the bloodstream to find and enter target cells.
Cell-targeting and uptake mechanisms: To get enzymes into cells, specific "transport tags" can be added to the protein. One option is a cell-penetrating peptide such as TAT (from HIV) or penetratin, which can help drag large cargo across cell membranes. Another option is to harness receptor-mediated uptake: for example, attaching a ligand or antibody that targets a receptor on the surface of target cells. Many enzyme replacement therapies for lysosomal storage diseases use a mannose-6-phosphate (M6P) sugar tag on the enzyme, which tricks cells into taking them up via the M6P receptor and routing them to lysosomes (where the enzyme then works). Similarly, to cross the blood–brain barrier, an enzyme can be fused to an antibody or ligand that targets receptors on the BBB (like the transferrin or insulin receptor), allowing the enzyme to be ferried across into the brain via a process called transcytosis (a "Trojan horse" strategy). Such fusion tags can be explored so that each enzyme is delivered to the specific tissue or cellular compartment where it's needed.
Gene therapy delivery (DNA): Instead of delivering the enzyme protein, the gene that encodes the enzyme can be delivered using viral vectors (such as adeno-associated virus, AAV, or lentivirus). By packaging the enzyme's DNA sequence into a virus targeted to the tissue of interest (for example, AAV vectors that naturally go to liver or muscle), the patient's own cells can be made to produce the enzyme. This approach has the advantage of continuous enzyme production inside cells, including proper localization if the gene includes an organelle targeting sequence. It also bypasses repeated dosing – a one-time gene therapy could provide long-term enzyme supply. The challenge here is selecting a safe and efficient vector and controlling the expression level so it's effective but not harmful.
mRNA delivery: A newer modality is to deliver messenger RNA (mRNA) encoding the enzyme, typically in a lipid nanoparticle (LNP) formulation. mRNA can be injected and taken up by cells, which then translate it into the enzyme protein. This is the same technology behind some recent vaccines. For enzyme therapy, an LNP can be formulated to target certain organs (for instance, LNPs often accumulate in the liver by default). The mRNA approach can achieve intracellular production of the enzyme without the permanence of DNA integration – the protein is made for a period of time and the mRNA eventually degrades, which can be advantageous for dose control. mRNA-LNP delivery can be considered especially when transient expression is sufficient or when gene therapy is too risky.
Encapsulation and nano-delivery systems: Enzymes can be packaged into various carriers that aid in delivery. For example, encapsulating enzymes in biocompatible nanoparticles or polymers can protect them from degradation and help target them to specific cells (by functionalizing the nanoparticle with targeting molecules). Another intriguing approach is loading enzymes into engineered exosomes or even into resealed red blood cell ghosts, which can then circulate and deliver enzymes to certain tissues while evading the immune system. These advanced delivery vehicles can be designed so that once the enzyme-packages reach the target tissue, they release the enzyme into the extracellular space or even directly into cells.

In summary, effective delivery often requires a combination of strategies: protecting the enzyme during transit, targeting it to the right location, and using the right modality (protein vs gene vs mRNA) for the situation. By integrating delivery considerations early in the design process, the likelihood that therapeutic enzymes will reach the intracellular sites of damage in a functional state can be improved.

Compatibility with Cellular Environment (pH & Cofactors)

Challenge: The inside of a cell (and particularly certain organelles) is a very specific chemical environment – and a designed enzyme needs to remain functional in those exact conditions. Key factors include pH, the presence of necessary cofactors or ions, the redox conditions, and the temperature. For example, many of the "junk" molecules that need to be degraded reside in lysosomes, which are acidic compartments (pH ~4.5–5.0). An enzyme that works well at neutral pH in a test tube might become sluggish or denatured in the acidic lysosome. Likewise, if an enzyme requires a cofactor (such as a metal ion or a vitamin-derived coenzyme) that is scarce in the target compartment, the enzyme will not be able to carry out its reaction efficiently in vivo. Another aspect of compatibility is ensuring that the enzyme's activity does not disrupt the cell: if breaking down the target produces harmful by-products (e.g. reactive oxygen species or toxic fragments), those need to be handled safely by the cell. In short, a challenge in enzyme design is adapting the enzyme to the cell's environment (or vice versa) so that it works optimally where it's supposed to, without causing collateral damage.

Solution Approaches: This can be addressed by designing enzymes with the target environment in mind and by planning for any necessary support the enzyme might need:

Design for the target pH: If an enzyme is intended to operate in an acidic organelle like the lysosome, features that confer acid stability and activity can be incorporated. This might include selecting or mutating amino acids that maintain favorable charge and structure at low pH (many natural lysosomal enzymes have evolved acid-stable folding). Computational tools can simulate protein behavior at different pH values, guiding adjustments so that the enzyme remains properly folded and catalytically active under acidic conditions. Conversely, for enzymes targeting the cytosol (pH ~7.4), they can be ensured to be optimized for neutrality.
Match cofactor availability: It is preferable to design enzymes that use cofactors readily available in cells (such as Mg²⁺ or Ca²⁺, which are common, versus something exotic). If a particular catalytic mechanism absolutely requires a less-common cofactor, ways to supply it can be considered. For instance, if using a metal like manganese or an uncommon cofactor, one solution might be co-delivering a supplement or engineering the cell to upregulate transport of that cofactor. Alternatively, modifying the enzyme's active site to use a different chemistry that doesn't rely on a scarce component can be explored. Computational enzyme design can sometimes swap out dependencies – for example, redesigning a metal-binding site to accept Mg²⁺ instead of Zn²⁺ if magnesium is more abundant in the target environment.
Temperature and conditions adaptation: Designed enzymes might originate from bacteria or extremophiles with very different temperature or salt conditions than the human body. Enzymes can be ensured to function at 37°C and in physiological salt by introducing mutations that mimic those found in human proteins. If an enzyme is borrowed from a thermophilic organism (which might normally prefer 60°C), some of its stabilizing interactions can be softened so it remains dynamic enough at body temperature. These changes are guided by both intuition and software tools that predict activity profiles.
Plan for by-products and pathways: When an enzyme breaks down a target, it's important to consider what the breakdown products are and whether the cell can handle them. The degradation pathway can be analyzed: for example, if the enzyme cleaves a complex molecule into smaller pieces, are those pieces harmlessly excreted or further metabolized by the cell? If there's a risk of a harmful intermediate, a tandem system might be designed – either engineer the enzyme to perform the reaction in two steps (all in one protein) or co-deliver a second enzyme to immediately neutralize the by-product. In silico modeling of reaction pathways and cell metabolism can highlight such issues early.
Testing in simulated environments: Before moving to real cells, enzyme candidates can be tested in vitro under conditions that mimic the target compartment (e.g., in a test tube at pH 5.0 for a lysosomal enzyme, or in the presence of high oxidative stress if relevant). These compatibility tests, even when done computationally or in cell-free systems, help validate that the enzyme will fold and function as expected in the actual cellular milieu. Any design that fails under these conditions can be iteratively redesigned with computational feedback.

By consciously engineering enzymes for the environment in which they will act, their efficacy and safety can be improved. The enzyme, the target, and the cellular context all have to fit together like pieces of a puzzle for successful therapy.

Proper Folding and Solubility (Avoiding Aggregation)

Challenge: Not every designed protein will fold correctly into its intended 3D shape once produced in a cell. Some designs may misfold or form insoluble aggregates, especially in the crowded environment of the cytosol or when expressed at high levels. Protein aggregation is not only a loss of functional enzyme activity, but also a potential toxicity issue – aggregates can gum up cellular machinery or even seed pathological aggregates (like amyloids). Achieving proper folding is difficult because de novo designed sequences haven't been refined by evolution's quality control; even if a design is predicted to fold, the cell's chaperones might not recognize it or assist it effectively. Moreover, highly hydrophobic or misfolding-prone regions in the protein can cause it to stick to itself or other proteins, leading to aggregation. Thus, a key challenge is ensuring the enzyme is soluble and correctly folded in the cellular environment.

Solution Approaches: Both design-stage filters and smart sequence modifications can be used to promote correct folding and prevent aggregation:

Optimize surface properties: Ensure that the outside of the enzyme (the parts exposed to solvent) is rich in hydrophilic, charged amino acids that keep it soluble. Large patches of hydrophobic residues on the surface should be avoided, as those can cause molecules to stick together. Computational tools can analyze a protein's surface for "aggregation-prone" patches – for instance, looking at hydrophobicity patterns or using algorithms like Aggrescan3D and Tango which predict regions likely to aggregate. If such patches are found in a design, those regions can be modified (e.g., substituting some residues for more polar ones or redesigning that segment) to improve solubility.
Remove aggregation-prone sequences: Certain sequence motifs or structural elements are known to cause trouble (for example, stretches of hydrophobic amino acids or beta-sheet-prone segments that can form amyloid-like fibrils). During design refinement, the sequence can be scanned for these problematic motifs. If a segment has a high aggregation propensity score, that segment can be redesigned – perhaps by introducing proline or glycine residues to disrupt beta-sheet formation, or by adding charged residues to break up hydrophobic runs. This can be done iteratively with AI tools that suggest low-aggregation mutations while trying to preserve the enzyme's function.
Leverage stable scaffold designs: Starting from or incorporating hyperstable scaffold elements can help ensure the protein folds correctly. For example, some de novo design methods (like those using RFdiffusion or Rosetta) tend to produce proteins with idealized, stable secondary structure patterns. By using such methods, the designed enzyme is more likely to fold up on its own without needing extensive chaperone help. Designs can also be cross-checked with structure prediction (AlphaFold) – if AlphaFold confidently predicts the correct fold for the sequence, that's a good sign the protein will fold in reality. Designs that have strong predicted structures can be preferentially moved forward with.
Include solubility-enhancing features: If a particular enzyme tends to be borderline in solubility, known tricks can be incorporated to improve it. This might include adding a short solubility tag or peptide at one end (which is rich in charged/hydrophilic residues) that can be removed later if necessary. It can also include engineering sites that preferentially bind to cellular chaperones (proteins that assist in folding) – though this is a newer area, some research suggests adding specific motifs can recruit chaperones to help fold the enzyme. In production settings (when manufacturing the enzyme in cells), co-expressing certain chaperones or using secretion pathways can also yield a properly folded enzyme. While these measures are not permanent parts of the therapeutic enzyme, they inform how a well-folded product can initially be obtained.
Computational folding checks: Throughout the design process, folding simulations and protein stability calculations can be utilized to weed out designs that are likely to misfold. For example, molecular dynamics can reveal if a protein rapidly unfolds or clumps in silico. Evolutionary-based analysis can also be applied (looking at whether parts of the sequence violate known protein sequence patterns) to catch designs that might be unnatural and prone to misfolding. By doing these checks in advance, any enzyme design that shows a tendency to come apart or aggregate can be discarded or reworked.

The result of these efforts is an enzyme that is not only active against its target, but is also well-behaved in the cell: it folds into the intended shape and stays soluble, avoiding the formation of any toxic aggregates.

Regulatory and Safety Barriers

Challenge: Any novel bioengineered enzyme intended for therapeutic use must clear a high bar in terms of safety and regulatory approval. Even if all the scientific design challenges are solved, it must be ensured that introducing this enzyme into patients will not cause unacceptable side effects and will satisfy regulatory agencies (like the FDA or EMA). Potential safety concerns include immunogenicity (discussed above), off-target effects (breaking something it shouldn't), toxicity of breakdown products, and long-term effects (such as what happens if the gene is delivered permanently—could it integrate into the genome and cause issues? Does lifetime exposure raise cancer risks or other problems?). Additionally, because this is a cutting-edge approach, regulators will scrutinize the manufacturing process, purity, delivery method, and consistency of the product. The challenge here is not a scientific one per se, but it is crucial: safety and regulatory considerations must be proactively addressed in the design and testing plan, to smooth the path toward eventual clinical use.

Solution Approaches: Safety features can be incorporated into designs and adherence to best practices and guidelines from the outset can be maintained, to ensure that enzymes have a viable path to approval:

Built-in safety switches: Where applicable, enzymes or their delivery vectors can be designed with control mechanisms. For example, if using gene therapy, the enzyme can be put under an inducible promoter or a tissue-specific promoter, so that expression can be limited to desired times or places (reducing systemic risk). For the protein itself, incorporating any known toxin-like or harmful domains can be avoided, and fusion of the enzyme with degron tags (as discussed) can be considered to prevent long-term accumulation. These design choices act as safety valves.
Regulatory consultation and guidelines: The development strategy can be planned in line with regulatory guidelines for biologics. This means early on, precedents can be looked at – similar enzyme therapies (like those for lysosomal storage disorders) or gene therapies – to understand what regulators expect in terms of safety testing and product characterization. By consulting with regulatory experts or even directly with agencies in early stages, any red flags can be identified (for example, certain sequence motifs that might be of concern, or certain delivery methods that require additional safeguards) and addressed ahead of time. Aligning the design with ICH (International Council for Harmonisation) guidelines on biologics quality, immunogenicity assessment, etc., ensures one is not caught off guard late in development.
Thorough preclinical testing: Before any human use, enzyme candidates will go through extensive laboratory and animal testing to validate safety and efficacy. This includes testing in cell cultures and in animal models (such as mice, including possibly "humanized" mice that have human-like immune systems) to check for immune reactions, toxicity, proper distribution in the body (biodistribution), and effectiveness at reducing the targeted damage. While the approach emphasizes computational design to reduce trial-and-error, the importance of these preclinical studies is recognized to confirm that computational predictions hold true in living systems. These studies can be designed to gather maximum information (for instance, using sensitive assays to detect any off-target tissue damage or unwanted immune activation).
Risk mitigation and monitoring: Risk mitigation plans can be integrated, such as developing assays to monitor patients for any signs of immune response or off-target effects during any future trials. Also, a close eye can be kept on the long-term behavior of the enzymes – for gene therapies, ensuring that there is no insertional mutagenesis (by using non-integrating vectors like AAV and assessing vector genomes in cells), and for protein therapies, ensuring no accumulation or unforeseen interactions over time. By building a strong safety profile and a clear monitoring strategy, regulators can be satisfied that there is control over the therapy's risks.

Navigating the regulatory landscape is a challenge of its own, but by designing with safety in mind and following established guidance, the chances of delivering a therapy that is not only effective but also meets all the necessary safety standards for approval can be improved.

Accurate Prediction and Validation of Function

Challenge: Computational models – as powerful as they are – can sometimes misestimate how well an enzyme will actually perform in real biological conditions. Designing an enzyme entirely on computers involves many approximations: force fields in molecular simulations, assumptions in machine learning models, or limited data on the exact transition state of a novel reaction. There is a risk that a protein which scores well in silico might turn out to have little activity in vitro or in vivo, due to factors the models didn't capture. Conversely, some designs that were filtered out might actually have worked. The challenge here is ensuring in silico predictions truly correlate with real-world function, and iteratively improving models and designs based on experimental feedback. In short, it is necessary to validate that enzymes do what they're supposed to, and refine the design approach when they don't.

Solution Approaches: To address this, multiple modeling techniques can be combined and experimental validation steps can be incorporated in an intelligent way (focusing on minimal but informative tests):

Multi-pronged computational evaluation: Rather than relying on a single scoring method, each design can be evaluated using diverse computational predictors. For example, a physics-based simulation can be used (to estimate binding energy or reaction energy), a machine learning model trained on enzyme data (to predict likely activity or stability), and a knowledge-based functional site comparison (to see if the active site resembles those of known enzymes). If a candidate looks good across all these different lenses, more confidence can be had in it. If the predictors disagree, that design might be risky and need more scrutiny or tweaking. This cross-validation within in silico methods helps catch potential false positives or false negatives early.
Quantum and molecular simulations of catalysis: For critical reactions, higher-level computational chemistry can be employed to double-check that the enzyme's active site can facilitate the reaction mechanism. Quantum mechanics (density functional theory, for instance) can be used on a simplified model of the active site to verify transition state stabilization. Meanwhile, molecular dynamics can simulate the enzyme and substrate over time to observe if the substrate stays in the right place and if catalytic residues adopt the correct conformations. Together, these give a more realistic picture of the enzyme's function beyond static snapshots.
Iterative design-build-test cycles: While wet-lab work should ideally be minimized, a few strategic experiments are invaluable for improving designs. A small set of top-designed enzymes can be tested in vitro to measure their actual activity on the target substrate. The results of these tests (which designs worked well, which didn't, and why) will be fed back into computational models. For instance, if a certain pattern emerges (maybe designs with a certain motif perform better), scoring functions or design constraints can be updated to favor that in the next round. This closed-loop learning – alternating between computation and experiment – ensures that design algorithms become more accurate over time, essentially "learning" from real-world data.
Innovative high-sensitivity assays: One difficulty in validation is that a poorly active enzyme might be dismissed as non-functional when it actually could be improved. To detect even low levels of activity (and thereby not miss any promising starting points), sensitive assay techniques can be used. For example, amplified or fluorescent readouts can be used that can catch a tiny amount of product formation. There's also research into using microfluidic or coacervate-based assay systems that concentrate enzymes and substrates to enhance the detectable signal. By employing such techniques, more reliable measurements of activity for computational designs can be obtained, which in turn helps refine models (ensuring decisions are not based on false negatives).
Continual model refinement: As data accumulates – both from testing and from the broader field of enzyme engineering – AI models can be retrained and updated. For example, if it is discovered that a neural network was overestimating k_cat for a class of designs, it can be retrained with the new data including those failures so that it calibrates down those predictions. In essence, computational tools will get smarter and more predictive as more designs are validated. This reduces the gap between predicted function and actual function with each iteration.

Buried Cross-Link Accessibility

Challenge: Some molecular damage targets, particularly protein cross-links (e.g., glucosepane in collagen, HNE-His adducts, MDA-lys adducts), are buried deep within densely packed or highly ordered protein assemblies. These can include fibrillar extracellular matrix proteins (such as collagen), lens crystallins, amyloid aggregates, or protein–protein interfaces in large complexes. In such cases, the cross-link is shielded from solvent and physically inaccessible to the catalytic site of an enzyme. Steric hindrance, conformational rigidity, and stabilization of the protein fold by the cross-link itself can make it difficult for even small catalytic domains to reach the damage site. Aggregation or membrane association can further limit accessibility, and any attempt to force exposure risks disrupting essential structures or causing unwanted proteolysis. In short, buried cross-links pose a major physical accessibility barrier: even if the chemistry to cleave them is tractable, the enzyme may not be able to reach the bond in situ.

Solution Approaches: Computationally guided strategies can be developed to overcome steric shielding and enable enzymatic access to buried adducts without harming surrounding healthy structures:

Chaperone- or unfoldase-assisted access: Fuse the repair enzyme to peptides or domains that recruit cellular chaperones (e.g., Hsp70/Hsp40/Hsp110) or engineered unfoldases. These can induce transient local unfolding or “breathing” of the target protein, exposing the cross-link for cleavage. Molecular dynamics (MD) simulations can be used to model breathing motions and design linkers that allow the enzyme to reach the site during these transient openings.
Exosite-guided opening: Add an auxiliary binding site (exosite) to the enzyme that docks near the lesion and exerts mechanical leverage on the local structure. Computational docking and MD can optimize the exosite position and linker length so that binding both exposes the target and positions the catalytic pocket correctly.
Protease pre-treatment with logic gating: Use a highly specific protease to make a precise cut near the cross-link, increasing accessibility without destabilizing the whole protein. Specificity can be enforced computationally by designing proteases with sequence/structure gating, activated only when bound to an adduct-specific epitope.
Aggregate or fibril deconstruction prior to repair: For aggregated proteins, employ a sequential approach: first apply (or co-deliver) disaggregase activity—natural or engineered—to loosen packing, then follow with the repair enzyme. Computational crowding simulations can help model the aggregate environment and identify optimal intervention points.
Size minimization and flexible linkers: Use AI-guided scaffold design to generate smaller, more compact catalytic domains (≤25–35 kDa) that can fit into narrow clefts. Flexible gly/ser linkers can be modeled to extend catalytic reach from an anchoring exosite into buried regions.
Organelle targeting for pre-conditioning: Direct the target protein (or the enzyme) to an environment where conditions naturally increase flexibility—such as the lysosome—before repair. Computational prediction of trafficking sequences and modeling of pH-dependent unfolding can help ensure both enzyme and target remain functional until catalysis.
Environment modulation: Design enzymes tolerant to mild, localized chemical modifications (e.g., safe micro-cosolvent levels or redox agents) that temporarily soften the protein structure. In silico stability and compatibility screening ensures these modifications don’t denature the enzyme or harm surrounding proteins.

By integrating targeting, binding, and local unfolding strategies into the design phase—and verifying accessibility improvements with structural modeling and MD simulations—buried cross-links can be rendered susceptible to enzymatic repair without compromising the integrity of the surrounding biological structure.

Hydrophobic Core / Hydrophobic Environment Accessibility

Challenge: Some molecular damage resides in hydrophobic environments — either deep in the non-polar core of a folded protein, in a transmembrane domain, or within membrane-proximal protein layers. In these locations, the lesion is shielded from solvent not only by tight packing but also by an energetically unfavorable hydrophilic–hydrophobic interface: polar catalytic residues of a typical enzyme cannot easily approach without destabilizing the surrounding structure or themselves becoming unstable. Enzymes optimized for aqueous environments may misfold, lose activity, or aggregate if forced into hydrophobic surroundings. Moreover, hydrophobic cores often exclude water, which is required for many catalytic mechanisms, further complicating repair chemistry. The dual problem is (1) gaining physical access to the site and (2) maintaining enzyme stability and activity in a low-dielectric, hydrophobic microenvironment.

Solution Approaches:

Hydrophobic-compatible scaffold design: Use computational design tools (e.g., Rosetta, RFdiffusion with custom constraints) to create enzymes or binding domains whose surface properties match the target environment. For membrane or core-access enzymes, increase the fraction of hydrophobic surface residues on the approach face, while keeping the catalytic pocket functional. Such designs can be modeled with membrane-protein force fields to ensure stability.
Membrane-protein–like enzymes: Borrow scaffolds from naturally occurring membrane proteins or lipid-associated enzymes (lipases, phospholipases) and graft the desired catalytic site into them. These scaffolds are already evolutionarily adapted to hydrophobic surroundings. AlphaFold/MD simulations in explicit membrane models (e.g., CHARMM-GUI setups) can be used to verify insertion and stability.
Hydrophobic-binding carriers: Fuse the catalytic domain to a targeting/binding domain that naturally interacts with hydrophobic regions (e.g., amphipathic helices, lipid-binding domains, or nanobodies raised against transmembrane epitopes). This allows the enzyme to be ferried close to the lesion without exposing its entire structure to hydrophobic stress.
Local environment modification: Pre-treat the site with agents (delivered in a targeted, minimal way) that transiently increase polarity or local hydration — for example, small amphiphiles or lipid-exchanging chaperones — making the lesion accessible to a more conventional enzyme. Computational docking and MD can be used to screen such agents for minimal off-target disruption.
Hydrophobic-pocket–compatible catalysis: Redesign the catalytic mechanism to work without bulk water if hydration cannot be increased. Quantum mechanics/molecular mechanics (QM/MM) simulations can help identify catalytic residue arrangements that function in low-dielectric media, possibly relying on substrate-bound water or non-polar proton shuttles.
Two-component systems: Deliver an “exposure” component first — e.g., a small hydrophobic-binding peptide that pries open the hydrophobic core or flips out a damaged side chain — followed immediately by the catalytic enzyme. Molecular docking and flexible linker modeling ensure that both components act cooperatively at the site.

By co-optimizing scaffold hydrophobicity, catalytic chemistry, and delivery mode, it becomes possible to design enzymes that operate effectively in hydrophobic protein cores or membrane-proximal damage sites, where traditional aqueous enzymes would fail.