Quantao's Blog - Medicinal Chemistry Topics

The script(csh) is from the default Charmm GUI solution input generator, it is very handy to be used on Colab. However, it has been found csh is not easy to run on the AI Studio platform due to permission limitations. Thus it is necessary to convert this csh script to a bash one to be run on systems we can't simply install csh shells.

#!/bin/bash

# Generated by CHARMM-GUI (http://www.charmm-gui.org) v3.5
#
# The following shell script assumes your NAMD executable is namd2 and that
# the NAMD inputs are located in the current directory.
#
# Only one processor is used below. To parallelize NAMD, use this scheme:
#     charmrun namd2 +p4 input_file.inp > output_file.out
# where the "4" in "+p4" is replaced with the actual number of processors you
# intend to use.

set equi_prefix = step4_equilibration
set prod_prefix = step5_production
set prod_step   = step5

# Running equilibration step
namd2 ${equi_prefix}.inp > ${equi_prefix}.out

# Running production for 10 nanoseconds
set cnt    = 1
set cntmax = 10

while ( ${cnt} <= ${cntmax} )
    # create appropriate input file using ${prod_prefix}.inp as template
    if ( ${cnt} == 1 ) then
        set outputname = "${prod_step}_${cnt}"
        # change only the output name
        sed "s/${prod_prefix}/${outputname}/" ${prod_prefix}.inp > ${prod_step}_run.inp
    else
        @ cntprev = ${cnt} - 1
        set inputname  = "${prod_step}_${cntprev}"
        set outputname = "${prod_step}_${cnt}"
        # change input and output names from template file
        sed "s/${equi_prefix}/${inputname}/" ${prod_prefix}.inp | \
            sed "s/${prod_prefix}/${outputname}/" > ${prod_step}_run.inp
    endif

    # run the simulation for 1 nanosecond
    namd2 ${prod_step}_run.inp > ${outputname}.out

    @ cnt += 1
end

The original folder looks like this:

bash    restraints       step3_input.pdb  step4_equilibration.inp  sysinfo.dat
README  step3_input.crd  step3_input.psf  step5_production.inp     toppar

Let's use the "cat" command to generate 10 conf files

for (( a=1; a<=10; a++ ))
> do
> cat step5_production.inp > step5_production.$a.inp
> done

The folder now looks like this (you should not have "bash" since that is what created for temporary use)

bash             step3_input.psf          step5_production.3.inp  step5_production.8.inp
README           step4_equilibration.inp  step5_production.4.inp  step5_production.9.inp
restraints       step5_production.10.inp  step5_production.5.inp  step5_production.inp
step3_input.crd  step5_production.1.inp   step5_production.6.inp  sysinfo.dat
step3_input.pdb  step5_production.2.inp   step5_production.7.inp  toppar

Now we need to substitute the input and output name for the inp files, but remember, the step5_production.1.inp is a little different since its input is from the previous equilibration step, i.e, step4_equilibration. For each step5_equilibraion.$a.inp, two things need to be fixed, we could do it by two for loops embedded with a "sed" command.

In the 1st loop, we will only substitute the output name from "step5_production" to "step5_production.$a", that's all

1st "for" loop:

for (( a=1; a<=10; a++ ))> do
> sed -i 's/step5_production/step5_production.$a/' step5_production.$a.inp
> done

BUT, this does not work, what I get is, indeed the "step5production" was substituted successfully but with "step5_produciton.$a" not as expected "step5_production.1(2,3,4, etc)", I found you need the following one to do the job. Just switch the single quote to the double one.

for (( a=1; a<=10; a++ )); 
do sed -i "s/step5_production/step5_production.$a/" step5_production.$a.inp; 
done

2nd "for" loop

In this for loop we gonna to replace the input file name from "step4equlibration" to "step5_production.$($a-1)" from 2 to 10, while keep the 1st step5_production.1.inp untouched.

for (( a=2; a<=10; a++ )) 
do sed -i "s/step4_equilibration/step5_production.$[a-1]/" step5_production.$a.inp
done

Pay attention to the expression of $[a-1] not $(a-1) or $($a-1)

Now, we have finished all the substitutions for the input and output names inside the configuration files.

let's take step5_production.4.inp as an example, if there are numbers 4 and 3 for the output and input respectively, then that is perfect.

set temp                303.15;
outputName              step5_production.4; # base name for output from this run
                                            # NAMD writes two files at the end, final coord and vel
                                            # in the format of first-dyn.coor and first-dyn.vel

set inputname           step5_production.3;
binCoordinates          $inputname.coor;    # coordinates from last run (binary)
binVelocities           $inputname.vel;     # velocities from last run (binary)
extendedSystem          $inputname.xsc;     # cell dimensions from last run (binary)

Let's not forget the 1st step5_production.1.inp should have the right input as " step4_equilbration"

set temp                303.15;
outputName              step5_production.1; # base name for output from this run
                                            # NAMD writes two files at the end, final coord and vel
                                            # in the format of first-dyn.coor and first-dyn.vel

set inputname           step4_equilibration;
binCoordinates          $inputname.coor;    # coordinates from last run (binary)
binVelocities           $inputname.vel;     # velocities from last run (binary)
extendedSystem          $inputname.xsc;     # cell dimensions from last run (binary)

Finally, we could run on AI Studio, in a bash style, not as a csh style.

with namd2 run command inserted inside the loop, something like below:

#Equilibration
/home/aistudio/NAMD_Git-2021-05-17_Linux-x86_64-multicore-CUDA/namd2 step4_equilibration.inp > step4_equilibration.out &&
#Production
for (( a=1; a <=10; a++ ))
do
/home/aistudio/NAMD_Git-2021-05-17_Linux-x86_64-multicore-CUDA/namd2 step5_production.$a.ipn > step5_production.$a.out
done

If you run this for loop inside the "work" directory, on AI studio, all the output of the MD will be saved even if there is an unexpected break that occurs due to like internet unstable things happen, you still get what has been done.

In case you need to restart the equilibration simulation (like your laptop automatically shut down for no reason), just add three lines called "binCoordinates" etc., then change the "firsttimestep" to the step from the last run, which you could read from the end of step4_equilibration.out file.

NOTICE: YOU DON'T NEED THE THREE LINE OF binCoordinates etc., JUST CHANGE THE FIRST STEP THEN NAMD WILL AUTOMATICALLY DETECT THESE THREE FILES FROM YOUR WORKING FOLDER.

outputName              $outputname;        # base name for output from this run
#binCoordinates          step4_equilibration.restart.coor;    # coordinates from last run (binary)
#binVelocities           step4_equilibration.restart.vel;     # velocities from last run (binary)
#extendedSystem          step4_equilibration.restart.xsc;     # cell dimensions from last run (binary)

                                            # NAMD writes two files at the end, final coord and vel
                                            # in the format of first-dyn.coor and first-dyn.vel
firsttimestep           72000;                  # last step of previous run

The last word, if you could install csh shell, then please forget all the above content, just do, the next several lines, job is done!

sudo apt-get install csh
cat README > job_control.csh
csh job_control.csh

NAMD job control script for Baidu AI Studio

Conversion from csh to bash

1st "for" loop:

2nd "for" loop

Finally, we could run on AI Studio, in a bash style, not as a csh style.