Difference between revisions of "Split NMR-style multiple model pdb files into individual models"

From CCP4 wiki
Jump to navigationJump to search
m
 
(6 intermediate revisions by 2 users not shown)
Line 8: Line 8:
 
This one-liner splits the file models.pdb into individual pdb files named model_###.pdb.
 
This one-liner splits the file models.pdb into individual pdb files named model_###.pdb.
  
'''grep -n 'MODEL\|ENDMDL' models.pdb | '''
+
  grep -n 'MODEL\|ENDMDL' models.pdb | cut -d: -f 1 | \
'''cut -d: -f 1 | '''
+
  awk '{if(NR%2) printf "sed -n %d,",$1+1; else printf "%dp models.pdb > model_%03d.pdb\n", $1-1,NR/2;}' | bash -sf
'''awk '{if(NR%2) printf "sed -n %d,",$1+1; else printf "%dp > model_%03d.pdb\n", $1-1,NR/2;}' | '''
 
'''bash -sf'''
 
  
 
== Bash script ==
 
== Bash script ==
  
'''i=1'''
+
  i=1
 
+
  while read -a line; do
'''while read -a line; do'''
+
    echo "${line[@]}" >> model_${i}.pdb
 
+
    [[ ${line[0]} == ENDMDL ]] && ((i++))
'''    echo "${line[@]}" >> model_${i}.pdb'''
+
  done < /path/to/file.pdb
 
 
'''    [[ ${line[0]} == ENDMDL ]] && ((i++))'''
 
 
 
'''done < /path/to/file.pdb'''
 
  
  
Line 30: Line 24:
 
Should be called as  
 
Should be called as  
  
awk -f script.awk < models.pdb
+
  awk -f script.awk < models.pdb
 
 
'''BEGIN {file = 0; filename = "model_"  file ".pdb"}'''
 
 
 
'''/ENDMDL/ {getline; file ++; filename = "model_" file ".pdb"}'''
 
  
'''{print $0 > filename}'''
+
  BEGIN {file = 0; filename = "model_"  file ".pdb"}
 +
  /ENDMDL/ {getline; file ++; filename = "model_" file ".pdb"}
 +
  {print $0 > filename}
  
  
 
== Perl script ==
 
== Perl script ==
  
'''$base='1g9e';open(IN,"<$base.pdb");@indata = <IN>;$i=0;'''
+
  $base='1g9e';open(IN,"<$base.pdb");@indata = <IN>;$i=0;
 +
  foreach $line(@indata) {
 +
  if($line =~ /^MODEL/) {++$i;$file="${base}_$i.pdb";open(OUT,">$file");next}
 +
  if($line =~ /^ENDMDL/) {next}
 +
  if($line =~ /^ATOM/ || $line =~ /^HETATM/) {print OUT "$line"}
 +
  }
  
'''foreach $line(@indata) {'''
+
== Python script ==
  
'''if($line =~ /^MODEL/) {++$i;$file="${base}_$i.pdb";open(OUT,">$file");next}'''
 
  
'''if($line =~ /^ENDMDL/) {next}'''
+
For this kludgy version using Python 2.x, you need to paste the entire PDB file into the script where it says "PASTE YOUR PDB FILE TEXT HERE".
 +
 +
You can fork [https://github.com/fomightez/structurework/blob/master/python_scripts/super_basic_multiple_model_PDB_file_splitter.py the code here at Github].
  
'''if($line =~ /^ATOM/ || $line =~ /^HETATM/) {print OUT "$line"}'''
+
(A more full-featured version there that you can just point at your file [,or a folder of files,] using an argument on the command line can be found [https://github.com/fomightez/structurework/blob/master/python_scripts/multiple_model_PDB_file_splitter.py here at Github]. )
  
'''}'''
+
  PDB_text = """
 +
  PASTE YOUR PDB FILE TEXT HERE
 +
  """
 +
 
 +
  model_number = 1
 +
  new_file_text = ""
 +
  for line in filter(None, PDB_text.splitlines()):
 +
      line = line.strip () #for better control of ends of lines
 +
      if line == "ENDMDL":
 +
          # save file with file number in name
 +
          output_file = open("model_" + str(model_number) + ".pdb", "w")
 +
          output_file.write(new_file_text.rstrip('\r\n')) #rstrip to remove trailing newline
 +
          output_file.close()
 +
          # reset everything for next model
 +
          model_number += 1
 +
          new_file_text = ""
 +
      elif not line.startswith("MODEL"):
 +
          new_file_text += line + '\n'
  
  
 
Back to [[Useful scripts (aka smart piece of code)]]
 
Back to [[Useful scripts (aka smart piece of code)]]

Latest revision as of 19:14, 3 June 2016

This assumes that you have a correctly formatted pdb file that contains both MODEL and ENDMDL records.


Bash/awk one-liner[edit | edit source]

This one-liner splits the file models.pdb into individual pdb files named model_###.pdb.

 grep -n 'MODEL\|ENDMDL' models.pdb | cut -d: -f 1 | \
 awk '{if(NR%2) printf "sed -n %d,",$1+1; else printf "%dp models.pdb > model_%03d.pdb\n", $1-1,NR/2;}' |  bash -sf

Bash script[edit | edit source]

 i=1
 while read -a line; do
   echo "${line[@]}" >> model_${i}.pdb
   [[ ${line[0]} == ENDMDL ]] && ((i++))
 done < /path/to/file.pdb


Awk script[edit | edit source]

Should be called as

 awk -f script.awk < models.pdb
 BEGIN {file = 0; filename = "model_"  file ".pdb"}
 /ENDMDL/ {getline; file ++; filename = "model_" file ".pdb"}
 {print $0 > filename}


Perl script[edit | edit source]

 $base='1g9e';open(IN,"<$base.pdb");@indata = <IN>;$i=0;
 foreach $line(@indata) {
 if($line =~ /^MODEL/) {++$i;$file="${base}_$i.pdb";open(OUT,">$file");next}
 if($line =~ /^ENDMDL/) {next}
 if($line =~ /^ATOM/ || $line =~ /^HETATM/) {print OUT "$line"}
 }

Python script[edit | edit source]

For this kludgy version using Python 2.x, you need to paste the entire PDB file into the script where it says "PASTE YOUR PDB FILE TEXT HERE".

You can fork the code here at Github.

(A more full-featured version there that you can just point at your file [,or a folder of files,] using an argument on the command line can be found here at Github. )

 PDB_text = """
 PASTE YOUR PDB FILE TEXT HERE
 """
 
 model_number = 1
 new_file_text = ""
 for line in filter(None, PDB_text.splitlines()):
     line = line.strip () #for better control of ends of lines
     if line == "ENDMDL":
         # save file with file number in name
         output_file = open("model_" + str(model_number) + ".pdb", "w")
         output_file.write(new_file_text.rstrip('\r\n')) #rstrip to remove trailing newline
         output_file.close()
         # reset everything for next model
         model_number += 1
         new_file_text = ""
     elif not line.startswith("MODEL"):
         new_file_text += line + '\n'


Back to Useful scripts (aka smart piece of code)