Difference between revisions of "Split NMR-style multiple model pdb files into individual models"

From CCP4 wiki
Jump to navigationJump to search
(noticed '' was not showing on wiki page, so changed empty string to double quotes "")
Line 64: Line 64:
           # reset everything for next model
           # reset everything for next model
           model_number += 1
           model_number += 1
           new_file_text = ''
           new_file_text = ""
       elif not line.startswith("MODEL"):
       elif not line.startswith("MODEL"):
           new_file_text += line + '\n'
           new_file_text += line + '\n'

Revision as of 20:19, 10 April 2015

This assumes that you have a correctly formatted pdb file that contains both MODEL and ENDMDL records.

Bash/awk one-liner

This one-liner splits the file models.pdb into individual pdb files named model_###.pdb.

 grep -n 'MODEL\|ENDMDL' models.pdb | cut -d: -f 1 | \
 awk '{if(NR%2) printf "sed -n %d,",$1+1; else printf "%dp models.pdb > model_%03d.pdb\n", $1-1,NR/2;}' |  bash -sf

Bash script

 while read -a line; do
   echo "${line[@]}" >> model_${i}.pdb
   [[ ${line[0]} == ENDMDL ]] && ((i++))
 done < /path/to/file.pdb

Awk script

Should be called as

 awk -f script.awk < models.pdb
 BEGIN {file = 0; filename = "model_"  file ".pdb"}
 /ENDMDL/ {getline; file ++; filename = "model_" file ".pdb"}
 {print $0 > filename}

Perl script

 $base='1g9e';open(IN,"<$base.pdb");@indata = <IN>;$i=0;
 foreach $line(@indata) {
 if($line =~ /^MODEL/) {++$i;$file="${base}_$i.pdb";open(OUT,">$file");next}
 if($line =~ /^ENDMDL/) {next}
 if($line =~ /^ATOM/ || $line =~ /^HETATM/) {print OUT "$line"}

Python script

For this kludgy version using Python 2.x, you need to paste the entire PDB file into the script where it says "PASTE YOUR PDB FILE TEXT HERE".

You can fork the code here at Github.

(Eventually, I hope to have a more full-featured version there that you can just point at your file using an argument at the command line, and after that a web-hosted service to do it for you right on a web page.)

 PDB_text = """
 model_number = 1
 new_file_text = ""
 for line in filter(None, PDB_text.splitlines()):
     line = line.strip () #for better control of ends of lines
     if line == "ENDMDL":
         # save file with file number in name
         output_file = open("model_" + str(model_number) + ".pdb", "w")
         output_file.write(new_file_text.rstrip('\r\n')) #rstrip to remove trailing newline
         # reset everything for next model
         model_number += 1
         new_file_text = ""
     elif not line.startswith("MODEL"):
         new_file_text += line + '\n'

Back to Useful scripts (aka smart piece of code)