libStatGen Software  1
SamQuerySeqWithRefIter Class Reference

Iterates through the query and compare with reference. More...

#include <SamQuerySeqWithRefHelper.h>

Public Member Functions

 SamQuerySeqWithRefIter (SamRecord &record, GenomeSequence &refSequence, bool forward=true)
 
bool reset (bool forward=true)
 Reset to start at the beginning of the record. More...
 
bool getNextMatchMismatch (SamSingleBaseMatchInfo &matchMismatchInfo)
 Returns information for the next position where the query and the reference match or mismatch. More...
 

Detailed Description

Iterates through the query and compare with reference.

NOTE: References to the GenomeSequence and SamRecord are stored, the objects are not copied, so they must remain valid as long as this class is used.

Definition at line 58 of file SamQuerySeqWithRefHelper.h.

Member Function Documentation

◆ getNextMatchMismatch()

bool SamQuerySeqWithRefIter::getNextMatchMismatch ( SamSingleBaseMatchInfo matchMismatchInfo)

Returns information for the next position where the query and the reference match or mismatch.

To be a match or mismatch, both the query and reference must have a base that is not 'N'. This means: insertions and deletions are not mismatches or matches. 'N' bases are not matches or mismatches

Parameters
matchMismatchInforeturn parameter with the information about the matching/mismatching base.
Returns
true if there was another match/mismatch (matchMismatchInfo was set); false if not.

Definition at line 102 of file SamQuerySeqWithRefHelper.cpp.

103 {
104  // Check whether or not this read is mapped.
105  // If the read is not mapped, return no matches.
106  if(!SamFlag::isMapped(myRecord.getFlag()))
107  {
108  // Not mapped.
109  return(false);
110  }
111 
112  // Check that the Cigar is set.
113  if(myCigar == NULL)
114  {
115  // Error.
116  throw(std::runtime_error("Cannot determine matches/mismatches since failed to retrieve the cigar"));
117  return(false);
118  }
119 
120  // If myStartOfReadOnRefIndex is the default (unset) value, then
121  // the reference was not found, so return false, no matches/mismatches.
122  if(myStartOfReadOnRefIndex == INVALID_GENOME_INDEX)
123  {
124  // This reference name was not found in the reference file, so just
125  // return no matches/mismatches.
126  return(false);
127  }
128 
129 
130  // Repull the read length from the record to check just in case the
131  // record has changed length.
132  // Loop until a match or mismatch is found as long as query index
133  // is still within the read (Loop is broken by a return).
134  while((myQueryIndex < myRecord.getReadLength()) &&
135  (myQueryIndex >= 0))
136  {
137  // Still more bases, look for a match/mismatch.
138 
139  // Get the reference offset for this read position.
140  int32_t refOffset = myCigar->getRefOffset(myQueryIndex);
141  if(refOffset == Cigar::INDEX_NA)
142  {
143  // This is either a softclip or an insertion
144  // which do not count as a match or a mismatch, so
145  // go to the next index.
146  nextIndex();
147  continue;
148  }
149 
150  // Both the reference and the read have a base, so get the bases.
151  char readBase = myRecord.getSequence(myQueryIndex, SamRecord::NONE);
152  char refBase = myRefSequence[myStartOfReadOnRefIndex + refOffset];
153 
154  // If either the read or the reference bases are unknown, then
155  // it does not count as a match or a mismatch.
156  if(BaseUtilities::isAmbiguous(readBase) ||
158  {
159  // Either the reference base or the read base are unknown,
160  // so skip this position.
161  nextIndex();
162  continue;
163  }
164 
165  // Both the read & the reference have a known base, so it is either
166  // a match or a mismatch.
167  matchMismatchInfo.setQueryIndex(myQueryIndex);
168 
169  // Check if they are equal.
170  if(BaseUtilities::areEqual(readBase, refBase))
171  {
172  // Match.
173  matchMismatchInfo.setType(SamSingleBaseMatchInfo::MATCH);
174  // Increment the query index to the next position.
175  nextIndex();
176  return(true);
177  }
178  else
179  {
180  // Mismatch
181  matchMismatchInfo.setType(SamSingleBaseMatchInfo::MISMATCH);
182  // Increment the query index to the next position.
183  nextIndex();
184  return(true);
185  }
186  }
187 
188  // No matches or mismatches were found, so return false.
189  return(false);
190 }
static bool isAmbiguous(char base)
Returns whether or not the specified bases is an indicator for ambiguity.
static bool areEqual(char base1, char base2)
Returns whether or not two bases are equal (case insensitive), if one of the bases is '=',...
static const int32_t INDEX_NA
Value associated with an index that is not applicable/does not exist, used for converting between que...
Definition: Cigar.h:492
int32_t getRefOffset(int32_t queryIndex)
Return the reference offset associated with the specified query index or INDEX_NA based on this cigar...
Definition: Cigar.cpp:187
@ NONE
Leave the sequence as is.
Definition: SamRecord.h:58
uint16_t getFlag()
Get the flag (FLAG).
Definition: SamRecord.cpp:1384
int32_t getReadLength()
Get the length of the read.
Definition: SamRecord.cpp:1391
const char * getSequence()
Returns the SAM formatted sequence string (SEQ), translating the base as specified by setSequenceTran...
Definition: SamRecord.cpp:1568
void setQueryIndex(int32_t queryIndex)
Set the query index for this object.
void setType(Type newType)
Set the type (match/mismatch/unkown) for this object.

References BaseUtilities::areEqual(), SamRecord::getFlag(), SamRecord::getReadLength(), Cigar::getRefOffset(), SamRecord::getSequence(), Cigar::INDEX_NA, BaseUtilities::isAmbiguous(), SamRecord::NONE, SamSingleBaseMatchInfo::setQueryIndex(), and SamSingleBaseMatchInfo::setType().

Referenced by SamFilter::clipOnMismatchThreshold(), and SamFilter::sumMismatchQuality().

◆ reset()

bool SamQuerySeqWithRefIter::reset ( bool  forward = true)

Reset to start at the beginning of the record.

This will re-read values from SamRecord, so can be used if it has changed to contain information for a new record.

Parameters
forwardtrue means to start from the beginning and go to the end; false means to start from the end and go to the beginning.
Returns
true if successfully reset; false if failed to read the Cigar.

Definition at line 58 of file SamQuerySeqWithRefHelper.cpp.

59 {
60  myCigar = myRecord.getCigarInfo();
61  if(myCigar == NULL)
62  {
63  // Failed to get Cigar.
64  return(false);
65  }
66 
67  // Get where the position of where this read starts as mapped to the
68  // reference.
69  myStartOfReadOnRefIndex =
70  myRefSequence.getGenomePosition(myRecord.getReferenceName());
71 
72  if(myStartOfReadOnRefIndex != INVALID_GENOME_INDEX)
73  {
74  // This reference name was found in the reference file, so
75  // add the start position.
76  myStartOfReadOnRefIndex += myRecord.get0BasedPosition();
77  }
78 
79  myForward = forward;
80 
81  if(myForward)
82  {
83  myQueryIndex = 0;
84  }
85  else
86  {
87  // reverse, so start at the last entry.
88  myQueryIndex = myRecord.getReadLength() - 1;
89  }
90  return(true);
91 }
genomeIndex_t getGenomePosition(const char *chromosomeName, unsigned int chromosomeIndex) const
given a chromosome name and position, return the genome position
const char * getReferenceName()
Get the reference sequence name (RNAME) of the record.
Definition: SamRecord.cpp:1298
Cigar * getCigarInfo()
Returns a pointer to the Cigar object associated with this record.
Definition: SamRecord.cpp:1836
int32_t get0BasedPosition()
Get the 0-based(BAM) leftmost position of the record.
Definition: SamRecord.cpp:1319

References SamRecord::get0BasedPosition(), SamRecord::getCigarInfo(), GenomeSequence::getGenomePosition(), SamRecord::getReadLength(), and SamRecord::getReferenceName().


The documentation for this class was generated from the following files: