3  Strings

In programming string refers to a sequence of characters that can act as a variable or constant. This is the most popular data type in Python. In fact the increasingly strong prevelance of Python in Bioinformatics is primarily due to its ability to easily perform different operations of strings. A string variable can be assigned a value using either single or double quotes.

x = 'String A' 
var2 = "Another variable"
print(x)
print(var2)
String A
Another variable

To check the data type of a variable, we can use the type function. The get number of character in a string variable use the len function. Note that a blank space is also considered as a character.

print(type(x))
print(len(var2))
<class 'str'>
16

3.1 String concatenation

The process of joining two or more things is called as concatenation. The arithmetic operators + and * can be used directly with strings to concatenate (addition) or repeat (multiplication). This process of giving additional functions to operator (beyond their existing functions) is called as operator overloading. E.g. the plus (+) operator is used for addition given that the operands are integers. However, if the operands are string then it act as a concatenation operator instead of addition operator.

# The plus (+) operator with two numbers
2+3
5
var1 = "Hello"
var2 = "World!"
print (var1+var2)
HelloWorld!

In the case of asteriks (*) operator, which is used to multiple two numbers, when the operands are a string (s) and a number (n) the output is s repeated n times. This behaviour is similar to muliplication of two numbers. For instance, let say we want to multiple 5 by 3 (5*3). This multiplication can also be represented as sum of 5 three times i.e. 5+5+5. So, when we use a string (s) and a number (n) as operands for * operator we get s+s+s…(n times).

var3 = var1*3
print(var3)
HelloHelloHello

3.2 Slice of a string

Slice is another very useful operator that can be used to manipulate strings. The slice operator [] gives the character within the start and end positions separated by a colon. The numbering of characters within a string start from 0. Note that the start position character is included in the output but the end position character is not. Slicing effectively return the substring of a given string. The general syntax for slicing a string is as follows:

string[start:end] string[start:] string[:end] string[start:end:step]

Let’s see some examples to get a better understanding to the slice operation.

var4 = "ABCDEFG"
print(var4)
print(var4[1:5])
ABCDEFG
BCDE

In case no value is specified before or after the colon then the slicing would occur from begining or till end respectively.

print(var3)
print(var3[:7])
print(var3[3:])
HelloHelloHello
HelloHe
loHelloHello

The step part in the slice operator specific the number of steps to take when going from the start position to the end position. The default step size is 1. We can change the default value by specifying the step parameter within the slice command.

print(var3)
print(var3[2::2])
HelloHelloHello
loelHlo

Quiz: Write a command that outputs ‘HHH’ given a string ‘HelloHelloHello’.

Show answer
var3 = "HelloHelloHello"
print(var3[::5])

3.3 String comparison

One of the frequently required tasks in programming is string comparison. In Python comparison operator can be used to compare two strings. The == (two equal symbols without space) is the comparison operator. The output of comparison is a boolean value i.e. either True or False. String comparison is case sensitive.

var1 = 'Hello'
var2 = "Hello"
var3 = 'Hi'
print(var1 == var2)
print(var1 == var3)
print(var1 == "hello")
True
False
False

3.4 Splitting string

Sometimes there is a need to split a string based on certain delimiters, the split function is designed for that task. Python String types have split function associated with them that return a list of elements after splitting the string as per the delimiter. The default delimiter is blank space.

s1 = "This is a sentence."
words1 = s1.split()
print(words1)

#split with comma as a delimiter
s2 = 'This is an another sentence, a longer one.'
words2 = s2.split(",")
print(words2)
['This', 'is', 'a', 'sentence.']
['This is an another sentence', ' a longer one.']

Quiz: What would be the output if we split s2 using “is” as a delimiter.

Show answer
s2 = 'This is an another sentence, a longer one.'
print(s2.split("is"))

##Output would be a list with three elements:
##['Th', ' ', ' an another sentence, a longer one.']

3.5 String functions

Python strings have several methods to work with string objects. Below are examples of some of the functions available is class ‘str’. These methods acts on the string and returns a new string after doing the required manipulations. For additional functions, please refer to the python documentation.

s1 = "Apple"

String function           Output
————————————————————————————————
s1.upper()                APPLE
s1.lower()                apple
s1.startswith("a")        False
s1.startswith("A")        True
s1.index("l")             3
s1.replace("e","es")      Apples

3.6 Doc strings

To declare a variable whose value is a long string that spans multiple lines tripple quotes can be used. All white spaces such as tabs and newline are considered part of the string. These types of strings are generally used for documentation purposes e.g. writing help text for custom functions.

var4 = """This is an example of
a long string that spans 
three lines."""
print (var4)
This is an example of
a long string that spans 
three lines.

3.7 Strings – key characteristics

  • A string variable stores text data

A string variable can store characters including white spaces (space, tab, newline). A string variable is an object of class ‘str’. To initialize a string variable, single or double quotes can be used.

  • A string variable is immutable

In Python the strings are immutable i.e. their value cannot be changed once it has been assigned. The values can however be reassigned. The value of a string variable can change but the data contained within a variable can`t be changed.

  • A strings variable is a list

A string variable is a also a list i.e. a collection of characters. We can iterate through characters in a string just like we can iterate through any list. Unlike lists, however, characters cannot be appended to a string because strings are a immutable data type.