While the previous article dealt with syntax for handling string data in Python, this article discusses methods to edit string objects in Python. Editing in this sense includes slicing of the sequence or recombining two different sequences, using in-built functions. Such functions are useful when handling large string type data in Python.
Length of string value by using len()
Before editing or manipulating a string object, be aware of the length of the string character sequence, that is, the number of characters (including spaces) within a string sequence. Such information is useful to deal with data of a specific length, like setting up passwords or developing consistent length objects. For this purpose, the len() is used.
Indexing in Python refers to the position referencing of each character within a string sequence. This referencing positions starts with 0, instead of 1. For example, the string sequence, ‘HELLO’, has five letters, and its indexing is as follows:
Therefore, the character, ‘E’ has an index of 1, such that it is the 1-th character of the object, while ‘O’ has an index of 4. Subsequently, each character, including the spaces that exists between words as well as special characters, in a string object corresponds to a particular index number, which is inserted within the bracket operator, [ ] in Python.
By calling the 10th index number, this will print that character which is at the 10-th index position of the string object a (as shown in Image 3). This property of indexing can be used to extract specific character or range of characters from an existing string sequence.
Counting in string objects
To find out how many times a particular character or a sequence of character appears within a string object, the in-built str.count() function can be used. By inputting the character or the character sequence, to be counted, within the parenthesis, Python interpreter will count how many times that particular input character(s) appears. Similarly, even a string sequence can also be counted within an object.
Finding characters in string objects
Along with determining length and counting certain characters of a string object, one can find character(s) within the string object, using the str.find() function.
This is also helpful in finding the start position of a string sequence (like words) within a string object. As seen in Image 5(b) above, to find the index position of the word, ‘national, the output, 18, where the first letter of ‘national, that is, ‘n’ starts.
To extract a range of characters, the relevant index numbers should be known. This is known as slicing, and involves using colon (:) to create the index value range.
Note, the first index number of the range is where the slice starts, but the ending index number is one index value after the ending position.
Here, using the colon, a range of index values could be specified, where, 10th index number refers to the character ‘b’, while the 19th index number is one number after the last desired character, ‘l’. The Python Interpreter can read and execute this, to give the output, ‘beautiful’ (shown in Image 4)
The output, ‘beautiful’ is called as substring, which is a string within the original string, ‘This is a beautiful painting’, and can be called through its variable name, c, in this case.
Splitting of string objects using str.split()
While slicing refers to extracting a substring within a bigger string, splitting of string objects in Python involves splitting the entire sequence to multiple components, using the str.split() function. Parameters are to be provided to Python Interpreter for splitting a string object, within the parentheses ().
Output for this can be seen in the above image, where all components around spaces have been split. This is what happens when no parameters are provided. Similarly, to split this string object around the letter ‘h’ (note that Python is case-sensitive, so ‘h’ and ‘H’ are different), Python will split the sequence length at those points where ‘h’ was present, and gives an output without the character ‘h’ (Image 6b). In this case, it will be splitted into 3 strings, as shown below:
- Manipulation of string objects in Python - December 11, 2021
- Python syntax to correctly handle string data type - July 23, 2021
- Multiple sequence alignment studies of merA protein sequence - May 28, 2018