Strings in Python, their Conversion and Formatting
Table of Contents
Strings in Python:
Strings in Python- You already got to know about Strings in Python in my previous article. To repeat very briefly the most important key data:
- Strings are optionally placed in single or double apostrophes, so ‘abc’ or “abc”. Both variants are equivalent.
- Special characters marked with \ can be embedded in character strings, e.g. \n for a line break (see Table 1).
- If strings in the code are to extend over several lines, they have to be in triplicate Apostrophes are placed, so either ” ‘or “” “.
- The slicing syntax applies to access to parts of a character string. s[n] gives that returns the nth character, where n = 0 means the first character. s[start: end] returns the Characters from start (inclusive) to end (exclusive). With negative values, enter the offset from the end of the character string.
- The function len(s) determines the number of characters in a string.
- Python Strings can be edited with various functions and methods (see Table2).
Amazon Purchase Links:
*Please Note: These are affiliate links. I may make a commission if you buy the components through these links. I would appreciate your support in this way!
Table 1: Selected escape sequences
Character sequence | Importance |
\a | Bell (beep) |
\f | Form feed (new page) |
\n | new line |
\r | Carriage return (for Windows text files) |
\t | Tab characters |
\unnnn | Unicode characters with the hex code & xnnnn |
\’ | the sign ‘ |
\” | the sign “ |
\\ | the character \ |
Table 2: Selected string methods and functions
Method | Function |
len(s) | Finds the number of characters. |
str(x) | Converts x to a string. |
sub in s | Tests whether sub appears in s. |
s.count(sub) | Finds how often sub occurs in s. |
s.endswith(sub) | Tests whether s ends with sub. |
s.expandtabs() | Replaces tab characters with spaces. |
s.find(sub) | Searches for sub in s and returns the starting position or –1. |
s.isxxx() | Tests properties of s (islower (), isupper () etc.). |
s.join(x) | Connects the strings in x (list, set, tuple). |
s.lower() | Returns all lower case letters. |
s.partition(sub) | Separates s and returns three parts as tuples. |
s.replace(old, new) | Returns s, where old is replaced by new. |
s.rfind(sub) | Like find, but the search starts at the end of the string |
s.split(sub) | Decomposes s for each occurrence of sub, returns a list. |
s.startswith(sub) | Tests whether s starts with sub. |
s.upper() | Returns a whole lot of capital letters. |
Unicode
From version 3, Python internally displays and expects all character strings in Unicode the source text in the UTF-8 coding. If your scripts are compatible with Python 2 add a comment in the first or second line of the script with the Instruction – * – coding: utf-8 – * – a.
1 2 3 |
# - * - coding: utf -8 - * - # only required in Python 2 if the code is characters # Contains UTF -8 coding |
When processing text files, Python generally assumes a UTF-8 Coding off. On the Raspberry Pi, this assumption applies to almost all files correctly. If you still have to read or save files in a different encoding, specify the required character set imencoding parameter of open:
f = open (‘readme.txt’, encoding = ‘latin -1’)
raw strings
Python interprets \ sequences as special characters (see Table1) do not want and every \ -sign should be interpreted as such, put the entire character string preceded by the letter r (raw):
latexcode = r ‘\ section {heading}’
String conversion and formatting
Often you have to create strings from numbers, dates and times, etc. In the simplest case you use the functions str(x) or repr(x), which each represent any object as a character string. The repr function works in such a way that the resulting string can be read in again with eval. str hard however, to format the strings in such a way that they are good for humans are legible.
However, you have no control over the formatting with either method. When you right-justify numbers or display them with thousands separators then you need special formatting functions. Under Python you have the choice between several procedures. The most popular are those % Operator and the format method:
- format string% (data, data, data): Here the format string Formulated in the syntax of the printf function of the C programming language. Within this character string,% characters indicate the position of the data to be used at.
- format string.format (data, data, data): The structure of the character string is very similar to the structure of the method of the same name in the .NET framework from Microsoft. Within this string there are {} pairs of brackets the position of the parameters.
First three examples of the% method:
1 2 3 4 5 6 7 |
>>> '% s is% d years old.' % ('Engr Fahad', 30) 'Engr Fahad is 30 years old.' >>> '1/7 with three decimal places:% .3f'% (1/7) '1/7 with three decimal places: 0.143' >>> '<img src = "% s" alt = "% s" width = "% d">'% ('foto. jpg', ... 'Portrait', 200) '<img src = "foto. jpg" alt = "Portrait" width = "200">' |
Here are some examples of the newer format method. Your greatest advantage is there in that the placeholder sequence can be freely selected using {n}:
1 2 3 4 5 6 7 8 9 10 11 |
>>> '{} is {} years old.'. format ('Fawad khan', 20) 'Sebastian is 9 years old.' >>> '{1} is {0} years old.'. format (20, 'Fawad khan') 'Fawad khan is 20 years old.' >>> '{name} is {age} years old.'. format (age = 20, ... name = 'Fawad khan') 'Fawad khan is 20 years old.' >>> '1/7 with three decimal places: {: .3 f}'. format (1/7) '1/7 with three decimal places: 0.143' >>> 'select * from table where id = {: d}'. format (324) 'select * from table where id = 324' |
There are innumerable codes for building the character strings for the two formatting systems (see Table 3 and Table 4). For a full reference we are missing however the place.
Table 3: Selected codes for% formatting (printf syntax)
code | Importance |
%d | whole number (decimal) |
%5d | whole number with five digits, right justified |
%-5d | whole number with five digits, left-justified |
%f | Floating point number (float) |
%.2f | Floating point number with two decimal places |
%r | String, Python uses repr. |
%s | String, python uses str. |
%10s | String with ten characters, right-aligned |
%-10s | String with ten characters, left-justified |
%x | Output whole number in hexadecimal |
Table 4:Selected codes for the format method
Code | importance |
{} | Parameter, any data type |
{0}, {1}, … | numbered parameters |
{one},{ two}, … | named parameters |
{:d} | integer |
{:<7d} | integer with seven digits, left-justified |
{:>7d} | integer with seven digits, right justified |
{:^7d} | integer with seven digits, centered |
{:f} | Floating point number |
{:.5f} | Floating point number with five decimal places |
{:s} | String |
Regular expressions
Regular expressions are a separate type of language in order to Describe search patterns for character strings. For example, you can use it to select all links Extract an HTML document or complex search-and-replace operations carry out. In Python, the corresponding functions are bundled in the re module (see Table 5).
Table 5: Selected functions of the re-module
Function | Importance |
match(pattern, s) | Test whether it matches the pattern. |
search(pattern, s) | Returns the position at which the pattern occurs. |
split(pattern, s) | Decomposes s for each occurrence of the search pattern. |
sub(pattern, r, s) | Replaces the first found pattern in s with r. |
The following lines show a simple application: The program expects with input the input of an email address. pattern contains a regular expression to test whether the address meets formal criteria. To do this, the first part has to be the address made up of the letters a – z, the digits 0–9 and some special characters consist. This is followed by an @ sign, then another block of letters and numbers and characters, finally a period and finally a pure letter block for the top-level domain (e.g. info).
1 2 3 4 5 6 7 |
import right pattern = r '^ [A -Za -z0 -9 \. \ + _ -] + @ [A -Za -z0 -9 \ ._ -] + \. [a -zA -Z] + $' email = input ("Please enter an email address:") if re. match (pattern, email): print ("The email address is OK.") else: print ("The email address looks incorrect.") |
The example also makes it immediately clear that the compilation of regular expressions is anything but easy. You can often find common problems suitable solutions on the Internet. Otherwise, you’ll have to put together the regular expression yourself (see Table 6).
Table 6: Structure of regular expressions
Code | Importance |
^ | Beginning of the string |
$ | End of the string |
. | any character |
[a-z] | a lowercase letter between a and z |
[a,b,f-h] | a letter from a, b, f, g and h |
[^0-9] | any character except 0 through 9 |
<muster>* | The pattern can appear any number of times (including 0 times). |
<muster>+ | The pattern can appear any number of times (at least once). |
\x | Specify special characters (\ $ stands for a ‘$’ character.) |
For example, the pattern [a-z] + matches any combination of lowercase lettersStrings to, e.g. to abc or x or xxx, but not to a b (Space), Abc (capital letter), a1 (number) or äöü (international letters).