10  File Handling

Saving and retrieving information is frequently required when working with computer programs. Python has methods to read and write files. In addition, there are some libraries that can facilitate working with specialized file types. For reading a file, a file handler needs to be initiated which is an object that act as an interface to that file. A file handler incorporates information about the file along with its path and the mode in which the file is available for processing. The mode here refers to the way the file would be opened i.e. for reading, writing, etc. The open function takes name of the file as an argument along with mode and returns the corresponding file handler. This file handler object is iterator and can be considered as a collection having of all the lines in the file. Once done with reading the contents for the file, the file handler must be closed using the close function.

10.1 Reading files

To read a file using the open function, we will pass the file name as the first argument and the mode i.e. r are the second argument. Note that when we given only the file name as an argument, the file should be present in the current working directory. If a file from another folder needs to be accessed then we need to give the full path for that file. To check the working directory path, pwd command can be used.

pwd
'C:\\Users\\bioinfo guru\\OneDrive - bioinfo.guru\\Documents\\python_book'

We can used the Jupyter magic command %load to display the contents a text file within the notebook as shown below.

# %load test_file.txt
This is a test file for Python.
This file has .txt extension.
# test_file.txt should be present in the current working directory.

FH1 = open('test_file.txt','r')

print(f"The file {FH1.name} is open in {FH1.mode} mode.")

# print all the lines in the file.
for x in FH1:
    print(x)
FH1.close()
The file test_file.txt is open in r mode.
This is a test file for Python.

This file has .txt extension.

Often it is useful to remove the newline character (\n) at the end of the lines since the print fuction add as newline by default. This can be achieved using the rstrip() function. This function without any argument remove the last character in the string and return a copy of the modified string.

10.2 Writing files

For writing content to a file the open function should be called with ‘w’ as the mode. When a file is opened in the write mode, a new file is created. In case there the file already exists then its contents are overwritten (without warning!). write function is used to write content to a file. We can also use print with file attribute to write to file instead of printing on screen.

FH_out = open('temp_file.txt', 'w')
FH_out.write("This is the first sentence.\n")
print("This is the second sentence.", file=FH_out)
FH_out.close()
# %load temp_file.txt
This is the first sentence.This is the second sentence.
FH2 = open('temp_file.txt','r')
for lines in FH2:
    print(lines)
FH2.close()
This is the first sentence.

This is the second sentence.

To append contents to an existing file, it should be opened with ‘a’ option instead of ‘w’.

FH_out = open('temp_file.txt','a')
FH_out.write("This is the third sentence.\n")
FH_out.write("This is the \t fourth sentence.\n")
FH_out.close()
FH3 = open('temp_file.txt','r')
for lines in FH3:
    lines = lines.rstrip('\n')
    print(lines)
FH3.close()
This is the first sentence.
This is the second sentence.
This is the third sentence.
This is the      fourth sentence.

The readlines function can be used to get a list having lines of the file as elements.

FH4 = open('temp_file.txt','r')
all_lines = FH4.readlines()
print(all_lines)
FH4.close()
['This is the first sentence.\n', 'This is the second sentence.\n', 'This is the third sentence.\n', 'This is the \t fourth sentence.\n']

10.3 The with keyword

We can also read and write file using the with keyword. Here some action are performed on the file object within the with block. This approach automatically closes the when the with block is over.

with open("temp123.txt", "w") as FH_OUT:
    print("Hello", file=FH_OUT, end=" ") 
    print("World!", file=FH_OUT)
print("Done")
Done
with open("temp123.txt", "a") as FH_OUT:
    print("hi", file=FH_OUT)
with open("temp123.txt","r") as FH:
    for line in FH.readlines():
        print(line)
Hello World!

hi

Quiz: Write a program to print the third line of a text file.

Show answer
#temp_file.txt should be there in the current directory
FH = open('temp_file.txt','r')
all_lines = FH.readlines()
print(all_lines[2])

10.4 Working with multiple files

Many a times we need to process multiple files such that we cannot use individual file names to open and therefore we need a way to programmatically access all the files in a folder. To do this, we have a few options in Python. Libraries such as os and glob provides functionality to iterate over files in a folder. Using the glob function from the glob library, the code below creates a list of files with name starting with “temp” and having a “.txt” extension.

import glob
txt_files = glob.glob("temp*.txt")
print(txt_files)
['temp123.txt', 'temp_file.txt']

We can now iterate through this list of files and print their contents. The line_count variable stores the number of lines in each file and is printed after iterating through the contents of a particular file.

for f in txt_files:
    line_count = 0
    print(f'Reading file {f}')
    FH = open(f, 'r')
    for line in FH:
        print("\t"+line.rstrip("\n"))
        line_count += 1
    FH.close()
    print(f'{f} has {line_count} line(s).')
Reading file temp123.txt
    Hello World!
temp123.txt has 1 line(s).
Reading file temp_file.txt
    This is the first sentence.
    This is the second sentence.
    This is the third sentence.
    This is the      fourth sentence.
temp_file.txt has 4 line(s).