While working with files in Python, we may need to fetch files based on a specific pattern. The glob module in Python provides functions to create lists of files matching given patterns, like wildcard characters and support for recursive search with "**". Here's how we can import the module and use it:
import glob
txt_files = glob.glob('*.txt') # Fetches all .txt files in the current directory
print(txt_files, type(txt_files)) # Prints the list of .txt files and its type
Suppose there are three text files in the current directory: file1.txt, file2.txt, and file3.txt, the output would be:
The glob function takes a pattern as a string and returns a list of file and directory paths that match the pattern.
In our previous example, we used the wildcard character * which matches zero or more characters. Hence, *.txt matches all files that end with .txt.
We can also search files in subdirectories recursively:
nested_files = glob.glob('**/*.txt', recursive=True) # Fetches all .txt files recursively
print(nested_files, type(nested_files)) # Prints the list of .txt files and its type
Assuming that in addition to our previous files, we also have a folder1 in the current directory containing file4.txt and file5.txt, the output would be:
Here, we've used ** in our pattern. This symbol in a pathname pattern stands for all files and zero or more directories and subdirectories. We also used the recursive=True argument to enable the function to look into subdirectories. Therefore, '**/*.txt', recursive=True fetches all the .txt files from the current directory and its subdirectories.
This is a useful way to deal with file paths and automate file-handling tasks in Python.
Challenge: Find .py files
As a new data scientist at a tech startup, you are given the task to organize all the Python files in your current project. Due to the rush of the last weeks, the Python files (.py) are scattered all over the current directory. Your job is to create a Python program that uses glob to list all Python files (.py) present in the current directory. However, since you are only interested in the names of the files and not their location, the output should only be the filenames without the directory path.
The program does not need to take any input.
The output of the program should contain the names of the Python files in the current directory, each name on a new line. Do not include the directory path in the output. The output should be alphabetically sorted.