In this tutorial we will explore how to merge PDF files using Python.
Table of Contents
- Introduction
- Sample PDF files
- Merge two PDF files using Python
- Merge many PDF files using Python
- Conclusion
Introduction
Merging PDF files is often a required operation after scanning multiple pages of documents, or saving multiple pages as individual documents on your computer.
There are several software such as Adobe as well as online tools that can help perform this task quickly. However most of them are either paid or might not have enough security features provided.
In this tutorial we will explore how to merge PDF files using Python on your computer with a few lines of code.
To continue following this tutorial we will need the following Python library: PyPDF2.
If you don’t have them installed, please open “Command Prompt” (on Windows) and install them using the following code:
pip install PyPDF2
Sample PDF files
In order to continue in this tutorial we will need some PDF files to work with.
Here are the three PDF files we will use in this tutorial:
These PDF files will reside in the pdf_files folder, which is in the same directory as the main.py with our code.
Here is how the structure of my files looks like:
Merge two PDF files using Python
In order to perform PDF merging in Python we will need to import the PdfFileMerger() class from the PyPDF2 library, and create an instance of this class.
In this example we will merge two files: sample_page1.pdf and sample_page2.pdf.
In this case, the two file paths can be placed into a list, which we will then iterate over and append one to another:
from PyPDF2 import PdfFileMerger
#Create an instance of PdfFileMerger() class
merger = PdfFileMerger()
#Create a list with the file paths
pdf_files = ['pdf_files/sample_page1.pdf', 'pdf_files/sample_page2.pdf']
#Iterate over the list of the file paths
for pdf_file in pdf_files:
#Append PDF files
merger.append(pdf_file)
#Write out the merged PDF file
merger.write("merged_2_pages.pdf")
merger.close()
And you should see merged_2_pages.pdf created in the same directory as the main.py file with the code:
Merge many PDF files using Python
In this section we will explore how to merge many PDF files using Python.
One way of merging many PDF files would be to add the file names of every PDF files to a list manually and then perform the same operation as in the previous section.
But what if we have 100 PDF files in the folder? Using the os library we can access all of the file names in a given directory as a list and iterate over it:
from PyPDF2 import PdfFileMerger
import os
#Create an instance of PdfFileMerger() class
merger = PdfFileMerger()
#Define the path to the folder with the PDF files
path_to_files = r'pdf_files/'
#Get the file names in the directory
for root, dirs, file_names in os.walk(path_to_files):
#Iterate over the list of the file names
for file_name in file_names:
#Append PDF files
merger.append(path_to_files + file_name)
#Write out the merged PDF file
merger.write("merged_all_pages.pdf")
merger.close()
And you should see merged_all_pages.pdf created in the same directory as the main.py file with the code:
Conclusion
In this article we explored how to merge multiple PDF files using Python.
Feel free to leave comments below if you have any questions or have suggestions for some edits and check out more of my Python for PDF tutorials.