Selenium WebDriver Java Framework Course Limited Time Offer for $20

Selenium WebDriver Java Framework Course Limited Time Offer for $20

 

Print

Pdf Comparison In Robot Framework Python

Pdf comparison is a challenging work in test automation. In this example below, you will learn how to compare pdf files in Robot Framework Python. To compare pdf files (1) you need to install PDFMiner on your PC. (2) you need to use the code provided by Selenium Master and install it under the folder C:\Python27\Lib\site-packages\Pdf2TextLibrary.

 

PDFMiner is a tool for extracting information from PDF documents. It focuses on getting and analyzing text data.  Selenium Master wrote a python code to get page counts of a pdf file and extract its text. In this example, we have three pdf files as listed in the table below. 

Pdf File Name Page Count Text Content
 smpdf1.pdf  1  Selenium Master Pdf Comparison
smpdf2.pdf   1  Selenium Master Pdf Comparison
smpdf3.pdf   2  Selenium Master Pdf Comparison

Page 1
Sunday, August 17, 2014

 

Selenium Master Pdf Comparison
Page 2
Sunday, August 17, 2014

 When we compare the above 3 files, page count and text content should be equal for smpdf1.pdf and smpdf2.pdf.  However, page count and text content should not be equal for smpdf1.pdf and smpdf3.pdf. Click the link to see the result file. See the python library code and robot framework code. 

Step 1: write the python code "pdftotext.py" with Python IDLE. 

from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from cStringIO import StringIO

class PdfToText(object):
    ROBOT_LIBRARY_SCOPE = 'Global'

    def __init__(self):
        print 'pdt to text library'

    def convert_pdf_to_txt(self,path):
        rsrcmgr = PDFResourceManager()
        retstr = StringIO()
        codec = 'utf-8'
        laparams = LAParams()
        device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
        fp = file(path, 'rb')
        interpreter = PDFPageInterpreter(rsrcmgr, device)
        password = ""
        maxpages = 0
        caching = True
        pagenos=set()
        for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password,caching=caching, check_extractable=True):
            interpreter.process_page(page)
        fp.close()
        device.close()
        str = retstr.getvalue()
        retstr.close()
        return str

    def count_pdf_pages(self,path):
        rsrcmgr = PDFResourceManager()
        retstr = StringIO()
        codec = 'utf-8'
        laparams = LAParams()
        device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
        fp = file(path, 'rb')
        interpreter = PDFPageInterpreter(rsrcmgr, device)
        password = ""
        maxpages = 0
        caching = True
        pagenos=set()
        pagenumber=0
        for page in enumerate(PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password,caching=caching, check_extractable=True)):
            pagenumber=pagenumber+1
        fp.close()
        device.close()
        return pagenumber

Step 2: write the python code "__init__.py" with Python IDLE.

from pdftotext import PdfToText
__version__ = '1.0'

class Pdf2TextLibrary(PdfToText):
       
    ROBOT_LIBRARY_SCOPE = 'GLOBAL'

Step 3: create a folder named "Pdf2TextLibrary" under "C:\Python27\Lib\site-packages\" folder and copy the two files above in to the folder. See the screenshot below. 

 

Step 4: create a robot framework project in Robot Framework RIDE. See the screenshot below. 

 

Step 5: write the following test steps for each test case under the test suite "PdfTest". 

Counts Should be Equal

Counts Should Not Be Equal

Text Should Be Equal

Text Should Not Be Equal

whole code in text format looks like this below

*** Settings ***
Library           Pdf2TextLibrary

*** Test Cases ***
Counts Should Be Equal
    ${file1count}=    Count Pdf Pages    smpdf1.pdf
    Log    ${file1count}=
    ${file2count}=    Count Pdf Pages    smpdf2.pdf
    Should Be Equal    ${file2count}    ${file1count}

Counts Should Not Be Equal
    ${file1count}    Count Pdf Pages    smpdf1.pdf
    Log    ${file1count}
    ${file3count}    Count Pdf Pages    smpdf3.pdf
    Log    ${file3count}
    Should Not Be Equal    ${file1count}    ${file3count}

Text Should Be Equal
    ${File1ExtractedText}    Convert Pdf To Txt    smpdf1.pdf
    Log    ${File1ExtractedText}
    ${File2ExtractedText}    Convert Pdf To Txt    smpdf2.pdf
    Log    ${File2ExtractedText}
    Should Be Equal    ${File1ExtractedText}    ${File2ExtractedText}

Text Should Not Be Equal
    ${File1ExtractedText}    Convert Pdf to Txt    smpdf1.pdf
    Log    ${File1ExtractedText}
    ${File3ExtractedText}    Convert Pdf to Txt    smpdf3.pdf
    Log    ${File3ExtractedText}
    Should Not Be Equal    ${File1ExtractedText}    ${File3ExtractedText}

Screenshot 

Step 6: run the above test with the argument "--timestampoutputs -d TestResult ". You will see the result in the TestResult folder. All tests passed. See the RIDE log below. 

RobotFrameworkPdfComparison                                                                                                                                           
======================================================================================================================================================================
RobotFrameworkPdfComparison.Pdf Test                                                                                                                                  
======================================================================================================================================================================
Counts Should Be Equal                                                                                                                                        | PASS |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Counts Should Not Be Equal                                                                                                                                    | PASS |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Text Should Be Equal                                                                                                                                          | PASS |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Text Should Not Be Equal                                                                                                                                      | PASS |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
RobotFrameworkPdfComparison.Pdf Test                                                                                                                          | PASS |
4 critical tests, 4 passed, 0 failed
4 tests total, 4 passed, 0 failed
======================================================================================================================================================================
RobotFrameworkPdfComparison                                                                                                                                   | PASS |
4 critical tests, 4 passed, 0 failed
4 tests total, 4 passed, 0 failed
======================================================================================================================================================================
Output:  C:\RobotFrameworkPdfComparison\TestResult\output-20140817-224200.xml
Log:     C:\RobotFrameworkPdfComparison\TestResult\log-20140817-224200.html
Report:  C:\RobotFrameworkPdfComparison\TestResult\report-20140817-224200.html

test finished 20140817 22:42:01
Starting test: RobotFrameworkPdfComparison.Pdf Test.Counts Should Be Equal
20140817 22:42:00.581 :  INFO : ${file1count} = 1
20140817 22:42:00.582 :  INFO : 1=
20140817 22:42:00.600 :  INFO : ${file2count} = 1
20140817 22:42:00.602 :  INFO : 
Argument types are:
<type 'int'>
<type 'int'>
Ending test:   RobotFrameworkPdfComparison.Pdf Test.Counts Should Be Equal

Starting test: RobotFrameworkPdfComparison.Pdf Test.Counts Should Not Be Equal
20140817 22:42:00.624 :  INFO : ${file1count} = 1
20140817 22:42:00.625 :  INFO : 1
20140817 22:42:00.656 :  INFO : ${file3count} = 2
20140817 22:42:00.657 :  INFO : 2
20140817 22:42:00.658 :  INFO : 
Argument types are:
<type 'int'>
<type 'int'>
Ending test:   RobotFrameworkPdfComparison.Pdf Test.Counts Should Not Be Equal

Starting test: RobotFrameworkPdfComparison.Pdf Test.Text Should Be Equal
20140817 22:42:00.700 :  INFO : 
${File1ExtractedText} = Selenium Master Pdf Comparison 


20140817 22:42:00.702 :  INFO : 
Selenium Master Pdf Comparison 


20140817 22:42:00.728 :  INFO : 
${File2ExtractedText} = Selenium Master Pdf Comparison 


20140817 22:42:00.729 :  INFO : 
Selenium Master Pdf Comparison 


20140817 22:42:00.731 :  INFO : 
Argument types are:
<type 'str'>
<type 'str'>
Ending test:   RobotFrameworkPdfComparison.Pdf Test.Text Should Be Equal

Starting test: RobotFrameworkPdfComparison.Pdf Test.Text Should Not Be Equal
20140817 22:42:00.759 :  INFO : 
${File1ExtractedText} = Selenium Master Pdf Comparison 


20140817 22:42:00.760 :  INFO : 
Selenium Master Pdf Comparison 


20140817 22:42:00.824 :  INFO : 
${File3ExtractedText} = Selenium Master Pdf Comparison 

Page 1 

Sunday, August 17, 2014 


Selenium Master Pdf Comparison 

Page 2 

Sunday, August 17, 2...
20140817 22:42:00.825 :  INFO : 
Selenium Master Pdf Comparison 

Page 1 

Sunday, August 17, 2014