Automating Document Comparison w/ Python & Tkinter

So, you’ve got a client with some seriously specific document requirements, huh? I feel your pain!

Processes are important, especially for change tracking, but sometimes they can feel like they’re eating up all your time!

I recently worked on a project where the client needed the documents to be updated in a very specific way…

Every single update meant creating:

  • A new MS Word document with the latest changes.
  • The old MS Word document with tracked changes.
  • Another old MS Word document, but this time with colour-coded highlights to show what changed and why.
  • PDFs of all three documents.

So, updating one document meant creating six documents in total!

Since I’m keen on automation, I couldn’t resist the challenge of whether I could automate this in my own time. Since this code doesn’t contain any project-specific info and was made in my spare time, I figured I’d explain how it works in case it helps anyone else facing a similar document-pocalypse.

Screenshot of the software

The Goal

Create a tool that could:

  • Compare two Word docs (old vs new)
  • Automatically generate tracked changes.
  • Automatically highlight differences with the specific colour coding requirements.
  • Automatically generate the corresponding PDFs.

Building the GUI

First, we need a user-friendly way to control this process. To do this, I used Tkinter to create a simple GUI.

from tkinter import *
from tkinter import filedialog, messagebox
import win32com.client
import os
import docx2pdf
import time

root = Tk()
root.title("FTC - V1.1 - GHothi")
root.geometry("400x470")
root.configure(background="DodgerBlue4")
root.iconbitmap(r"\\FTC_Logo.ico")

# Labels and buttons for user input
Label(text="F.T.C", bg="DodgerBlue4", fg="White", font="Arial 30 bold").pack(pady=(20,10))
Label(text="Ensure *ALL* MS Word files are closed before running this application", bg="DodgerBlue4", fg="firebrick2", font="Arial 12 bold").pack(pady=(5,30))

Selecting Files and Directory

For selecting the documents and output directory, I used the filedialog from Tkinter. This lets the user browse and choose the files and folders interactively rather than having to provide full directory paths.

def getDoc1():
    global document1
    document1 = filedialog.askopenfilename(filetypes = (("docx files","*.docx"),("all files","*.*")))
    document1 = document1.replace("/", "\\")

def getDoc2():
    global document2
    global documentName
    document2 = filedialog.askopenfilename(filetypes = (("docx files","*.docx"),("all files","*.*")))
    document2 = document2.replace("/", "\\")
    documentName = os.path.basename(document2).replace(".docx", "")

def getOutputDirectory():
    global outputDirectory
    outputDirectory = filedialog.askdirectory()
    outputDirectory = outputDirectory.replace("/", "\\")

Comparing Documents

The core functionality happens in the compareText function. Here, I used win32com to open Word, compare the documents, and save the results with track changes. I then loop through each change, highlighting the insertions and deletions with their required colours.

def compareText():
    Application = win32com.client.gencache.EnsureDispatch("Word.Application")
    Application.CompareDocuments(Application.Documents.Open(document1),
                                 Application.Documents.Open(document2))
    Application.ActiveDocument.SaveAs(outputDirectory + "\\" + documentName + " - Tracked Changes.docx")
    docx2pdf.convert(outputDirectory + "\\" + documentName + " - Tracked Changes.docx")
    Application.Quit()

    word = win32com.client.gencache.EnsureDispatch("Word.Application")
    doc = word.Documents.Open(outputDirectory + "\\" + documentName + " - Tracked Changes.docx")
    doc.TrackRevisions = False

    for change in doc.Revisions:
        if change.Type == 2:
            change.Reject()
        elif change.Type == 1:
            change.Accept()

    word.ActiveDocument.SaveAs(outputDirectory + "\\" + documentName + " - Highlight.docx")
    docx2pdf.convert(outputDirectory + "\\" + documentName + " - Highlight.docx")
    word.Quit()
    messagebox.showinfo("Complete", "Processing is complete")

Putting It All Together

Finally, we need to tie everything together with some Tkinter buttons for initiating the document selection and processing.

browseButton_Doc1 = Button(text="1. Select Superseeded Document", command=getDoc1).pack()
browseButton_Doc2 = Button(text="2. Select Updated Document", command=getDoc2).pack()
browseButton_OutputDirectory = Button(text="3. Select Output Directory", command=getOutputDirectory).pack()
processButton_CompareDifferences = Button(text="4. Markup Differences", command=compareText).pack()
Label(text="GHothi - V1.1", bg="DodgerBlue4", fg="LightSlateGray", font="Arial 9", justify="center").pack(pady=15)

root.mainloop()

And that’s it!

This little tool turned a mind-numbing task into something I could knock out faster than you can say “track changes”

Although the code could definitely be optimised, it got the job done and saved my sanity in the process.

If I were to turn the tool into something more distributable, I would have considered:

  • Improving the error handling. Right now, it’s pretty much non-existent.
  • Improving the GUI so it’s more intuitive to use.
  • Add additional features so it could be as useful to others in my team as it was to me (this is a big one – fortunately I had the easiest set of changes to make in my team – they had additional color coding requirements which would not work with the current script’s logic. It would need to be much more complex and probably require a human-in-the-loop to make a decision on what each change should be classified as according to the client’s requirements).
  • It’d be nice to add a progress bar so you don’t start doubting whether the software has crashed whilst its running (though in all honesty, it only takes a few seconds to run, so this feature is more of a luxury than a necessity).

Hope this helps someone out there drowning in documents!

more insights

Deploying AI

Some notes on AI deployment from Sol Rashidi’s book ‘Your AI Survival Guide: Scraped Knees, Bruised Elbows, and Lessons Learned from Real-World AI Deployments‘. She’s

Read more >

SQL Dump

Crib notes from when I used SQL to manage my online platform’s database. Focuses on the most practical 20% that delivers 80% of the results.

Read more >

Automating Construction News

Reading industry news is part of the job. But doing it manually every day—clicking headlines, skimming paragraphs, filtering out noise—is a time sink. So I

Read more >