Using Python to Automate Word Report | by Yeung WONG | Jun, 2023

[ad_1]

To update your report, you need to follow 3 steps.

Find the location of the target part by getting the paragraph index or Relationship ID (rID)
Remove the outdated data if necessary
Replace / Add the new data

In my example, I need to update 7 components. My codes are written in the same order as those labeled in the image.

# For updating:
document = Document(f"work_dir/Currencies_template.docx")
all_paras = document.paragraphs# Manual way to locate the index of the target paragraph
for i in range(0, len(document.paragraphs)):
for run in document.paragraphs[i].runs:
if run.text.strip() and run.text.strip() != '\n':
print(i, document.paragraphs[i].text)
# Use 'search_text' function to locate the index of the target_text
def search_text(document, target_text):
target_para_index = []
for i, paragraph in enumerate(document.paragraphs):
if target_text in paragraph.text:
target_para_index.append(i)
return target_para_index
search_text(document, 'Updated on')

Firstly, we load the report using Document('your_report_path').

document containes different objects, and text is mainly stored in paragraphs.

Paragraph List

I will print out the text content to help me locate the index of the part that I need to update.

If you have a lot of pages to go through, then you can define a search_text function to help you locate the position. In my example, I want to update the datetime, which is written after the words, “Update on”. After running search_text(document, ‘Updated on’), I will get ‘12’, which is the target paragraph index and I need this index to indicate where to replace the data in the later scripts.

# Update last update date
target_para_index = 12 # replace target_para_index 
update_dt = datetime_now.strftime("%#d %b %Y %H:%M")
paragraph = document.paragraphs[target_para_index]
for run in paragraph.runs:
run.font.size = Pt(18)
run.text = f'Updated on update_dt'
print(f"Update datetime: update_dt")

paragraph stores the datetime that I need to update. As I would like to follow the same font style as before, so we need to change the font size as well.

# Update top 5 figure label
target_para_index = 30 # replace target_para_index 
paragraph = document.paragraphs[target_para_index]
for run in paragraph.runs:
run.bold = True
run.font.size = Pt(18)
run.text = top_df['Name'][0].replace('/', ' / ')# Update bottom 5 figure label
target_para_index = 32 # replace target_para_index 
paragraph = document.paragraphs[target_para_index]
for run in paragraph.runs:
run.bold = True
run.font.size = Pt(18)
run.text = bottom_df['Name'][0].replace('/', ' / ')
print("Figure labels are updated.")

Similar to updating the datetime data above, target_para_index is used to indicate the location of the text that we need to update. run properties are used to restore the font styles.

Table List

Tables are read in the order as they are in the document. Thus, we can refer to the right table easily.

# Update top table
top_table = document.tables[0]
for i in range(5):
for j in range(4):
cell = top_table.cell(i+1, j)
cell.text = str(top_df.iloc[i, j])
for paragraph in cell.paragraphs:
for run in paragraph.runs:
paragraph.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER# Update bottom table
bottom_table = document.tables[1]
for i in range(5):
for j in range(4):
cell = bottom_table.cell(i+1, j)
cell.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER # make the text center
cell.text = str(bottom_df.iloc[i, j])
for paragraph in cell.paragraphs:
for run in paragraph.runs:
paragraph.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER
print("Table figures are updated.")

WD_PARAGRAPH_ALIGNMENT.CENTER is used to change the text alignment to the center. Based on your need, you can change the alignment in the level of the paragraph or table.

Relationship ID (rID) List

In the context of the Microsoft Office Open XML (OOXML) file format, which is used by Microsoft Word documents (.docx), rID is an identifier that represents a relationship between different parts of the document.

OOXML files consist of multiple parts, such as the main document, styles, settings, images, etc. These parts are connected through relationships defined in the document. Each relationship is identified by a unique rID.

Relationships are used to link different parts of the document together. For example, a relationship can be used to link an image file to the document, associate a style with a specific paragraph, or link an embedded object.

When working with the python-docx library, you may come across rIDs when dealing with relationships between parts of the document, such as images, hyperlinks, or embedded objects.

It’s important to note that rIDs are specific to the structure of the OOXML file format and are not directly visible or editable in the Word document itself. They are internal identifiers used by the document format and its associated libraries.

We use rID to identify images and replace them in the report.

# Find rID of target picture
for rel in document.part.rels.values():
if "image" in rel.reltype:
image_part = document.part.related_parts[[k for k, v in document.part.rels.items() if v == rel][0]]
print([k for k, v in document.part.rels.items() if v == rel][0])

# Update pictures
def replace_picture_by_rID(document, target_rID, new_image_path):
# target_rID = 'rId7'  # Specify the rID of the target picture
# new_image_path = 'top.png' # Specify the new_image_path to the new image file
rels = document.part.rels
for rel in rels:
if rels[rel].reltype == "http://schemas.openxmlformats.org/officeDocument/2006/relationships/image":
if rel == target_rID:
relationship = rels[rel]
image_part = relationship.target_part
image_part._blob = open(new_image_path, 'rb').read()# Optional: Resize the new image if desired
new_width = Inches(2)  # Adjust the width as needed
new_height = Inches(3)  # Adjust the height as needed
image_part.width = new_width
image_part.height = new_height
return True
return False
# Call the function to replace the picture
success = replace_picture_by_rID(document, 'rId7', 'top.png')
success = replace_picture_by_rID(document, 'rId8', 'bottom.png')
if success:
print("Pictures are replaced.")
else:
print("Target picture not found.")

We first identify all the rIDs related to the image. Based on rID, we can replace the image with the updated png as indicated in new_image_path.

# Save the modified document
update_d = datetime_now.strftime("%Y%m%#d")
document.save(f"work_dir/Currencies_update_d.docx")
print("Document is saved.")

Finally, we save the document.

from docx2pdf import convert# Convert .docx to .pdf
convert(f"work_dir/Currencies_update_d.docx", f"work_dir/Currencies_update_d.pdf")

Alternatively, many people would prefer to view the report in PDF format. docx2pdf can be used to convert the Word report to PDF.

[ad_2]

Source link