Working With Xls And Xlsx Files In Python

Working With Xls And Xlsx Files In Python

Let's start working with xls and xlsx files in python. We can read data from xls or xlsx files using python programming and we can also write to xls or xlsx files using python programming. We do this by using the python package "openpyxl". The package "openpyxl" can be found in Python Package Index. So, we can easily install it with python pip. I recommend you to use python virtualenv.

Steps to work with xls and xlsx files in python:

  1. Create a virtualenv
     virtualenv -p python3 env
    
  2. Activate the virtualenv
     source env/bin/activate
    
  3. Install the "openpyxl" python package in virtualenv
     pip install openpyxl
    

Reading data from xls and xlsx files in python using openpyxl

from openpyxl import load_workbook

workbook = load_workbook('/path/to/file.xlsx')

# get all worksheets
all_sheets = workbook.get_sheet_names()

first_sheet = all_sheets[0]

# get a worksheet by its name
worksheet = workbook.get_sheet_by_name(first_sheet)

# print all rows in a worksheet
for row in worksheet.iter_rows():
    print(row)

    # all the values in a row
    for cell in row:
        print(cell)
        # get cell value
        print(cell.value)
# get cell with row number and column number
worksheet.cell(row=1, column=2).value

In above code we have loaded the xlsx file using the function "load_workbook" . It takes the file path and returns the workbook. we can get all sheet names using method "get_sheet_names". We can get the worksheet by its name by using the function "get_sheet_by_name". If we want we can iterate throught all sheets using "for loop". In above code we have worked on only a single sheet of the xlsx file. In general xlsx sheets contains rows and columns. To get all the rows in the sheet we can use the method "iter_rows" on the worksheet object. To get column of a row we can iterate through "for loop" and get the cell like above code. To get the value of the cell with the attribute "value".

Writing data to xls and xlsx files in python using openpyxl

from openpyxl import Workbook

workbook = Workbook()

# create xls workbook sheet
workbook.create_sheet(index=0, title='Sheet1')
workbook.create_sheet(index=0, title='Sheet2')

# remove xls workbook sheet
workbook.remove(workbook['Sheet2'])

# get active work sheet
work_sheet = workbook.active

data = [
    ['Naveen', 'M', 'Software Engineer'],
    ['David', 'N', 'Designer'],
    ['Shera', 'H', 'Web Developer']
]

rows = 3
cols = 3
# iterate to set data
for row_num in range(0, rows):
    for col_num in range (0, cols):
        work_sheet.cell(
            row=row_num + 1,
            column=col_num + 1
        ).value = data[row_num][col_num]

# save data to xls file
workbook.save('output.xlsx')

We can create the workbook with class "Workbook". Create a sheet using the method "create_sheet". After creating the sheet take the active work sheet and then retrieve the cell with it's row and column values and set the value. After save the work book just like shown in the above code.

I've just showed how we can read and write the data from and to the xls or xlsx files. It's just the basics of the "openpyxl". We can do more with it. For example we can add images to xls file, we can merge the two or more cells and more. All these information is well documented in the "openpyxl".

References: https://openpyxl.readthedocs.io/en/stable/