working with xls and xlsx files in python


working with xls and xlsx files in python

Let's start working with xls and xlsx files in python. We can read data from xls or xlsx files using python programming and we can also write to xls or xlsx files using python programming. We do this by using the python package "openpyxl".  The package "openpyxl" can be found in Python Package Index. So, we can easily install it with python pip. I recommend you to use python virtualenv

Steps to work with xls and xlsx files in python:

  1. Create a virtualenv
    virtualenv -p python3 env
  2. Activate the virtualenv
    source env/bin/activate
  3. Install the "openpyxl" python package in virtualenv
    pip install openpyxl

Reading data from xls and xlsx files in python using openpyxl

from openpyxl import load_workbook

workbook = load_workbook('/path/to/file.xlsx')

# get all worksheets
all_sheets = workbook.get_sheet_names()

first_sheet = all_sheets[0]

# get a worksheet by its name
worksheet = workbook.get_sheet_by_name(first_sheet)

# print all rows in a worksheet
for row in worksheet.iter_rows():
    print(row)

    # all the values in a row
    for cell in row:
        print(cell)
        # get cell value
        print(cell.value)
# get cell with row number and column number
worksheet.cell(row=1, column=2).value

In above code we have loaded the xlsx file using the function "load_workbook" . It takes the file path and returns the workbook. we can get all sheet names using method "get_sheet_names". We can get the worksheet by its name by using the function "get_sheet_by_name". If we want we can iterate throught all sheets using "for loop". In above code we have worked on only a single sheet of the xlsx file. In general xlsx sheets contains rows and columns. To get all the rows in the sheet we can use the method "iter_rows" on the worksheet object.  To get column of a row we can iterate through "for loop" and get the cell like above code.
To get the value of the cell  with the attribute "value".

Writing data to xls and xlsx files in python using openpyxl

from openpyxl import Workbook

workbook = Workbook()

# create xls workbook sheet
workbook.create_sheet(index=0, title='Sheet1')
workbook.create_sheet(index=0, title='Sheet2')

# remove xls workbook sheet
workbook.remove(workbook['Sheet2'])

# get active work sheet
work_sheet = workbook.active

data = [
	['Naveen', 'M', 'Software Engineer'],
	['David', 'N', 'Designer'],
	['Shera', 'H', 'Web Developer']
]

rows = 3
cols = 3
# iterate to set data
for row_num in range(0, rows):
    for col_num in range (0, cols):
        work_sheet.cell(
        	row=row_num + 1,
        	column=col_num + 1
        ).value = data[row_num][col_num]

# save data to xls file
workbook.save('output.xlsx')

We can create the workbook with class "Workbook".  Create a sheet using the method "create_sheet". After creating the sheet take the active work sheet and then retrieve the cell with it's row and column values and  set the value. After save the work book just like shown in the above code.

I've just showed how we can read and write the data from and to the xls  or xlsx files. It's just the basics of the "openpyxl". We can do more with it. For example we can add images to xls file, we can merge the two or more cells and more. All these information is well documented in the "openpyxl".

References: https://openpyxl.readthedocs.io/en/stable/

 

Read Next: custom path converters in django

Read Prev: django image and file upload using ajax

Blog Archive