working with xls and xlsx files in python
Let's start working with xls and xlsx files in python. We can read data from xls or xlsx files using python programming and we can also write to xls or xlsx files using python programming. We do this by using the python package "openpyxl". The package "openpyxl" can be found in Python Package Index. So, we can easily install it with python pip. I recommend you to use python virtualenv.
Steps to work with xls and xlsx files in python:
- Create a virtualenv
virtualenv -p python3 env
- Activate the virtualenv
- Install the "openpyxl" python package in virtualenv
pip install openpyxl
Reading data from xls and xlsx files in python using openpyxl
from openpyxl import load_workbook workbook = load_workbook('/path/to/file.xlsx') # get all worksheets all_sheets = workbook.get_sheet_names() first_sheet = all_sheets # get a worksheet by its name worksheet = workbook.get_sheet_by_name(first_sheet) # print all rows in a worksheet for row in worksheet.iter_rows(): print(row) # all the values in a row for cell in row: print(cell) # get cell value print(cell.value) # get cell with row number and column number worksheet.cell(row=1, column=2).value
In above code we have loaded the xlsx file using the function "load_workbook" . It takes the file path and returns the workbook. we can get all sheet names using method "get_sheet_names". We can get the worksheet by its name by using the function "get_sheet_by_name". If we want we can iterate throught all sheets using "for loop". In above code we have worked on only a single sheet of the xlsx file. In general xlsx sheets contains rows and columns. To get all the rows in the sheet we can use the method "iter_rows" on the worksheet object. To get column of a row we can iterate through "for loop" and get the cell like above code.
To get the value of the cell with the attribute "value".
Writing data to xls and xlsx files in python using openpyxl
from openpyxl import Workbook workbook = Workbook() # create xls workbook sheet workbook.create_sheet(index=0, title='Sheet1') workbook.create_sheet(index=0, title='Sheet2') # remove xls workbook sheet workbook.remove(workbook['Sheet2']) # get active work sheet work_sheet = workbook.active data = [ ['Naveen', 'M', 'Software Engineer'], ['David', 'N', 'Designer'], ['Shera', 'H', 'Web Developer'] ] rows = 3 cols = 3 # iterate to set data for row_num in range(0, rows): for col_num in range (0, cols): work_sheet.cell( row=row_num + 1, column=col_num + 1 ).value = data[row_num][col_num] # save data to xls file workbook.save('output.xlsx')
We can create the workbook with class "Workbook". Create a sheet using the method "create_sheet". After creating the sheet take the active work sheet and then retrieve the cell with it's row and column values and set the value. After save the work book just like shown in the above code.
I've just showed how we can read and write the data from and to the xls or xlsx files. It's just the basics of the "openpyxl". We can do more with it. For example we can add images to xls file, we can merge the two or more cells and more. All these information is well documented in the "openpyxl".