.xlsx file parsing doesn't parse currency the same way .csv does #1338

Open
opened 2018-11-02 14:36:07 +00:00 by mandros1 · 4 comments
mandros1 commented 2018-11-02 14:36:07 +00:00 (Migrated from github.com)

I have actually commented on one of the threads, but to be sure that you can see it I will open my own.
I have looked around the Issues a bit, trying to find a solution, but as it didn't amount to much I have the following issue:
I take the xlsx file and save it as csv and this is what I get
1. XLSX
Software,"$6,526",$227,"$45,797","$42,354","$7,931","$15,802","$45,210","$49,233","$32,049","$7,519","$49,916","$21,721"
2. CSV
Software,$6.526,$227,$45.797,$42.354,$7.931,$15.802,$45.210,$49.233,$32.049,$7.519,$49.916,$21.721

CSV is the format that is desirable, so is there any way I could achieve the same thing with xlsx, from what I read from other threads I believe it has something to do with the format of the cell, but I have no clue how to go about fixing it.
Any input would be highly appreciated,
Thank you :)

edit: it is connected explicitly with the currency, as the percentage is parsed fine

I have actually commented on one of the threads, but to be sure that you can see it I will open my own. I have looked around the Issues a bit, trying to find a solution, but as it didn't amount to much I have the following issue: I take the xlsx file and save it as csv and this is what I get **1. XLSX** Software,"$6,526",$227,"$45,797","$42,354","$7,931","$15,802","$45,210","$49,233","$32,049","$7,519","$49,916","$21,721" **2. CSV** Software,$6.526,$227,$45.797,$42.354,$7.931,$15.802,$45.210,$49.233,$32.049,$7.519,$49.916,$21.721 CSV is the format that is desirable, so is there any way I could achieve the same thing with xlsx, from what I read from other threads I believe it has something to do with the format of the cell, but I have no clue how to go about fixing it. Any input would be highly appreciated, Thank you :) edit: it is connected explicitly with the currency, as the percentage is parsed fine
SheetJSDev commented 2018-11-06 02:18:50 +00:00 (Migrated from github.com)

If you're trying to get the currency using a different thousands and decimal symbol, that option is not currently exposed by the number formatting library (Excel doesn't actually store the symbols; it's inherited from the Windows regional settings, just like the format code 14 Date format)

If you're trying to get the currency using a different thousands and decimal symbol, that option is not currently exposed by the number formatting library (Excel doesn't actually store the symbols; it's inherited from the Windows regional settings, just like the format code 14 Date format)
mandros1 commented 2018-11-11 14:30:24 +00:00 (Migrated from github.com)

I force fixed it in my code by converting that comma into a dot inside my program. Thanks for the reply though. Good thing is that, even tough the format isn't the same, it surrounded it with quotes, so I was able to scope it and change it.
Really helpful tool and much the work!
Best regards

I force fixed it in my code by converting that comma into a dot inside my program. Thanks for the reply though. Good thing is that, even tough the format isn't the same, it surrounded it with quotes, so I was able to scope it and change it. Really helpful tool and much the work! Best regards
techcreatix commented 2019-04-08 18:54:10 +00:00 (Migrated from github.com)

@mandros1 Can you share how you scoped the quotes ? I am having same issue need to fix it urgently.

@mandros1 Can you share how you scoped the quotes ? I am having same issue need to fix it urgently.
mandros1 commented 2019-04-08 20:35:14 +00:00 (Migrated from github.com)

@techcreatix I worked on a Django project, so this might not be useful for you, but I will post the whole method I used if anyone else is to stumble upon this.
` if request.method == 'POST':
excel_filename = request.POST['fileName']
excel_filedata = request.POST['fileData']
if (excel_filename != None and excel_filedata != None):

        dataArr = excel_filedata.split("\r\n\r\n")  # gets sheet specific data
        dataframe_collection = {}
        # Loop through every sheet
        for x in range(len(dataArr)):
            sheetArray = dataArr[x].split("\r\n")
            sheetName = sheetArray[0]  # current sheet name
            sheetData = sheetArray[1:]  # sheet data
            data_array = []

            #  for every row in the sheet data
            for y in range(len(sheetData)):
                # Split every row by a comma
                convSheetData = re.sub(r'"[^"]*"', lambda m: m.group(0).replace(',', '.'), sheetData[y])
                data_array.append(convSheetData.split(","))
                if(y+1==len(sheetData)):
                    data_array.pop()
            final_data_arr = np.array(data_array)  # data array for pandas dataframe
            df = pd.DataFrame(final_data_arr)
            dataframe_collection[sheetName] = df
            #df.to_excel(excel_writer=writer, sheet_name=sheetName, header=False, index=False)
        #writer.save()
        stringified_json = utils.convert(dataframe_collection, remove_extension(filename=excel_filename))
    store_file_data_in_database(excel_name=remove_extension(excel_filename), excel_data=dataframe_collection, json_data=stringified_json)
    print(get_all_file_names_of_placeholder())`

I pasted the whole function, but I believe this is the part you need "convSheetData = re.sub(r'"[^"]*"', lambda m: m.group(0).replace(',', '.'), sheetData[y])".
I hope I remember that correctly

@techcreatix I worked on a Django project, so this might not be useful for you, but I will post the whole method I used if anyone else is to stumble upon this. ` if request.method == 'POST': excel_filename = request.POST['fileName'] excel_filedata = request.POST['fileData'] if (excel_filename != None and excel_filedata != None): dataArr = excel_filedata.split("\r\n\r\n") # gets sheet specific data dataframe_collection = {} # Loop through every sheet for x in range(len(dataArr)): sheetArray = dataArr[x].split("\r\n") sheetName = sheetArray[0] # current sheet name sheetData = sheetArray[1:] # sheet data data_array = [] # for every row in the sheet data for y in range(len(sheetData)): # Split every row by a comma convSheetData = re.sub(r'"[^"]*"', lambda m: m.group(0).replace(',', '.'), sheetData[y]) data_array.append(convSheetData.split(",")) if(y+1==len(sheetData)): data_array.pop() final_data_arr = np.array(data_array) # data array for pandas dataframe df = pd.DataFrame(final_data_arr) dataframe_collection[sheetName] = df #df.to_excel(excel_writer=writer, sheet_name=sheetName, header=False, index=False) #writer.save() stringified_json = utils.convert(dataframe_collection, remove_extension(filename=excel_filename)) store_file_data_in_database(excel_name=remove_extension(excel_filename), excel_data=dataframe_collection, json_data=stringified_json) print(get_all_file_names_of_placeholder())` I pasted the whole function, but I believe this is the part you need "convSheetData = re.sub(r'"[^"]*"', lambda m: m.group(0).replace(',', '.'), sheetData[y])". I hope I remember that correctly
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#1338
No description provided.