himavantht

New Member
Hi All,

I need help on extracting data from Excel file.

I cannot use MS Excel VBO - because Sheet has more than 10 duplicate column names(but data inside them is different). So if I try to extract using Excel VBO I get an error stating column name already exists in the data table.

So I tried using OLEDB with a query("Select * from [Sheet1$]") - This is extracting data into a collection as per my demands
But, extracted data differs in formatting. Ex: Date cells are adding 06:30:00 along with Date. Decimals are considering as round up values(1.0000 to 1). (Formatting is getting changed while reading and getting data to collection).

Some workarounds like - Changing cell formatting using excel VBO cannot be done because there are more than 50+ columns and there are 16 diff files I need to extract data. So it's very difficult to change cell formatting individually.

Is there any way we can extract the same Data(with the same formatting) from the sheet to the collection using OLEDB?

Thanks in advance!
 

gil.silva

Active Member
Hello himavantht,

I can't help you much with the OLEDB, but taking your first approach, you can use the action Get Worksheet as Collection Offset with the StartCell: A2 and Use Header: False OR if you wish to keep the name of the columns somehow, you can use StartCell: A1 and Use Header: False, this way the first row will have your column names from the Excel.

Note: Even if you use OLEDB, keep in mind that in Blue Prism collections it's not possible to have multiple columns with the same name.
 

himavantht

New Member
Hi gil.silva,

I sorted out this issue couple of days ago using Get worksheet as collection offset(same approach you suggested)

Yeah, I'm aware even using OLEDB, BP collection concatenates _1,_2 wherever it finds duplicate column names in a sheet.

Thanks for you help Man! Have a good day ahead.
 

corporateCat

New Member
Hi,

I know that the topic was closed long time but if someone wants an answer for OLEDB approach then the solution is simply to extend connection string with two more parameters: "HDR=YES;IMEX=1".

HDR=NO indicates that first row doesn't contain headers. Therefore duplicates should not cause any errors.
IMEX=1 will treat all data types as text.

Cheers!
 
Top