Sunday, July 29, 2012

Edit a CSV Without Changing Number Formats

This tutorial demonstrates how to prevent Excel from changing your csv data and points out some other problems that can occur when editing a csv file. You can jump to the video at the end or read on for a bit more detailed version.

Many users believe csv is just another type of Excel file, however this is not the case. Excel can open csv files and sets itself as the default editor when it is installed, but csv is nothing more than a text file with a different file extension. The most common use of the csv file format is in moving data (represented in the form of a table) from one database or program to another incompatible database or program. A csv cannot store formatting or data that is not text (such as images)

Editing a csv file in Excel can cause fields with numbers to change automatically into some undesired format. This can go unnoticed in large data sets, only to be discovered weeks or even months later in your eCommerce site or other destination database. At this point it can be difficult to track down the erroneous records and correct the information. The most common cause of surprise format changes is opening the csv directly either by double-clicking the file or using the File > Open dialog in Excel. Importing a csv into an blank Excel sheet will prevent the majority of undesired results.

The most common cause of surprise format changes is opening the csv directly either by double-clicking the file or using the File > Open dialog in Excel. Importing a csv into an blank Excel sheet will prevent the majority of undesired results. Here's how:
  1. Open a new Excel sheet, select the Data tab, then click 'From Text' in the Get External Data group.
  2. Browse to the CSV file and select 'Import'.
  3. In step 1 of the Import Wizard choose 'Delimited' as the original data type. Typically you don't need to make any other changes here but be sure your header row is actually on row 1 and change this number accordingly if it begins further down the sheet. Click 'Next'.
  4. In step 2 of the Import Wizard choose Comma as the delimiter (deselect the Tab check box) and click 'Next'.
  5. Step 3 is where you tell Excel not to change your formats. With the first column in the Data Preview selected, scroll across to the last column and select it while holding the SHIFT key (all columns should now be selected).
  6. Select 'Text' from the Column Data Format group and then 'Finish'. Click OK to insert the data into cell A1.
Note that if the source data contains forced line breaks (ALT + ENTER was used to create a new line of text) these will come in as a broken row when importing. If this happens you need to fix the source data or open the file in a text editor such as Notepad and remove the line breaks (they are represented in Notepad as a small square). If these are not fixed the import will most likely fail completely or partially when you attempt to get the csv data to it's destination.

Summary of Best Practices:
  • If you are the one generating the csv, retrieve only the data you need (get the least amount of columns and rows to accomplish the task).
  • Import the csv as text into Excel (don't open it directly)
  • If the editing task is extensive save it as an Excel file and re-save it as a csv when finished (this will retain column widths, formatting and formulas until you're done)
  • Fix spelling and data errors, line breaks, etc. in the source database if possible.

No comments:

Post a Comment