Bulk Fix for CSV (Macintosh) Excel Files On Windows

I have a bunch of CSV files that I received from someone using Excel on Mac. When she saved them, she did not choose a special CSV format that's Windows-friendly. As a result, if I open one of her files, make a small change, save the file, and reopen it later, the columns all end up jammed together. The program that is ultimately supposed to consume the CSV data also has a problem reading the files that were saved on a Mac.

Using Notepad++, I inspected the line-endings and the file encoding. The line-endings were correct for Windows (CRLF), but I noticed that the file encoding was "UTF-8 BOM". The BOM stands for byte-order mark. When I opened CSV files saved from Excel for Windows, the file encoding was just "UTF-8" -- no byte-order mark.

I wanted a way to bulk-edit all the files, switching the encoding to UTF-8. I thought about writing a program in C#, but it seemed likely that I could use existing command-line tools. With a little searching, I found this sed command on Stack Exchange. That was great for updating a single file, but I wanted to process well over a hundred files -- many of them with spaces in their names. Using find and piping the results to xargs, I thought this would be easy, but it turned out that find doesn't wrap the resulting file paths in quotes. I also discovered that sed for Cygwin writes out files using the Unix default line-ending, "LF".

What I ended up with is this:

find myTargetDirectory -name "*.csv" | sed -e 's/^/"/g' -e 's/$/"/g' | tr '\n' ' ' | xargs sed -b -i '1s/^\xEF\xBB\xBF//'

The find command is straightforward enough, but the results are piped to sed and tr in order to wrap the file paths in double-quotes, using an approach I lifted from this Stack Overflow answer. And I had to add the -b flag to the BOM-removal sed in order to get sed on Cygwin to leave the Windows end-of-line characters (CRLF) as-is.

End result: defused the BOM.

I've noted this here for my own future reference. I hope it helps someone else, too!

If you found this helpful, you'd like to thank me, and you enjoy reading science fiction, consider buying my book! More info here.


comments powered by Disqus