Processing Text Files In Excel
In a recent project, I had to export corrupted database files in a text format (Comma separated values) and then clean them by reading the text files in Excel and performing cleanup routines. Text files can be read using Windows Scripting Host routines or native VBA functions that read and write the files. Both methods are illustrated here.
Program Code Example 1
Option Explicit ' ******************************************************** ' Remove Non-Printable Characters From A Text File ' Also Remove CR & LF Characters That aren't paired ' Use Windows Scripting Host Objects ' ******************************************************** Dim intCharInAscii As Integer Dim intLastCharInAscii As Integer Dim strError As String Dim lngBoundary As Long Public Sub RemoveNonPrintables() Dim objFSO As Object Dim objInput As Object Dim objOutput As Object Dim fd As FileDialog Dim fdo As FileDialog Dim curInputLine As String Dim curOutputLine As String Dim strFileNameSelected As String Dim strOutputFileName As String lngBoundary = 1700019 On Error GoTo HandleScriptError: ' ******************************************************** ' Set Up File System Objects That Allow The Program ' To Read A Text File Sequentially ' ******************************************************** Set objFSO = CreateObject("Scripting.FileSystemObject") ' ******************************************************** ' Use the File Dialog Method to Present an Open File ' Dialog Box Allowing the User To Select The File ' They want to clean ' ******************************************************** Set fd = Application.FileDialog(msoFileDialogFilePicker) ' *************************************************** ' Present the File Open Dialog Box ' *************************************************** With fd .AllowMultiSelect = False .InitialFileName = "" .Show ' *************************************************** ' Process The Result Of The Open Dialog Box ' Save the path to the file to open ' *************************************************** If .SelectedItems.Count < 1 Then MsgBox ("No File Selected") Exit Sub Else strFileNameSelected = .SelectedItems(1) End If End With Set fd = Nothing ' *************************************************** ' Present the File Save Dialog Box ' Ask the user where they want to store the ' clean version of the file ' *************************************************** Set fdo = Application.FileDialog(msoFileDialogSaveAs) fdo.Show If fdo.SelectedItems.Count < 1 Then Exit Sub End If ' *************************************************** ' Save the Output File Path ' *************************************************** strOutputFileName = fdo.SelectedItems(1) Set fdo = Nothing intLastCharInAscii = 0 strError = "N" On Error GoTo HandleScriptError ' *************************************************** ' Use Windows Scripting Host Objects To Open, Read ' And Write The Corrected Text File ' *************************************************** Set objInput = objFSO.OpenTextFile(strFileNameSelected, 1) Set objOutput = objFSO.CreateTextFile(strOutputFileName, True) ' *************************************************** ' lngBoundary Is The Buffer Size To Read A Block ' of data from the input file. When read, the ' buffer includes all control characters, including ' carriage return and line feed ' *************************************************** Do While objInput.AtEndOfStream <> True If strError = "Y" Then Exit Sub End If curInputLine = objInput.Read(lngBoundary) ' *************************************************** ' The Corrupted File Contains Non-Printable Garbage ' Characters When Need To Be Removed. Save all ' Carriage Return and Line Feed Characters ' *************************************************** curOutputLine = PrintableOnly(curInputLine) objOutput.Write (curOutputLine) Loop ReadTextFileExit: objInput.Close objOutput.Close Exit Sub HandleScriptError: MsgBox "An Error Occurred In This Application" & vbCrLf & _ "Please Contact The Developer" & vbCrLf & vbCrLf & _ "Error Number = " & Err.Number & " Description = " & _ Err.Description, vbCritical Resume ReadTextFileExit End Sub Function PrintableOnly(ByVal strIn As String) As String Dim i As Long ' Point To Character To Be Read Dim j As Long ' Point To Character To Be Written Dim strChar As String Dim strCharNext As String j = 0 For i = 1& To Len(strIn) strChar = Mid$(strIn, i, 1&) intCharInAscii = Asc(strChar) ' *************************************************** ' Don't Allow A Carriage Return Without a Line Feed ' Or a Line Feed Without A Preceeding Carriage Return ' *************************************************** If intCharInAscii = 13 Then If i = lngBoundary Then MsgBox ("Boundary Error - CR at Boundary Character") strError = "Y" Exit Function End If strCharNext = Mid$(strIn, i + 1, 1&) If Asc(strCharNext) <> 10 Then GoTo Skip_Character End If End If ' *************************************************** ' If Not A Line Feed Then Do Usual Screening ' *************************************************** If intCharInAscii <> 10 Then Select Case intCharInAscii Case 13, 32 To 127 j = j + 1 Mid$(strIn, j, 1) = strChar End Select ' *************************************************** ' If A Line Feed Then Don't Pass Through Unless Last ' Character Was A Carriage Return ' *************************************************** ElseIf intLastCharInAscii = 13 Then Select Case intCharInAscii Case 10, 13, 32 To 127 j = j + 1 Mid$(strIn, j, 1) = strChar End Select End If ' *************************************************** ' Save Last Character ' *************************************************** Skip_Character: intLastCharInAscii = intCharInAscii Next i PrintableOnly = Left$(strIn, j) ' The String Is Shorter Now End Function
Program Code Example 2
Option Explicit ' *************************************************** ' Use Standard Excel I/O Functions Instead of ' Windows Scripting Host - The options for this ' Input Method Are More Limited But Simpler ' To Code ' *************************************************** Public Sub CountDelimitersCreateText() Dim intHandleIn As Integer Dim intHandleOut As Integer Dim strInputLine As String Dim fd As FileDialog Dim fdo As FileDialog Dim lngCount As Long Dim strFileNameSelected As String Dim strOutputFileName As String Dim lngLineLength As Long Dim lngDelimiterCount As Long Dim i As Long Dim boolPipe As Boolean Set fd = Application.FileDialog(msoFileDialogFilePicker) ' *************************************************** ' Open the file dialog ' *************************************************** With fd .AllowMultiSelect = False .InitialFileName = "" .Show ' *************************************************** ' Process The Result Of The Dialog Box ' *************************************************** If .SelectedItems.Count < 1 Then MsgBox ("No File Selected") Exit Sub Else strFileNameSelected = .SelectedItems(1) End If End With Set fd = Nothing ' *************************************************** ' Present the File Save Dialog Box ' *************************************************** Set fdo = Application.FileDialog(msoFileDialogSaveAs) fdo.Show If fdo.SelectedItems.Count < 1 Then Exit Sub End If strOutputFileName = fdo.SelectedItems(1) Set fdo = Nothing On Error GoTo ReadTextFileError ' *********************************************************************** ' Open The Input File ' *********************************************************************** intHandleIn = FreeFile Open strFileNameSelected For _ Input Access Read _ As intHandleIn ' *********************************************************************** ' Open The Output File ' *********************************************************************** intHandleOut = FreeFile Open strOutputFileName For _ Output Access Write _ As intHandleOut ' *********************************************************************** ' Loop Through Entries ' *********************************************************************** lngDelimiterCount = 0 While Not EOF(intHandleIn) Line Input #intHandleIn, strInputLine lngLineLength = Len(strInputLine) lngDelimiterCount = 0 boolPipe = False For i = 1 To lngLineLength If Mid(strInputLine, i, 1) = """" Then boolPipe = Not boolPipe Else If boolPipe = False Then If Mid(strInputLine, i, 1) = "," Then lngDelimiterCount = lngDelimiterCount + 1 End If End If End If Next i Print #intHandleOut, "*" & Format(lngDelimiterCount, "0#") & "*" & strInputLine Wend ReadTextFileExit: Close intHandleIn Close intHandleOut Exit Sub ReadTextFileError: MsgBox "An Error Occurred In This Application" & vbCrLf & _ "Please Contact The Developer" & vbCrLf & vbCrLf & _ "Error Number = " & Err.Number & " Description = " & _ Err.Description, vbCritical Resume ReadTextFileExit End Sub