I've just read Federico Kereki's article about interrogating a Linux
system titled “What's in the Box? Interrogate Your Hardware” in the December 2015 issue. I love this kind of article and hope to see more!
—
Brian Clark
Federico Kereki replies: Thanks, Mr Clark, for your kind words. The article grew out of my actual need to know about the hardware in my own machine, and because of Linux's openness, I learned even more than I had expected. I'm glad you liked my results!
Regarding Greg Bledsoe's “Server Hardening” article in the November 2015
issue: great article—lots of detailed help in one source. One question: is it possible to
mention specific logs you reference in the article? I get lost quickly when seeing the
large number of logs scattered about my Debian server.
—
Tom Browder
I believe that the final version of the script that Dave Taylor came up with in his Work the Shell column titled “Analyzing Comma-Separated Values (CSV) Files” in the December 2015 issue of Linux Journal contains an oversight. Specifically, it does not handle the case in which more than one field contains commas (for example, the dollar amount field and the comment field). I have modified Dave's script to take this into account. Hopefully, this will be of some help. I always enjoy Dave's column and have learned a lot from it. Here's the modified script:
#! /bin/bash -
# fixcsv
# fix CSV files with embedded commas
# The problem is that some spreadsheet fields may contain
commas. In the sample case, this includes the dollar
amount and comment fields. I believe you overlooked the
case in which both the dollar amount and comment fields
contain commas. Your script assumes that there is at
most one such instance.
# The simplest solution is to export the spreadsheet
contents with some field delimiter that can never appear
in any field, e.g., a tab. Then write the script
using this delimiter.
# Original code
# while read inline
# do
# if [ ! -z "$(echo $inline | grep \")" ]
# then
# f1=$(echo $inline | cut -d\" -f1)
# f2=$(echo $inline | cut -d\" -f2)
# f3=$(echo $inline | cut -d\" -f3)
# echo $f1`echo $f2|sed 's/,//g'`$f3
# else
# echo $inline
# fi
# done
# exit 0
# This works correctly ONLY when there is EXACTLY ONE
field enclosed in double quotes.
# Revised code
# For each line that contains at least one field enclosed
in double quotes, process each such field from left to
right until no fields are enclosed in double quotes and
all remaining commas are field separators. The steps are:
(1) replace the double quotes enclosing the field being
processed with a temporary delimiter to isolate that
specific field, (2) remove any commas embedded in the
isolated field, (3) reconstruct the line without the
temporary delimiters. The temporary delimiter must be
a single character (for the cut command) that cannot
appear in the input file. I selected an asterisk (*),
but other characters can be used. Some characters (such
as asterisk, colon, hyphen, and equals) work fine, while
others (such as tab and semicolon) do not.
td=* # temporary delimiter
while read inline
do
while [ ! -z "$(echo $inline | grep \")" ]
do
inline=$(echo $inline | sed "s/\"/$td/" | sed "s/\"/$td/")
f1=$(echo $inline | cut -d"$td" -f1)
f2=$(echo $inline | cut -d"$td" -f2)
f3=$(echo $inline | cut -d"$td" -f3)
inline=$(echo "$f1$(echo $f2 | sed 's/,//g')$f3")
done
echo $inline
done
exit 0
# Test input file fixcsvtest.txt:
$ cat fixcsvtest.txt
4/7/14,subscriptions,199.99,Ask Dave Taylor Monthly
4/10/14,subscriptions,"1,300.99",Linux Journal
4/10/14,subscriptions,"1,300.99","Linux Journal, APR 2014"
4/10/14,subscriptions,19.99,"Linux Journal, annual"
ab,cd,ef,gh
ab,cd,ef,"g,h"
ab,cd,"e,f",gh
ab,cd,"e,f","g,h"
ab,"c,d",ef,gh
ab,"c,d",ef,"g,h"
ab,"c,d","e,f",gh
ab,"c,d","e,f","g,h"
"a,b",cd,ef,gh
"a,b",cd,ef,"g,h"
"a,b",cd,"e,f",gh
"a,b",cd,"e,f","g,h"
"a,b","c,d",ef,gh
"a,b","c,d",ef,"g,h"
"a,b","c,d","e,f",gh
"a,b","c,d","e,f","g,h"
$
# Test Results
$ ./fixcsv < fixcsvtest.txt
4/7/14,subscriptions,199.99,Ask Dave Taylor Monthly
4/10/14,subscriptions,1300.99,Linux Journal
4/10/14,subscriptions,1300.99,Linux Journal APR 2014
4/10/14,subscriptions,19.99,Linux Journal annual
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
ab,cd,ef,gh
$
Dave Taylor replies: Thanks for your note, Jeff, and I do believe you're correct that I didn't test the case where more than a single field of the input data included commas. Bah, pesky debugging! I like your mods, and yet still have a niggling sense that the entire problem can be sidestepped with the perfect regular expression. If I only had a few weeks to create it!
I thought you would like to see an unusual place where LJ is being read this month: 49
degrees north, 35 degrees west. That's the middle of the Atlantic Ocean at 20 knots
heading for NYC. The satellite Internet costs are rather steep on board so I brought a
few issues with me. Must dash as the sun is over the yard arm.
—
Roger Greenwood