Sign in to follow this  

I need a sed ninja

This topic is 3658 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have a large text file that's the output of mysqldump. This means that the output looks like this:
-- This is a comment
-- another comment.

USE `dbname`;

-- More comments here

DROP TABLE IF EXISTS `achentry`;
CREATE TABLE `achentry` (
  `EntryId` int(11) NOT NULL auto_increment,
  `BatchId` int(11) NOT NULL,
  `Info` blob NOT NULL,
  `Created` datetime NOT NULL,
  `Amount` decimal(50,2) NOT NULL,
  PRIMARY KEY  (`EntryId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

My goal is to get each SQL statement onto a single line. As far as I can tell, comments never appear on a line with any actual SQL code, so we don't have to worry about stripping trailing comments. Essentially, I see this working like this: 1) Get rid of all comments 2) Remove the newline at the end of all lines that don't end in a semicolon Step 1 is easy: cat dump.sql | sed 's/--.*//' > dump2.sql Step 2 is tough. sed operates on single lines--how would I make it strip out the newline at the end of lines not ending in semicolons? If it matters, I'm using bash on CentOS5 (essentially RHEL5).

Share this post


Link to post
Share on other sites
sed -n -e '/--/d; H; $ { g; s/\n//g; s/ \+/ /g; s/;/;\n/g; p }' dump.sql > result.sql


sed does not really operate on individual lines. It merely adds text to the pattern space one line at a time, and flushes the pattern space in-between lines. If you wish to work on several lines at once, just append these lines to the work space using the H command. The above source does, for every line inside the input file:


/--/d # if a line contains '--' stop processing it
H # append the line to the work buffer, separated by newline

$ { # only do this stuff when done reading all input
g # load the work space into the pattern space
s/\n//g # remove all newlines in the pattern space
s/ \+/ /g # remove all redundant whitespace in the pattern space
s/;/;\n/g # add a newline after every semicolon in pattern space
p # print pattern space to output
}


The -n argument specifies that the only script output shall come from the 'p' instruction inside, instead of echoing the pattern space after each line read.

Share this post


Link to post
Share on other sites
I'm playing with uni/cluster, a middleware clustering solution for MySQL. Its command-line client to interface with the MySQL cluster (the equivalent of the mysql CLI) will run each line of the file as a query if the file is piped into the CLI (unlike the regular mysql CLI).

Share this post


Link to post
Share on other sites
Sign in to follow this