I need a sed ninja

Started by
4 comments, last by BeanDog 16 years, 4 months ago
I have a large text file that's the output of mysqldump. This means that the output looks like this:

-- This is a comment
-- another comment.

USE `dbname`;

-- More comments here

DROP TABLE IF EXISTS `achentry`;
CREATE TABLE `achentry` (
  `EntryId` int(11) NOT NULL auto_increment,
  `BatchId` int(11) NOT NULL,
  `Info` blob NOT NULL,
  `Created` datetime NOT NULL,
  `Amount` decimal(50,2) NOT NULL,
  PRIMARY KEY  (`EntryId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

My goal is to get each SQL statement onto a single line. As far as I can tell, comments never appear on a line with any actual SQL code, so we don't have to worry about stripping trailing comments. Essentially, I see this working like this: 1) Get rid of all comments 2) Remove the newline at the end of all lines that don't end in a semicolon Step 1 is easy: cat dump.sql | sed 's/--.*//' > dump2.sql Step 2 is tough. sed operates on single lines--how would I make it strip out the newline at the end of lines not ending in semicolons? If it matters, I'm using bash on CentOS5 (essentially RHEL5).
Advertisement
Use tr to delete newlines, then use sed to change all semicolons to semicolon-newlines.
sed -n -e '/--/d; H; $ { g; s/\n//g; s/ \+/ /g; s/;/;\n/g; p }' dump.sql > result.sql


sed does not really operate on individual lines. It merely adds text to the pattern space one line at a time, and flushes the pattern space in-between lines. If you wish to work on several lines at once, just append these lines to the work space using the H command. The above source does, for every line inside the input file:

/--/d       # if a line contains '--' stop processing itH           # append the line to the work buffer, separated by newline$ {         # only do this stuff when done reading all input  g         # load the work space into the pattern space  s/\n//g   # remove all newlines in the pattern space  s/ \+/ /g # remove all redundant whitespace in the pattern space  s/;/;\n/g # add a newline after every semicolon in pattern space  p         # print pattern space to output}


The -n argument specifies that the only script output shall come from the 'p' instruction inside, instead of echoing the pattern space after each line read.
Quote:Original post by ToohrVyk
sed -n -e '/--/d; H; $ { g; s/\n//g; s/ \+/ /g; s/;/;\n/g; p }' dump.sql > result.sql


Thanks, that worked perfectly! Rating++
Out of curiosity: Why do you need them to be on a single line?

<hr />
Sander Marechal<small>[Lone Wolves][Hearts for GNOME][E-mail][Forum FAQ]</small>

I'm playing with uni/cluster, a middleware clustering solution for MySQL. Its command-line client to interface with the MySQL cluster (the equivalent of the mysql CLI) will run each line of the file as a query if the file is piped into the CLI (unlike the regular mysql CLI).

This topic is closed to new replies.

Advertisement