Jump to content
  • Advertisement
Sign in to follow this  
grumpyOldDude

Java Pattern Matcher reads white space wrong

This topic is 402 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

As my project is beginning to take shape, most of the problems I've encountered, I've been able to solve, but this latest one cunningly beats me, not because of any complexities but  because I don't know whats going on under the hood of the parser.

I am using a pattern matcher() Pattern.compile("\\d+");  - (in JAVA) - to extract and read floats,   but there were problems as white spaces are being interpreted as 0.0

808.00.0472.00.036.00.0202.00.018.00.024.00.0
782.00.096.00.036.00.0202.00.018.00.024.00.0
909.00.01028.00.036.00.0202.00.018.00.024.00.0
931.00.01149.00.036.00.0202.00.018.00.024.00.0

but should, correctly, be something like this

808.0  472.0  36.0  202.0  18.0  24.0
782.0  96.0  36.0  202.0  18.0  24.0
909.0  1028.0  36.0  202.0  18.0  24.0
931.0  1149.0  36.0  202.0  18.0  24.0

It was wrong because it was making white space to be 0.0

As said It was wrong because it was making white space to be 0.0

My quick fix was to use an if statement that exclude 0.0.  

Well I got away with it until the inevitable began to happen,- some of the real data started turning out to be 0.0, so my if statement was excluding the  real data from being read.  Any help on how to get this fixed?  I need white space to be read as white space not as 0.0 

public void readDataFromSelectedTextFile( File fPathplusName  ){
	
		List<Float> numbers = new LinkedList<Float>();
	   try {
				bufferedReader = new BufferedReader(new FileReader(fPathplusName));
				while ((stringObjectData = bufferedReader.readLine()) != null){
					Pattern p = Pattern.compile("\\d+");
					Matcher m = p.matcher(stringObjectData);
					while (m.find()) {
					  numbers.add(Float.parseFloat(m.group()));
					}
				}
				ListIterator<Float>  floatIterator = numbers.listIterator();			
				int i=0, t=7, n=0;
				float s, size=0;                         
				
			    while( floatIterator.hasNext() ){
			    	if( (s = floatIterator.next()) > 0 ){
			    		...
                                      ...  plenty of good coding here ...
                                       ...
			    	}
			    }
		  } 
	      catch (FileNotFoundException e) {		
	    	  e.printStackTrace();
	      } 
	      catch (IOException e) {
	    	  e.printStackTrace();
	      }/**/
	}

 

Edited by grumpyOldDude

Share this post


Link to post
Share on other sites
Advertisement

I don't think whitespace is your problem.

 

I think your problem is that \d "digit" doesn't match '.' characters (I use C# not Java, but regular expressions should be basically the same, shouldn't they?), so when you see x.y  your regular expression finds x, skips . and then returns a second match for y.

 

Try using "[0-9\\.]+" instead, or if you want to be more exact:  "[\\-\\+]?[0-9]*(\\.[0-9]+)?"  (but this does not include the 10E+5 syntax.  left as an exercise to the reader)

Edited by Nypyren

Share this post


Link to post
Share on other sites

Oh dear, I completely forgot ...  Many thanks Nypyren,  your answer triggered further inspiration on how it works.  And so I replaced   Pattern p = Pattern.compile("\\d+");  with    Pattern p = Pattern.compile("\\d+\\.\\d+");   and  the algorithm read my data correctly. Thanks

Edited by grumpyOldDude

Share this post


Link to post
Share on other sites
1 hour ago, grumpyOldDude said:

Oh dear, I completely forgot ...  Many thanks Nypyren,  your answer triggered further inspiration on how it works.  And so I replaced   Pattern p = Pattern.compile("\\d+");  with    Pattern p = Pattern.compile("\\d+\\.\\d+");   and  the algorithm read my data correctly. Thanks

This will probably break if you ever have negative values, or forget to put in the fractional part.  That may or may not be a problem, depending on what your data is here.

Nypren's more exact regex would get around that.

If I knew I had a file full of white-space delimited floats, I would probably not bother with trying to use a regex at all, and instead, what I would probably do would be something like (also a C# guy...):

var numbers = new List<float>();
var lines = File.ReadAllLines("whatever.txt");
float f;
foreach (var line in lines){
	var values = line.Split(new[]{' ', '\t', '\r', '\n'}, StringSplitOptions.RemoveEmptyEntries));
	foreach( var value in values ) {
		if (float.TryParse(value, out f)){
			numbers.Add(f);
		}
	}
}

The Jamie Zawinski quote gets thrown out probably more than it should, but...

Quote

"Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems."

 

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!