Jump to content

  • Log In with Google      Sign In   
  • Create Account

big text file


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
8 replies to this topic

#1 Jarwulf   Members   -  Reputation: 222

Like
1Likes
Like

Posted 06 May 2012 - 08:05 PM

I have a file of text about 600mb that wordpad freezes when trying to handle and notepad and openoffice writer refuse to work with. My computer is only a few years old running win7 so I don't understand why this size is such a challenge but it appears to be so. Anyway I can open it smoothly? I just need to copy a few lines.

I tried some text file splitter programs but they run out of memory trying to load the file which makes me wonder what their purpose is.

Edited by Jarwulf, 06 May 2012 - 08:40 PM.


Sponsor:

#2 jefferytitan   Crossbones+   -  Reputation: 2125

Like
0Likes
Like

Posted 06 May 2012 - 09:21 PM

Notepad and Wordpad aren't great at that kind of thing. They try to load it all into memory in one go, and then perform word-wrapping. Almost Microsoft demonstrating how not to write an editor well. ;) I think Textpad may do a better job. In general it's problematic because many editors represent the file in a larger format in memory (for example to handle word-wrapping, syntax highlighting, formatting, etc), and often they don't do paging. It's weird that the text splitters didn't work.

If you have no luck with that, try it yourself. Open it as a stream (or manually buffer into a byte array), keep reading chunks until you hit both the size limit and a line break, then dump to disk as a file and repeat.

#3 Krohm   Crossbones+   -  Reputation: 3117

Like
2Likes
Like

Posted 07 May 2012 - 12:58 AM

For a really quick test, I've been slightly more successful with Notepad++, I think I've opened stuff exceeding 100 Megs but in general, this is going to be problematic for most text editors.

#4 RedArgo   Members   -  Reputation: 102

Like
0Likes
Like

Posted 09 May 2012 - 02:29 PM

I use Programmer's File Editor at work and frequently work with much bigger files than that. It can take a while, but it can handle multi-gig files.

#5 Bacterius   Crossbones+   -  Reputation: 8861

Like
0Likes
Like

Posted 09 May 2012 - 03:14 PM

If you just want to copy some lines you may be better off opening the textfile in a hexadecimal editor (which are fundamentally designed to be efficient).

Edited by Bacterius, 09 May 2012 - 03:14 PM.

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#6 Oolala   Members   -  Reputation: 810

Like
1Likes
Like

Posted 09 May 2012 - 10:57 PM

Perhaps this is a dumb question, but why on earth are you manually sifting through 600+ mb text files by hand. Thats WAY too much to do anything useful with, as a human being. If you know what information you want, maybe the better route would be to write some short thing that pulls out of it the part you need.

I routinely deal with huge text files. Typically on the order of 100mb to 1gb, and tens of thousands of them at a time. There is nothing even remotely productive that I can manually do with that information. I can, however, write a paragraph-long script that parses 100 gbs in one whack, and generates a graph of the feature I'm actually interested in. That's information that I can, as a human being, actually do something with.

#7 rip-off   Moderators   -  Reputation: 8217

Like
0Likes
Like

Posted 10 May 2012 - 04:16 AM

For future sanity, think about how you are generating this data and see if you can make it more manageable. If it is a log, perhaps it can be broken into a number of distinct sub-systems. You might want to rotate the log periodically too.

#8 swiftcoder   Senior Moderators   -  Reputation: 9992

Like
0Likes
Like

Posted 10 May 2012 - 07:13 AM

Load it up on a linux box (or install some unix tools on your window's box). Less/more have no trouble with multi-gigabyte files, nor does ViM.

Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#9 jnmacd   Members   -  Reputation: 197

Like
0Likes
Like

Posted 10 May 2012 - 09:02 AM

Notepad++ has worked on 600MB files for me.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS