Quantcast
Channel: How to strip headers/footers from Project Gutenberg texts? - Stack Overflow
Browsing latest articles
Browse All 5 View Live

Answer by Marco for How to strip headers/footers from Project Gutenberg texts?

I am also trying to figure out a way to clean a Gutenberg Project text files for text analysis purpouses, but I use julia and I am probably just trying to reinvent the wheel. So I wonder if it is...

View Article



Answer by wordsforthewise for How to strip headers/footers from Project...

The gutenbergr package in R seems to do an ok job of removing headers, including junk after the 'official' end of the header.First you'll need to install R/Rstudio,...

View Article

Answer by hippietrail for How to strip headers/footers from Project Gutenberg...

I've also wanted a tool to strip Project Gutenberg headers and footers for years for playing with natural language processing without contaminating the analysis with boilerplate mixed in with the etxt....

View Article

Answer by Beta for How to strip headers/footers from Project Gutenberg texts?

You weren't kidding. It's almost as if they were trying to make the job AI-complete. I can think of only two approaches, neither of them perfect.1) Set up a script in, say, Perl, to tackle the most...

View Article

How to strip headers/footers from Project Gutenberg texts?

I've tried various methods to strip the license from Project Gutenberg texts, for use as a corpus for a language learning project, but I can't seem to come up with an unsupervised, reliable approach....

View Article

Browsing latest articles
Browse All 5 View Live




Latest Images