Searching for Data in an external TXT file

Russell Bird · July 26, 2019

Hi,

I'm hoping to use an external tab delimited TXT to store a master price list for use with our CAD drawings. The file contains over 12,000 item codes which means I can't load it into a worksheet in VectorWorks as the row limit on those is 4094. I've proven I can open and read a line from the file, but then I wanted to test how quickly VectorWorks could scan through the rows, so I simply put in a while loop in with an incremental counter variable. At the file of the while loop (based on EOF), the loop should exit and give me the number of lines read (loops done in effect) by displaying the counter variable in a dialog (using AlrtDialog).

I set this running and waited (a long time), but the dialog didn't get displayed and interface was locked out so I assume the script was still running. 2 possiblities: 1) the loop has a fault causing it to loop indefinitely or 2) VectorWorks is really slow at reading lines from a file. I don't it's option 1 as I've checked the loop code and am pretty sure it's fine.

I wondered if anyone could tell me if there are limits to the number of lines or file size that can be read with VectorScript and/or is VectorScript just really slow (in which case pretty useless) at reading files?

Thanks

Russell

JBenghiat · July 26, 2019

I strongly recommend switching to python for file access. File reading and data parsing are native to the language, while the VS routines are proprietary and less flexible. (Not to mention, numerous python examples and tutorials exist on the web)

https://www.guru99.com/reading-and-writing-files-in-python.html

You should be able to read the file and load data into arrays or objects with a couple of lines.

Edited July 26, 2019 by JBenghiat

Peter Vandewalle · July 26, 2019

I agree on the Python idea. And I would suggest using an xml dike instead of a txt file. XML data can be accessed by address instead of by looping.

LarryO · July 27, 2019

if you are in an infinite loop. This could be as simple as checking for NULL data from your read line request to determine EOF. I don't believe that files embed a readable ascii 4 character because EOF is a control state by definition.

MullinRJ · July 29, 2019

Russell,

I just finished a script in Python today that imports a 220 MB text file into a LIST (5,321,708 lines) - one text line per list element. I was able to do some serious parsing of the data (extracting names, searching for complex patterns, combining lines, and marking most for deletion), then write it out and back in to repeat the process a dozen times, all in under 35 seconds. This is not possible in VectorScript (VS), as the largest array in VS is limited to 32K elements. You're 12K item codes would fit into a VS array, but you'll have to code most of the routines you'll need yourself. Chances are, what you want to do is already available in Python as a set of canned routines.

For many things, VS and Python are nearly equal in speed. But if you're going to do a lot of string manipulation, Python has way more text handling features than Pascal. It's worth the effort to explore the language. Also, there is a seemingly unending wealth of help sites and fora on the web for Python. Let Google lead you to the answers you seek.

My 2¢,

Raymond

Nicolas Goutte · July 29, 2019

Indeed, that is what I would recommend too: use Python. VS was not meant for handling large files.

You could even use the csv module, if your data is organized in columns (be it comma separated, separated by tabs or any similar tabular format that the csv module supports).

warrenfelsh · May 6, 2020

In the case you are working with Big Data using readlines() is not very efficient as it can result in MemoryError because this function loads the entire file into memory, then iterates over it.

A slightly better approach for large files is to use the fileinput module , as follows:

import fileinput
for line in fileinput.input(['sample.txt']):
    print(line)

The fileinput.input() call reads lines sequentially, but doesn't keep them in memory after they've been read or even simply so this, since file in Python is iterable.

Nicolas Goutte · May 25, 2020

On 5/6/2020 at 7:31 AM, warrenfelsh said:
In the case you are working with Big Data using readlines() is not very efficient as it can result in MemoryError because this function loads the entire file into memory, then iterates over it.

A slightly better approach for large files is to use the fileinput module , as follows:
import fileinput
for line in fileinput.input(['sample.txt']):
    print(line)
The fileinput.input() call reads lines sequentially, but doesn't keep them in memory after they've been read or even simply so this, since file in Python is iterable.

The Python 3.5 documentation does not recommend using the module fileinput for just one file: https://docs.python.org/3.5/library/fileinput.html

Instead, https://docs.python.org/3.5/library/io.html (paragraph for readlines) recommends using:

for line in file:

So the code sample above would then read something like:

for line in open('sample.txt', mode='r'):
    print(line)

Edited May 25, 2020 by Nicolas Goutte

Sign In

Searching for Data in an external TXT file

Recommended Posts

Russell Bird

Link to comment

JBenghiat

Link to comment

Peter Vandewalle

Link to comment

LarryO

Link to comment

MullinRJ

Link to comment

Nicolas Goutte

Link to comment

warrenfelsh

Link to comment

Nicolas Goutte

Link to comment

Join the conversation

EXPLORE

ACTIVITY

MARIONETTE