
Reader = java.io.LineNumberReader(java.io.FileReader('demo_file.txt'),bufferSize ) įprintf(1,'Using java single line file reader and sscanf on single lines. %% Using Java single line readers + sscanf ScannedData = reshape(sscanf(data,'%d, %d'),2,)' ĭataIncrement(end+1) = fread(fid,1,'uint8=>char') %This can be slightly optimizedįprintf(1,'Reading large batches into memory, then sscanf. While ~isempty(dataIncrement) & (dataIncrement(end) ~= eol) & ~feof(fid)ĭataIncrement(end+1) = fread(fid,1,'uint8=>char') %This can be slightly optimized %% Reading in large batches into memory, incrementing to end-of-line, sscanfĭataBatch = fread(fid,bufferSize,'uint8=>char')' ĭataIncrement = fread(fid,1,'uint8=>char') Nums = įprintf(1,'Using textscan in large batches. ScannedData = textscan(fid, '%d, %d \n', bufferSize) ScannedData = reshape(fscanf(fid, '%d, %d', bufferSize),2,)' įprintf(1,'Using fscanf in large batches. %d check \n', t, CHECK) įprintf(1,'Using sscanf, once per line.
MATLAB TEXTSCAN CODE
Sample code for all of the solutions described above are included below. In fact that solution is 2 - 3 times slower than the comparable single line result using native readers.

(Not the "check" value does not match for the last entry.)įinally, in direct contradiction a previous edit of mine within this response, no savings are available by switching the the available cached Java, single line readers. However, some algorithms do not lend themselves to this, so we leave it alone. If we are willing to violate rule number three in the original post, another 7/8 of the time can be reduced by switching to a fully numeric processing. More than half of the original time (68 -> 27 sec) was consumed with inefficiencies in the str2num call, which can be removed by switching the sscanf.Ībout another 2/3 of the remaining time (27 -> 8 sec) can be reduced by using larger batches for both file reading and string to number conversions.

performing the string to number conversion more efficiently (either via batching, or using better functions).With that in mind, the answers and comments seem to be encouraging efficiency in three areas: The actual operation (what the OP calls "do stuff with nums") must be performed one row at a time, cannot be vectorized.

The method must scale to reading files that are too large to be contained in memory, (although my patience is limited, so my test file is only 500,000 lines). Here are my assumptions:Ī well formatted ASCII file, containing two columns of numbers. This is a common struggle, and there is nothing like a test to answer.
