theweaselking | Geek Pop Quiz: Perl, SQL, and HTML edition!

So, I've got Bugzilla. Bugzilla takes commits from CVS and adds them as comments to bugs. It then, when displaying bugs, parses out those comments and causes them to link to ViewCVS, with diffs when applicable, in an HTML table.

That table should have:
PATH/FILENAME LAST_VERSION THIS_VERSION DIFF

Each of those should be a link to the ViewCVS.cgi file, with the appropriate bits stacked on the end to get you a view of the file as a whole, the version you replaced, the version you committed, and the diff between the two. It handles new files and deleted files just fine.

The problem is that when the path/filename is very long, it causes the parser to produce garbage.

Given this text: \ncompany/docs/ProductDocs/releaseNote.html 1.9 1.10\n
(all one line, pulled right out of a MediumText entry in a MySQL table)

the code correctly parses it into

company/docs/ProductDocs/releaseNote.html 1.9 1.10 View diff

However, given *this* text: \ncompany/runtime/file_struct_templates/company/profiles/TI/languages/c/framework/target/oe_header.txt 1.21 1.22\n
(all one line, pulled right out of a MediumText entry in a MySQL table)

View diff

company/runtime/templates/company/output/port/c/zceComponentContainerImplPortsInclude.xslt View diff

1.11 1.12 View diff

The code of the function itself that does the parsing is here. And no, I didn't write it. If I'd written it, there would be more comments.

You can use View Source to see the tables themselves, if you want.

But, here's the question. Why does it garble the parsing when the path/filename is long? And how do I fix that script to make it parse the long ones correctly?

EDIT: There should be newlines before and after the text I'm passing to the function. Sorry.
EDIT#2: Corrected the actual text that's sent to the function.
EDIT#3: Apparently the problem with this code is that there is no problem with this code - it works perfectly for the given inputs, but the inputs provided on the long pathnames are full of extra \ns for no apparent reason. Lovely.

Flat | Top-Level Comments Only

From:

vagabond27.livejournal.com

Why does it garble the parsing when the path/filename is long?
The stars are right.
Ia! Ia! Cthulhu ftagn!

jerril

OK, I'm amazed that the short version works.

Because the first thing the perl script does is split on \n (newlines).

Which puts the string into an array, with each line of text as a new element in the array... (so only one line).

And then it "shifts" the array, meaning it pops the top element off the array leaving nothing inside it.

Are you sure there aren't any leading newline commands? This may even include unix/mac (just bare 0x0A or 0x0D, whichever isn't native to your environment) depending on how your server is set up.

Because I can't get the short version to work in the code as presented.

OK, if you wrap the strings in \n's, they work just fine.

so "\nSubject: CVS Checkin BRANCH: HEAD FILES CHECKED IN: company/runtime/file_struct_templates/company/profiles/TI/languages/c/framework/target/oe_header.txt 1.21 1.22\n" works.

Subject: CVS Checkin BRANCH: HEAD FILES CHECKED IN: company/runtime/file_struct_templates/company/profiles/TI/languages/c/framework/target/oe_header.txt

1.21

1.22

View diff

Never mind, that's not quite right. But better.

OK, question. Are you ACTUALLY getting

"Subject: CVS Checkin BRANCH: HEAD FILES CHECKED IN:\ncompany/docs/ProductDocs/releaseNote.html 1.9 1.10\n"

?

Because that works.

theweaselking.livejournal.com

I don't know. I shouldn't be.
But I'm not sure.
It's possible it's only passing the "\ncompany/docs/ProductDocs/releaseNote.html 1.9 1.10\n" part.

mhoye.livejournal.com

Can you rebuild your examples? They're clearly not quite accurate.

Rebuilt, but apparently the problem is not this code, it's the data that's being passed. Extra newlines are being added if the line is too long.

Basically, what's happening is that some function is dropping newlines around the relative_path if it's over some magic number. I suspect the magic number is 80-ish but I can't tell from here.

Once it's wrapped in newlines, what the subroutine sees is basically:
A blank line (thus the first blank line in the output)
A line with nothing but the relative path (Which is parsed as being the new version number, due to a quirk in the code)
A line with nothing but the old number and the new number (which are parsed as the relative path of a new entry, and the old number of that entry, as fallout from the same code quirk)

skington.livejournal.com

What I want to know is: why did the original author decide that the best way to handle arguments to a function was to reverse them, then to split on spaces, then to reverse each resulting component in turn.

Such things are common in specific-use patches to open-source projects.

It actually makes sense!

So.

Items come in groups of three.
The first one will ALWAYS be "company/something"
The second one will ALWAYS be a number, or the phrase "NONE"
The third one will ALWAYS be a number, or the phrase "NONE"

That should be REALLY easy to parse on, and then I could just discard all newlines in text.

And I don't really want to filter based on "company/" because I'm trying to convince them to split the damn module, which would mean that that first word might change up some. And the word appears IN the rest of the path, sometimes. But it will ALWAYS be [thing]/[more things that might or might not have slashies].

Of course, those "more things" might include spaces and might include numbers.

So that makes it harder to parse based on numbers.

Perhaps that's why he reversed it? Reverse, parse for number/NONE, parse for number/NONE, parse for filename now that you're sure the version numbers aren't in it? But then you have to know when to stop - and, when it's reversed, I suppose you could always stop at "[space or newline]company/" - since that's an extremely odd combination to have in the middle of a path. Not totally impossible, though.

Grr!

I wonder if I can just find the bit that's "parsing" the long path names and make it stop sticking newlines in.

What he's doing is protecting himself from paths that have spaces in them.

He splits from the end, telling perl he wants a maximum of three tokens when he's done (two splits). So even if perl finds more spaces, it won't split them. This means if your pathname contains a space (like windows pathnames are wont to do) it won't oversplit and give you a path like:

company\my
documents\my
projects\somewebspace\some
more paths\Frank!
10.2
10.3

I have seen the future.

And it doesn't make sense.

Geek Pop Quiz: Perl, SQL, and HTML edition!

Navigation

Geek Pop Quiz: Perl, SQL, and HTML edition!

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

Profile

		View diff
	company/runtime/templates/company/output/port/c/zceComponentContainerImplPortsInclude.xslt	View diff
1.11	1.12	View diff