egghelp/eggheads community

split ||

2003-08-13T01:48:44-04:00

O(8n) = O(2n) = O(n) (you ignore constants). That's why I said both are O(n).

Yes, this is true.. However, because tcl execution is quite slow in comparison to compiled languages, the coefficients become increasingly harder to simply discard.

Also, what does comparing regsub and string map have to do with anything? Ppslim's original proc didn't use regsub so I don't see where that comparison is going.. but even so, it's wrong, because regsub is not always O(n), it can be higher, like O(n^2) for certain operations, or lower, like O(1) for other operations.

I brought it up as a comparison for the short 'chop' procedure...
ie. regsub -all $by $str $re str in place of the string map...

So either (from timed results):
a) regsub does extra work to do the same thing as done by the string map (> O(n))
b) string map doesn't take an iterative-search approach to implementing it's changes (< O(n))

Nitpick nitpicked nitpicked nitpick nitpick nitpicked

Statistics: Posted by strikelight — Wed Aug 13, 2003 1:48 am

split ||

2003-08-13T01:28:27-04:00

Just to nitpick, you're assuming that the operations in question have the same penalty time-wise, but that's wrong. If you think about it, "string map" and "split" both cycle through the entire string searching. Both procs are O(n).
I was referring to the 'string map' versus the proc initially proposed by ppslim, which uses a while loop, as well as many other functions, which is obviously going to require a larger O notation.

I was referring to those two procs too. Ppslim's does not require a bigger O notation, because "string map" is itself a function that uses a loop and many other functions. It is not a constant-time function. So the two procs are basically the same in terms of efficiency -- although ppslim's is slower overall because it's implemented in tcl instead of C. That doesn't change its O value.

And if you are referring to the proc which does use both split and string map and my calculation of big-O, you will see the word 'about' in my estimation (which is what O notation is).. It would be O(n+n) = O(2n) then.. and even then it's probably less, because when you think about it, if you were to use regsub in place of string map, you would find it takes longer in practical tests.. so assuming the regsub would be O(n), then string map < O(n) .. Nitpick nitpicked.

O(8n) = O(2n) = O(n) (you ignore constants). That's why I said both are O(n).

Just to clear this up: the purpose of big-O notation is to estimate the change in something like memory usage or running time relative to a change in input (n). In this case we're talking about string length. So if you double the string length, an O(n) algorithm will take double the time to finish. You can see that (2n) / (n) = 2, twice the time. If you have O(8n), you get (8 * 2n) / (8 * n) = 2 (same as O(n)). If you have an O(n^2) algorithm, you get ((2n)^2) / (n^2) = 4, which means it takes 4 times as long when you double the input.

Also, what does comparing regsub and string map have to do with anything? Ppslim's original proc didn't use regsub so I don't see where that comparison is going.. but even so, it's wrong, because regsub is not always O(n), it can be higher, like O(n^2) for certain operations, or lower, like O(1) for other operations.

nitpicked nitpick nitpick nitpicked :)

Statistics: Posted by stdragon — Wed Aug 13, 2003 1:28 am

Re: Slight bug

2003-08-12T22:53:09-04:00

Hence the \x81 furthur on..

I still don't get it.

O-notation is largely used in computer science... To call it sensless, is pure ignorance. I suggest researching "O Notation" on google.

I didn't call O-notation senseless. What I meant is that it's very inaccurate when used on the uncompiled tcl code.

Statistics: Posted by user — Tue Aug 12, 2003 10:53 pm

Re: Slight bug

2003-08-12T22:43:19-04:00

It most definitley is better for ANY text...
Not if the text can contain any char. Then it's useless.

Hence the \x81 furthur on..

the previous implementation would render approximately O(8n) instructions whereas the second one only renders about O(3) instructions ... so if the text was 128 chars long, the first one would be issuing about 384 instructions (worst case scenario) to the processor, as opposed to the mere 3 instructions sent out by the shorter proc.
By "instructions" I assume you mean command invocations, and counting them, like stdragon said, makes little sense.

O-notation is largely used in computer science... To call it sensless, is pure ignorance. I suggest researching "O Notation" on google.

Statistics: Posted by strikelight — Tue Aug 12, 2003 10:43 pm

Re: Slight bug

2003-08-12T22:34:16-04:00

It most definitley is better for ANY text...

Not if the text can contain any char. Then it's useless.

the previous implementation would render approximately O(8n) instructions whereas the second one only renders about O(3) instructions ... so if the text was 128 chars long, the first one would be issuing about 384 instructions (worst case scenario) to the processor, as opposed to the mere 3 instructions sent out by the shorter proc.

By "instructions" I assume you mean command invocations, and counting them, like stdragon said, makes little sense.

Why think when we've got "time"? I named the three procs from this thread in the order they were posted and timed them:

Code:

set a "ab||cde||fghi||jklmn||opqrst||uvwxyz0||12345678||"foreach cmd {chop1 chop2 chop3} {  puts "$cmd: [time [list $cmd $a ||] 10000]"}

Result:

chop1: 377 microseconds per iteration
chop2: 118 microseconds per iteration
chop3: 36 microseconds per iteration

Although I would have used \x81 instead of \0 myself

WHY?
\x81 can be sent via irc, \0 can't. (unless it's encoded in a ctcp iirc) That's my reason for using \0.

Statistics: Posted by user — Tue Aug 12, 2003 10:34 pm

split ||

2003-08-12T22:09:21-04:00

Just to nitpick, you're assuming that the operations in question have the same penalty time-wise, but that's wrong. If you think about it, "string map" and "split" both cycle through the entire string searching. Both procs are O(n).

I was referring to the 'string map' versus the proc initially proposed by ppslim, which uses a while loop, as well as many other functions, which is obviously going to require a larger O notation.
And if you are referring to the proc which does use both split and string map and my calculation of big-O, you will see the word 'about' in my estimation (which is what O notation is).. It would be O(n+n) = O(2n) then.. and even then it's probably less, because when you think about it, if you were to use regsub in place of string map, you would find it takes longer in practical tests.. so assuming the regsub would be O(n), then string map < O(n) .. Nitpick nitpicked.

Statistics: Posted by strikelight — Tue Aug 12, 2003 10:09 pm

split ||

2003-08-12T21:18:39-04:00

Just to nitpick, you're assuming that the operations in question have the same penalty time-wise, but that's wrong. If you think about it, "string map" and "split" both cycle through the entire string searching. Both procs are O(n).

Statistics: Posted by stdragon — Tue Aug 12, 2003 9:18 pm

Re: Slight bug

2003-08-12T14:19:12-04:00

This check:
if {[string length [string range $in $chunks end]]} {
will lead to invalid results if the last chars of the string is the chars you're "splitting" by. (should result in a empty element at the end)

Here's a rewrite of ppslim's proc that should produce results more like the original split:
Code:
proc chop {str {by " "}} {set l [string length $by]set i 0set j 0while {[set j [string first $by $str $i]]>-1} {lappend out [string range $str $i [expr {$j-1}]]set i [expr {$j+$l}]}if {$i<=[string len $str]} {lappend out [string range $str $i end]}set out}
EDIT: I still think
Code:
proc chop {str {by "  "} {re \0}} {  split [string map [list $by $re] $str] $re}
is better (at least for text recieved from irc)

It most definitley is better for ANY text... not only because of code size, but also cpu time wise.. the previous implementation would render approximately O(8n) instructions whereas the second one only renders
about O(3) instructions ... so if the text was 128 chars long, the first one would be issuing about 384 instructions (worst case scenario) to the processor, as opposed to the mere 3 instructions sent out by the shorter proc. Although I would have used \x81 instead of \0 myself

Statistics: Posted by strikelight — Tue Aug 12, 2003 2:19 pm

Slight bug

2003-08-12T09:29:08-04:00

This check:

if {[string length [string range $in $chunks end]]} {

will lead to invalid results if the last chars of the string is the chars you're "splitting" by. (should result in a empty element at the end)

Here's a rewrite of ppslim's proc that should produce results more like the original split:

Code:

proc chop {str {by " "}} {set l [string length $by]set i 0set j 0while {[set j [string first $by $str $i]]>-1} {lappend out [string range $str $i [expr {$j-1}]]set i [expr {$j+$l}]}if {$i<=[string len $str]} {lappend out [string range $str $i end]}set out}

EDIT: I still think

Code:

proc chop {str {by "  "} {re \0}} {  split [string map [list $by $re] $str] $re}

is better (at least for text recieved from irc)

Statistics: Posted by user — Tue Aug 12, 2003 9:29 am

split ||

2003-08-12T07:54:50-04:00

Code:

proc chunk {in chars} {  if {[string first $chars $in] < 0} { return [list $in] }  set temp [list]  set chunks 0  set chunke 0  while {[set chunke [string first $chars $in $chunks]] != "-1"} {    lappend temp [string range $in $chunks [expr $chunke - 1]]    set chunks [expr $chunke + [string length $chars]]  }  if {[string length [string range $in $chunks end]]} {    lappend temp [string range $in $chunks end]  }  return $temp}

Simalar to split, however, it does it in chunks like you asked.

% set a "123,@.456,@.789,@.abc,@.def,>.ghi,@.jklm"
123,@.456,@.789,@.abc,@.def,>.ghi,@.jklm

% chunk $a ",@."
123 456 789 abc def,>.ghi jklm

% chunk $a ",>."
123,@.456,@.789,@.abc,@.def ghi,@.jklm

Statistics: Posted by ppslim — Tue Aug 12, 2003 7:54 am

split ||

2003-08-11T09:28:15-04:00

Check the manual.
'split' splits on ALL the chars specified in the second argument (if any).

Code:

regexp -all -inline {[^|]+} $yourString

would return a list like you want, but doesn't care how many |'s there are between the "elements".
Another solution is replacing || with some single char not used anywhere else in your content (using 'string map' or 'regsub') and then split by that char.

Statistics: Posted by user — Mon Aug 11, 2003 9:28 am

split ||

2003-08-11T08:58:20-04:00

hi
anyone a solution for this:

i've got a string "a||b||c||d" and i want to split it into a, b, c and d. now my problem is that tcl won't split by "||". it just splits by "|" and gives me too much parts.

Code:

set test "a||b||c||d"set length [llength [split $test "||"]]

length is now "7".

Code:

set test "a|b|c|d"set length [llength [split $test "||"]]

length is now "4".

i've tried everything i could think of (split "\\||", split {||}, split "\|\|"...). none did work. can you help me?

Statistics: Posted by arcane — Mon Aug 11, 2003 8:58 am