PEAK XOOPS - Let's make joints (2) in englishin japanese

Archive | RSS |
XOOPS
XOOPS : Let's make joints (2)
Poster : GIJOE on 2007-06-11 05:19:05 (8997 reads)

in englishin japanese
I'm itroducing the second sample of "parse joint".
This joint named D3pipesParseLinkhtml can fetch "publish time" and "link of the article" on the other hand, D3pipesParseSimplehtml can fetch only the "heading".

D3pipesParseLinkhtml has been already included the latest archive of d3pipes.

With the sample joint of D3pipesParseLinkhtml, you can make a pipe like this.
(Just a sample)


 0 Fetching from outside  snoopy    (URI of the page)
10 Transfer to UTF-8     (asyoulike)  The encoding of the page
20 Parsing XML           linkhtml   #([0-9/]{10}).*href=\"([^"]+)\"\>(.*)\</a\>#iU
30 Transfer from UTF-8   (asyoulike)  Internal encoding of your site
40 Clipping into local    moduledb   86400


It is defficult to describe the option as a regex pattern for linkhtml.

You have to know the meaning of each ().

1st () corresponds datetime (published)
2nd () corresponds link URI
3rd () corresponds the heading

Or

1st () corresponds link URI
2nd () corresponds the heading
3rd () corresponds datetime (published)

You have to make a regex pattern fitting the site you want to get information.


joints/parse/D3pipesParseLinkhtml.class.php

<?php

require_once dirname(dirname(__FILE__)).'/D3pipesParseAbstract.class.php' ;

class D3pipesParseLinkhtml extends D3pipesParseAbstract {

	function execute( $html_source , $max_entries = '' )
	{
		$items = array() ;

		$result = preg_match_all( $this->option , $html_source , $matches , PREG_SET_ORDER ) ;
		if( ! $result ) {
			$this->errors[] = 'Invalid pattern for this Parser' ;
		}
		foreach( $matches as $match ) {
			if( preg_match( '#[0-9]{2,4}[/.-][0-9]{1,2}[/.-][0-9]{1,2}#' , $match[1] , $regs ) ) {
				$pubtime = strtotime( $regs[0] ) ;
				$link = $match[2] ;
				$headline = $match[3] ;
			} else if( preg_match( '#[0-9]{2,4}[/.-][0-9]{1,2}[/.-][0-9]{1,2}#' , $match[3] , $regs ) ) {
				$pubtime = strtotime( $regs[0] ) ;
				$link = $match[1] ;
				$headline = $match[2] ;
			} else {
				$pubtime = time() ;
				$link = $match[1] ;
				$headline = $match[2] ;
			}

			$items[] = array(
				'headline' => $headline ,
				'pubtime' => $pubtime ,
				'link' => $link ,
				'fingerprint' => $link ,
			) ;
		}

		return $items ;
	}

}

?>

0 comments

Related articles
Printer friendly page Send this story to a friend

Comments list

Login
Username or e-mail:

Password:

Remember Me

Lost Password?

Register now!