SCRAPY is a screen scraping framework. web scraper,web extracter, are similar to screen scraping.
what a web scraper can do? It can extract desired information from the website of interest,then you may transfer or research information that be saved as a file to what you want. for more detail check its website: SCRAPY
Install scrapy on windows platform note:
A1# requirement and prepare
Step1:
Install python(Python 2.6 or 2.7)
(Python is a programming language)
Windows x86 MSI Installer (2.7.3):download python 2.7.3 (now)
After downloading, click the file python-2.7.3.msi to install python on windows XP, and finish installing.
Step 2:
Add python folder PATH to windows system PATH
click system in control panel=>click environment variable=>select path and edit it in environment variable=> add C:\python27\Scripts and C:\python27 where the python folder is to path (;C:\python27\Scripts;C:\python27)
Step3:
Install OpenSSL
download and install OpenSSL(the regular version) and Visual C++ 2008 redistributables(download link) on Win32 OpenSSL page (Win32 OpenSSL v1.0.1e)
add c:\openssl-win32\bin to the PATH, the same as STEP2
Step4:
download pip or easy_install for installing SCRAPY
(ex:easy_install here)
(setuptools-0.6c11.win32-py2.7.exe)
A2# Install SCRAPY
To install using : easy_install scrapy
for more details see the install section in the documentation: http://doc.scrapy.org/en/latest/intro/install.html
setup error: unable to find vcvarsall.bat when setup twisted 12.3.0
fixed method:download twisted win32 setup file and install it directly, then reinstall scrapy and success (Twisted-12.3.0.win32-py2.7.exe)(another link: http://twistedmatrix.com/trac/wiki/Downloads)
or setup Visual C++ 2008 Express Edition (still not try it)
Next step: scrapy tutorial
Saturday, March 30, 2013
Friday, March 29, 2013
What's DOM?
DOM stands for Document Object Model is a standard language interface recommended by W3C for processing XML(eXtensible Markup Language.
For more detail check W3C document object model
DOM:"The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page."
DOM:"The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page."
Subscribe to:
Posts (Atom)