Posts by elias • 3,132 points
37 posts
- 
		0 votes1 answer45 viewsA: "Missing Scheme" error using ScrapyAssuming you want to visit all pages (from 1 to 240), you probably wanted to do: def start_requests(self): link = "http://www.jornalpanorama.com.br/site/data-policia.php?page=" for x in range(1,… 
- 
		0 votes1 answer146 viewsA: Scrapy different pagesReplace the return of the method parse by a yield. The return is causing the method to return in the first case, without finishing scrolling through the rest of the posts in the for loop. Using… 
- 
		3 votes1 answer309 viewsA: Information contained in two Scrapy pagesThe traditional way to extract data from multiple pages and use the mechanism to pass data between a request and another using the dictionary meta. Here’s how it works: in the callback that is… 
- 
		1 votes4 answers14598 viewsA: Print only the column of a matrixTo work with matrices, it’s nice to use numpy. With it you can do: >>> import numpy as np >>> a = np.array([[1, 5], [7, 4], [8, 3]]) >>> print a.transpose()[0] [1 7 8]… 
- 
		1 votes2 answers187 viewsA: Python code errorIt’s hard to answer your question without seeing the snippet of code you’re trying to execute and the data you’re using. But the error message you are seeing seems to be from an attempt to use the… 
- 
		3 votes1 answer55 viewsA: Is there any way to disable Scrapyd’s web interface?scrapyd does not offer this option. But you can get this by running a custom installation with the code commenting on that line: https://github.com/scrapy/scrapyd/blob/master/scrapyd/website.py#L24… 
- 
		3 votes4 answers1398 viewsA: Tool to generate XpathTry the bookmarklet Selectorgadget. It works like this: right after you fire, you click on something you want to capture. It generates a very generic selector, and highlights in yellow the captured… 
- 
		3 votes1 answer126 viewsA: How to manage the operation and failure in the execution of Spiders?Well, as who has access to the real Stats is the Scrapy (the scrapyd only runs the Jobs), I think the way to solve this problem is to use a Spider middleware send Crawler statistics to your… 
- 
		1 votes1 answer138 viewsA: Deploy queues to manage competition between Piders in ScrapydWell, if you need simple priorities, one option is to use scrapyd’s priority parameter (this is not documented but is implemented here, is basically a basic priority queue on top of Sqlite). To use,… 
- 
		5 votes1 answer7435 viewsA: On HREF and SRC, finally, what are the differences of application?It is quite possible that the most efficient way is to import all the necessary CSS already in the html HEAD, before making any dynamic update via Javascript: if the actualization is via Ajax you… 
- 
		2 votes2 answers113 viewsA: How to protect my Scrapyd server from unauthorized calls?So this username/password setting is a client setting for a basic HTTP authentication, that the scrapyd currently does not implement. To set this up on your server, the way is to let scrapyd listen… 
- 
		2 votes2 answers126 viewsA: How to calculate an optimal value for Scrapyd’s CONCURENT_REQUESTS variable?You can use the extension Autothrottle, trying to optimize crawling speed based on estimates of server load and scrapy processing. Using this extension (code here), you can define a… 
- 
		2 votes1 answer152 viewsA: Multiple Pipelines to Treat Different Files in ScrapyIt is a common pattern in pipelines (and Spider middlewares as well) to use Spider attributes to decide what to do: class MyPipeline: def process_item(self, item, spider): if getattr(spider,… 
- 
		3 votes1 answer1599 viewsA: How to get the average sizes of a cluster with Pandas?In accordance with answer I just got in the OS in English, the solution is to make another groupby specifying the meter level: df.groupby(['tipo', 'ano']).size().groupby(level=1).mean() ano 2000 1.5… 
- 
		1 votes1 answer1599 viewsQ: How to get the average sizes of a cluster with Pandas?Given a Pandas Dataframe, with the data in such a structure: import pandas as pd raw_data = { 'tipo': ['a', 'a', 'b', 'c', 'c', 'c', 'd'], 'ano': [2000, 2000, 2000, 2001, 2001, 2001, 2001], } df =… 
- 
		2 votes1 answer233 viewsA: Prevent js file from runningYou can make a user script to change the functioning of the site, using the extension Greasemonkey (if you use Firefox) or Tampermonkey (if you use Chrome), which reverses what the play.html.js… 
- 
		4 votes6 answers3344 viewsA: How is an agile way to add and remove code comments in VIM?I suggest using the plugin tcomment, providing unique shortcuts to comment/uncomment in multiple languages. With tcomment installed, you can do, by Normal mode: gcc to comment/uncomment the current… 
- 
		1 votes3 answers19533 viewsA: How to do a Ubmit sending filled data in the form to an email?Browsers do not send email. Your code is the closest you will get using just the browser. You can’t email only with HTML & Javascript -- in web applications, you can only send emails securely on… 
- 
		4 votes3 answers70810 viewsA: What are the appropriate data types for columns like address, email, phone and mobile phone for SQL database?If you are not going to do math with the numbers (add, multiply, etc), there is not much to use numerical types. Use CHAR or VARCHAR, as appropriate. 
- 
		16 votes8 answers5920 viewsA: Is it always guaranteed that a multi-threaded application runs faster than using a single thread?Be careful not to confuse parallelism with competition. Parallelism relates to two tasks running at the same time. Example: two tasks running at the same time, one on each CPU. Competition relates… 
- 
		3 votes2 answers1611 views
- 
		3 votes1 answer109 viewsA: How to repeat a block of Jinja2?To solution was posted in response to my question in the OS in English. Just use the special variable self to access the block by name: <title>{% block title %}{% endblock %} - {{ sitename… 
- 
		3 votes1 answer109 viewsQ: How to repeat a block of Jinja2?I’m using the Jinja2 as a template engine to generate a static HTML site in a Python script. I want to repeat the contents of a block (title) in the layout template (html layout.), that is like:… 
- 
		6 votes7 answers59028 viewsA: Refactoring function to remove punctuation, spaces and special charactersYou may want to use the library Urlify.php (source code here), which has extensive testing to support multiple characters and languages, and also supports adding more complex mappings than 1… 
- 
		16 votes7 answers3681 viewsA: What are the implications of not declaring variables in PHP?Simply state it. Any impact on performance is irrelevant near maintenance impact. You or someone else can get to this code snippet later and not be sure if it’s right because you’re not sure where… 
- 
		4 votes3 answers157 viewsA: Refactoring function to collect constants with a certain prefixI know I’m not exactly answering the question as I seem to specifically want a performance improvement, but I would like to make some suggestions to improve the clarity of the code. As it stands,… 
- 
		4 votes3 answers2176 viewsA: Command to replace characters recursivelyUse perl, which supports look-Ahead: perl -p -e 's/;(?=;|$)/;\\N/g' arquivo.csv > novo-arquivo.csv Incidentally, if you want to make the change within the same file (without having to redirect to… 
- 
		4 votes1 answer236 viewsA: How to make Dbunit recognize the POLYGON data type of Postgresql?The interesting item in the FAQ is this: How to replace the default data type Factory?, which explains how to set up a custom data type Factory Dbunit pro -- basically, a class that implements… 
- 
		4 votes4 answers1546 viewsA: How can I optimize a recursive method for finding ancestors?One way around this problem is to use dynamic programming (or memoization). The idea is to create a kind of "cache" of the function result, so you do not have to re-query the whole hierarchy if you… 
- 
		2 votes5 answers2222 views
- 
		7 votes2 answers821 viewsQ: In Python, how to get the default temporary directory?I am making a program that uses a temporary file to save a serialized object (pickled). At the moment, the program is generating in the /tmp, but this path is specific to Unix/Linux; wanted to take… 
- 
		57 votes4 answers66720 viewsA: What is the correct way to make a regular Javascript substitution for all occurrences found?Use a regular expression in the first argument of replace with the flag g (global): str = str.replace(/_/g, ' '); Read more: Regular Expression Aurelius Page Regular Expressions in Javascript - MDN… 
- 
		51 votes1 answer24387 viewsQ: What is the difference in the use of Return false, Event.stopPropagation() and Event.preventDefault()?Num Handler jQuery event, one can use return false, event.stopPropagation() and event.preventDefault() (or combinations thereof) to "cancel the action" of the event. I know there’s a difference in… 
- 
		2 votes3 answers5937 viewsA: How do I see which commits change a certain file?I recommend using an alias in . gitconfig to query this type of information. Put it in your ~/.gitconfig, in the aliases section: [alias] ll = log --pretty=format:"%C(yellow)%h%Cred%d\\… 
- 
		16 votes4 answers1969 viewsQ: How to save Ctrl+S to Vim?When I use Vim in the terminal Ctrl+S, or the terminal hangs or something else strange happens. How to squeeze Ctrl+S is already almost an instinct to save, how to map this shortcut to save the… 
- 
		24 votes1 answer1187 viewsQ: How to use virtualenv to manage dependencies on a Python application?I need to manage the dependencies of an application Python that I am developing, so that it is easy for other team developers to work on the project using the same versions of the packages that I am… 
- 
		24 votes1 answer1187 viewsA: How to use virtualenv to manage dependencies on a Python application?[virtualenv][1] builds a "virtual" Python environment, storing all the dependencies in a directory. Personally, I like to use [virtualenvwrapper][2], which is a set of scripts that make it a little…