Posts by elias • 3,132 points
37 posts
-
0
votes1
answer45
viewsA: "Missing Scheme" error using Scrapy
Assuming you want to visit all pages (from 1 to 240), you probably wanted to do: def start_requests(self): link = "http://www.jornalpanorama.com.br/site/data-policia.php?page=" for x in range(1,…
-
0
votes1
answer146
viewsA: Scrapy different pages
Replace the return of the method parse by a yield. The return is causing the method to return in the first case, without finishing scrolling through the rest of the posts in the for loop. Using…
-
3
votes1
answer309
viewsA: Information contained in two Scrapy pages
The traditional way to extract data from multiple pages and use the mechanism to pass data between a request and another using the dictionary meta. Here’s how it works: in the callback that is…
-
1
votes4
answers14598
viewsA: Print only the column of a matrix
To work with matrices, it’s nice to use numpy. With it you can do: >>> import numpy as np >>> a = np.array([[1, 5], [7, 4], [8, 3]]) >>> print a.transpose()[0] [1 7 8]…
-
1
votes2
answers187
viewsA: Python code error
It’s hard to answer your question without seeing the snippet of code you’re trying to execute and the data you’re using. But the error message you are seeing seems to be from an attempt to use the…
-
3
votes1
answer55
viewsA: Is there any way to disable Scrapyd’s web interface?
scrapyd does not offer this option. But you can get this by running a custom installation with the code commenting on that line: https://github.com/scrapy/scrapyd/blob/master/scrapyd/website.py#L24…
-
3
votes4
answers1398
viewsA: Tool to generate Xpath
Try the bookmarklet Selectorgadget. It works like this: right after you fire, you click on something you want to capture. It generates a very generic selector, and highlights in yellow the captured…
-
3
votes1
answer126
viewsA: How to manage the operation and failure in the execution of Spiders?
Well, as who has access to the real Stats is the Scrapy (the scrapyd only runs the Jobs), I think the way to solve this problem is to use a Spider middleware send Crawler statistics to your…
-
1
votes1
answer138
viewsA: Deploy queues to manage competition between Piders in Scrapyd
Well, if you need simple priorities, one option is to use scrapyd’s priority parameter (this is not documented but is implemented here, is basically a basic priority queue on top of Sqlite). To use,…
-
5
votes1
answer7435
viewsA: On HREF and SRC, finally, what are the differences of application?
It is quite possible that the most efficient way is to import all the necessary CSS already in the html HEAD, before making any dynamic update via Javascript: if the actualization is via Ajax you…
-
2
votes2
answers113
viewsA: How to protect my Scrapyd server from unauthorized calls?
So this username/password setting is a client setting for a basic HTTP authentication, that the scrapyd currently does not implement. To set this up on your server, the way is to let scrapyd listen…
-
2
votes2
answers126
viewsA: How to calculate an optimal value for Scrapyd’s CONCURENT_REQUESTS variable?
You can use the extension Autothrottle, trying to optimize crawling speed based on estimates of server load and scrapy processing. Using this extension (code here), you can define a…
-
2
votes1
answer152
viewsA: Multiple Pipelines to Treat Different Files in Scrapy
It is a common pattern in pipelines (and Spider middlewares as well) to use Spider attributes to decide what to do: class MyPipeline: def process_item(self, item, spider): if getattr(spider,…
-
3
votes1
answer1599
viewsA: How to get the average sizes of a cluster with Pandas?
In accordance with answer I just got in the OS in English, the solution is to make another groupby specifying the meter level: df.groupby(['tipo', 'ano']).size().groupby(level=1).mean() ano 2000 1.5…
-
1
votes1
answer1599
viewsQ: How to get the average sizes of a cluster with Pandas?
Given a Pandas Dataframe, with the data in such a structure: import pandas as pd raw_data = { 'tipo': ['a', 'a', 'b', 'c', 'c', 'c', 'd'], 'ano': [2000, 2000, 2000, 2001, 2001, 2001, 2001], } df =…
-
2
votes1
answer233
viewsA: Prevent js file from running
You can make a user script to change the functioning of the site, using the extension Greasemonkey (if you use Firefox) or Tampermonkey (if you use Chrome), which reverses what the play.html.js…
-
4
votes6
answers3344
viewsA: How is an agile way to add and remove code comments in VIM?
I suggest using the plugin tcomment, providing unique shortcuts to comment/uncomment in multiple languages. With tcomment installed, you can do, by Normal mode: gcc to comment/uncomment the current…
-
1
votes3
answers19533
viewsA: How to do a Ubmit sending filled data in the form to an email?
Browsers do not send email. Your code is the closest you will get using just the browser. You can’t email only with HTML & Javascript -- in web applications, you can only send emails securely on…
-
4
votes3
answers70810
viewsA: What are the appropriate data types for columns like address, email, phone and mobile phone for SQL database?
If you are not going to do math with the numbers (add, multiply, etc), there is not much to use numerical types. Use CHAR or VARCHAR, as appropriate.
-
16
votes8
answers5920
viewsA: Is it always guaranteed that a multi-threaded application runs faster than using a single thread?
Be careful not to confuse parallelism with competition. Parallelism relates to two tasks running at the same time. Example: two tasks running at the same time, one on each CPU. Competition relates…
-
3
votes2
answers1611
views -
3
votes1
answer109
viewsA: How to repeat a block of Jinja2?
To solution was posted in response to my question in the OS in English. Just use the special variable self to access the block by name: <title>{% block title %}{% endblock %} - {{ sitename…
-
3
votes1
answer109
viewsQ: How to repeat a block of Jinja2?
I’m using the Jinja2 as a template engine to generate a static HTML site in a Python script. I want to repeat the contents of a block (title) in the layout template (html layout.), that is like:…
-
6
votes7
answers59028
viewsA: Refactoring function to remove punctuation, spaces and special characters
You may want to use the library Urlify.php (source code here), which has extensive testing to support multiple characters and languages, and also supports adding more complex mappings than 1…
-
16
votes7
answers3681
viewsA: What are the implications of not declaring variables in PHP?
Simply state it. Any impact on performance is irrelevant near maintenance impact. You or someone else can get to this code snippet later and not be sure if it’s right because you’re not sure where…
-
4
votes3
answers157
viewsA: Refactoring function to collect constants with a certain prefix
I know I’m not exactly answering the question as I seem to specifically want a performance improvement, but I would like to make some suggestions to improve the clarity of the code. As it stands,…
-
4
votes3
answers2176
viewsA: Command to replace characters recursively
Use perl, which supports look-Ahead: perl -p -e 's/;(?=;|$)/;\\N/g' arquivo.csv > novo-arquivo.csv Incidentally, if you want to make the change within the same file (without having to redirect to…
-
4
votes1
answer236
viewsA: How to make Dbunit recognize the POLYGON data type of Postgresql?
The interesting item in the FAQ is this: How to replace the default data type Factory?, which explains how to set up a custom data type Factory Dbunit pro -- basically, a class that implements…
-
4
votes4
answers1546
viewsA: How can I optimize a recursive method for finding ancestors?
One way around this problem is to use dynamic programming (or memoization). The idea is to create a kind of "cache" of the function result, so you do not have to re-query the whole hierarchy if you…
-
2
votes5
answers2222
views -
7
votes2
answers821
viewsQ: In Python, how to get the default temporary directory?
I am making a program that uses a temporary file to save a serialized object (pickled). At the moment, the program is generating in the /tmp, but this path is specific to Unix/Linux; wanted to take…
-
57
votes4
answers66720
viewsA: What is the correct way to make a regular Javascript substitution for all occurrences found?
Use a regular expression in the first argument of replace with the flag g (global): str = str.replace(/_/g, ' '); Read more: Regular Expression Aurelius Page Regular Expressions in Javascript - MDN…
-
51
votes1
answer24387
viewsQ: What is the difference in the use of Return false, Event.stopPropagation() and Event.preventDefault()?
Num Handler jQuery event, one can use return false, event.stopPropagation() and event.preventDefault() (or combinations thereof) to "cancel the action" of the event. I know there’s a difference in…
-
2
votes3
answers5937
viewsA: How do I see which commits change a certain file?
I recommend using an alias in . gitconfig to query this type of information. Put it in your ~/.gitconfig, in the aliases section: [alias] ll = log --pretty=format:"%C(yellow)%h%Cred%d\\…
-
16
votes4
answers1969
viewsQ: How to save Ctrl+S to Vim?
When I use Vim in the terminal Ctrl+S, or the terminal hangs or something else strange happens. How to squeeze Ctrl+S is already almost an instinct to save, how to map this shortcut to save the…
-
24
votes1
answer1187
viewsQ: How to use virtualenv to manage dependencies on a Python application?
I need to manage the dependencies of an application Python that I am developing, so that it is easy for other team developers to work on the project using the same versions of the packages that I am…
-
24
votes1
answer1187
viewsA: How to use virtualenv to manage dependencies on a Python application?
[virtualenv][1] builds a "virtual" Python environment, storing all the dependencies in a directory. Personally, I like to use [virtualenvwrapper][2], which is a set of scripts that make it a little…