Ftpwebresponse.Getresponsestream returning an HTML

Asked

Viewed 157 times

2

I am performing FTP connection, with the FtpWebResponse of . So far so good, I’m listing the directories as per this answer.

When I simulate an FTP server locally with Filezilla Server included in XAMPP, I list the directories and come one by one on each line from ResponseStream, as in the example:

config/
app/
public/
file.xml

But I tested today on two remote servers and comes a gigantic.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd
">
<!-- HTML listing generated by Squid 2.6.STABLE21 -->
<!-- Wed, 27 May 2015 17:42:13 GMT -->
<HTML><HEAD><TITLE>
FTP Directory: ftp://[email protected]/
</TITLE>
<STYLE type="text/css"><!--BODY{background-color:#ffffff;font-family:verdana,sans-serif}--></STYLE>
</HEAD><BODY>
<PRE>
--------- Welcome to Pure-FTPd [privsep] [TLS] ----------
You are user number 2 of 50 allowed.
Local time is now 18:43. Server port: 21.
This is a private system - No anonymous login
IPv6 connections are also welcome on this server.
</PRE>
<HR noshade size="1px">
<H2>
FTP Directory: <A HREF="/">ftp://[email protected]</A>/</H2>
<PRE>
<A HREF="etc/"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-static/ic
ons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="etc/">etc</A>. . . . . . . . . . . . . . . Jan 13 20
:39
<A HREF="logs/"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-static/i
cons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="logs/">logs</A> . . . . . . . . . . . . . . May 14
19:06
<A HREF="mail/"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-static/i
cons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mail/">mail</A> . . . . . . . . . . . . . . Dec 16
20:53
<A HREF="public_ftp/"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-st
atic/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="public_ftp/">public_ftp</A> . . . . . . . . .
 . . Aug  4  2014
<A HREF="public_html/"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-s
tatic/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="public_html/">public_html</A>. . . . . . . .
 . . . May 25 17:21
<A HREF="ssl/"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-static/ic
ons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="ssl/">ssl</A>. . . . . . . . . . . . . . . Aug  5  2
014
<A HREF="tmp/"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-static/ic
ons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="tmp/">tmp</A>. . . . . . . . . . . . . . . May  5 12
:57
<A HREF="www"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-static/ico
ns/anthony-link.gif" ALT="[LINK]"></A> <A HREF="www">www</A>. . . . . . . . . . . . . . . Sep 30  20
14         <A HREF="www;type=a"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-i
nternal-static/icons/anthony-text.gif" ALT="[VIEW]"></A> <A HREF="www;type=i"><IMG border="0" SRC="h
ttp://proxy.domain.local:8080/squid-internal-static/icons/anthony-box.gif" ALT="[DOWNLOAD]"
></A> -> <A HREF="public_html">public_html</A>
</PRE>
<HR noshade size="1px">
<ADDRESS>
Generated Wed, 27 May 2015 17:42:13 GMT by proxy.domain.local (squid/2.6.STABLE21)
</ADDRESS></BODY></HTML>

I removed some parts so it wouldn’t get too big and also some confidential information...

How to force the answer to be only directories and files line by line, or at least one XML?


Edit

I inspected the requisition with the Fiddler Web Debugger, and in Inspector > Raw contains the following:

GET ftp://user:[email protected]/ HTTP/1.1
Host: domain.com
Proxy-Connection: Keep-Alive

Obs.: I don’t know why, Fiddler is interrupting my request and my application doesn’t complete it, stands still until it’s too late, only concludes when I close Fiddler.

Edit 2

I tested the program at home and it worked normally, returning only the list of directories. As discussed in the comments the problem is probably the company Proxy.

  • Strange, can you identify what HTML is about? Do you use any Proxy?

  • Yes proxy use, at first html seems to be the directory listing, I will post a part of it...

  • That’s the problem. Add your code to the line: ftpRequest.Proxy = null;. =)

  • I put it, it doesn’t connect...

  • So define him with a proxy: ftpRequest.Proxy = "TeuProxyAqui";.

  • The FTP server is understanding that the request was made by HTTP and not by FTP so it is responding with an HTML file with the formatted listing of directories and files. Try to verify using Finddler2 or Charles or Wireshark exactly how your request is being made.

  • When I don’t define the proxy, or define in the following ways: Request.Proxy = WebRequest.GetSystemWebProxy(); or Request.Proxy = new WebProxy("proxy.dominio.local", 8080);. It works and brings the result like [tag:HTML]

  • You are not getting past Squid, one way to solve this problem would be to analyze the HTML and extract the information you want. You can use the HTML Agile Pack, and select the files like this: document.DocumentNode.SelectNodes("//A/text()").

  • @Guilhermebrancostracini tried to use Fiddle but it is not capturing the application request, only from the browsers... :/

  • @Guilhermebrancostracini managed to set up here, but he’s hijacking my request is not letting it be completed... Anyway, the Inspector tab on the Raw option contains the following: GET ftp://user:[email protected]/ HTTP/1.1 Host: domain.com Proxy-Connection: Keep-Alive

  • Use the property KeepAlive of FtpWebRequest. As a last resort, if not successful, consider parse html. Using the example of the question I managed that result.

  • You can configure this Squid?

  • No @Ciganomorrisonmendez, the proxy is company, I do not have access to settings, I surround the home application where there is no proxy and worked normally. So the problem isn’t in the configuration of my application? I have to parse as @qmechanik said?

  • You have, but I don’t know if it’s worth it.

  • @qmechanik can post the example parser as answer? I think this will be the most viable solution... :/ Then I’ll test later.

  • The proxy is routing your FTP request to HTTP, this is the problem, in this case you will have to "parse" the HTML returned by FTP and extract the information. You can use the Html Agilepack as already recommended. the CS Query or make your own reader/parser.

Show 11 more comments

1 answer

1


This is because the proxy performs the request through the protocol HTTP, and not FTP, the proxy, then perform the commands of FTP required and will return the result to you within an HTTP response.

HTTP proxies usually return a page HTML as a result, so the user can click to get the relevant files.

Since, you do not have access to the settings of proxy, an alternative is to analyze the HTML and extract relevant information.

One way to do this in C# is to use the HTML Agile Pack, follows below the adapted code, what was mentioned in the question:

using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Linq;
....
...

static List<string> retornarDiretoriosFTP(string URI, string usuario, string senha) {
    FtpWebRequest ftpRequest = (FtpWebRequest)WebRequest.Create(URI);
    List<string> diretorios = new List<string>();
    string resposta = string.Empty;

    ftpRequest.Credentials = new NetworkCredential(usuario, senha);
    ftpRequest.Method = WebRequestMethods.Ftp.ListDirectory;
    FtpWebResponse resultado = (FtpWebResponse)ftpRequest.GetResponse();

    using (StreamReader streamReader = new StreamReader(resultado.GetResponseStream())) {
        resposta = streamReader.ReadToEnd();
    }

    var documento = new HtmlAgilityPack.HtmlDocument();
    documento.LoadHtml(resposta);

    // Se a resposta conter um HTML Válido
    if (documento.ParseErrors.Count() == 0) {
        foreach (var diretorio in documento.DocumentNode.Descendants("a").Select(x => x.Attributes["href"])) {
            diretorios.Add(diretorio.Value);
        }
    }
    // Se não, provavelmente é a listagem dos diretórios
    else {
        foreach (var diretorio in resposta) {
            diretorios.Add(diretorio.ToString());
        }
    }          
    return diretorios;
}

To use it, do:

static void Main(string[] args) {
    List<string> diretorios = retornarDiretoriosFTP("Proxy", "Usuário", "Senha");
    foreach (var diretorio in diretorios) {
        Console.WriteLine(diretorio);
    }

}

Note: It is necessary to reference the HTML Agile Pack in the project.

  • Qmechanik, I did not find a documentation of this package, so it has to check if the return is an HTML?

  • @Kaduamaral The documentation is scarce, but it is possible to do this by checking if the HTML was analyzed correctly, I updated the answer. I couldn’t test it, so I can’t say if it works this way too.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.