Message Boards Message Boards

[✓] Transfer HTML table to Excel using Wolfram Language?

GROUPS:

In the link below there is a table that I need to transfer to Excel. The table is necessary in a Mathematica program I'm now developing. This must be made in the Mathematica environment, otherwise it will be necessary to stop the program, do it by hand, and import the Excel file. I'm aware about the Mathematica Link for Excel addon, but I can't use it because other users of my program may not have it.

Table currency values in Brazil

I will appreciate any help, even if it is "not possible".

Anderson

POSTED BY: Anderson Gaudio
Answer
2 months ago

Will this do any part of what you need to do?

data = Import["http://www.excelcontabilidade.com.br/indice/5/IPCA", "Data"]
Export["sheet.xlsx", data]

Looking at FullForm[data] may give you some helpful information.

I expect you will need to do some processing between the Import and Export, but this might help get you started.

POSTED BY: Bill Simpson
Answer
2 months ago

Hi Bill,

As copy & paste worked almost perfect in transferring HTML table to XLSX, and then to Mathematica, I thought about creating a code capable of reproducing the same thing. I soon realized that this would be very difficult, maybe impossible.

Some trials using "Import" resulted in a big confusion when Brazilian currency, that uses comma as decimal separator, is transferred to Mathematica. But interestingly I forgot about using "Export".

In fact, what deviated most my attention was another table that I also must import to the program and it is hundreds of times more complicated than that about IPCA.

Well, your solution worked fine, although some cleanup is necessary. How about the complicated table? It also worked as well! That’s what matters.

So, I would like to thank very much for your help.

POSTED BY: Anderson Gaudio
Answer
2 months ago

If the details of the formatting of the web page will not change often then a little bit of list and string hacking might work

h[s_] := If[DigitQ[StringTake[s, 1]] || StringTake[s, 1] == "-", ToExpression[s], s];
g[s_] := h[StringReplace[s, {"%" -> "", "," -> "."}]];
f[v_] := Join[{First[v]}, Map[g, Rest[v]]];
data3 = Map[f, Rest[data[[3]]]]

which gives this

{{1981, 6.82, 6.15, 5.1, 5.04, 7.39, 3.97, 6.69, 6.28, 5.29, 4.57, 5.88, 4.37},
 {1982, 8.32, 6.14, 6.36, 5.18, 6.46, 7.38, 6.54, 5.94, 5.61, 4.59, 4.59, 6.04},
 {1983, 10.45, 6.06, 8.54, 7.8, 5.57, 7.87, 12.06, 8.32, 9.48, 10.9, 6.96, 7.44},
 {1984, 10.14, 9.1, 8.98, 9.23, 9.88, 8.14, 12.62, 7.32, 11.39, 10.87, 10.35, 10.35},
 {1985, 14.61, 8.95, 12.78, 8.8, 6.76, 7.71, 9.27, 12.1, 11.98, 9.6, 11.12, 13.36},
 {1986, 16.23, 14.36, -0.11, 0.78, 1.4, 1.27, 1.71, 3.55, 1.72, 1.9, 5.45, 11.65},
 {1987, 13.21, 12.64, 16.37, 19.1, 21.45, 19.71, 9.21, 4.87, 7.78, 11.22, 15.08, 14.15},
 {1988, 18.89, 15.7, 17.6, 19.29, 17.42, 22., 21.91, 21.59, 27.45, 25.62, 27.94, 28.7},
 {1989, 37.49, 16.78, 6.82, 8.33, 17.92, 28.65, 27.74, 33.71, 37.56, 39.77, 47.82, 51.5},
 {1990, 67.55, 75.73, 82.39, 15.52, 7.59, 11.75, 12.92, 12.88, 14.41, 14.36, 16.81, 18.44},
 {1991, 20.75, 20.72, 11.92, 4.99, 7.43, 11.19, 12.41, 15.63, 15.63, 20.23, 25.21, 23.71},
 {1992, 25.94, 24.32, 21.4, 19.93, 24.86, 20.21, 21.83, 22.14, 24.63, 25.24, 22.49, 25.24},
 {1993, 30.35, 24.98, 27.26, 27.75, 27.69, 30.07, 30.72, 32.96, 35.69, 33.92, 35.56, 36.84},
 {1994, 41.31, 40.27, 42.75, 42.68, 44.03, 47.43, 6.84, 1.86, 1.53, 2.62, 2.81, 1.71},
 {1995, 1.7, 1.02, 1.55, 2.43, 2.67, 2.26, 2.36, 0.99, 0.99, 1.41, 1.47, 1.56},
 {1996, 1.34, 1.03, 0.35, 1.26, 1.22, 1.19, 1.11, 0.44, 0.15, 0.3, 0.32, 0.47},
 {1997, 1.18, 0.5, 0.51, 0.88, 0.41, 0.54, 0.22, -0.02, 0.06, 0.23, 0.17, 0.43},
 {1998, 0.71, 0.46, 0.34, 0.24, 0.5, 0.02, -0.12, -0.51, -0.22, 0.02, -0.12, 0.33},
 {1999, 0.7, 1.05, 1.1, 0.56, 0.3, 0.19, 1.09, 0.56, 0.31, 1.19, 0.95, 0.6},
 {2000, 0.62, 0.13, 0.22, 0.42, 0.01, 0.23, 1.61, 1.31, 0.23, 0.14, 0.32, 0.59},
 {2001, 0.57, 0.46, 0.38, 0.58, 0.41, 0.52, 1.33, 0.7, 0.28, 0.83, 0.71, 0.65},
 {2002, 0.52, 0.36, 0.6, 0.8, 0.21, 0.42, 1.19, 0.65, 0.72, 1.31, 3.02, 2.1},
 {2003, 2.25, 1.57, 1.23, 0.97, 0.61, -0.15, 0.2, 0.34, 0.78, 0.29, 0.34, 0.52},
 {2004, 0.76, 0.61, 0.47, 0.37, 0.51, 0.71, 0.91, 0.69, 0.33, 0.44, 0.69, 0.86},
 {2005, 0.58, 0.59, 0.61, 0.87, 0.49, -0.02, 0.25, 0.17, 0.35, 0.75, 0.55, 0.36},
 {2006, 0.59, 0.41, 0.43, 0.21, 0.1, -0.21, 0.19, 0.05, 0.21, 0.33, 0.31, 0.48},
 {2007, 0.44, 0.44, 0.37, 0.25, 0.28, 0.28, 0.24, 0.47, 0.18, 0.3, 0.38, 0.74},
 {2008, 0.54, 0.49, 0.48, 0.55, 0.79, 0.74, 0.53, 0.28, 0.26, 0.45, 0.36, 0.28},
 {2009, 0.48, 0.55, 0.2, 0.48, 0.47, 0.36, 0.24, 0.15, 0.24, 0.28, 0.41, 0.37},
 {2010, 0.75, 0.78, 0.52, 0.57, 0.43, 0., 0.01, 0.04, 0.45, 0.75, 0.83, 0.63},
 {2011, 0.83, 0.8, 0.79, 0.77, 0.47, 0.15, 0.16, 0.37, 0.53, 0.43, 0.52, 0.5},
 {2012, 0.56, 0.45, 0.21, 0.64, 0.36, 0.08, 0.43, 0.41, 0.57, 0.59, 0.6, 0.79},
 {2013, 0.86, 0.6, 0.47, 0.55, 0.37, 0.26, 0.03, 0.24, 0.35, 0.57, 0.54, 0.92},
 {2014, 0.55, 0.69, 0.92, 0.67, 0.46, 0.4, 0.01, 0.25, 0.57, 0.42, 0.51, 0.78},
 {2015, 1.24, 1.22, 1.32, 0.71, 0.74, 0.79, 0.62, 0.22, 0.54, 0.82, 1.01, 0.96},
 {2016, 1.27, 0.9, 0.43, 0.61, 0.78, 0.35, 0.52, 0.44, 0.08, 0.26, 0.18, 0.3},
 {2017, 0.38, 0.33, 0.25, 0.14, 0.31, -0.23, "---", "---", "---", "---", "---", "---"}}

and which might be acceptable to export as an Excel file.

But list and string hacking raw data seems like it often comes back to bite you later.

Might it be possible to do this in reverse? Can you write one short expression in Excel which would import the web page onto a sheet, transparently deal with localization of currency and export the desired range of cells of that sheet to an xlsx file which you can then import into Mathematica?

If not then many many many years ago there were fairly inexpensive software products which would try to do a reasonable job of extracting data from html pages and putting that into a useful Excel sheet. These were smarter than just pasting the contents. Excel gives a much broader market and you might find something that would be less "brittle" and likely to fail with the smallest change to the web page format than the simple bits which I wrote.

POSTED BY: Bill Simpson
Answer
2 months ago

I really appreciated your set of functions to clean up the file. I'll adapt it to import the big table. Thank you very much once again.

POSTED BY: Anderson Gaudio
Answer
2 months ago

Group Abstract Group Abstract