部门分享Django和爬虫

一、Django简单接口开发分享:

1、在开发之前需要配置开发环境,摆脱Ubuntu上面开发,在Mac电脑上面创建虚拟环境开发
2、django-admin startproject jiekou
3、创建应用python manage.py startapp myjiekou
4、打开项目,把应用注册在setting.py文件

1
2
3
4
5
6
7
8
9
复制代码INSTALLED_APPS = (
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'myjiekou',
)

5、在models.py文件里面定义模型类,定义需要的字段

1
2
3
4
5
6
7
8
9
10
11
12
复制代码# encoding=utf-8
from django.db import models

# Create your models here.

class MyModel(models.Model):
# 姓名
name = models.CharField(max_length=20)
# 年龄
age = models.CharField(max_length=100)
# 爱好
hobby = models.CharField(max_length=300)

6、生成迁移文件python manage.py makemigrations
7、生成迁移python manage.py migrate,迁移完成以后会自动生成一个auth表
8、运行python manage.py runserver,通过连接http://127.0.0.1:8000/admin看下后台管理界面
9、看后台管理界面之前需要注册管理员账号python manage.py createsuperuser
10、登进去以后为什么没有我们新建的表格那?

admin.png
原因是:我们没有在admin.py文件里面进行注册我们的模型类,接下来进行注册

1
2
3
4
5
6
7
8
9
复制代码from django.contrib import admin

from myjiekou.models import MyModel
# Register your models here.

class MyAdmin(admin.ModelAdmin):
list_display = ["name","age","hobby"]

admin.site.register(MyModel,MyAdmin)

11、再次执行python manage.py runserver 12、让我们再看一下admin管理界面,并添加字段

admin1.png

13、我们再admin管理界面的数据怎么怎么在django web页面显示那我们来进行下步操作,我们目的需要通过http://127.0.0.1:8000/index来进行访问显示我们输出的内容,首先我们先进行简单的显示

1
2
3
4
5
6
复制代码#encoding=utf-8
from django.shortcuts import render
from django.http import HttpResponse
# Create your views here.
def index(request):
return HttpResponse("你好 我的体育老师")

admin2.png
再次,我们需要把SQLite数据展示在我们页面上,首先导入我们的模型类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
复制代码setting.py配置路径
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [os.path.join(BASE_DIR),'templates'],
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]
1
2
3
4
5
6
7
8
9
10
11
复制代码在view.py文件中
#encoding=utf-8
from django.shortcuts import render
from django.http import HttpResponse
from models import MyModel
# Create your views here.

def index(request):
content = MyModel.objects.all()
list = {"content":content}
return render(request,"myjiekou/index.html",list)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
复制代码index.html显示
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
</head>
<body>
<ul>
{% for item in content %}

<li>{{ item.name }}</li>
<li>{{ item.age }}</li>
<li>{{ item.hobby }}</li>

{% endfor %}
</ul>
</body>
</html>

注意:我们再操作过程中会产生一些问题,例如下面,我们解决就好

1
2
3
4
5
复制代码MIDDLEWARE_CLASSES = [
'django.contrib.sessions.middleware.SessionMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
]

14、管理界面汉语化

1
复制代码LANGUAGE_CODE = 'en-us'

15、接下来进行django接口开发

1
2
复制代码首先导入模块
from django.http import JsonResponse
1
2
3
4
5
6
7
复制代码url配置
from myjiekou import views
urlpatterns = [
url(r'^admin/', include(admin.site.urls)),
url(r'^index/', views.index),
url(r'^api/', views.api),
]
1
2
3
4
5
6
7
8
9
10
11
12
13
复制代码api实现
def api(request):
list = []
item = {}
content = MyModel.objects.all()

for one in content:
item["name"] = one.name
item["age"] = one.age
item["hobby"] = one.hobby
list.append(item)

return JsonResponse({"status":200,"date":list})

admin3.png

接下来我运行一下OC程序来调用这个接口,看是否调用成功

二、爬虫爬取某个网站

先了解下爬虫的基础模块
1、re模块:主要是使用正则匹配对抓取的数据进行分析
2、XPath:查找 HTML 节点或元素进行数据过滤
3、BeautifulSoup4: 也是一个HTML/XML的解析器,解析和提取 HTML/XML 数据
4、JSON与JsonPATH:JSON数据解析
下面通过一个实例说明,主要使用了XPath查找 HTML 节点或元素解析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
复制代码# -*- coding:utf-8 -*-

import urllib2,os
import lxml.etree

class Xunmall():
def __init__(self):
self.url = "http://www.xunmall.com"

def get_html(self,p1 = ""):
# headers = {
# "User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Mobile Safari/537.36"}
request = urllib2.Request(self.url + p1)
response = urllib2.urlopen(request)
html = response.read()
return html

def get_xpath(self):
xmlcontent = lxml.etree.HTML(self.get_html())
xmllist = xmlcontent.xpath('//h2[@class="floor_name"]/text()')

for item in xmllist:
with open('title.txt','a') as file:
file.write(item.encode('utf-8') + '\n')
file.close


def get_image(self):
xmlimage = lxml.etree.HTML(self.get_html())
imagelist = xmlimage.xpath('//div[@class="color_top"]/img/@src')
if os.path.isdir('./imgs'):
pass
else:
os.mkdir("./imgs")
for item in imagelist:
print self.url + item
with open('imgs/' + (self.url + item)[-8:],'a+') as file:
file.write(self.get_html(item))
file.close

def get_theme(self):
xmltheme = lxml.etree.HTML(self.get_html())
themelist = xmltheme.xpath('//h3[@class="floor_theme"]/text()')

for item in themelist:
with open('theme.txt','a') as file:
file.write(item.encode('utf-8') + '\n')
file.close

sloganlist = xmltheme.xpath('//p[@class="slogan"]/text()')
for item in sloganlist:
with open('theme.txt','a') as file:
file.write(item.encode('utf-8') + '\n')
file.close

give_outlist = xmltheme.xpath('//p[@class="give_out"]/text()')
for item in give_outlist:
with open('theme.txt', 'a') as file:
file.write(item.encode('utf-8') + '\n')
file.close

def get_html1(self,p2):
request = urllib2.Request(p2)
response = urllib2.urlopen(request)
html = response.read()
return html

# 食品标题和图片
def foodImageTitle(self):
foodImage = lxml.etree.HTML(self.get_html())
foodImageList = foodImage.xpath('//div[@class="pro_image"]/img/@src')

if os.path.isdir('./foodimage'):
pass
else:
os.mkdir("./foodimage")
for item in foodImageList:
print item
with open('foodimage/' + item[-20:],'a+') as file:
file.write(self.get_html1(item))
file.close

# 每个零食的详细信息(标题、图片、副标题)
def detail(self):
detailLink = lxml.etree.HTML(self.get_html())
detailLinkList = detailLink.xpath('//div[@class="nth_floor first_floor"]/div[@class="goods_box"]/ul[@class="item_list"]//a/@href')
for item in detailLinkList:
# print item[-18:]
detailUrl = lxml.etree.HTML(self.get_html("/" + item[-18:]))
detailImageList = detailUrl.xpath(
'//div[@class="info-panel panel1"]/img/@src')

for detailitem in detailImageList:
print '正在下载详情图片'

if os.path.isdir('./' + item[-18:-5]):
pass
else:
os.mkdir("./" + item[-18:-5])

with open(item[-18:-5] + '/' + detailitem[-9:], 'a+') as file:
file.write(self.get_html1(detailitem))
file.close
# 商品标题
detailtitleList = detailUrl.xpath(
'//div[@class="col-lg-7 item-inner"]//h1[@class="fl"]/text()')

for title in detailtitleList:
with open('foodtitle.txt', 'a+') as file:
file.write(title.encode('utf-8') + '\n')
file.close
# 商品编号
goodnumberList = detailUrl.xpath(
'//div[@class="col-lg-7 item-inner"]//li[@class="col-lg-5 col-md-5"]/text()')
for number in goodnumberList:
print number
if os.path.isdir('./qrcoder'):
pass
else:
os.mkdir("./qrcoder")

with open('qrcoder', 'a+') as file:
file.write(number.encode('utf-8') + '\n')
file.close

# 商品二维码:data_code
coderImageList = detailUrl.xpath('//div[@class="clearfixed"]//div[@class="barcode fr"]/img/@data_code')

for item in coderImageList:
print item
with open('goodnumber.txt', 'a+') as file:
file.write(item + '\n')
file.close


if __name__ == "__main__":
# 获取分类标题
xunmall = Xunmall()
# xunmall.get_xpath()
# 获取图片
# xunmall.get_image()
# 图片上面的标题
# xunmall.get_theme()
# 休闲食品标题和图片
# xunmall.foodImageTitle()
xunmall.detail()

后续会分享Swift哦,只是简单的分享下学习成果,和项目组一起探讨和学习。

本文转载自: 掘金

开发者博客 – 和开发相关的 这里全都有

0%